All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Preliminary BTRFS Encryption
@ 2016-09-13 13:39 Anand Jain
  2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
                   ` (11 more replies)
  0 siblings, 12 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick example
on the cli usage. Please try out, let me know if I have missed something.

Also would like to mention that a review from the security experts is due,
which is important and I believe those review comments can be accommodated
without major changes from here.

Also yes, thanks for the emails, I hear, per file encryption and inline
with vfs layer is also important, which is wip among other things in the
list.

As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs, which is
easier for data center solution-ing, seamlessly persistent and easy to
manage.


Steps:
-----

Make sure following kernel TFMs are compiled in.
# cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
name         : ctr(aes)
name         : cbc(aes)

Create encrypted subvolume.
# btrfs su create -e 'ctr(aes)' /btrfs/e1
Create subvolume '/btrfs/e1'
Passphrase: 
Again passphrase: 

A key is created and its hash is updated into the subvolume item,
and then added to the system keyctl.
# btrfs su show /btrfs/e1 | egrep -i encrypt
	Encryption: 		ctr(aes)@btrfs:75197c8e (594790215)

# keyctl show 594790215
Keyring
 594790215 --alsw-v      0     0  logon: btrfs:75197c8e


Now any file data extents under the subvol /btrfs/e1 will be
encrypted.

You may revoke key using keyctl or btrfs(8) as below.
# btrfs su encrypt -k out /btrfs/e1

# btrfs su show /btrfs/e1 | egrep -i encrypt
	Encryption: 		ctr(aes)@btrfs:75197c8e (Required key not available)

# keyctl show 594790215
Keyring
Unable to dump key: Key has been revoked

As the key hash is updated, If you provide wrong passphrase in the next
key in, it won't add key to the system. So we have key verification
from the day1.

# btrfs su encrypt -k in /btrfs/e1
Passphrase: 
Again passphrase: 
ERROR: failed to set attribute 'btrfs.encrypt' to 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service

ERROR: key set failed: Key was rejected by service

# btrfs su encrypt -k in /btrfs/e1
Passphrase: 
Again passphrase: 
key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'

Now if you revoke the key the read / write fails with key error.

# md5sum /btrfs/e1/2k-test-file 
8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file

# btrfs su encrypt -k out /btrfs/e1

# md5sum /btrfs/e1/2k-test-file 
md5sum: /btrfs/e1/2k-test-file: Key has been revoked

# cp /tfs/1k-test-file /btrfs/e1/
cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been revoked

Plain text memory scratches for security reason is pending. As there are some
key revoke notification challenges to coincide with encryption context switch,
which I do believe should be fixed in the due course, but is not a roadblock
at this stage.

Thanks, Anand


Anand Jain (1):
  btrfs: Encryption: Add btrfs encryption support

 fs/btrfs/Makefile               |   4 +-
 fs/btrfs/btrfs_inode.h          |   6 +
 fs/btrfs/compression.c          |  30 +-
 fs/btrfs/compression.h          |  10 +-
 fs/btrfs/ctree.h                |   4 +
 fs/btrfs/disk-io.c              |   3 +
 fs/btrfs/encrypt.c              | 807 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/encrypt.h              |  94 +++++
 fs/btrfs/inode.c                | 255 ++++++++++++-
 fs/btrfs/ioctl.c                |  67 ++++
 fs/btrfs/lzo.c                  |   2 +-
 fs/btrfs/props.c                | 331 +++++++++++++++-
 fs/btrfs/super.c                |  27 +-
 fs/btrfs/tests/crypto-tests.c   | 376 +++++++++++++++++++
 fs/btrfs/tests/crypto-tests.h   |  38 ++
 fs/btrfs/zlib.c                 |   2 +-
 include/uapi/linux/btrfs_tree.h |   6 +-
 17 files changed, 2027 insertions(+), 35 deletions(-)
 create mode 100644 fs/btrfs/encrypt.c
 create mode 100644 fs/btrfs/encrypt.h
 create mode 100755 fs/btrfs/tests/crypto-tests.c
 create mode 100755 fs/btrfs/tests/crypto-tests.h

Anand Jain (2):
  btrfs-progs: make wait_for_commit non static
  btrfs-progs: add encryption support

 Makefile.in       |   5 +-
 btrfs-list.c      |  33 ++++
 cmds-filesystem.c |   4 +-
 cmds-restore.c    |  16 ++
 cmds-subvolume.c  | 112 ++++++++++++--
 commands.h        |   1 +
 ctree.h           |   5 +-
 encrypt.c         | 455 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 encrypt.h         |  46 ++++++
 props.c           |   4 +
 utils.h           |   2 +
 11 files changed, 665 insertions(+), 18 deletions(-)
 create mode 100644 encrypt.c
 create mode 100644 encrypt.h

Anand Jain (1):
  fstests: btrfs: support encryption

 common/filter.btrfs |   2 +-
 common/rc           |   2 +-
 tests/btrfs/041     |   2 +
 tests/btrfs/041.out |  13 ++++
 tests/btrfs/052     |  12 +++
 tests/btrfs/052.out | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/079     |   2 +
 tests/btrfs/125     |   2 +-
 tests/generic/297   |   6 +-
 tests/generic/298   |   2 +-
 10 files changed, 251 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] btrfs: Encryption: Add btrfs encryption support
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
@ 2016-09-13 13:39 ` Anand Jain
  2016-09-13 14:12   ` kbuild test robot
                     ` (2 more replies)
  2016-09-13 13:39 ` [PATCH 1/2] btrfs-progs: make wait_for_commit non static Anand Jain
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Adds encryption support. Based on v4.7-rc3.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/Makefile               |   4 +-
 fs/btrfs/btrfs_inode.h          |   6 +
 fs/btrfs/compression.c          |  30 +-
 fs/btrfs/compression.h          |  10 +-
 fs/btrfs/ctree.h                |   4 +
 fs/btrfs/disk-io.c              |   3 +
 fs/btrfs/encrypt.c              | 807 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/encrypt.h              |  94 +++++
 fs/btrfs/inode.c                | 255 ++++++++++++-
 fs/btrfs/ioctl.c                |  67 ++++
 fs/btrfs/lzo.c                  |   2 +-
 fs/btrfs/props.c                | 331 +++++++++++++++-
 fs/btrfs/super.c                |  27 +-
 fs/btrfs/tests/crypto-tests.c   | 376 +++++++++++++++++++
 fs/btrfs/tests/crypto-tests.h   |  38 ++
 fs/btrfs/zlib.c                 |   2 +-
 include/uapi/linux/btrfs_tree.h |   6 +-
 17 files changed, 2027 insertions(+), 35 deletions(-)
 create mode 100644 fs/btrfs/encrypt.c
 create mode 100644 fs/btrfs/encrypt.h
 create mode 100755 fs/btrfs/tests/crypto-tests.c
 create mode 100755 fs/btrfs/tests/crypto-tests.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17a80b0..c185b2f18953 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-	   uuid-tree.o props.o hash.o free-space-tree.o
+	   uuid-tree.o props.o hash.o free-space-tree.o encrypt.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
@@ -17,4 +17,4 @@ btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
 btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
 	tests/extent-buffer-tests.o tests/btrfs-tests.o \
 	tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o \
-	tests/free-space-tree-tests.o
+	tests/free-space-tree-tests.o tests/crypto-tests.o
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4919aedb5fc1..8d2ce6c0e384 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -24,6 +24,7 @@
 #include "extent_io.h"
 #include "ordered-data.h"
 #include "delayed-inode.h"
+#include "encrypt.h"
 
 /*
  * ordered_data_close is set by truncate when a file that used
@@ -207,6 +208,11 @@ struct btrfs_inode {
 	struct rw_semaphore dio_sem;
 
 	struct inode vfs_inode;
+
+	unsigned char key_payload[BTRFS_CRYPTO_KEY_SIZE];
+	u32 key_len;
+	unsigned char cryptoiv[BTRFS_CRYPTO_IV_SIZE];
+	u32 iv_len;
 };
 
 extern unsigned char btrfs_filetype_table[];
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 658c39b70fba..fc65f8831a1d 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -41,6 +41,7 @@
 #include "compression.h"
 #include "extent_io.h"
 #include "extent_map.h"
+#include "encrypt.h"
 
 struct compressed_bio {
 	/* number of bios pending for this compressed extent */
@@ -83,7 +84,7 @@ struct compressed_bio {
 
 static int btrfs_decompress_biovec(int type, struct page **pages_in,
 				   u64 disk_start, struct bio_vec *bvec,
-				   int vcnt, size_t srclen);
+				   int vcnt, size_t srclen, struct bio *bio);
 
 static inline int compressed_bio_size(struct btrfs_root *root,
 				      unsigned long disk_size)
@@ -180,9 +181,9 @@ static void end_compressed_bio_read(struct bio *bio)
 				      cb->start,
 				      cb->orig_bio->bi_io_vec,
 				      cb->orig_bio->bi_vcnt,
-				      cb->compressed_len);
+				      cb->compressed_len, cb->orig_bio);
 csum_failed:
-	if (ret)
+	if (ret && ret != -ENOKEY)
 		cb->errors = 1;
 
 	/* release the compressed pages */
@@ -754,15 +755,21 @@ static struct {
 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
 	&btrfs_zlib_compress,
 	&btrfs_lzo_compress,
+	&btrfs_encrypt_ops,
 };
 
 void __init btrfs_init_compress(void)
 {
 	int i;
+	int type;
 
 	for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
 		struct list_head *workspace;
 
+		type = i + 1;
+		if (type == BTRFS_ENCRYPT_AES)
+			continue;
+
 		INIT_LIST_HEAD(&btrfs_comp_ws[i].idle_ws);
 		spin_lock_init(&btrfs_comp_ws[i].ws_lock);
 		atomic_set(&btrfs_comp_ws[i].total_ws, 0);
@@ -801,6 +808,10 @@ static struct list_head *find_workspace(int type)
 	atomic_t *total_ws		= &btrfs_comp_ws[idx].total_ws;
 	wait_queue_head_t *ws_wait	= &btrfs_comp_ws[idx].ws_wait;
 	int *free_ws			= &btrfs_comp_ws[idx].free_ws;
+
+	if (type == BTRFS_ENCRYPT_AES)
+		return NULL;
+
 again:
 	spin_lock(ws_lock);
 	if (!list_empty(idle_ws)) {
@@ -867,6 +878,9 @@ static void free_workspace(int type, struct list_head *workspace)
 	wait_queue_head_t *ws_wait	= &btrfs_comp_ws[idx].ws_wait;
 	int *free_ws			= &btrfs_comp_ws[idx].free_ws;
 
+	if (!workspace)
+		return;
+
 	spin_lock(ws_lock);
 	if (*free_ws < num_online_cpus()) {
 		list_add(workspace, idle_ws);
@@ -894,8 +908,12 @@ static void free_workspaces(void)
 {
 	struct list_head *workspace;
 	int i;
+	int type;
 
 	for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
+		type = i + 1;
+		if (type == BTRFS_ENCRYPT_AES)
+			continue;
 		while (!list_empty(&btrfs_comp_ws[i].idle_ws)) {
 			workspace = btrfs_comp_ws[i].idle_ws.next;
 			list_del(workspace);
@@ -931,7 +949,7 @@ int btrfs_compress_pages(int type, struct address_space *mapping,
 			 unsigned long *out_pages,
 			 unsigned long *total_in,
 			 unsigned long *total_out,
-			 unsigned long max_out)
+			 unsigned long max_out, int flags)
 {
 	struct list_head *workspace;
 	int ret;
@@ -942,7 +960,7 @@ int btrfs_compress_pages(int type, struct address_space *mapping,
 						      start, len, pages,
 						      nr_dest_pages, out_pages,
 						      total_in, total_out,
-						      max_out);
+						      max_out, flags);
 	free_workspace(type, workspace);
 	return ret;
 }
@@ -965,7 +983,7 @@ int btrfs_compress_pages(int type, struct address_space *mapping,
  */
 static int btrfs_decompress_biovec(int type, struct page **pages_in,
 				   u64 disk_start, struct bio_vec *bvec,
-				   int vcnt, size_t srclen)
+				   int vcnt, size_t srclen, struct bio *bio)
 {
 	struct list_head *workspace;
 	int ret;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index f49d8b8c0f00..b90820f3898c 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -29,7 +29,7 @@ int btrfs_compress_pages(int type, struct address_space *mapping,
 			 unsigned long *out_pages,
 			 unsigned long *total_in,
 			 unsigned long *total_out,
-			 unsigned long max_out);
+			 unsigned long max_out, int flags);
 int btrfs_decompress(int type, unsigned char *data_in, struct page *dest_page,
 		     unsigned long start_byte, size_t srclen, size_t destlen);
 int btrfs_decompress_buf2page(char *buf, unsigned long buf_start,
@@ -53,8 +53,9 @@ enum btrfs_compression_type {
 	BTRFS_COMPRESS_NONE  = 0,
 	BTRFS_COMPRESS_ZLIB  = 1,
 	BTRFS_COMPRESS_LZO   = 2,
-	BTRFS_COMPRESS_TYPES = 2,
-	BTRFS_COMPRESS_LAST  = 3,
+	BTRFS_ENCRYPT_AES    = 3,
+	BTRFS_COMPRESS_TYPES = 3,
+	BTRFS_COMPRESS_LAST  = 4,
 };
 
 struct btrfs_compress_op {
@@ -70,7 +71,7 @@ struct btrfs_compress_op {
 			      unsigned long *out_pages,
 			      unsigned long *total_in,
 			      unsigned long *total_out,
-			      unsigned long max_out);
+			      unsigned long max_out, int flags);
 
 	int (*decompress_biovec)(struct list_head *workspace,
 				 struct page **pages_in,
@@ -88,5 +89,6 @@ struct btrfs_compress_op {
 
 extern const struct btrfs_compress_op btrfs_zlib_compress;
 extern const struct btrfs_compress_op btrfs_lzo_compress;
+extern const struct btrfs_compress_op btrfs_encrypt_ops;
 
 #endif
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 101c3cfd3f7c..aa3b3a4da923 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -40,6 +40,7 @@
 #include "extent_io.h"
 #include "extent_map.h"
 #include "async-thread.h"
+#include "encrypt.h"
 
 struct btrfs_trans_handle;
 struct btrfs_transaction;
@@ -1255,6 +1256,8 @@ struct btrfs_root {
 
 	/* For qgroup metadata space reserve */
 	atomic_t qgroup_meta_rsv;
+
+	char crypto_keytag[BTRFS_CRYPTO_KEYTAG_SIZE];
 };
 
 /*
@@ -1384,6 +1387,7 @@ do {                                                                   \
 #define BTRFS_INODE_NOATIME		(1 << 9)
 #define BTRFS_INODE_DIRSYNC		(1 << 10)
 #define BTRFS_INODE_COMPRESS		(1 << 11)
+#define BTRFS_INODE_ENCRYPT		(1 << 12)
 
 #define BTRFS_INODE_ROOT_ITEM_INIT	(1 << 31)
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1142127f6e5e..8517e7c5968f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -50,6 +50,7 @@
 #include "sysfs.h"
 #include "qgroup.h"
 #include "compression.h"
+#include "encrypt.h"
 
 #ifdef CONFIG_X86
 #include <asm/cpufeature.h>
@@ -1302,6 +1303,8 @@ static void __setup_root(u32 nodesize, u32 sectorsize, u32 stripesize,
 	root->anon_dev = 0;
 
 	spin_lock_init(&root->root_item_lock);
+
+	memset(root->crypto_keytag, 0, BTRFS_CRYPTO_KEYTAG_SIZE);
 }
 
 static struct btrfs_root *btrfs_alloc_root(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/encrypt.c b/fs/btrfs/encrypt.c
new file mode 100644
index 000000000000..ff22295a617c
--- /dev/null
+++ b/fs/btrfs/encrypt.c
@@ -0,0 +1,807 @@
+/*
+ * Copyright (C) 2016 Oracle.  All rights reserved.
+ * Author: Anand Jain (anand.jain@oracle.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include <linux/string.h>
+#include <linux/crypto.h>
+#include <linux/scatterlist.h>
+#include <linux/pagemap.h>
+#include <keys/user-type.h>
+#include "compression.h"
+#include <linux/slab.h>
+#include <linux/keyctl.h>
+#include <linux/key-type.h>
+#include <linux/cred.h>
+#include <keys/user-type.h>
+#include "ctree.h"
+#include "btrfs_inode.h"
+#include "props.h"
+#include "hash.h"
+#include "encrypt.h"
+#include "xattr.h"
+
+static const struct btrfs_encrypt_algorithm {
+	const char *name;
+	size_t keylen;
+	size_t ivlen;
+	int type_index;
+} btrfs_encrypt_algorithm_supported[] = {
+	{"ctr(aes)", 16, 16, BTRFS_ENCRYPT_AES}
+};
+
+int get_encrypt_type_index(char *type_name)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(btrfs_encrypt_algorithm_supported); i++)
+		if (!strcmp(btrfs_encrypt_algorithm_supported[i].name, type_name))
+			return btrfs_encrypt_algorithm_supported[i].type_index;
+
+	return -EINVAL;
+}
+
+/*
+ * Returns cipher alg key size if the encryption type is found
+ * otherwise 0
+ */
+size_t get_encrypt_type_len(char *type)
+{
+	int i;
+	for (i = 0; i < ARRAY_SIZE(btrfs_encrypt_algorithm_supported); i++)
+		if (!strcmp(btrfs_encrypt_algorithm_supported[i].name, type))
+			return btrfs_encrypt_algorithm_supported[i].keylen;
+
+	return 0;
+}
+
+void btrfs_disable_encrypt_inode(struct inode *inode)
+{
+	if (BTRFS_I(inode)->force_compress == BTRFS_ENCRYPT_AES)
+		BTRFS_I(inode)->force_compress = 0;
+}
+
+/*
+ * Helper to get the key.
+ * The key can be in
+ *                system keyring or
+ *                in a file in the external USB drive
+ * As of now only keyring type is supported.
+ */
+int btrfs_request_key(char *key_tag, void *key_data)
+{
+	int ret;
+	const struct user_key_payload *payload;
+	struct key *btrfs_key = NULL;
+
+	ret = 0;
+
+	btrfs_key = request_key(BTRFS_CRYPTO_KEY_TYPE, key_tag, NULL);
+	if (IS_ERR(btrfs_key)) {
+		ret = PTR_ERR(btrfs_key);
+		btrfs_key = NULL;
+		return ret;
+	}
+
+	ret = key_validate(btrfs_key);
+	if (ret < 0) {
+		key_put(btrfs_key);
+		return ret;
+	}
+
+	down_read(&btrfs_key->sem);
+	payload = user_key_payload(btrfs_key);
+	if (IS_ERR_OR_NULL(payload)) {
+		pr_err("get payload failed\n");
+		ret = PTR_ERR(payload);
+		goto out;
+	}
+
+	if (payload->datalen != BTRFS_CRYPTO_KEY_SIZE) {
+		pr_err("payload datalen does not match the expected\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	memcpy(key_data, payload->data, BTRFS_CRYPTO_KEY_SIZE);
+
+out:
+	up_read(&btrfs_key->sem);
+	key_put(btrfs_key);
+
+	return ret;
+}
+
+#if !(BTRFS_CRYPTO_TEST_BYDUMMYENC | BTRFS_CRYPTO_TEST_BYDUMMYKEY)
+static int btrfs_get_cipher_name_from_inode(struct inode *inode,
+					unsigned char *cipher_name)
+{
+	struct btrfs_root *root;
+
+	root = BTRFS_I(inode)->root;
+	memcpy(cipher_name, root->root_item.encrypt_algo,
+				BTRFS_CRYPTO_TFM_NAME_SIZE);
+	cipher_name[BTRFS_CRYPTO_TFM_NAME_SIZE] = '\0';
+	if (strlen(cipher_name))
+		return 0;
+
+	if (root->fs_info->compress_type == BTRFS_ENCRYPT_AES) {
+		memset(cipher_name, 0, BTRFS_CRYPTO_TFM_NAME_SIZE);
+		memcpy(cipher_name, "ctr(aes)",
+				BTRFS_CRYPTO_TFM_NAME_SIZE);
+		return 0;
+	}
+
+	return -EINVAL;
+}
+#endif
+
+int btrfs_check_keytag(char *keytag)
+{
+	unsigned char keydata[BTRFS_CRYPTO_KEY_SIZE];
+	return btrfs_request_key(keytag, keydata);
+}
+
+int btrfs_validate_keytag(struct inode *inode, unsigned char *keytag)
+{
+	int ret;
+	u32 seed = 0;
+	u32 keyhash = ~(u32)0;
+	struct btrfs_root_item *ri;
+	unsigned char keydata[BTRFS_CRYPTO_KEY_SIZE];
+
+	ri = &(BTRFS_I(inode)->root->root_item);
+	if (!ri->crypto_keyhash)
+		return -ENOTSUPP;
+
+	ret = btrfs_request_key(keytag, keydata);
+	if (ret)
+		return ret;
+
+	keyhash = btrfs_crc32c(seed, keydata, BTRFS_CRYPTO_KEY_SIZE);
+	if (keyhash != ri->crypto_keyhash) {
+		/* wrong key */
+		pr_err("BTRFS: %pU wrong key: hash %u expected %u\n",
+				ri->uuid, keyhash, ri->crypto_keyhash);
+		return -EKEYREJECTED;
+	}
+
+	return 0;
+}
+
+int btrfs_set_keyhash(struct inode *inode, char *keytag)
+{
+	int ret;
+	u32 seed = 0;
+	u32 keyhash = ~(u32)0;
+	struct btrfs_root *root;
+	unsigned char keydata[BTRFS_CRYPTO_KEY_SIZE+1];
+
+	ret = btrfs_request_key(keytag, keydata);
+	if (ret)
+		return ret;
+
+	keyhash = btrfs_crc32c(seed, keydata, BTRFS_CRYPTO_KEY_SIZE);
+	root = BTRFS_I(inode)->root;
+	root->root_item.crypto_keyhash = keyhash;
+	return 0;
+}
+
+int btrfs_check_key_access(struct inode *inode)
+{
+	int ret;
+	u32 seed = 0;
+	u32 keyhash = ~(u32)0;
+	struct btrfs_root_item *ri;
+	char keytag[BTRFS_CRYPTO_KEYTAG_SIZE + 1];
+	unsigned char keydata[BTRFS_CRYPTO_KEY_SIZE + 1];
+
+	ri = &(BTRFS_I(inode)->root->root_item);
+	if (!ri->crypto_keyhash)
+		return -ENOKEY;
+
+	strncpy(keytag, BTRFS_I(inode)->root->crypto_keytag,
+					BTRFS_CRYPTO_KEYTAG_SIZE);
+	keytag[BTRFS_CRYPTO_KEYTAG_SIZE] = '\0';
+	ret = btrfs_request_key(keytag, keydata);
+	if (ret)
+		return ret;
+
+	keyhash = btrfs_crc32c(seed, keydata, BTRFS_CRYPTO_KEY_SIZE);
+	/*
+	 * what if there is different key with the same keytag
+	 * check with the hash helps to eliminate this case.
+	 */
+	if (ri->crypto_keyhash != keyhash)
+		return -EKEYREJECTED;
+
+	return 0;
+}
+
+int btrfs_get_master_key(struct inode *inode,
+					unsigned char *key)
+{
+	int ret;
+	char keytag[BTRFS_CRYPTO_KEYTAG_SIZE + 1];
+	struct btrfs_root_item *ri;
+	u32 keyhash = ~(u32)0;
+	u32 seed = 0;
+	unsigned char keydata[BTRFS_CRYPTO_KEY_SIZE + 1];
+
+	ri = &(BTRFS_I(inode)->root->root_item);
+	if (strlen(BTRFS_I(inode)->root->crypto_keytag)) {
+		strncpy(keytag, BTRFS_I(inode)->root->crypto_keytag,
+					BTRFS_CRYPTO_KEYTAG_SIZE);
+	} else {
+		pr_err("BTRFS: %lu btrfs_get_master_key no keytag\n",
+						inode->i_ino);
+		return -EINVAL;
+	}
+	keytag[BTRFS_CRYPTO_KEYTAG_SIZE] = '\0';
+
+	ret = btrfs_request_key(keytag, keydata);
+	if (ret)
+		return ret;
+
+	keyhash = btrfs_crc32c(seed, keydata, BTRFS_CRYPTO_KEY_SIZE);
+
+	/*
+	 * what if there is different key with the same keytag
+	 * checking with the hash helps to eliminate this case.
+	 */
+	if (ri->crypto_keyhash && ri->crypto_keyhash != keyhash) {
+		/* wrong key */
+		pr_err("BTRFS: %pU wrong key: hash %u expected %u\n",
+				ri->uuid, keyhash, ri->crypto_keyhash);
+		return -EKEYREJECTED;
+	}
+
+	memcpy(key, keydata, BTRFS_CRYPTO_KEY_SIZE);
+
+	return 0;
+}
+
+#if !(BTRFS_CRYPTO_TEST_BYDUMMYENC | BTRFS_CRYPTO_TEST_BYDUMMYKEY)
+static int btrfs_get_iv_from_inode(struct inode *inode,
+				unsigned char *iv, size_t *iv_size)
+{
+	if (!BTRFS_I(inode)->iv_len)
+		return -EINVAL;
+
+	memcpy(iv, BTRFS_I(inode)->cryptoiv, BTRFS_I(inode)->iv_len);
+
+	*iv_size = BTRFS_I(inode)->iv_len;
+
+	return 0;
+}
+#endif
+
+int btrfs_update_key_to_binode(struct inode *inode)
+{
+	int ret;
+	unsigned char keydata[BTRFS_CRYPTO_KEY_SIZE];
+
+	ret = btrfs_get_master_key(inode, keydata);
+	if (ret)
+		return ret;
+
+	memcpy(BTRFS_I(inode)->key_payload, keydata,
+					BTRFS_CRYPTO_KEY_SIZE);
+
+	BTRFS_I(inode)->key_len = BTRFS_CRYPTO_KEY_SIZE;
+	return ret;
+}
+
+int btrfs_blkcipher(int encrypt, struct btrfs_blkcipher_req *btrfs_req,
+						char *data, size_t len)
+{
+	int ret = -EFAULT;
+	struct scatterlist sg;
+	unsigned int ivsize = 0;
+	unsigned int blksize = 0;
+	char *cipher = "cbc(aes)";
+	struct blkcipher_desc desc;
+	struct crypto_blkcipher *blkcipher = NULL;
+
+	blkcipher = crypto_alloc_blkcipher(cipher, 0, 0);
+	if (IS_ERR(blkcipher)) {
+		pr_err("BTRFS: crypto, allocate blkcipher handle for %s\n", cipher);
+		return -PTR_ERR(blkcipher);
+	}
+
+	blksize = crypto_blkcipher_blocksize(blkcipher);
+	if (len < blksize) {
+		pr_err("BTRFS: crypto, blk can't work with len %lu\n", len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (crypto_blkcipher_setkey(blkcipher, btrfs_req->key,
+					btrfs_req->key_len)) {
+		pr_err("BTRFS: crypto, key could not be set\n");
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	ivsize = crypto_blkcipher_ivsize(blkcipher);
+	if (ivsize != btrfs_req->iv_len) {
+		pr_err("BTRFS: crypto, length differs from expected length\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	crypto_blkcipher_set_iv(blkcipher, btrfs_req->cryptoiv,
+					btrfs_req->iv_len);
+
+	desc.flags = 0;
+	desc.tfm = blkcipher;
+	sg_init_one(&sg, data, len);
+
+	if (encrypt) {
+		/* encrypt data in place */
+		ret = crypto_blkcipher_encrypt(&desc, &sg, &sg, len);
+	} else {
+		/* decrypt data in place */
+		ret = crypto_blkcipher_decrypt(&desc, &sg, &sg, len);
+	}
+
+out:
+	crypto_free_blkcipher(blkcipher);
+	return ret;
+}
+
+int btrfs_cipher_iv(int encrypt, struct inode *inode,
+					char *data, size_t len)
+{
+	int ret;
+	struct btrfs_blkcipher_req btrfs_req;
+	unsigned char key[BTRFS_CRYPTO_KEY_SIZE];
+	unsigned char *iv = BTRFS_CRYPTO_IV_IV;
+
+	ret = btrfs_get_master_key(inode, key);
+	if (ret) {
+		pr_err("BTRFS: crypto, %lu btrfs_get_master_key failed to '%s' iv\n",
+				inode->i_ino, encrypt?"encrypt":"decrypt");
+		return ret;
+	}
+
+	memcpy(btrfs_req.key, key, BTRFS_CRYPTO_KEY_SIZE);
+	btrfs_req.key_len = BTRFS_CRYPTO_KEY_SIZE;
+	memcpy(btrfs_req.cryptoiv, iv, BTRFS_CRYPTO_IV_SIZE);
+	btrfs_req.iv_len = BTRFS_CRYPTO_IV_SIZE;
+
+	ret = btrfs_blkcipher(encrypt, &btrfs_req, data, len);
+
+	return ret;
+}
+
+static void btrfs_ablkcipher_cb(struct crypto_async_request *req, int error)
+{
+	struct btrfs_ablkcipher_result *cb_result = req->data;
+
+	if (error == -EINPROGRESS)
+		return;
+
+	cb_result->err = error;
+
+	complete(&cb_result->completion);
+}
+
+int btrfs_do_ablkcipher(int enc, struct page *page, unsigned long len,
+				struct btrfs_ablkcipher_req_data *btrfs_req)
+{
+	int ret = 0;
+	char *ivdata = NULL;
+	unsigned int ivsize = 0;
+	unsigned int ivdata_size;
+	unsigned int ablksize = 0;
+	struct ablkcipher_request *req = NULL;
+	struct crypto_ablkcipher *ablkcipher = NULL;
+	int key_len = btrfs_req->key_len;
+
+	ablkcipher = crypto_alloc_ablkcipher(btrfs_req->cipher_name, 0, 0);
+	if (IS_ERR(ablkcipher)) {
+		ret = PTR_ERR(ablkcipher);
+		pr_err("BTRFS: crypto, allocate cipher engine '%s' failed: %d\n",
+					btrfs_req->cipher_name, ret);
+		return ret;
+	}
+
+	ablksize = crypto_ablkcipher_blocksize(ablkcipher);
+	/* we can't cipher a block less the ciper block size */
+	if (len < ablksize) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (ablksize > BTRFS_CRYPTO_KEY_SIZE)
+		BUG_ON("Incompatible key for the cipher\n");
+
+	ivsize = crypto_ablkcipher_ivsize(ablkcipher);
+	ivdata = btrfs_req->iv;
+	ivdata_size = btrfs_req->iv_size;
+
+	if (ivsize != ivdata_size) {
+		BUG_ON("IV length differs from expected length\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	req = ablkcipher_request_alloc(ablkcipher, GFP_KERNEL);
+	if (IS_ERR(req)) {
+		pr_info("BTRFS: crypto, could not allocate request queue\n");
+		ret = PTR_ERR(req);
+		goto out;
+	}
+	btrfs_req->tfm = ablkcipher;
+	btrfs_req->req = req;
+
+	ablkcipher_request_set_tfm(req, ablkcipher);
+	ablkcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+				btrfs_ablkcipher_cb, &btrfs_req->cb_result);
+
+	ret = crypto_ablkcipher_setkey(ablkcipher, btrfs_req->key, key_len);
+	if (ret) {
+		pr_err("BTRFS: crypto, cipher '%s' set key failed: len %u %d\n",
+				btrfs_req->cipher_name, key_len, ret);
+		goto out;
+	}
+
+	sg_init_table(&btrfs_req->sg_src, 1);
+	sg_set_page(&btrfs_req->sg_src, page, len, 0);
+	ablkcipher_request_set_crypt(req, &btrfs_req->sg_src,
+				&btrfs_req->sg_src, len, ivdata);
+
+	init_completion(&btrfs_req->cb_result.completion);
+
+	if (enc)
+		ret = crypto_ablkcipher_encrypt(btrfs_req->req);
+	else
+		ret = crypto_ablkcipher_decrypt(btrfs_req->req);
+
+	switch (ret) {
+	case 0:
+		break;
+	case -EINPROGRESS:
+	case -EBUSY:
+		ret = wait_for_completion_interruptible(
+					&btrfs_req->cb_result.completion);
+		if (!ret && !btrfs_req->cb_result.err) {
+			reinit_completion(&btrfs_req->cb_result.completion);
+			break;
+		}
+	default:
+		pr_info("crypto engine: %d result %d\n",
+					ret, btrfs_req->cb_result.err);
+		break;
+	}
+	init_completion(&btrfs_req->cb_result.completion);
+
+out:
+	if (ablkcipher)
+		crypto_free_ablkcipher(ablkcipher);
+	if (req)
+		ablkcipher_request_free(req);
+
+	return ret;
+}
+
+static int btrfs_do_ablkcipher_by_inode(int enc, struct page *page,
+				unsigned long len, struct inode *inode)
+{
+	int ret;
+	struct btrfs_ablkcipher_req_data btrfs_req;
+
+	if (!inode) {
+		BUG_ON("BTRFS: crypto, needs inode\n");
+		return -EINVAL;
+	}
+	memset(&btrfs_req, 0, sizeof(struct btrfs_ablkcipher_req_data));
+
+#if BTRFS_CRYPTO_TEST_BYDUMMYENC
+	if (len < PAGE_SIZE) {
+		char *in;
+		in = kmap(page);
+		/*
+		 * not scratched with zero, so to have
+		 * higher chance of catching bugs
+		 */
+		memset(in+len, 'z', PAGE_SIZE - len);
+		kunmap(page);
+	}
+	ret = 0;
+#elif BTRFS_CRYPTO_TEST_BYDUMMYKEY
+	/*
+	 * This is for testing only, especially the extents ops,
+	 * we don't worry about security here
+	 */
+	strncpy(btrfs_req.cipher_name, "ctr(aes)", BTRFS_CRYPTO_TFM_NAME_SIZE);
+	strncpy(btrfs_req.key, BTRFS_CRYPTO_IV_IV, BTRFS_CRYPTO_KEY_SIZE);
+	strncpy(btrfs_req.iv, BTRFS_CRYPTO_IV_IV, BTRFS_CRYPTO_IV_SIZE);
+	btrfs_req.key_len = BTRFS_CRYPTO_KEY_SIZE;
+	btrfs_req.iv_size = BTRFS_CRYPTO_IV_SIZE;
+	ret = btrfs_do_ablkcipher(enc, page, len, &btrfs_req);
+#else
+
+	/* Get the cipher engine name */
+	ret = btrfs_get_cipher_name_from_inode(inode, btrfs_req.cipher_name);
+	if (ret) {
+		pr_err("BTRFS: Error: Invalid cipher name: '%d'\n", ret);
+		return -EINVAL;
+	}
+
+	/* Get the Key */
+	if (BTRFS_I(inode)->key_len)
+		memcpy(btrfs_req.key, BTRFS_I(inode)->key_payload,
+						BTRFS_CRYPTO_KEY_SIZE);
+	else
+		ret = btrfs_get_master_key(inode, btrfs_req.key);
+
+	btrfs_req.key_len = BTRFS_CRYPTO_KEY_SIZE;
+
+	if (ret) {
+		/* Error getting key */
+		if (enc) {
+			/* For encrypt its an error*/
+			pr_err("BTRFS: crypto, '%lu' Get key failed: %d\n",
+						inode->i_ino, ret);
+		} else {
+			/*
+			 * For decrypt, the user with no key, may access
+			 * ciphertext
+			 */
+			if (ret == -ENOKEY || ret == -EKEYREVOKED)
+				ret = 0;
+			else
+				pr_err("BTRFS: crypto, '%lu' Get key failed: %d\n",
+						inode->i_ino, ret);
+		}
+		return ret;
+	}
+
+	ret = btrfs_get_iv_from_inode(inode, btrfs_req.iv,
+					&btrfs_req.iv_size);
+	if (ret) {
+		pr_err("BTRFS: crypto, can't get cryptoiv\n");
+		return ret;
+	}
+
+	ret = btrfs_do_ablkcipher(enc, page, len, &btrfs_req);
+#endif
+
+	return ret;
+}
+
+
+static int btrfs_encrypt_pages(struct list_head *na_ws,
+			struct address_space *mapping, u64 start,
+			unsigned long len, struct page **pages,
+			unsigned long nr_pages, unsigned long *nr_out_pages,
+			unsigned long *total_in, unsigned long *total_out,
+			unsigned long na_max_out, int dont_align)
+{
+	int ret;
+	struct page *in_page;
+	struct page *out_page = NULL;
+	char *in;
+	char *out;
+	unsigned long bytes_left = len;
+	unsigned long cur_page_len = 0;
+	unsigned long cur_page_len_for_out = 0;
+	unsigned long i;
+	struct inode *inode;
+	u64 blocksize;
+
+	*total_in = 0;
+	*nr_out_pages = 0;
+	*total_out = 0;
+	if (!len)
+		return 0;
+
+	if (!mapping && !mapping->host) {
+		WARN_ON("BTRFS: crypto, need mapped pages\n");
+		return -EINVAL;
+	}
+
+	inode = mapping->host;
+	blocksize = BTRFS_I(inode)->root->sectorsize;
+	if (blocksize != PAGE_SIZE)
+		pr_err("BTRFS: crypto, fatal, blocksize not same as page size\n");
+
+	for (i = 0; i < nr_pages; i++) {
+
+		in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+		cur_page_len = min(bytes_left, PAGE_SIZE);
+		out_page = alloc_page(GFP_NOFS| __GFP_HIGHMEM);
+
+		in = kmap(in_page);
+		out = kmap(out_page);
+		memset(out, 0, PAGE_SIZE);
+		memcpy(out, in, cur_page_len);
+		kunmap(out_page);
+		kunmap(in_page);
+		if (dont_align)
+			cur_page_len_for_out = cur_page_len;
+		else
+			cur_page_len_for_out = ALIGN(cur_page_len, blocksize);
+
+		ret = btrfs_do_ablkcipher_by_inode(1, out_page,
+						cur_page_len_for_out, inode);
+		if (ret) {
+			__free_page(out_page);
+			return ret;
+		}
+		put_page(in_page);
+
+		pages[i] = out_page;
+		*nr_out_pages = *nr_out_pages + 1;
+		*total_in += cur_page_len;
+		*total_out += cur_page_len_for_out;
+
+		start += cur_page_len;
+		bytes_left = bytes_left - cur_page_len;
+		if (!bytes_left)
+			break;
+	}
+
+	return ret;
+}
+
+static int btrfs_decrypt_pages(struct list_head *na_ws, unsigned char *in,
+			struct page *out_page, unsigned long na_start_byte,
+			size_t in_size, size_t out_size)
+{
+	int ret;
+	char *out_addr;
+	struct address_space *mapping;
+	struct inode *inode;
+
+	if (!out_page)
+		return -EINVAL;
+
+	if (in_size > PAGE_SIZE) {
+		WARN_ON("BTRFS: crypto, cant decrypt more than pagesize\n");
+		return -EINVAL;
+	}
+
+	mapping = out_page->mapping;
+	if (!mapping && !mapping->host) {
+		WARN_ON("BTRFS: crypto, Need mapped pages\n");
+		return -EINVAL;
+	}
+
+	inode = mapping->host;
+
+	out_addr = kmap_atomic(out_page);
+	memcpy(out_addr, in, in_size);
+	kunmap_atomic(out_addr);
+
+	ret = btrfs_do_ablkcipher_by_inode(0, out_page, in_size, inode);
+
+#if BTRFS_CRYPTO_INFO_POTENTIAL_BUG
+	if (na_start_byte) {
+		pr_err("BTRFS: crypto, a context that a out start is not zero %lu\n",
+						na_start_byte);
+		BUG_ON(1);
+	}
+#endif
+
+	return ret;
+}
+
+static int btrfs_decrypt_pages_bio(struct list_head *na_ws,
+		struct page **in_pages, u64 disk_start, struct bio_vec *bvec,
+		int bi_vcnt, size_t in_len)
+{
+	char *in;
+	char *out;
+	int ret = 0;
+	int more = 0;
+	struct page *in_page;
+	struct page *out_page;
+	unsigned long bytes_left;
+	unsigned long total_in_pages;
+	unsigned long cur_page_len;
+	unsigned long processed_len = 0;
+	unsigned long page_in_index = 0;
+	unsigned long page_out_index = 0;
+	unsigned long saved_page_out_index = 0;
+	unsigned long pg_offset = 0;
+	struct address_space *mapping;
+	struct inode *inode;
+	total_in_pages = DIV_ROUND_UP(in_len, PAGE_SIZE);
+
+	if (na_ws)
+		return -EINVAL;
+
+	out_page = bvec[page_out_index].bv_page;
+	mapping = out_page->mapping;
+	if (!mapping && !mapping->host) {
+		WARN_ON("BTRFS: crypto, need mapped page\n");
+		return -EINVAL;
+	}
+
+	inode = mapping->host;
+
+#if BTRFS_CRYPTO_INFO_POTENTIAL_BUG
+	/* Hope the call here is an inode specific, or its not ? */
+	if (bi_vcnt > 1) {
+		int i;
+		struct inode *tmp_i;
+		for (i = 0; i < bi_vcnt; i++) {
+			tmp_i = (bvec[i].bv_page)->mapping->host;
+			if (tmp_i != inode)
+				pr_err("BTRFS: crypto, pages of diff files %lu and %lu\n",
+					tmp_i->i_ino, inode->i_ino);
+		}
+	}
+#endif
+
+	bytes_left = in_len;
+
+#if BTRFS_CRYPTO_INFO_POTENTIAL_BUG
+	if (total_in_pages < bi_vcnt)
+		pr_err("BTRFS: crypto, untested: pages to be decrypted is less than expected, "\
+			"total_in_pages %lu out_nr_pages %d in_len %lu\n",
+					total_in_pages, bi_vcnt, in_len);
+#endif
+
+	for (page_in_index = 0; page_in_index < total_in_pages;
+						page_in_index++) {
+		cur_page_len = min(bytes_left, PAGE_SIZE);
+		saved_page_out_index = page_out_index;
+
+		in_page = in_pages[page_in_index];
+		in = kmap(in_page);
+		more = btrfs_decompress_buf2page(in, processed_len,
+				processed_len + cur_page_len, disk_start,
+				bvec, bi_vcnt, &page_out_index, &pg_offset);
+		kunmap(in_page);
+
+		/*
+		 * if page_out_index is incremented then we know data to
+		 * decrypt is in the outpage.
+		 */
+		if (!more || saved_page_out_index != page_out_index) {
+			out_page = bvec[saved_page_out_index].bv_page;
+			ret = btrfs_do_ablkcipher_by_inode(0, out_page,
+						cur_page_len, inode);
+			if (ret)
+				return ret;
+
+			if (cur_page_len < PAGE_SIZE) {
+				out = kmap(out_page);
+				memset(out + cur_page_len, 0,
+						PAGE_SIZE - cur_page_len);
+				kunmap(out_page);
+			}
+		}
+
+		bytes_left = bytes_left - cur_page_len;
+		processed_len = processed_len + cur_page_len;
+		if (!more)
+			break;
+	}
+	return 0;
+}
+
+const struct btrfs_compress_op btrfs_encrypt_ops = {
+	.alloc_workspace	= NULL,
+	.free_workspace		= NULL,
+	.compress_pages		= btrfs_encrypt_pages,
+	.decompress_biovec	= btrfs_decrypt_pages_bio,
+	.decompress		= btrfs_decrypt_pages,
+};
diff --git a/fs/btrfs/encrypt.h b/fs/btrfs/encrypt.h
new file mode 100644
index 000000000000..8e794da9d8f5
--- /dev/null
+++ b/fs/btrfs/encrypt.h
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2016 Oracle.  All rights reserved.
+ * Author: Anand Jain, (anand.jain@oracle.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __ENCRYPT__
+#define __ENCRYPT__
+/*
+ * Encryption sub features defines.
+ */
+#ifndef BTRFS_CRYPT_SUB_FEATURES
+//testing
+	//enable method
+	#define BTRFS_CRYPTO_TEST_ENABLE_BYMNTOPT	0
+	//key choice
+	#define BTRFS_CRYPTO_TEST_BYDUMMYKEY		0 //off rest
+	#define BTRFS_CRYPTO_TEST_BYDUMMYENC		0 //off rest
+
+//debug
+	#define BTRFS_CRYPTO_INFO_POTENTIAL_BUG 	1
+
+//feature
+	#define BTRFS_CRYPTO_KEY_TYPE_LOGON		1
+#endif
+
+#define BTRFS_CRYPTO_TFM_NAME_SIZE	16
+#define BTRFS_CRYPTO_KEYTAG_SIZE	16
+#define BTRFS_CRYPTO_KEY_SIZE		16
+#define BTRFS_CRYPTO_IV_SIZE		16
+#define BTRFS_CRYPTO_IV_IV 	\
+	"\x12\x34\x56\x78\x90\xab\xcd\xef\x12\x34\x56\x78\x90\xab\xcd\xef"
+#if BTRFS_CRYPTO_KEY_TYPE_LOGON
+	#define BTRFS_CRYPTO_KEY_TYPE &key_type_logon
+#else
+	#define BTRFS_CRYPTO_KEY_TYPE &key_type_user
+#endif
+
+struct btrfs_ablkcipher_result {
+	struct completion completion;
+	int err;
+};
+
+struct btrfs_ablkcipher_req_data {
+	char cipher_name[17];
+	struct scatterlist sg_src;
+	struct crypto_ablkcipher *tfm;
+	struct ablkcipher_request *req;
+	unsigned char key[BTRFS_CRYPTO_KEY_SIZE];
+	size_t key_len;
+	unsigned char iv[BTRFS_CRYPTO_IV_SIZE];
+	size_t iv_size;
+	struct btrfs_ablkcipher_result cb_result;
+};
+
+struct btrfs_blkcipher_req {
+	unsigned char key[BTRFS_CRYPTO_KEY_SIZE];
+	size_t key_len;
+	unsigned char cryptoiv[BTRFS_CRYPTO_IV_SIZE];
+	size_t iv_len;
+};
+
+int get_encrypt_type_index(char *type_name);
+size_t get_encrypt_type_len(char *encryption_type);
+int btrfs_update_key_to_binode(struct inode *inode);
+int btrfs_validate_keytag(struct inode *inode, unsigned char *keytag);
+int btrfs_check_keytag(char *keytag);
+int btrfs_set_keyhash(struct inode *inode, char *keytag);
+int btrfs_request_key(char *key_tag, void *key_data);
+int btrfs_key_get(struct inode *inode);
+void btrfs_key_put(struct inode *inode);
+int btrfs_check_key_access(struct inode *inode);
+int btrfs_do_ablkcipher(int enc, struct page *page, unsigned long len,
+			struct btrfs_ablkcipher_req_data *btrfs_req);
+int btrfs_get_master_key(struct inode *inode,
+					unsigned char *keydata);
+int btrfs_cipher_iv(int encrypt, struct inode *inode,
+					char *data, size_t len);
+void btrfs_disable_encrypt_inode(struct inode *inode);
+void print_hex(char *key, size_t len, char *prefix);
+#endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8b1212e8f7a8..f07c86245c70 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -60,6 +60,7 @@
 #include "hash.h"
 #include "props.h"
 #include "qgroup.h"
+#include "encrypt.h"
 
 struct btrfs_iget_args {
 	struct btrfs_key *location;
@@ -206,10 +207,13 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans,
 		}
 		btrfs_set_file_extent_compression(leaf, ei,
 						  compress_type);
+		if (compress_type == BTRFS_ENCRYPT_AES)
+			btrfs_set_file_extent_encryption(leaf, ei, 1);
 	} else {
 		page = find_get_page(inode->i_mapping,
 				     start >> PAGE_SHIFT);
 		btrfs_set_file_extent_compression(leaf, ei, 0);
+		btrfs_set_file_extent_encryption(leaf, ei, 0);
 		kaddr = kmap_atomic(page);
 		offset = start & (PAGE_SIZE - 1);
 		write_extent_buffer(leaf, kaddr + offset, ptr, size);
@@ -386,6 +390,154 @@ static inline int inode_need_compress(struct inode *inode)
 	return 0;
 }
 
+static int btrfs_inline_extent_able(struct inode *inode,
+				u64 start, u64 end, size_t data_len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u64 isize = i_size_read(inode);
+	u64 actual_end = min(end + 1, isize);
+
+	if (start > 0 ||
+	    actual_end > root->sectorsize ||
+	    data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
+	    (!data_len &&
+	    (actual_end & (root->sectorsize - 1)) == 0) ||
+	    end + 1 < isize ||
+	    data_len > root->fs_info->max_inline) {
+		return 0;
+	}
+
+	return 1;
+}
+
+/*
+ * In crypto bailout is only when its inevitable, in the long run we
+ * should merge this to compress_file_range() though.
+ */
+static noinline int encrypt_file_range(struct inode *inode,
+			struct page *locked_page, u64 start, u64 end,
+			struct async_cow *async_cow, int *num_added)
+{
+	int ret = 0;
+	u64 actual_end;
+	unsigned long len = 0;
+	unsigned long nr_pages;
+	int may_inline_dont_align = -1; //test with btrfs/035
+	struct page **pages = NULL;
+	unsigned long total_in = 0;
+	unsigned long ram_bytes = 0;
+	unsigned long total_out = 0;
+	unsigned long nr_pages_ret = 0;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int encode_type = root->fs_info->compress_type;
+
+	if (BTRFS_I(inode)->force_compress)
+		encode_type = BTRFS_I(inode)->force_compress;
+
+	if ((end - start + 1) < SZ_16K &&
+	    (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size))
+		btrfs_add_inode_defrag(NULL, inode);
+
+	actual_end = min_t(u64, i_size_read(inode), end + 1);
+	if (actual_end < start)
+		actual_end = end + 1;
+
+again:
+	if (actual_end <= start)
+		return 0;
+
+	len = min_t(unsigned long, actual_end - start, SZ_128K);
+
+	nr_pages = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
+	nr_pages = min_t(unsigned long, nr_pages, SZ_128K / PAGE_SIZE);
+	pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
+	if (!pages) {
+		pr_err("BTRFS: Fatal: kcalloc for encrypt page list failed\n");
+		goto inevitable_bailout;
+	}
+
+	extent_range_clear_dirty_for_io(inode, start, end);
+
+	total_in = 0;
+	total_out = 0;
+	nr_pages_ret = 0;
+
+	if (len == actual_end)
+		may_inline_dont_align = btrfs_inline_extent_able(inode,
+							start, end, len);
+	else
+		may_inline_dont_align = 0;
+	ret = btrfs_compress_pages(encode_type, inode->i_mapping, start,
+				len, pages, nr_pages, &nr_pages_ret,
+				&total_in, &total_out, SZ_128K,
+				may_inline_dont_align);
+	if (ret) {
+		kfree(pages);
+		goto inevitable_bailout;
+	}
+
+	if (may_inline_dont_align) {
+
+		ret = cow_file_range_inline(root, inode, start, end,
+					total_out, encode_type, pages);
+
+		if (!ret) {
+			extent_clear_unlock_delalloc(inode, start, end,
+				NULL, EXTENT_DELALLOC | EXTENT_DEFRAG,
+				PAGE_UNLOCK | PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
+				PAGE_END_WRITEBACK);
+			return 0;
+		}
+
+		if (ret < 0)
+			goto inevitable_bailout;
+	}
+
+	ram_bytes = ALIGN(total_in, PAGE_SIZE);
+
+	ret = add_async_extent(async_cow, start, ram_bytes, total_out, pages,
+						nr_pages_ret, encode_type);
+	*num_added += 1;
+	if (start + total_in < end) {
+		start += total_in;
+		pages = NULL;
+		cond_resched();
+		goto again;
+	}
+
+	return ret;
+
+inevitable_bailout:
+	if (start == 0) {
+		ret = cow_file_range_inline(root, inode, start, end,
+					0, BTRFS_COMPRESS_NONE, NULL);
+		if (ret <= 0) {
+			unsigned long clear_flags = EXTENT_DELALLOC |
+							EXTENT_DEFRAG;
+			unsigned long page_error_op = PAGE_UNLOCK |
+				PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
+				PAGE_END_WRITEBACK;
+
+			clear_flags |= (ret < 0) ? EXTENT_DO_ACCOUNTING : 0;
+			page_error_op |= (ret < 0) ? PAGE_SET_ERROR : 0;
+
+			extent_clear_unlock_delalloc(inode, start, end, NULL,
+						clear_flags, page_error_op);
+			return ret;
+		}
+	}
+	if (page_offset(locked_page) >= start &&
+				page_offset(locked_page) <= end)
+			__set_page_dirty_nobuffers(locked_page);
+
+	extent_range_redirty_for_io(inode, start, end);
+	ret = add_async_extent(async_cow, start, end - start + 1,
+				0, NULL, 0, BTRFS_COMPRESS_NONE);
+	*num_added += 1;
+
+	return ret;
+}
+
 /*
  * we create compressed extents in two phases.  The first
  * phase compresses a range of pages that have already been
@@ -510,7 +662,7 @@ again:
 					   nr_pages, &nr_pages_ret,
 					   &total_in,
 					   &total_compressed,
-					   max_compressed);
+					   max_compressed, 0);
 
 		if (!ret) {
 			unsigned long offset = total_compressed &
@@ -1087,12 +1239,26 @@ out_unlock:
 static noinline void async_cow_start(struct btrfs_work *work)
 {
 	struct async_cow *async_cow;
+	struct inode *inode;
 	int num_added = 0;
+	int encode_type;
+
 	async_cow = container_of(work, struct async_cow, work);
+	inode = async_cow->inode;
+	encode_type = BTRFS_I(inode)->root->fs_info->compress_type;
 
-	compress_file_range(async_cow->inode, async_cow->locked_page,
+	if (BTRFS_I(inode)->force_compress)
+		encode_type = BTRFS_I(inode)->force_compress;
+
+	if (encode_type == BTRFS_ENCRYPT_AES)
+		encrypt_file_range(async_cow->inode, async_cow->locked_page,
+			    async_cow->start, async_cow->end, async_cow,
+			    &num_added);
+	else
+		compress_file_range(async_cow->inode, async_cow->locked_page,
 			    async_cow->start, async_cow->end, async_cow,
 			    &num_added);
+
 	if (num_added == 0) {
 		btrfs_add_delayed_iput(async_cow->inode);
 		async_cow->inode = NULL;
@@ -6735,6 +6901,8 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
 	ret = btrfs_decompress(compress_type, tmp, page,
 			       extent_offset, inline_size, max_size);
+	if (ret && ret == -ENOKEY)
+		ret = 0;
 	kfree(tmp);
 	return ret;
 }
@@ -9249,6 +9417,11 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	ei->i_otime.tv_sec = 0;
 	ei->i_otime.tv_nsec = 0;
 
+	memset(ei->key_payload, 0, BTRFS_CRYPTO_KEY_SIZE);
+	ei->key_len = 0;
+	memset(ei->cryptoiv, 0, BTRFS_CRYPTO_IV_SIZE);
+	ei->iv_len = 0;
+
 	inode = &ei->vfs_inode;
 	extent_map_tree_init(&ei->extent_tree);
 	extent_io_tree_init(&ei->io_tree, &inode->i_data);
@@ -9689,6 +9862,50 @@ out:
 	return ret;
 }
 
+static int btrfs_check_fops_move_crypto(struct btrfs_root *src,
+					struct btrfs_root *dest)
+{
+	u64 src_flags;
+	u64 dest_flags;
+
+	src_flags = btrfs_root_flags(&src->root_item);
+	dest_flags = btrfs_root_flags(&dest->root_item);
+
+	if (src == dest)
+		return 0;
+
+	/*
+	 * Move from non-encrypted sv to encrypted sv is a thing
+	 * as usual
+	 */
+	if (!(src_flags & BTRFS_ROOT_SUBVOL_ENCRYPT))
+		return 0;
+
+	/*
+	 * Here we are sure src is encrypted but not dest.
+	 * This means asking for reverse encryption which
+	 * is nosupp as of now. Workaround is to use cp instead.
+	 */
+	if (!(dest_flags & BTRFS_ROOT_SUBVOL_ENCRYPT))
+		return -EOPNOTSUPP;
+
+	/*
+	 * Move to different sv, but having same key hash.
+	 * As of now there is only one encryption policy so just
+	 * approve the move, but its a 'crypto-fixme' when there
+	 * are mulitple encryption policies
+	 */
+	if (src->root_item.crypto_keyhash ==
+			dest->root_item.crypto_keyhash)
+		return 0;
+
+	/*
+	 * Any thing else no supp
+	 */
+
+	return -EOPNOTSUPP;
+}
+
 static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 			   struct inode *new_dir, struct dentry *new_dentry,
 			   unsigned int flags)
@@ -9705,6 +9922,21 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	u64 old_ino = btrfs_ino(old_inode);
 	bool log_pinned = false;
 
+	u64 root_flags;
+	u64 dest_flags;
+	/*
+	 * File move across subvol with potentially a different/no
+	 * encryption key is not supported as if now.
+	 */
+	root_flags = btrfs_root_flags(&root->root_item);
+	dest_flags = btrfs_root_flags(&dest->root_item);
+	if ((root_flags & BTRFS_ROOT_SUBVOL_ENCRYPT) ||
+		(dest_flags & BTRFS_ROOT_SUBVOL_ENCRYPT)) {
+		ret = btrfs_check_fops_move_crypto(root, dest);
+		if (ret)
+			return ret;
+	}
+
 	if (btrfs_ino(new_dir) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)
 		return -EPERM;
 
@@ -10403,6 +10635,25 @@ static int btrfs_permission(struct inode *inode, int mask)
 		if (BTRFS_I(inode)->flags & BTRFS_INODE_READONLY)
 			return -EACCES;
 	}
+
+	if (S_ISREG(mode)) {
+		int ret = 0;
+		u64 root_flags;
+
+		root_flags = btrfs_root_flags(&root->root_item);
+		if (root_flags & BTRFS_ROOT_SUBVOL_ENCRYPT) {
+			ret = btrfs_check_key_access(inode);
+			if (ret) {
+				return ret;
+			}
+			if (!BTRFS_I(inode)->key_len) {
+				ret = btrfs_update_key_to_binode(inode);
+				if (ret)
+					return ret;
+			}
+		}
+	}
+
 	return generic_permission(inode, mask);
 }
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 05173563e4a6..0bcc0a357c02 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2144,8 +2144,15 @@ static noinline int btrfs_ioctl_tree_search(struct file *file,
 	int ret;
 	size_t buf_size;
 
+	/*
+	 * Allow tree serach by non root, so that non root
+	 * user can find info about the subvol they own/create.
+	 * BTRFS_CRYPTO_fixme: check if safe?.
+	 */
+#if 0
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
+#endif
 
 	uargs = (struct btrfs_ioctl_search_args __user *)argp;
 
@@ -2622,6 +2629,27 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
 			}
 			/* compression requires us to start the IO */
 			if ((range->flags & BTRFS_DEFRAG_RANGE_COMPRESS)) {
+#if !(BTRFS_CRYPTO_TEST_BYDUMMYKEY | BTRFS_CRYPTO_TEST_BYDUMMYENC)
+				/*
+				 * Check if user is trying to encrypt the file/root
+				 * which isn't under an encrypt subvol, as their isn't
+				 * key for that, unless we are under the defines
+				 * BTRFS_CRYPTO_TEST_BYDUMMYKEY or
+				 * BTRFS_CRYPTO_TEST_BYDUMMYENC
+				 * which means its testing context so don't really
+				 * worry about the key.
+				 */
+				if (range->compress_type == BTRFS_ENCRYPT_AES) {
+					u64 root_flags = btrfs_root_flags(&root->root_item);
+					/*
+					 * Presence of a valid key is already been verified
+					 * at the permission, just reject encrypt compress
+					 * type on a non encrypt subvol.
+					 */
+					if (!(root_flags & BTRFS_ROOT_SUBVOL_ENCRYPT))
+						return -EOPNOTSUPP;
+				}
+#endif
 				range->flags |= BTRFS_DEFRAG_RANGE_START_IO;
 				range->extent_thresh = (u32)-1;
 			}
@@ -3844,6 +3872,41 @@ out:
 	return ret;
 }
 
+static int btrfs_encode_check_fops(struct btrfs_root *src,
+			struct btrfs_root *dest, int fops)
+{
+#if 0
+	u64 src_flags;
+	u64 dest_flags;
+
+	src_flags = btrfs_root_flags(&src->root_item);
+	dest_flags = btrfs_root_flags(&dest->root_item);
+
+	if (src_flags & BTRFS_ROOT_SUBVOL_ENCRYPT)
+		return -EOPNOTSUPP;
+
+	if (dest_flags & BTRFS_ROOT_SUBVOL_ENCRYPT)
+		return -EOPNOTSUPP;
+
+	return 0;
+#else
+
+	switch(fops) {
+	case 1: //clone; not tested for per file compress/encrypt
+		if (src == dest)
+			return 0;
+
+		if (src->root_item.crypto_keyhash ==
+			dest->root_item.crypto_keyhash)
+			return 0;
+
+		return -EOPNOTSUPP;
+		break;
+	}
+	return -EINVAL;
+#endif
+}
+
 static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
 					u64 off, u64 olen, u64 destoff)
 {
@@ -3873,6 +3936,10 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
 	    src->i_sb != inode->i_sb)
 		return -EXDEV;
 
+	ret = btrfs_encode_check_fops(BTRFS_I(src)->root, root, 1);
+	if (ret)
+		return ret;
+
 	/* don't make the dst file partly checksummed */
 	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
 	    (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 1adfbe7be6b8..46ed0fc5c137 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -92,7 +92,7 @@ static int lzo_compress_pages(struct list_head *ws,
 			      unsigned long *out_pages,
 			      unsigned long *total_in,
 			      unsigned long *total_out,
-			      unsigned long max_out)
+			      unsigned long max_out, int flags)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
 	int ret = 0;
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index 36992128c746..a1c74ce29b9e 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -17,38 +17,74 @@
  */
 
 #include <linux/hashtable.h>
+#include <linux/random.h>
 #include "props.h"
 #include "btrfs_inode.h"
 #include "hash.h"
 #include "transaction.h"
 #include "xattr.h"
 #include "compression.h"
+#include "encrypt.h"
 
 #define BTRFS_PROP_HANDLERS_HT_BITS 8
 static DEFINE_HASHTABLE(prop_handlers_ht, BTRFS_PROP_HANDLERS_HT_BITS);
 
+#define BTRFS_PROP_INHERIT_NONE		(1U << 0)
+#define BTRFS_PROP_INHERIT_FOR_DIR	(1U << 1)
+#define BTRFS_PROP_INHERIT_FOR_CLONE	(1U << 2)
+#define BTRFS_PROP_INHERIT_FOR_SUBVOL	(1U << 3)
+
 struct prop_handler {
 	struct hlist_node node;
 	const char *xattr_name;
-	int (*validate)(const char *value, size_t len);
+	int (*validate)(struct inode *inode, const char *value, size_t len);
 	int (*apply)(struct inode *inode, const char *value, size_t len);
 	const char *(*extract)(struct inode *inode);
 	int inheritable;
 };
 
-static int prop_compression_validate(const char *value, size_t len);
+static int prop_compression_validate(struct inode *inode, const char *value, size_t len);
 static int prop_compression_apply(struct inode *inode,
 				  const char *value,
 				  size_t len);
 static const char *prop_compression_extract(struct inode *inode);
 
+static int prop_encrypt_validate(struct inode *inode, const char *value, size_t len);
+static int prop_encrypt_apply(struct inode *inode,
+				  const char *value, size_t len);
+static const char *prop_encrypt_extract(struct inode *inode);
+static int prop_cryptoiv_validate(struct inode *inode, const char *value, size_t len);
+static int prop_cryptoiv_apply(struct inode *inode,
+				  const char *value, size_t len);
+static const char *prop_cryptoiv_extract(struct inode *inode);
+
 static struct prop_handler prop_handlers[] = {
 	{
 		.xattr_name = XATTR_BTRFS_PREFIX "compression",
 		.validate = prop_compression_validate,
 		.apply = prop_compression_apply,
 		.extract = prop_compression_extract,
-		.inheritable = 1
+		.inheritable = BTRFS_PROP_INHERIT_FOR_DIR| \
+				BTRFS_PROP_INHERIT_FOR_CLONE| \
+				BTRFS_PROP_INHERIT_FOR_SUBVOL,
+	},
+	{
+		.xattr_name = XATTR_BTRFS_PREFIX "encrypt",
+		.validate = prop_encrypt_validate,
+		.apply = prop_encrypt_apply,
+		.extract = prop_encrypt_extract,
+		.inheritable = BTRFS_PROP_INHERIT_FOR_DIR| \
+				BTRFS_PROP_INHERIT_FOR_CLONE| \
+				BTRFS_PROP_INHERIT_FOR_SUBVOL,
+	},
+	{
+		.xattr_name = XATTR_BTRFS_PREFIX "cryptoiv",
+		.validate = prop_cryptoiv_validate,
+		.apply = prop_cryptoiv_apply,
+		.extract = prop_cryptoiv_extract,
+		.inheritable = BTRFS_PROP_INHERIT_FOR_DIR| \
+				BTRFS_PROP_INHERIT_FOR_CLONE| \
+				BTRFS_PROP_INHERIT_FOR_SUBVOL,
 	},
 };
 
@@ -127,15 +163,19 @@ static int __btrfs_set_prop(struct btrfs_trans_handle *trans,
 		return ret;
 	}
 
-	ret = handler->validate(value, value_len);
-	if (ret)
+	ret = handler->validate(inode, value, value_len);
+	if (ret) {
 		return ret;
+	}
 	ret = __btrfs_setxattr(trans, inode, handler->xattr_name,
 			       value, value_len, flags);
-	if (ret)
+	if (ret) {
 		return ret;
+	}
 	ret = handler->apply(inode, value, value_len);
-	if (ret) {
+	if (ret && ret != -EKEYREJECTED) {
+		pr_err("BTRFS: property apply failed %s %d %s %lu\n",
+					name, ret, value, value_len);
 		__btrfs_setxattr(trans, inode, handler->xattr_name,
 				 NULL, 0, flags);
 		return ret;
@@ -143,7 +183,7 @@ static int __btrfs_set_prop(struct btrfs_trans_handle *trans,
 
 	set_bit(BTRFS_INODE_HAS_PROPS, &BTRFS_I(inode)->runtime_flags);
 
-	return 0;
+	return ret;
 }
 
 int btrfs_set_prop(struct inode *inode,
@@ -276,13 +316,15 @@ static void inode_prop_iterator(void *ctx,
 	int ret;
 
 	ret = handler->apply(inode, value, len);
-	if (unlikely(ret))
-		btrfs_warn(root->fs_info,
+	if (unlikely(ret)) {
+		if (ret != -ENOKEY && ret != -EKEYREVOKED)
+			btrfs_warn(root->fs_info,
 			   "error applying prop %s to ino %llu (root %llu): %d",
 			   handler->xattr_name, btrfs_ino(inode),
 			   root->root_key.objectid, ret);
-	else
+	} else {
 		set_bit(BTRFS_INODE_HAS_PROPS, &BTRFS_I(inode)->runtime_flags);
+	}
 }
 
 int btrfs_load_inode_props(struct inode *inode, struct btrfs_path *path)
@@ -296,6 +338,20 @@ int btrfs_load_inode_props(struct inode *inode, struct btrfs_path *path)
 	return ret;
 }
 
+static int btrfs_create_iv(char **ivdata, unsigned int ivsize)
+{
+	char *tmp;
+	tmp = kmalloc(ivsize+1, GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+	get_random_bytes(tmp, ivsize);
+	tmp[ivsize] = '\0';
+
+	*ivdata = tmp;
+
+	return 0;
+}
+
 static int inherit_props(struct btrfs_trans_handle *trans,
 			 struct inode *inode,
 			 struct inode *parent)
@@ -313,6 +369,10 @@ static int inherit_props(struct btrfs_trans_handle *trans,
 		const char *value;
 		u64 num_bytes;
 
+		/*
+		 * BTRFS_CRYPTO_fixme:
+		 * should be inheritable only by files inode type
+		 */
 		if (!h->inheritable)
 			continue;
 
@@ -323,13 +383,37 @@ static int inherit_props(struct btrfs_trans_handle *trans,
 		num_bytes = btrfs_calc_trans_metadata_size(root, 1);
 		ret = btrfs_block_rsv_add(root, trans->block_rsv,
 					  num_bytes, BTRFS_RESERVE_NO_FLUSH);
-		if (ret)
+		if (ret) {
+			if (!strcmp(h->xattr_name, "btrfs.encrypt") ||
+				!strcmp(h->xattr_name, "btrfs.cryptoiv"))
+				kfree(value);
 			goto out;
-		ret = __btrfs_set_prop(trans, inode, h->xattr_name,
+		}
+		if (!strcmp(h->xattr_name, "btrfs.cryptoiv"))
+			ret = __btrfs_set_prop(trans, inode, h->xattr_name,
+				       value, BTRFS_CRYPTO_IV_SIZE, 0);
+		else
+			ret = __btrfs_set_prop(trans, inode, h->xattr_name,
 				       value, strlen(value), 0);
+		if (ret) {
+			pr_err("BTRFS: %lu failed to inherit '%s': %d\n",
+					inode->i_ino, h->xattr_name, ret);
+			if (!strcmp(h->xattr_name, "btrfs.encrypt") ||
+				!strcmp(h->xattr_name, "btrfs.cryptoiv"))
+				btrfs_disable_encrypt_inode(inode);
+			dump_stack();
+		}
+
 		btrfs_block_rsv_release(root, trans->block_rsv, num_bytes);
-		if (ret)
+		if (ret) {
+			if (!strcmp(h->xattr_name, "btrfs.encrypt") ||
+				!strcmp(h->xattr_name, "btrfs.cryptoiv"))
+				kfree(value);
 			goto out;
+		}
+		if (!strcmp(h->xattr_name, "btrfs.encrypt") ||
+			!strcmp(h->xattr_name, "btrfs.cryptoiv"))
+			kfree(value);
 	}
 	ret = 0;
 out:
@@ -376,8 +460,11 @@ int btrfs_subvol_inherit_props(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-static int prop_compression_validate(const char *value, size_t len)
+static int prop_compression_validate(struct inode *inode, const char *value, size_t len)
 {
+	if (BTRFS_I(inode)->force_compress == BTRFS_ENCRYPT_AES)
+		return -ENOTSUPP;
+
 	if (!strncmp("lzo", value, len))
 		return 0;
 	else if (!strncmp("zlib", value, len))
@@ -426,4 +513,218 @@ static const char *prop_compression_extract(struct inode *inode)
 	return NULL;
 }
 
+static int btrfs_split_key_type(const char *val, size_t len,
+					char *tfm, char *keytag)
+{
+	char *tmp;
+	char *tmp2;
+	char tmp1[BTRFS_CRYPTO_KEYTAG_SIZE + BTRFS_CRYPTO_TFM_NAME_SIZE + 1];
 
+	if (len > BTRFS_CRYPTO_KEYTAG_SIZE + BTRFS_CRYPTO_TFM_NAME_SIZE) {
+		return -EINVAL;
+	}
+	memcpy(tmp1, val, len);
+	tmp1[len] = '\0';
+	tmp = tmp1;
+	tmp2 = strsep(&tmp, "@");
+	if (!tmp2)
+		return -EINVAL;
+
+	if (strlen(tmp2) > BTRFS_CRYPTO_TFM_NAME_SIZE ||
+			strlen(tmp) > BTRFS_CRYPTO_KEYTAG_SIZE)
+		return -EINVAL;
+
+	strcpy(tfm, tmp2);
+	strcpy(keytag, tmp);
+
+	return 0;
+}
+
+/*
+ * The required foramt in the value is <crypto_algo>@<key_tag>
+ * eg: btrfs.encrypt="ctr(aes)@btrfs:61e0d004"
+ */
+static int prop_encrypt_validate(struct inode *inode,
+					const char *value, size_t len)
+{
+	int ret;
+	size_t keylen;
+	char keytag[BTRFS_CRYPTO_KEYTAG_SIZE + 1];
+	char keyalgo[BTRFS_CRYPTO_TFM_NAME_SIZE + 1];
+
+	if (BTRFS_I(inode)->force_compress == BTRFS_COMPRESS_ZLIB ||
+		BTRFS_I(inode)->force_compress == BTRFS_COMPRESS_LZO)
+		return -ENOTSUPP;
+
+	if (!len)
+		return 0;
+
+	if (len > (BTRFS_CRYPTO_TFM_NAME_SIZE + BTRFS_CRYPTO_KEYTAG_SIZE ))
+		return -EINVAL;
+
+	ret = btrfs_split_key_type(value, len, keyalgo, keytag);
+	if (ret) {
+		pr_err("BTRFS: %lu mal formed value '%s' %lu\n",
+					inode->i_ino, value, len);
+		return ret;
+	}
+
+	keylen = get_encrypt_type_len(keyalgo);
+	if (!keylen)
+		return -ENOTSUPP;
+
+	ret = btrfs_check_keytag(keytag);
+	if (!ret)
+		return ret;
+
+	ret = btrfs_validate_keytag(inode, keytag);
+	// check if its newly being set
+	if (ret == -ENOTSUPP)
+		ret = 0;
+
+	return ret;
+}
+
+static int prop_encrypt_apply(struct inode *inode,
+				const char *value, size_t len)
+{
+	int ret;
+	u64 root_flags;
+	char keytag[BTRFS_CRYPTO_KEYTAG_SIZE];
+	char keyalgo[BTRFS_CRYPTO_TFM_NAME_SIZE];
+	struct btrfs_root_item *root_item;
+	struct btrfs_root *root;
+
+	root_item = &(BTRFS_I(inode)->root->root_item);
+	root = BTRFS_I(inode)->root;
+
+	if (len == 0) {
+		/* means disable encryption */
+		return -EOPNOTSUPP;
+	}
+
+	ret = btrfs_split_key_type(value, len, keyalgo, keytag);
+	if (ret)
+		return ret;
+
+	/* do it only for the subvol or snapshot */
+	if (btrfs_ino(inode) == BTRFS_FIRST_FREE_OBJECTID) {
+		if (!root_item->crypto_keyhash) {
+			pr_info("BTRFS: subvol %pU enable encryption '%s'\n",
+							root_item->uuid, keyalgo);
+			/*
+			 * We are here when xattribute being set for the first time
+			 */
+			ret = btrfs_set_keyhash(inode, keytag);
+			if (!ret) {
+				root_flags = btrfs_root_flags(root_item);
+				btrfs_set_root_flags(root_item,
+					root_flags | BTRFS_ROOT_SUBVOL_ENCRYPT);
+
+				strncpy(root_item->encrypt_algo, keyalgo,
+						BTRFS_CRYPTO_TFM_NAME_SIZE);
+			}
+		} else {
+			ret = btrfs_validate_keytag(inode, keytag);
+		}
+		if (!ret)
+			strncpy(root->crypto_keytag, keytag,
+						BTRFS_CRYPTO_KEYTAG_SIZE);
+	}
+
+	if (!ret) {
+		BTRFS_I(inode)->flags |= BTRFS_INODE_ENCRYPT;
+		BTRFS_I(inode)->force_compress = get_encrypt_type_index(keyalgo);
+	}
+
+	return ret;
+}
+
+static int tuplet_encrypt_tfm_and_tag(char *val_out, char *tfm, char *tag)
+{
+	char tmp_tag[BTRFS_CRYPTO_KEYTAG_SIZE + 1];
+	char tmp_tfm[BTRFS_CRYPTO_TFM_NAME_SIZE + 1];
+	int sz = BTRFS_CRYPTO_TFM_NAME_SIZE + BTRFS_CRYPTO_KEYTAG_SIZE + 1;
+
+	memcpy(tmp_tag, tag, BTRFS_CRYPTO_KEYTAG_SIZE);
+	memcpy(tmp_tfm, tfm, BTRFS_CRYPTO_TFM_NAME_SIZE);
+
+	tmp_tag[BTRFS_CRYPTO_KEYTAG_SIZE] = '\0';
+	tmp_tfm[BTRFS_CRYPTO_TFM_NAME_SIZE] = '\0';
+
+	return snprintf(val_out, sz, "%s@%s", tmp_tfm, tmp_tag);
+}
+
+static const char *prop_encrypt_extract(struct inode *inode)
+{
+	struct btrfs_root *root;
+	char val[BTRFS_CRYPTO_TFM_NAME_SIZE + BTRFS_CRYPTO_KEYTAG_SIZE + 1];
+
+	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_ENCRYPT))
+		return NULL;
+
+	root = BTRFS_I(inode)->root;
+
+	tuplet_encrypt_tfm_and_tag(val, root->root_item.encrypt_algo,
+							root->crypto_keytag);
+
+	return kstrdup(val, GFP_NOFS);
+}
+
+static int prop_cryptoiv_validate(struct inode *inode,
+					const char *value, size_t len)
+{
+	if (len < BTRFS_CRYPTO_IV_SIZE)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int prop_cryptoiv_apply(struct inode *inode,
+				const char *value, size_t len)
+{
+	int ret;
+	char *tmp_val;
+
+	if (!strlen(BTRFS_I(inode)->root->crypto_keytag))
+		return -ENOKEY;
+
+	tmp_val = kmemdup(value, len, GFP_KERNEL);
+	/* decrypt iv and apply to binode */
+	ret = btrfs_cipher_iv(0, inode, tmp_val, len);
+	if (ret) {
+		pr_err("BTRFS: %lu prop_cryptoiv_apply failed ret %d len %lu\n",
+			inode->i_ino, ret, len);
+		return ret;
+	}
+
+	memcpy(BTRFS_I(inode)->cryptoiv, tmp_val, len);
+	BTRFS_I(inode)->iv_len = len;
+
+	kfree(tmp_val);
+	return 0;
+}
+
+static const char *prop_cryptoiv_extract(struct inode *inode)
+{
+	int ret;
+	char *ivdata = NULL;
+
+	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_ENCRYPT))
+		return NULL;
+
+	ret = btrfs_create_iv(&ivdata, BTRFS_CRYPTO_IV_SIZE);
+	if (ret)
+		return NULL;
+
+	/* Encrypt iv with master key */
+	ret = btrfs_cipher_iv(1, inode, ivdata,
+					BTRFS_CRYPTO_IV_SIZE);
+	if (ret) {
+		pr_err("BTRFS Error: %lu iv encrypt failed: %d\n",
+						inode->i_ino, ret);
+		kfree(ivdata);
+		return NULL;
+	}
+	return ivdata;
+}
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4339b6613f19..b90fc1cfad2f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -59,10 +59,10 @@
 #include "free-space-cache.h"
 #include "backref.h"
 #include "tests/btrfs-tests.h"
-
 #include "qgroup.h"
 #define CREATE_TRACE_POINTS
 #include <trace/events/btrfs.h>
+#include "encrypt.h"
 
 static const struct super_operations btrfs_super_ops;
 static struct file_system_type btrfs_fs_type;
@@ -92,6 +92,9 @@ const char *btrfs_decode_error(int errno)
 	case -ENOENT:
 		errstr = "No such entry";
 		break;
+	case -ENOKEY:
+		errstr = "Required key not available";
+		break;
 	}
 
 	return errstr;
@@ -491,6 +494,15 @@ int btrfs_parse_options(struct btrfs_root *root, char *options,
 				btrfs_clear_opt(info->mount_opt, NODATASUM);
 				btrfs_set_fs_incompat(info, COMPRESS_LZO);
 				no_compress = 0;
+#if BTRFS_CRYPTO_TEST_ENABLE_BYMNTOPT
+			} else if (strcmp(args[0].from, "ctr(aes)") == 0) {
+				compress_type = "ctr(aes)";
+				info->compress_type = BTRFS_ENCRYPT_AES;
+				btrfs_set_opt(info->mount_opt, COMPRESS);
+				btrfs_clear_opt(info->mount_opt, NODATACOW);
+				btrfs_clear_opt(info->mount_opt, NODATASUM);
+				no_compress = 0;
+#endif
 			} else if (strncmp(args[0].from, "no", 2) == 0) {
 				compress_type = "no";
 				btrfs_clear_opt(info->mount_opt, COMPRESS);
@@ -1208,10 +1220,19 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 					     num_online_cpus() + 2, 8))
 		seq_printf(seq, ",thread_pool=%d", info->thread_pool_size);
 	if (btrfs_test_opt(root, COMPRESS)) {
-		if (info->compress_type == BTRFS_COMPRESS_ZLIB)
+		switch(info->compress_type) {
+		case BTRFS_COMPRESS_ZLIB:
 			compress_type = "zlib";
-		else
+			break;
+		case BTRFS_COMPRESS_LZO:
 			compress_type = "lzo";
+			break;
+		case BTRFS_ENCRYPT_AES:
+			compress_type = "ctr(aes)";
+			break;
+		default:
+			compress_type = "error";
+		}
 		if (btrfs_test_opt(root, FORCE_COMPRESS))
 			seq_printf(seq, ",compress-force=%s", compress_type);
 		else
diff --git a/fs/btrfs/tests/crypto-tests.c b/fs/btrfs/tests/crypto-tests.c
new file mode 100755
index 000000000000..917c5837cc3f
--- /dev/null
+++ b/fs/btrfs/tests/crypto-tests.c
@@ -0,0 +1,376 @@
+#include <linux/highmem.h>
+#include <linux/slab.h>
+#include <linux/pagemap.h>
+#include <linux/scatterlist.h>
+#include <linux/random.h>
+#include <keys/user-type.h>
+#include "../extent_io.h"
+#include "../encrypt.h"
+#include "../hash.h"
+#include "crypto-tests.h"
+
+struct page *known_data_page = 0;
+char *known_data_str = 0;
+struct key *btrfs_key = 0;
+
+int __blkcipher(int encrypt, char *str, size_t sz)
+{
+	return 0;
+}
+
+int __ablkcipher(int enc, char *cipher_name, struct page *page,
+						unsigned long len)
+{
+	struct btrfs_ablkcipher_req_data btrfs_req;
+	char *key_str;
+
+	memset(&btrfs_req, 0, sizeof(btrfs_req));
+	key_str = kstrdup(
+	"\x12\x34\x56\x78\x90\xab\xcd\xef\x12\x34\x56\x78\x90\xab\xcd\xef",
+			GFP_NOFS);
+	memcpy(btrfs_req.key, key_str, 16);
+
+	strcpy(btrfs_req.cipher_name, cipher_name);
+	return btrfs_do_ablkcipher(enc, page, len, &btrfs_req);
+}
+
+bool is_same_as_known_data_page(char *a, char *b, size_t sz)
+{
+	return !memcmp(a, b, sz);
+}
+
+void __check_same_print(char *a, char *b, size_t sz, int for_encrypt)
+{
+	if (is_same_as_known_data_page(a, b, sz)) {
+		if (for_encrypt)
+			printk("_BTRFS_: encrypt failed !!!\n");
+		else
+			printk("_BTRFS_: decrypt success\n");
+	} else {
+		if (for_encrypt)
+			printk("_BTRFS_: encrypt success\n");
+		else
+			printk("_BTRFS_: decrypt failed !!!\n");
+	}
+}
+
+void test_pr_result(struct page *page_in, int for_encrypt)
+{
+	char *a = page_address(page_in);
+	char *b = page_address(known_data_page);
+
+	__check_same_print(a, b, TEST_DATA_SIZE, for_encrypt);
+}
+
+void test_pr_result_str(char *a, int for_encrypt)
+{
+	__check_same_print(a, known_data_str, TEST_DATA_SIZE, for_encrypt);
+}
+
+void test_init(void)
+{
+	char *kaddr;
+	char *str = "deadbeef";
+	unsigned long dlen = strlen(str);
+	unsigned long offset;
+
+	if (known_data_page)
+		return;
+
+	if (TEST_DATA_SIZE > PAGE_SIZE) {
+		printk("_BTRFS_: TEST_DATA_PAGE is bigger than PAGE_SIZE\n");
+		return;
+	}
+
+	known_data_page = alloc_page(GFP_NOFS);
+	//known_data_page = get_zeroed_page(GFP_NOFS);
+	if (!known_data_page) {
+		printk("_BTRFS_: FAILED to alloc page\n");
+		return;
+	}
+
+	/* Fill known data */
+	kaddr = page_address(known_data_page);
+	for (offset = 0; offset < TEST_DATA_SIZE; offset = offset + dlen)
+		memcpy(kaddr + offset, str, dlen);
+
+	flush_kernel_dcache_page(known_data_page);
+}
+
+void test_fini(void)
+{
+	if (known_data_page)
+		__free_page(known_data_page);
+}
+
+
+void test_print_data(const char *str, char *prefix, size_t sz, int print_as_str)
+{
+	int i;
+	printk("_BTRFS_: %s: sz %lu: ", prefix, sz);
+
+	if (print_as_str)
+		for (i = 0; i < sz; i++) printk("%c", str[i]);
+	else
+		for (i = 0; i < sz; i++) printk("%02x ", 0xF & str[i]);
+
+	printk("\n");
+}
+
+struct page *test_alloc_page_cpy_known_data(void)
+{
+	struct page *page;
+	char *kaddr;
+	char *kaddr_known_data;
+
+	page = alloc_page(GFP_NOFS|__GFP_HIGHMEM);
+	if (!page) {
+		printk("_BTRFS_: FAILED to alloc page\n");
+		return NULL;
+	}
+	kaddr = kmap(page);
+
+	if (!known_data_page)
+		test_init();
+	kaddr_known_data = kmap(known_data_page);
+
+	memcpy(kaddr, kaddr_known_data, TEST_DATA_SIZE);
+
+	kunmap(page);
+	kunmap(known_data_page);
+
+	return page;
+}
+
+char *test_alloc_known_data_str(void)
+{
+	char *str;
+
+	known_data_str = kzalloc(TEST_DATA_SIZE, GFP_NOFS);
+	strncpy(known_data_str, "This is test", TEST_DATA_SIZE);
+
+	str = kzalloc(TEST_DATA_SIZE, GFP_NOFS);
+	memcpy(str, known_data_str, TEST_DATA_SIZE);
+	return str;
+}
+
+void test_blkcipher(void)
+{
+	int ret;
+	char *str;
+
+	str = test_alloc_known_data_str();
+
+	printk("_BTRFS_: ------ testing blkcipher start ------\n");
+	ret = __blkcipher(1, str, TEST_DATA_SIZE);
+	if (ret) goto out;
+	test_pr_result_str(str, 1);
+	ret = __blkcipher(0, str, TEST_DATA_SIZE);
+	if (ret) goto out;
+	test_pr_result_str(str, 0);
+	printk("_BTRFS_: ------ testing blkcipher end ------\n");
+
+out:
+	kfree(str);
+	kfree(known_data_str);
+	known_data_str = NULL;
+}
+
+void test_ablkcipher(void)
+{
+	struct page *page;
+
+	test_init();
+	page = test_alloc_page_cpy_known_data();
+
+	printk("_BTRFS_: ------- testing ablkcipher start ---------\n");
+	__ablkcipher(1, "cts(cbc(aes))", page, TEST_DATA_SIZE);
+	test_pr_result(page, 1);
+	__ablkcipher(0, "cts(cbc(aes))", page, TEST_DATA_SIZE);
+	test_pr_result(page, 0);
+
+	__ablkcipher(1, "ctr(aes)", page, TEST_DATA_SIZE);
+	test_pr_result(page, 1);
+	__ablkcipher(0, "ctr(aes)", page, TEST_DATA_SIZE);
+	test_pr_result(page, 0);
+	printk("_BTRFS_: ------ testing ablkcipher end ------------\n\n");
+
+	__free_page(page);
+
+	test_fini();
+}
+
+bool does_pages_match(struct address_space *mapping, u64 start, unsigned long len,
+			unsigned long nr_page, struct page **pages)
+{
+	int ret;
+	char *in;
+	char *out;
+	struct page *in_page;
+	struct page *out_page;
+	unsigned long bytes_left = len;
+	unsigned long cur_page_len;
+	unsigned long cr_page;
+
+	for (cr_page = 0; cr_page < nr_page; cr_page++) {
+
+		WARN_ON(!bytes_left);
+
+		in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+		out_page = pages[cr_page];
+		cur_page_len = min(bytes_left, PAGE_SIZE);
+
+		in = kmap(in_page);
+		out = kmap(out_page);
+		ret = memcmp(out, in, cur_page_len);
+		kunmap(out_page);
+		kunmap(in_page);
+		if (ret)
+			return false;
+
+		start += cur_page_len;
+		bytes_left = bytes_left - cur_page_len;
+	}
+
+	return true;
+}
+
+void test_key(char *keytag)
+{
+	int ret;
+	unsigned char key_payload[16];
+
+	printk("_BTRFS_: ---- test_key() start -----\n");
+	ret = btrfs_request_key(keytag, key_payload);
+	if (ret == -ENOKEY) {
+		printk("_BTRFS_: NOKEY: keytag %s\n", keytag);
+		return;
+	}
+	if (ret) {
+		printk("_BTRFS_: request key failed !! %d\n", ret);
+		return;
+	}
+	printk("_BTRFS_: ------ test_key() end -----\n");
+}
+
+void test_print_data_v2(struct page *page, int endec)
+{
+	char *data;
+	char tmp[80];
+
+	data = kmap(page);
+	strncpy(tmp, data, 80);
+	kunmap(page);
+
+	printk("_BTRFS_: %s\n", tmp);
+}
+
+void test_open_key()
+{
+	btrfs_key = request_key(&key_type_user, "btrfs_test", NULL);
+	if (IS_ERR(btrfs_key)) {
+		printk("_BTRFS_: getting test key 'btrfs_test' failed\n");
+		btrfs_key = NULL;
+		return;
+	}
+
+	printk("_BTRFS_: Got test key serial %d\n", btrfs_key->serial);
+	down_write_nested(&btrfs_key->sem, 1);
+}
+
+void test_close_key()
+{
+	if (btrfs_key) {
+		up_write(&btrfs_key->sem);
+		key_put(btrfs_key);
+	}
+}
+
+int test_ablkciphear2(char *cipher_name, size_t test_size)
+{
+	u32 crc1 = ~(u32)0;
+	u32 crc2 = ~(u32)0;
+	u32 crc3 = ~(u32)0;
+	u32 seed;
+	struct page *page;
+	char *kaddr;
+	int ret = 0;
+	unsigned int page_nr;
+
+	page_nr = test_size/PAGE_SIZE;
+	page = alloc_pages(GFP_KERNEL, page_nr);
+	if (unlikely(!page)) {
+		printk("_BTRFS_: FAILED to alloc page\n");
+		return -ENOMEM;
+	}
+	kaddr = kmap(page);
+
+	get_random_bytes(&seed, 4);
+	crc1 = btrfs_crc32c(seed, kaddr, test_size);
+
+	/* Encrypt */
+	ret = __ablkcipher(1, cipher_name, page, test_size);
+	if (ret) {
+		printk("BTRFS_TEST: Encrypt '%s' size '%lu' Failed\n",
+			cipher_name, test_size);
+		return ret;
+	}
+
+	crc2 = btrfs_crc32c(seed, kaddr, test_size);
+
+	/* Decrypt */
+	ret = __ablkcipher(0, cipher_name, page, test_size);
+	if (ret) {
+		printk("BTRFS_TEST: Decrypt '%s' size '%lu' Failed\n",
+			cipher_name, test_size);
+		return ret;
+	}
+
+	crc3 = btrfs_crc32c(seed, kaddr, test_size);
+
+	if (crc1 == crc2) {
+		printk("BTRFS_TEST: %u:%u:%u\n", crc1,crc2,crc3);
+		printk("!!! BTRFS: ERROR: Encrypt failed !!! \n");
+		ret = -EINVAL;
+	}
+	if (!ret && (crc1 != crc3)) {
+		printk("BTRFS_TEST: %u:%u:%u\n", crc1,crc2,crc3);
+		printk("!!! BTRFS: ERROR: Decrypt failed !!!\n");
+		ret = -EINVAL;
+	}
+
+	kunmap(page);
+	__free_pages(page, page_nr);
+
+	return ret;
+}
+
+void workout(char *cipher_name)
+{
+	if (test_ablkciphear2(cipher_name, 16))
+		return;
+	if (test_ablkciphear2(cipher_name, 2024))
+		return;
+	if (test_ablkciphear2(cipher_name, 4096))
+		return;
+	if (test_ablkciphear2(cipher_name, 8192))
+		return;
+	if (test_ablkciphear2(cipher_name, 8333))
+		return;
+
+	test_ablkciphear2(cipher_name, 4097);
+	test_ablkciphear2(cipher_name, 1);
+	test_ablkciphear2(cipher_name, 15);
+}
+
+void btrfs_selftest_crypto(void)
+{
+	char cipher_name[17];
+
+	strcpy(cipher_name, "ctr(aes)");
+	workout(cipher_name);
+	/*
+	strcpy(cipher_name, "cts(cbc(aes))");
+	workout(cipher_name);
+	*/
+}
diff --git a/fs/btrfs/tests/crypto-tests.h b/fs/btrfs/tests/crypto-tests.h
new file mode 100755
index 000000000000..d51e78fb239e
--- /dev/null
+++ b/fs/btrfs/tests/crypto-tests.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2016 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#define BTRFS_CONFIG_TEST_ABLKCIPHER	1
+#define BTRFS_CONFIG_ZLIB_AS_ENCRYPT	1
+#define BTRFS_CONFIG_COMP_INT		1
+#define BTRFS_TEST_KEY			0
+
+//#define TEST_DATA_SIZE	16
+//#define TEST_DATA_SIZE	PAGE_CACHE_SIZE
+//#define TEST_DATA_SIZE	1024
+#define TEST_DATA_SIZE		2024
+
+void test_ablkcipher(void);
+void test_blkcipher(void);
+void test_print_data(const char *str, char *prefix, size_t sz, int print_str);
+void test_key(char *keytag);
+void test_pr_result(struct page *page_in, int for_encrypt);
+struct page *test_alloc_page_cpy_known_data(void);
+void test_fini(void);
+void test_open_key(void);
+void test_close_key(void);
+void btrfs_selftest_crypto(void);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index 88d274e8ecf2..5d007ec4ffbf 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -79,7 +79,7 @@ static int zlib_compress_pages(struct list_head *ws,
 			       unsigned long *out_pages,
 			       unsigned long *total_in,
 			       unsigned long *total_out,
-			       unsigned long max_out)
+			       unsigned long max_out, int flags)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
 	int ret;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index d5ad15a106a7..fb91acc7260e 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -593,6 +593,7 @@ struct btrfs_dir_item {
  * still visible as a directory
  */
 #define BTRFS_ROOT_SUBVOL_DEAD		(1ULL << 48)
+#define BTRFS_ROOT_SUBVOL_ENCRYPT	(1ULL << 49)
 
 struct btrfs_root_item {
 	struct btrfs_inode_item inode;
@@ -636,7 +637,10 @@ struct btrfs_root_item {
 	struct btrfs_timespec otime;
 	struct btrfs_timespec stime;
 	struct btrfs_timespec rtime;
-	__le64 reserved[8]; /* for future */
+	char encrypt_algo[16];
+	__le32 crypto_keylen;
+	__le32 crypto_keyhash;
+	__le64 reserved[3]; /* for future */
 } __attribute__ ((__packed__));
 
 /*
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 1/2] btrfs-progs: make wait_for_commit non static
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
  2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
@ 2016-09-13 13:39 ` Anand Jain
  2016-09-13 13:39 ` [PATCH 2/2] btrfs-progs: add encryption support Anand Jain
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

wait_for_commit() is needed by encrypt patch set so this patch makes
it non static.

Also as utils.h is included twice deletes one of it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 btrfs-list.c     | 10 ++++++++++
 cmds-subvolume.c | 11 -----------
 utils.h          |  1 +
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index 4cc2ed498536..4e67fe28b9b5 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -1913,3 +1913,13 @@ int btrfs_list_get_path_rootid(int fd, u64 *treeid)
 	*treeid = args.treeid;
 	return 0;
 }
+
+int wait_for_commit(int fd)
+{
+	int ret;
+
+	ret = ioctl(fd, BTRFS_IOC_START_SYNC, NULL);
+	if (ret < 0)
+		return ret;
+	return ioctl(fd, BTRFS_IOC_WAIT_SYNC, NULL);
+}
diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index e7ef67d3449b..5df7af56c7f8 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -36,7 +36,6 @@
 #include "commands.h"
 #include "utils.h"
 #include "btrfs-list.h"
-#include "utils.h"
 
 static int is_subvolume_cleaned(int fd, u64 subvolid)
 {
@@ -223,16 +222,6 @@ out:
 	return retval;
 }
 
-static int wait_for_commit(int fd)
-{
-	int ret;
-
-	ret = ioctl(fd, BTRFS_IOC_START_SYNC, NULL);
-	if (ret < 0)
-		return ret;
-	return ioctl(fd, BTRFS_IOC_WAIT_SYNC, NULL);
-}
-
 static const char * const cmd_subvol_delete_usage[] = {
 	"btrfs subvolume delete [options] <subvolume> [<subvolume>...]",
 	"Delete subvolume(s)",
diff --git a/utils.h b/utils.h
index da23bfcc9166..729e50a113a2 100644
--- a/utils.h
+++ b/utils.h
@@ -225,6 +225,7 @@ int test_isdir(const char *path);
 
 const char *subvol_strip_mountpoint(const char *mnt, const char *full_path);
 int get_subvol_info(const char *fullpath, struct root_info *get_ri);
+int wait_for_commit(int fd);
 
 /*
  * Btrfs minimum size calculation is complicated, it should include at least:
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/2] btrfs-progs: add encryption support
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
  2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
  2016-09-13 13:39 ` [PATCH 1/2] btrfs-progs: make wait_for_commit non static Anand Jain
@ 2016-09-13 13:39 ` Anand Jain
  2016-09-13 13:39 ` [PATCH] fstests: btrfs: support encryption Anand Jain
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Based on v4.7.2

Depends on keyctl-utils and libscrypt packages.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 Makefile.in       |   5 +-
 btrfs-list.c      |  23 +++
 cmds-filesystem.c |   4 +-
 cmds-restore.c    |  16 ++
 cmds-subvolume.c  | 101 +++++++++++-
 commands.h        |   1 +
 ctree.h           |   5 +-
 encrypt.c         | 455 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 encrypt.h         |  46 ++++++
 props.c           |   4 +
 utils.h           |   1 +
 11 files changed, 654 insertions(+), 7 deletions(-)
 create mode 100644 encrypt.c
 create mode 100644 encrypt.h

diff --git a/Makefile.in b/Makefile.in
index fd68b3eeeba7..6e857b763213 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -44,7 +44,8 @@ DISABLE_BTRFSCONVERT = @DISABLE_BTRFSCONVERT@
 BTRFSCONVERT_EXT2 = @BTRFSCONVERT_EXT2@
 
 EXTRA_CFLAGS :=
-EXTRA_LDFLAGS :=
+# asj fixme, remove path hardcode
+EXTRA_LDFLAGS := /usr/lib/libscrypt.so.0 /usr/lib/libkeyutils.so
 
 DEBUG_CFLAGS_DEFAULT = -O0 -U_FORTIFY_SOURCE -ggdb3
 DEBUG_CFLAGS_INTERNAL =
@@ -87,7 +88,7 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \
 	  extent-cache.o extent_io.o volumes.o utils.o repair.o \
 	  qgroup.o raid6.o free-space-cache.o list_sort.o props.o \
 	  ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \
-	  inode.o file.o find-root.o free-space-tree.o help.o
+	  inode.o file.o find-root.o free-space-tree.o help.o encrypt.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
diff --git a/btrfs-list.c b/btrfs-list.c
index 4e67fe28b9b5..34722731b652 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -1923,3 +1923,26 @@ int wait_for_commit(int fd)
 		return ret;
 	return ioctl(fd, BTRFS_IOC_WAIT_SYNC, NULL);
 }
+
+/*
+ * Fixme: A kind of workaround as of now, actual fix needs
+ * per subvol sync instead of entire FS.
+ */
+int wait_for_commit_subvol(char *subvol)
+{
+	int fd;
+	int ret;
+	DIR *ds;
+
+	fd = open_file_or_dir3(subvol, &ds, O_RDWR);
+	if (fd == -1) {
+		ret = -errno;
+		fprintf(stderr, "ERROR: open '%s' failed: %s\n",
+					subvol, strerror(-ret));
+		return ret;
+	}
+
+	ret = wait_for_commit(fd);
+	close_file_or_dir(fd, ds);
+	return ret;
+}
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 76ea82edf23c..4a866c28420a 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -952,6 +952,8 @@ static int parse_compress_type(char *s)
 		return BTRFS_COMPRESS_ZLIB;
 	else if (strcmp(optarg, "lzo") == 0)
 		return BTRFS_COMPRESS_LZO;
+	else if (strcmp(optarg, "ctr(aes)") == 0)
+		return BTRFS_ENCRYPT_AES;
 	else {
 		error("unknown compression type %s", s);
 		exit(1);
@@ -964,7 +966,7 @@ static const char * const cmd_filesystem_defrag_usage[] = {
 	"",
 	"-v             be verbose",
 	"-r             defragment files recursively",
-	"-c[zlib,lzo]   compress the file while defragmenting",
+	"-c[zlib,lzo,ctr(aes)]  compress/encrypt the file while defragmenting if it wasn't",
 	"-f             flush data to disk immediately after defragmenting",
 	"-s start       defragment only from byte onward",
 	"-l len         defragment only up to len bytes",
diff --git a/cmds-restore.c b/cmds-restore.c
index b491f083b72b..bed0aba77740 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -155,6 +155,19 @@ static int decompress_lzo(struct btrfs_root *root, unsigned char *inbuf,
 	return 0;
 }
 
+static int decrypt_ctr_aes(struct btrfs_root *root, unsigned char *inbuf,
+			char *outbuf, u64 compress_len, u64 *decompress_len)
+{
+	/*
+	 * fixme: This is only for testing, which works only with
+	 * kernel option BTRFS_CRYPTO_TEST_BYDUMMYENC, where
+	 * ciphertext == plaintext
+	 */
+	memcpy(outbuf, inbuf, compress_len);
+	*decompress_len = compress_len;
+	return 0;
+}
+
 static int decompress(struct btrfs_root *root, char *inbuf, char *outbuf,
 			u64 compress_len, u64 *decompress_len, int compress)
 {
@@ -165,6 +178,9 @@ static int decompress(struct btrfs_root *root, char *inbuf, char *outbuf,
 	case BTRFS_COMPRESS_LZO:
 		return decompress_lzo(root, (unsigned char *)inbuf, outbuf,
 					compress_len, decompress_len);
+	case BTRFS_ENCRYPT_AES:
+		return decrypt_ctr_aes(root, (unsigned char *)inbuf, outbuf,
+					compress_len, decompress_len);
 	default:
 		break;
 	}
diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 5df7af56c7f8..452e71755d39 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -27,6 +27,8 @@
 #include <getopt.h>
 #include <uuid/uuid.h>
 #include <linux/magic.h>
+#include <keyutils.h>
+#include <fcntl.h>
 
 #include "kerncompat.h"
 #include "ioctl.h"
@@ -36,6 +38,7 @@
 #include "commands.h"
 #include "utils.h"
 #include "btrfs-list.h"
+#include "encrypt.h"
 
 static int is_subvolume_cleaned(int fd, u64 subvolid)
 {
@@ -104,13 +107,14 @@ static const char * const subvolume_cmd_group_usage[] = {
 };
 
 static const char * const cmd_subvol_create_usage[] = {
-	"btrfs subvolume create [-i <qgroupid>] [<dest>/]<name>",
+	"btrfs subvolume create [-i <qgroupid>] [-e <cipher>] [<dest>/]<name>",
 	"Create a subvolume",
 	"Create a subvolume <name> in <dest>.  If <dest> is not given",
 	"subvolume <name> will be created in the current directory.",
 	"",
 	"-i <qgroupid>  add the newly created subvolume to a qgroup. This",
 	"               option can be given multiple times.",
+	"-e             enable encryption",
 	NULL
 };
 
@@ -125,9 +129,11 @@ static int cmd_subvol_create(int argc, char **argv)
 	char	*dst;
 	struct btrfs_qgroup_inherit *inherit = NULL;
 	DIR	*dirstream = NULL;
+	int	encrypt = 0;
+	char	*cipher_name = NULL;
 
 	while (1) {
-		int c = getopt(argc, argv, "c:i:v");
+		int c = getopt(argc, argv, "e:c:i:v");
 		if (c < 0)
 			break;
 
@@ -146,6 +152,16 @@ static int cmd_subvol_create(int argc, char **argv)
 				goto out;
 			}
 			break;
+		case 'e':
+			encrypt = 1;
+			if (!is_encryption_type_supported(optarg)) {
+				error("Unsupported cipher '%s', check '/proc/crypto'\n",
+					optarg);
+				retval = -EPROTONOSUPPORT;
+				goto out;
+			}
+			cipher_name = strdup(optarg);
+			break;
 		default:
 			usage(cmd_subvol_create_usage);
 		}
@@ -212,6 +228,13 @@ static int cmd_subvol_create(int argc, char **argv)
 		goto out;
 	}
 
+	if (encrypt) {
+		res = btrfs_set_subvol_encrypt(dst, cipher_name);
+		if (res)
+		warning("Subvol is created, but failed to enable encryption: %s\n",
+							strerror(-res));
+	}
+
 	retval = 0;	/* success */
 out:
 	close_file_or_dir(fddst, dirstream);
@@ -901,6 +924,9 @@ static int cmd_subvol_show(int argc, char **argv)
 	int fd = -1;
 	int ret = 1;
 	DIR *dirstream1 = NULL;
+	key_serial_t key_serial;
+	char key_algo[BTRFS_KEY_ALGO_MAX_LEN + 1];
+	char key_tag[BTRFS_KEY_TAG_MAX_LEN + 1];
 
 	clean_args_no_options(argc, argv, cmd_subvol_show_usage);
 
@@ -978,6 +1004,23 @@ static int cmd_subvol_show(int argc, char **argv)
 	else
 		printf("\tFlags: \t\t\t-\n");
 
+	key_serial = 0;
+	memset(key_algo, '\0', BTRFS_KEY_ALGO_MAX_LEN + 1);
+	memset(key_tag, '\0', BTRFS_KEY_TAG_MAX_LEN + 1);
+
+	ret = btrfs_subvol_key_info(fullpath, key_algo, key_tag, &key_serial);
+	if (strlen(key_tag)) {
+		char key_state[256] = {0};
+		if (key_serial == -1)
+			snprintf(key_state, 256, "(%s)", strerror(-ret));
+		else
+			snprintf(key_state, 256, "(%d)", key_serial);
+
+		printf("\tEncryption: \t\t%s@%s %s\n", key_algo, key_tag, key_state);
+	} else {
+		printf("\tEncryption: \t\t%s\n", "none");
+	}
+
 	/* print the snapshots of the given subvol if any*/
 	printf("\tSnapshot(s):\n");
 	filter_set = btrfs_list_alloc_filter_set();
@@ -1005,6 +1048,59 @@ out:
 	return !!ret;
 }
 
+static const char * const cmd_subvol_encrypt_usage[] = {
+	"btrfs subvolume encrypt <option> <subvol-path>",
+	"Encryption key login / logout",
+	"-k|--key <in|out>     Key login or logout",
+	NULL
+};
+
+static int cmd_subvol_encrypt(int argc, char **argv)
+{
+	int ret;
+	int login;
+	optind = 1;
+
+	login = 1;
+	while (1) {
+		int c;
+		static const struct option long_options[] = {
+			{ "key", required_argument, NULL, 'k'},
+			{ NULL, 0, NULL, 0}
+		};
+
+		c = getopt_long(argc, argv, "k:", long_options, NULL);
+		if (c < 0)
+			break;
+
+		switch (c) {
+		case 'k':
+			if (!strcmp("in", optarg))
+				login = 1;
+			else if (!strcmp("out", optarg))
+				login = 0;
+			else
+				usage(cmd_subvol_encrypt_usage);
+			break;
+		default:
+			usage(cmd_subvol_encrypt_usage);
+		}
+	}
+
+	if (check_argc_exact(argc - optind, 1))
+		usage(cmd_subvol_encrypt_usage);
+
+	if (login)
+		ret = cmd_encrypt_login(argc - 1, &argv[1]);
+	else
+		ret = cmd_encrypt_logout(argc - 1, &argv[1]);
+
+	if (ret == -EAGAIN)
+		usage(cmd_subvol_encrypt_usage);
+
+	return ret;
+}
+
 static const char * const cmd_subvol_sync_usage[] = {
 	"btrfs subvolume sync <path> [<subvol-id>...]",
 	"Wait until given subvolume(s) are completely removed from the filesystem.",
@@ -1272,6 +1368,7 @@ const struct cmd_group subvolume_cmd_group = {
 		{ "find-new", cmd_subvol_find_new, cmd_subvol_find_new_usage,
 			NULL, 0 },
 		{ "show", cmd_subvol_show, cmd_subvol_show_usage, NULL, 0 },
+		{ "encrypt", cmd_subvol_encrypt, cmd_subvol_encrypt_usage, NULL, 0 },
 		{ "sync", cmd_subvol_sync, cmd_subvol_sync_usage, NULL, 0 },
 		NULL_CMD_STRUCT
 	}
diff --git a/commands.h b/commands.h
index 94229c112bc0..ece721da37a1 100644
--- a/commands.h
+++ b/commands.h
@@ -88,6 +88,7 @@ extern const struct cmd_group subvolume_cmd_group;
 extern const struct cmd_group filesystem_cmd_group;
 extern const struct cmd_group balance_cmd_group;
 extern const struct cmd_group device_cmd_group;
+extern const struct cmd_group encrypt_cmd_group;
 extern const struct cmd_group scrub_cmd_group;
 extern const struct cmd_group inspect_cmd_group;
 extern const struct cmd_group property_cmd_group;
diff --git a/ctree.h b/ctree.h
index 1d153ec5c784..67ed34bbb28a 100644
--- a/ctree.h
+++ b/ctree.h
@@ -656,8 +656,9 @@ typedef enum {
 	BTRFS_COMPRESS_NONE  = 0,
 	BTRFS_COMPRESS_ZLIB  = 1,
 	BTRFS_COMPRESS_LZO   = 2,
-	BTRFS_COMPRESS_TYPES = 2,
-	BTRFS_COMPRESS_LAST  = 3,
+	BTRFS_ENCRYPT_AES    = 3,
+	BTRFS_COMPRESS_TYPES = 3,
+	BTRFS_COMPRESS_LAST  = 4,
 } btrfs_compression_type;
 
 /* we don't understand any encryption methods right now */
diff --git a/encrypt.c b/encrypt.c
new file mode 100644
index 000000000000..2ceeb5b395b2
--- /dev/null
+++ b/encrypt.c
@@ -0,0 +1,455 @@
+/*
+ * Copyright (C) 2016 Oracle.  All rights reserved.
+ * Author: Anand Jain (anand.jain@oracle.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include <stdio.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/xattr.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <uuid/uuid.h>
+#include <keyutils.h>
+#include <libscrypt.h>
+#include <termios.h>
+#include <keyutils.h>
+
+#include "ctree.h"
+#include "commands.h"
+#include "utils.h"
+#include "props.h"
+#include "encrypt.h"
+
+#ifndef XATTR_BTRFS_PREFIX
+#define XATTR_BTRFS_PREFIX     "btrfs."
+#define XATTR_BTRFS_PREFIX_LEN (sizeof(XATTR_BTRFS_PREFIX) - 1)
+#endif
+
+/*
+ * Defined as synonyms in attr/xattr.h
+ */
+#ifndef ENOATTR
+#define ENOATTR ENODATA
+#endif
+
+ssize_t __get_pass(char *prompt, char **lineptr, size_t *n)
+{
+	struct termios old, new;
+	int nread;
+
+	fprintf(stderr, "%s", prompt);
+	fflush(stderr);
+
+	/* Turn echoing off and fail if we can’t. */
+	if (tcgetattr(fileno(stdin), &old) != 0)
+		return -1;
+
+	new = old;
+	new.c_lflag &= ~ECHO;
+	if (tcsetattr(fileno(stdin), TCSAFLUSH, &new) != 0)
+		return -1;
+
+	/* Read the password. */
+	nread = getline(lineptr, n, stdin);
+
+	/* Restore terminal. */
+	tcsetattr(fileno(stdin), TCSAFLUSH, &old);
+
+	return nread;
+}
+
+/*
+ * If key is set, returns its key_serial, otherwise -1
+ */
+int get_key(char *keytag, key_serial_t *keyserial)
+{
+	size_t sz;
+	int retry;
+	int retry_again;
+	char pass_try1[100];
+	char pass_try2[100];
+	unsigned char pass_key[16];
+	size_t in_sz;
+	char *pass;
+	const unsigned char iv[100] = {"btrfs"};
+	int ret = 0;
+	int not_same = 0;
+
+	retry_again = 3;
+again:
+	pass = pass_try1;
+	in_sz = sizeof(pass_try1);
+	retry = 4;
+
+	while (--retry > 0) {
+		sz = __get_pass("Passphrase: ", &pass, &in_sz);
+		if (!sz || sz == 1) {
+			printf("\n");
+			error(" Password can not be empty, pls try again");
+			continue;
+		}
+		break;
+	}
+	if (retry == 0)
+		return -ECANCELED;
+
+	pass = pass_try2;
+	in_sz = sizeof(pass_try1);
+
+	printf("\n");
+	sz = __get_pass("Again passphrase: ", &pass, &in_sz);
+	printf("\n");
+	not_same = strncmp(pass_try1, pass_try2, sz);
+	if (not_same) {
+		error("Password does not match\n");
+		if (! --retry_again)
+			return -ECANCELED;
+		goto again;
+	}
+
+	ret = libscrypt_scrypt((uint8_t *)pass_try1, sz, iv, sizeof(iv),
+				SCRYPT_N, SCRYPT_r, SCRYPT_p, pass_key, 16);
+	if (ret) {
+		error("scrypt failed, cannot derive passphrase: %d\n", ret);
+		return -EFAULT;
+	}
+
+	*keyserial = add_key(BTRFS_CRYPTO_KEY_TYPE, keytag, pass_key,
+				BTRFS_CRYPTO_KEY_SIZE, KEY_SPEC_USER_KEYRING);
+	if (*keyserial == -1) {
+		ret = -errno;
+		return ret;
+	}
+
+	return 0;
+}
+
+static void generate_keytag(char *keytag, char *subvol)
+{
+	struct root_info get_ri;
+	char uuidparse[BTRFS_UUID_UNPARSED_SIZE];
+
+	get_subvol_info(subvol, &get_ri);
+	uuid_unparse(get_ri.uuid, uuidparse);
+	uuidparse[8] = '\0';
+	sprintf(keytag, "btrfs:%s", uuidparse);
+}
+
+void prefix_cipher_name(char *keystr,
+			const char *encrypt_type, char *keytag)
+{
+	int sz = BTRFS_KEY_ALGO_TAG_MAX_LEN + 1;
+
+	snprintf(keystr, sz, "%s@%s", encrypt_type, keytag);
+}
+
+int is_encryption_type_supported(const char *type)
+{
+	FILE *f;
+	char tmp[512];
+	const char *known_str = "name         : ";
+	int klen = 15;
+
+	if (strlen(type) > BTRFS_KEY_ALGO_MAX_LEN)
+		return -EINVAL;
+
+	if ((f = fopen("/proc/crypto", "r")) == NULL) {
+		error("Failed to open '/proc/crypto': %s\n",
+				strerror(errno));
+		return -errno;
+	}
+
+	while (fgets(tmp, sizeof(tmp), f) != NULL) {
+		if (!strncmp(known_str, tmp, klen)) {
+			if (!strncmp(tmp+klen, type, strlen(type) )) {
+				return 1;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * This probably should be as a property, however the property interface
+ * needs redesign, so as of now its part of subvolume create
+ */
+static int handle_prop_encrypt(enum prop_object_type type, const char *object,
+			const char *name, const char *value, char *value_out)
+{
+	int ret;
+	ssize_t sret;
+	int fd = -1;
+	DIR *dirstream = NULL;
+	char buf[BTRFS_KEY_ALGO_TAG_MAX_LEN];
+	char *xattr_name = NULL;
+	int open_flags = value ? O_RDWR : O_RDONLY;
+	char keytag[BTRFS_KEY_TAG_MAX_LEN + 1] = {0};
+	char *subvol_object = strdup(object);
+	key_serial_t keyserial;
+	char keystr[BTRFS_KEY_ALGO_TAG_MAX_LEN];
+	memset(keystr, '\0', BTRFS_KEY_ALGO_TAG_MAX_LEN);
+
+	ret = 0;
+	fd = open_file_or_dir3(object, &dirstream, open_flags);
+	if (fd == -1) {
+		ret = -errno;
+		error("open %s failed. %s\n", object, strerror(-ret));
+		goto out;
+	}
+
+	xattr_name = malloc(XATTR_BTRFS_PREFIX_LEN + strlen(name) + 1);
+	if (!xattr_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	memcpy(xattr_name, XATTR_BTRFS_PREFIX, XATTR_BTRFS_PREFIX_LEN);
+	memcpy(xattr_name + XATTR_BTRFS_PREFIX_LEN, name, strlen(name));
+	xattr_name[XATTR_BTRFS_PREFIX_LEN + strlen(name)] = '\0';
+
+	if (value_out) {
+		sret = fgetxattr(fd, xattr_name, buf, BTRFS_KEY_ALGO_TAG_MAX_LEN);
+		ret = -errno;
+		if (sret < 0 && errno == ENOATTR)
+			goto out;
+
+		if (sret < 0)
+			goto out;
+
+		ret = 0;
+		buf[sret] = '\0';
+
+		strncpy(value_out, buf, BTRFS_KEY_ALGO_TAG_MAX_LEN);
+
+		goto out;
+	}
+
+	if (value && !is_encryption_type_supported(value)) {
+		error("Cipher '%s' is not supported\n", value);
+		ret = -EPROTONOSUPPORT;
+		goto out;
+	}
+
+	generate_keytag(keytag, subvol_object);
+	ret = get_key(keytag, &keyserial);
+	if (ret) {
+		error("Failed to create a key: %s\n",
+					strerror(-ret));
+		goto out;
+	}
+
+	prefix_cipher_name(keystr, value, keytag);
+
+	sret = fsetxattr(fd, xattr_name, keystr, strlen(keystr), 0);
+	if (sret) {
+		ret = -errno;
+		error("failed to set attribute '%s' to '%s' : %s\n",
+				xattr_name, keystr, strerror(-ret));
+		keyctl(KEYCTL_REVOKE, keyserial);
+		goto out;
+	}
+
+out:
+	kfree(subvol_object);
+	kfree(xattr_name);
+	if (fd >= 0)
+		close_file_or_dir(fd, dirstream);
+
+	return ret;
+}
+
+int prop_encrypt(enum prop_object_type type, const char *object,
+				const char *name, const char *value)
+{
+	int ret;
+
+	if (value) {
+		ret = handle_prop_encrypt(type, object, name, value, NULL);
+	} else {
+		char val_out[256] = {0};
+		ret = handle_prop_encrypt(type, object, name, NULL, val_out);
+		if (!ret)
+			fprintf(stdout, "%s\n", val_out);
+	}
+	return ret;
+}
+
+int btrfs_set_subvol_encrypt(char *subvol, char *cipher_name)
+{
+	int ret;
+
+	ret = handle_prop_encrypt(prop_object_subvol, subvol,
+					"encrypt", cipher_name, NULL);
+
+	return ret;
+}
+
+int btrfs_get_subvol_encrypt(char *subvol, char *value_out)
+{
+	int ret;
+
+	ret = handle_prop_encrypt(prop_object_subvol, subvol,
+					"encrypt", NULL, value_out);
+
+	return ret;
+}
+
+static int split_key_alog_tag(const char *val, size_t len,
+                                        char *keyalgo, char *keytag)
+{
+	char *tmp;
+	char *tmp1;
+	char *tmp2;
+
+	tmp1 = tmp = strdup(val);
+	tmp[len] = '\0';
+
+	tmp2 = strsep(&tmp, "@");
+	if (!tmp2) {
+		kfree(tmp1);
+		return -EINVAL;
+	}
+
+	if (strlen(tmp2) > BTRFS_KEY_ALGO_MAX_LEN ||
+		strlen(tmp) > BTRFS_KEY_TAG_MAX_LEN) {
+		kfree(tmp1);
+		return -EINVAL;
+	}
+
+	if (keyalgo)
+		strcpy(keyalgo, tmp2);
+	if (keytag)
+		strcpy(keytag, tmp);
+
+	kfree(tmp1);
+	return 0;
+}
+
+int btrfs_subvol_key_info(char *subvol, char *key_algo, char *key_tag,
+						key_serial_t *key_serial)
+{
+	int ret;
+	char key_algo_tag[BTRFS_KEY_ALGO_TAG_MAX_LEN];
+
+	ret = btrfs_get_subvol_encrypt(subvol, key_algo_tag);
+	if (ret) {
+		#if 0
+		error("non encrypted subvolume %s: %s\n",
+						subvol, strerror(-ret));
+		error( "use 'btrfs subvolume create -e 'cipher' <subvol>'\
+					to create an encrypted subvolume\n");
+		#endif
+		return ret;
+	}
+
+	ret = split_key_alog_tag(key_algo_tag, strlen(key_algo_tag),
+						key_algo, key_tag);
+	if (ret) {
+		error("failed to parse key_tag in %s: %d\n",
+			key_algo_tag, ret);
+		return ret;
+	}
+
+	*key_serial = request_key(BTRFS_CRYPTO_KEY_TYPE, key_tag, NULL, 0);
+	if (*key_serial == -1) {
+		ret = -errno;
+		if (ret == -ENOKEY || ret == -EKEYEXPIRED || ret == -EKEYREVOKED)
+			ret = -ENOKEY;
+		return ret;
+	}
+
+	return 0;
+}
+
+int cmd_encrypt_login(int argc, char **argv)
+{
+	int ret;
+	char pr[10];
+	char *subvol;
+	key_serial_t keyserial;
+	char key_algo[BTRFS_KEY_ALGO_MAX_LEN + 1] = "";
+	char key_tag[BTRFS_KEY_TAG_MAX_LEN + 1] = "";
+
+	ret = 0;
+	keyserial = 0;
+	strcpy(pr, "already");
+
+#if 0
+	if (check_argc_exact(argc - optind, 1))
+		usage(cmd_encrypt_login_usage);
+#endif
+
+	subvol = argv[argc - 1];
+
+	ret = btrfs_subvol_key_info(subvol, key_algo, key_tag, &keyserial);
+	if (ret && ret != -ENOKEY) {
+		error("%s\n", strerror(-ret));
+		return ret;
+	}
+
+	if (keyserial == -1) {
+		pr[0] = '\0';
+
+		wait_for_commit_subvol(subvol);
+
+		ret = btrfs_set_subvol_encrypt(subvol, key_algo);
+		if (ret) {
+			error("key set failed: %s\n", strerror(-ret));
+			return ret;
+		}
+	}
+
+	fprintf(stdout,
+		"key for '%s' has %s logged in with keytag '%s'\n",
+		subvol, pr, key_tag);
+
+	return 0;
+}
+
+int cmd_encrypt_logout(int argc, char **argv)
+{
+	int ret;
+	char *subvol;
+	key_serial_t keyserial;
+	char key_tag[BTRFS_KEY_TAG_MAX_LEN + 1];
+	char key_algo[BTRFS_KEY_ALGO_MAX_LEN + 1];
+
+#if 0
+	if (check_argc_exact(argc - optind, 1))
+		usage(cmd_encrypt_login_usage);
+#endif
+
+	subvol = argv[argc - 1];
+
+	ret = btrfs_subvol_key_info(subvol, key_algo, key_tag, &keyserial);
+	if (ret) {
+		fprintf(stderr, "ERROR: %s\n", strerror(-ret));
+		return ret;
+	}
+
+	/*
+	 * Bit loosely coupled as of now, fixme
+	 * ask kernel to revoke, but user could use keyctl in the userspace
+	 * not too sure if using this
+	 *    down_write_nested(&btrfs_subvol_key->sem, 1)
+	 * in the kernel so that user spce can't revoke is a good idea.
+	 */
+	wait_for_commit_subvol(subvol);
+	keyctl(KEYCTL_REVOKE, keyserial);
+	return 0;
+}
diff --git a/encrypt.h b/encrypt.h
new file mode 100644
index 000000000000..6dbbd1e34d00
--- /dev/null
+++ b/encrypt.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2016 Oracle.  All rights reserved.
+ * Author: Anand Jain (anand.jain@oracle.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include "props.h"
+
+#define BTRFS_CRYPTO_KEY_TYPE_LOGON	1
+#if BTRFS_CRYPTO_KEY_TYPE_LOGON
+#define BTRFS_CRYPTO_KEY_TYPE "logon"
+#else
+#define BTRFS_CRYPTO_KEY_TYPE "user"
+#endif
+
+#define BTRFS_CRYPTO_KEY_SIZE		16
+#define	BTRFS_KEY_TAG_MAX_LEN		16
+#define	BTRFS_KEY_ALGO_MAX_LEN		16
+#define BTRFS_KEY_ALGO_TAG_MAX_LEN	(BTRFS_KEY_TAG_MAX_LEN + BTRFS_KEY_ALGO_MAX_LEN)
+
+void btrfs_create_keytag(char *keytag, char *subvol);
+void btrfs_create_encrypt_keytag_tuplet(char *keystr,
+			const char *encrypt_type, char *keytag);
+int btrfs_set_subvol_encrypt(char *subvol, char *val_in);
+int btrfs_get_subvol_encrypt(char *subvol, char *val_out);
+int prop_encrypt(enum prop_object_type type, const char *object,
+			const char *name, const char *value);
+int ask_key_for_keytag(char *keytag, key_serial_t *keyserial);
+int btrfs_subvol_key_info(char *subvol, char *key_algo, char *key_tag,
+						key_serial_t *key_serial);
+int cmd_encrypt_login(int argc, char **argv);
+int cmd_encrypt_logout(int argc, char **argv);
+int is_encryption_type_supported(const char *type);
+int btrfs_set_subvol_encrypt(char *subvol, char *cipher_name);
diff --git a/props.c b/props.c
index a7e3e96bc92e..c08c71944d01 100644
--- a/props.c
+++ b/props.c
@@ -20,11 +20,13 @@
 #include <sys/xattr.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <keyutils.h>
 
 #include "ctree.h"
 #include "commands.h"
 #include "utils.h"
 #include "props.h"
+#include "encrypt.h"
 
 #define XATTR_BTRFS_PREFIX     "btrfs."
 #define XATTR_BTRFS_PREFIX_LEN (sizeof(XATTR_BTRFS_PREFIX) - 1)
@@ -190,5 +192,7 @@ const struct prop_handler prop_handlers[] = {
 	 prop_object_dev | prop_object_root, prop_label},
 	{"compression", "Set/get compression for a file or directory", 0,
 	 prop_object_inode, prop_compression},
+	{"encrypt", "set/get encrypt property value", 0,
+	prop_object_subvol, prop_encrypt},
 	{NULL, NULL, 0, 0, NULL}
 };
diff --git a/utils.h b/utils.h
index 729e50a113a2..dcc911f9a615 100644
--- a/utils.h
+++ b/utils.h
@@ -226,6 +226,7 @@ int test_isdir(const char *path);
 const char *subvol_strip_mountpoint(const char *mnt, const char *full_path);
 int get_subvol_info(const char *fullpath, struct root_info *get_ri);
 int wait_for_commit(int fd);
+int wait_for_commit_subvol(char *subvol);
 
 /*
  * Btrfs minimum size calculation is complicated, it should include at least:
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] fstests: btrfs: support encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (2 preceding siblings ...)
  2016-09-13 13:39 ` [PATCH 2/2] btrfs-progs: add encryption support Anand Jain
@ 2016-09-13 13:39 ` Anand Jain
  2016-09-13 16:42 ` [RFC] Preliminary BTRFS Encryption Wilson Meier
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-13 13:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

This will help to test kernel encryption patch, and
when compiled with the below defines. So to use the
existing fstests test-cases on top of encryption.

diff --git a/fs/btrfs/encrypt.h b/fs/btrfs/encrypt.h
index 8e794da9d8f5..1ae6840d0742 100644
--- a/fs/btrfs/encrypt.h
+++ b/fs/btrfs/encrypt.h
@@ -25,9 +25,9 @@
 #ifndef BTRFS_CRYPT_SUB_FEATURES
 //testing
        //enable method
-       #define BTRFS_CRYPTO_TEST_ENABLE_BYMNTOPT       0
+       #define BTRFS_CRYPTO_TEST_ENABLE_BYMNTOPT       1
        //key choice
-       #define BTRFS_CRYPTO_TEST_BYDUMMYKEY            0 //off rest
+       #define BTRFS_CRYPTO_TEST_BYDUMMYKEY            1 //off rest
        #define BTRFS_CRYPTO_TEST_BYDUMMYENC            0 //off rest

Now use the following mount option during fstests to
exercise the extents with encryption.
  MOUNT_OPTIONS="-o compress=ctr(aes)"

As of now this mount option isn't for the end users but
for the testing only, but inspired by ecryptfs, we could
provide such an interface if useful. 

(Not sending this patch to fstests community as of now, but
it would be in the long run).

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 common/filter.btrfs |   2 +-
 common/rc           |   2 +-
 tests/btrfs/041     |   2 +
 tests/btrfs/041.out |  13 ++++
 tests/btrfs/052     |  12 +++
 tests/btrfs/052.out | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/079     |   2 +
 tests/btrfs/125     |   2 +-
 tests/generic/297   |   6 +-
 tests/generic/298   |   2 +-
 10 files changed, 251 insertions(+), 6 deletions(-)

diff --git a/common/filter.btrfs b/common/filter.btrfs
index 9970f4d42fce..cf93f6156247 100644
--- a/common/filter.btrfs
+++ b/common/filter.btrfs
@@ -69,7 +69,7 @@ _filter_btrfs_subvol_delete()
 
 _filter_btrfs_compress_property()
 {
-	sed -e "s/compression=\(lzo\|zlib\)/COMPRESSION=XXX/g"
+	sed -e "s/compression=\(lzo\|zlib\|ctr(aes)\)/COMPRESSION=XXX/g"
 }
 
 # filter name of the property from the output, optionally verify against $1
diff --git a/common/rc b/common/rc
index 67762a7fc834..a0e486bf55d2 100644
--- a/common/rc
+++ b/common/rc
@@ -3481,7 +3481,7 @@ _btrfs_stress_remount_compress()
 {
 	local btrfs_mnt=$1
 	while true; do
-		for algo in no zlib lzo; do
+		for algo in no zlib lzo 'ctr(aes)'; do
 			$MOUNT_PROG -o remount,compress=$algo $btrfs_mnt
 		done
 	done
diff --git a/tests/btrfs/041 b/tests/btrfs/041
index 8bb74cd2a241..be4a10fb3746 100755
--- a/tests/btrfs/041
+++ b/tests/btrfs/041
@@ -106,6 +106,8 @@ echo "Testing restore of file compressed with lzo"
 test_btrfs_restore "lzo"
 echo "Testing restore of file compressed with zlib"
 test_btrfs_restore "zlib"
+echo "Testing restore of file encrypted with ctr(aes)"
+test_btrfs_restore "ctr(aes)"
 echo "Testing restore of file without any compression"
 test_btrfs_restore
 
diff --git a/tests/btrfs/041.out b/tests/btrfs/041.out
index 9f4e53dec979..b8d5234649ef 100644
--- a/tests/btrfs/041.out
+++ b/tests/btrfs/041.out
@@ -25,6 +25,19 @@ wrote 100/100 bytes at offset 99000
 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 67edd038aaa42adb5a1aa78f2eb1d2b6  SCRATCH_MNT/foo
 67edd038aaa42adb5a1aa78f2eb1d2b6
+Testing restore of file encrypted with ctr(aes)
+wrote 100000/100000 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 100000/100000 bytes at offset 100000
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2/2 bytes at offset 10000
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 11/11 bytes at offset 33000
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 100/100 bytes at offset 99000
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+67edd038aaa42adb5a1aa78f2eb1d2b6  SCRATCH_MNT/foo
+67edd038aaa42adb5a1aa78f2eb1d2b6
 Testing restore of file without any compression
 wrote 100000/100000 bytes at offset 0
 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/052 b/tests/btrfs/052
index 599d2616f92f..94b555c4f422 100755
--- a/tests/btrfs/052
+++ b/tests/btrfs/052
@@ -186,5 +186,17 @@ _scratch_unmount
 echo "Testing with a nocow file and zlib compression"
 test_btrfs_clone_same_file "nodatacow,compress-force=zlib"
 
+_scratch_unmount
+
+echo "Testing with a cow file and ctr(aes) encryption"
+test_btrfs_clone_same_file "compress-force=ctr(aes)"
+
+_scratch_unmount
+
+echo "Testing with a nocow file and ctr(aes) encryption"
+test_btrfs_clone_same_file "nodatacow,compress-force=ctr(aes)"
+
+_scratch_unmount
+
 status=0
 exit
diff --git a/tests/btrfs/052.out b/tests/btrfs/052.out
index ac5924ecfa04..034d54fa7248 100644
--- a/tests/btrfs/052.out
+++ b/tests/btrfs/052.out
@@ -641,3 +641,217 @@ Blocks modified: [0 - 1]
 23 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
 *
 30
+Testing with a cow file and ctr(aes) encryption
+Blocks modified: [0 - 1]
+Blocks modified: [2 - 3]
+Blocks modified: [4 - 5]
+Blocks modified: [6 - 7]
+Blocks modified: [8 - 23]
+0 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+clone failed: Invalid argument
+0 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+clone failed: Invalid argument
+0 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+20 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+21 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+23 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+Blocks modified: [0 - 1]
+0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+20 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+21 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+23 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+20 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+21 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+23 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+Testing with a nocow file and ctr(aes) encryption
+Blocks modified: [0 - 1]
+Blocks modified: [2 - 3]
+Blocks modified: [4 - 5]
+Blocks modified: [6 - 7]
+Blocks modified: [8 - 23]
+0 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+clone failed: Invalid argument
+0 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+clone failed: Invalid argument
+0 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+4 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+20 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+21 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+23 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+Blocks modified: [0 - 1]
+0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+20 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+21 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+23 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
+0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+*
+2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+6 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+10 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+20 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
+*
+21 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
+*
+23 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
+*
+30
diff --git a/tests/btrfs/079 b/tests/btrfs/079
index 6aee3a373f91..489cbb174c9a 100755
--- a/tests/btrfs/079
+++ b/tests/btrfs/079
@@ -48,6 +48,8 @@ _cleanup()
 	wait
 	rm -fr $testfile
 	rm -fr $tmp.* $tmp
+	# to avoid umount getting failed with error busy
+	sleep 30
 }
 
 # get standard environment, filters and checks
diff --git a/tests/btrfs/125 b/tests/btrfs/125
index 1062b87b3eb9..eccb77b6a99e 100755
--- a/tests/btrfs/125
+++ b/tests/btrfs/125
@@ -138,7 +138,7 @@ _run_btrfs_util_prog device scan
 _scratch_mount >> $seqres.full 2>&1
 
 echo >> $seqres.full
-_run_btrfs_util_prog balance start ${SCRATCH_MNT}
+_run_btrfs_util_prog balance start --full-balance ${SCRATCH_MNT}
 
 _run_btrfs_util_prog filesystem show
 _run_btrfs_util_prog filesystem df ${SCRATCH_MNT}
diff --git a/tests/generic/297 b/tests/generic/297
index 4ae2b9c634c7..43d314710206 100755
--- a/tests/generic/297
+++ b/tests/generic/297
@@ -33,6 +33,8 @@ _cleanup()
 {
     cd /
     rm -rf $tmp.* $TEST_DIR/before $TEST_DIR/after
+    sync
+    sleep 40
 }
 
 # get standard environment, filters and checks
@@ -63,7 +65,7 @@ blksz="$(stat -f $testdir -c '%S')"
 _pwrite_byte 0x61 0 $blksz $testdir/file1 >> $seqres.full
 
 fnr=26		# 2^26 reflink extents should be enough to find a slow op?
-timeout=8	# guarantee a good long run...
+timeout=40	# guarantee a good long run...
 echo "Find a reflink size that takes a long time"
 truncate -s $(( (2 ** i) * blksz)) $testdir/file1
 for i in $(seq 0 $fnr); do
@@ -92,7 +94,7 @@ echo "reflink of $n bytes took $delta seconds" >> $seqres.full
 test $delta -gt $timeout && _fail "reflink didn't stop in time, n=$n t=$delta"
 
 echo "Check scratch fs"
-sleep 2		# give it a few seconds to actually die...
+sleep 40		# give it a few seconds to actually die...
 
 # success, all done
 status=0
diff --git a/tests/generic/298 b/tests/generic/298
index e85db1266fa9..4092efa6b961 100755
--- a/tests/generic/298
+++ b/tests/generic/298
@@ -92,7 +92,7 @@ echo "reflink of $n bytes took $delta seconds" >> $seqres.full
 test $delta -gt $timeout && _fail "reflink didn't stop in time, n=$n t=$delta"
 
 echo "Check scratch fs"
-sleep 2		# give it a few seconds to actually die...
+sleep 40		# give it a few seconds to actually die...
 
 # success, all done
 status=0
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH] btrfs: Encryption: Add btrfs encryption support
  2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
@ 2016-09-13 14:12   ` kbuild test robot
  2016-09-13 14:24   ` kbuild test robot
  2016-09-13 16:10   ` kbuild test robot
  2 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2016-09-13 14:12 UTC (permalink / raw)
  To: Anand Jain; +Cc: kbuild-all, linux-btrfs, clm, dsterba

[-- Attachment #1: Type: text/plain, Size: 4327 bytes --]

Hi Anand,

[auto build test WARNING on btrfs/next]
[cannot apply to v4.8-rc6 next-20160913]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-Encryption-Add-btrfs-encryption-support/20160913-214237
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
config: i386-randconfig-x009-201637 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:6:0,
                    from include/linux/kernel.h:13,
                    from include/linux/list.h:8,
                    from include/linux/hashtable.h:9,
                    from fs/btrfs/props.c:19:
   fs/btrfs/props.c: In function '__btrfs_set_prop':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:264:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> fs/btrfs/props.c:177:3: note: in expansion of macro 'pr_err'
      pr_err("BTRFS: property apply failed %s %d %s %lu\n",
      ^~~~~~
   fs/btrfs/props.c: In function 'prop_encrypt_validate':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:264:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   fs/btrfs/props.c:567:3: note: in expansion of macro 'pr_err'
      pr_err("BTRFS: %lu mal formed value '%s' %lu\n",
      ^~~~~~
   fs/btrfs/props.c: In function 'prop_cryptoiv_apply':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:264:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   fs/btrfs/props.c:696:3: note: in expansion of macro 'pr_err'
      pr_err("BTRFS: %lu prop_cryptoiv_apply failed ret %d len %lu\n",
      ^~~~~~

vim +/pr_err +177 fs/btrfs/props.c

   161			ASSERT(ret == 0);
   162	
   163			return ret;
   164		}
   165	
   166		ret = handler->validate(inode, value, value_len);
   167		if (ret) {
   168			return ret;
   169		}
   170		ret = __btrfs_setxattr(trans, inode, handler->xattr_name,
   171				       value, value_len, flags);
   172		if (ret) {
   173			return ret;
   174		}
   175		ret = handler->apply(inode, value, value_len);
   176		if (ret && ret != -EKEYREJECTED) {
 > 177			pr_err("BTRFS: property apply failed %s %d %s %lu\n",
   178						name, ret, value, value_len);
   179			__btrfs_setxattr(trans, inode, handler->xattr_name,
   180					 NULL, 0, flags);
   181			return ret;
   182		}
   183	
   184		set_bit(BTRFS_INODE_HAS_PROPS, &BTRFS_I(inode)->runtime_flags);
   185	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 20191 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] btrfs: Encryption: Add btrfs encryption support
  2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
  2016-09-13 14:12   ` kbuild test robot
@ 2016-09-13 14:24   ` kbuild test robot
  2016-09-13 16:10   ` kbuild test robot
  2 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2016-09-13 14:24 UTC (permalink / raw)
  To: Anand Jain; +Cc: kbuild-all, linux-btrfs, clm, dsterba

[-- Attachment #1: Type: text/plain, Size: 3690 bytes --]

Hi Anand,

[auto build test WARNING on btrfs/next]
[cannot apply to v4.8-rc6 next-20160913]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-Encryption-Add-btrfs-encryption-support/20160913-214237
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
config: i386-randconfig-s0-201637 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:6:0,
                    from include/linux/kernel.h:13,
                    from include/linux/crypto.h:21,
                    from fs/btrfs/encrypt.c:20:
   fs/btrfs/encrypt.c: In function 'btrfs_blkcipher':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:264:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> fs/btrfs/encrypt.c:327:3: note: in expansion of macro 'pr_err'
      pr_err("BTRFS: crypto, blk can't work with len %lu\n", len);
      ^~~~~~
   fs/btrfs/encrypt.c: In function 'btrfs_decrypt_pages_bio':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:264:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   fs/btrfs/encrypt.c:757:3: note: in expansion of macro 'pr_err'
      pr_err("BTRFS: crypto, untested: pages to be decrypted is less than expected, "\
      ^~~~~~

vim +/pr_err +327 fs/btrfs/encrypt.c

   311		int ret = -EFAULT;
   312		struct scatterlist sg;
   313		unsigned int ivsize = 0;
   314		unsigned int blksize = 0;
   315		char *cipher = "cbc(aes)";
   316		struct blkcipher_desc desc;
   317		struct crypto_blkcipher *blkcipher = NULL;
   318	
   319		blkcipher = crypto_alloc_blkcipher(cipher, 0, 0);
   320		if (IS_ERR(blkcipher)) {
   321			pr_err("BTRFS: crypto, allocate blkcipher handle for %s\n", cipher);
   322			return -PTR_ERR(blkcipher);
   323		}
   324	
   325		blksize = crypto_blkcipher_blocksize(blkcipher);
   326		if (len < blksize) {
 > 327			pr_err("BTRFS: crypto, blk can't work with len %lu\n", len);
   328			ret = -EINVAL;
   329			goto out;
   330		}
   331	
   332		if (crypto_blkcipher_setkey(blkcipher, btrfs_req->key,
   333						btrfs_req->key_len)) {
   334			pr_err("BTRFS: crypto, key could not be set\n");
   335			ret = -EAGAIN;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 26257 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] btrfs: Encryption: Add btrfs encryption support
  2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
  2016-09-13 14:12   ` kbuild test robot
  2016-09-13 14:24   ` kbuild test robot
@ 2016-09-13 16:10   ` kbuild test robot
  2 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2016-09-13 16:10 UTC (permalink / raw)
  To: Anand Jain; +Cc: kbuild-all, linux-btrfs, clm, dsterba

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]

Hi Anand,

[auto build test WARNING on btrfs/next]
[cannot apply to v4.8-rc6 next-20160913]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-Encryption-Add-btrfs-encryption-support/20160913-214237
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   fs/btrfs/tests/crypto-tests.c: In function 'test_print_data':
>> fs/btrfs/tests/crypto-tests.c:110:28: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
     printk("_BTRFS_: %s: sz %lu: ", prefix, sz);
                               ^
   fs/btrfs/tests/crypto-tests.c: In function 'test_ablkciphear2':
   fs/btrfs/tests/crypto-tests.c:314:44: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      printk("BTRFS_TEST: Encrypt '%s' size '%lu' Failed\n",
                                               ^
   fs/btrfs/tests/crypto-tests.c:324:44: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      printk("BTRFS_TEST: Decrypt '%s' size '%lu' Failed\n",
                                               ^

vim +110 fs/btrfs/tests/crypto-tests.c

    94		for (offset = 0; offset < TEST_DATA_SIZE; offset = offset + dlen)
    95			memcpy(kaddr + offset, str, dlen);
    96	
    97		flush_kernel_dcache_page(known_data_page);
    98	}
    99	
   100	void test_fini(void)
   101	{
   102		if (known_data_page)
   103			__free_page(known_data_page);
   104	}
   105	
   106	
   107	void test_print_data(const char *str, char *prefix, size_t sz, int print_as_str)
   108	{
   109		int i;
 > 110		printk("_BTRFS_: %s: sz %lu: ", prefix, sz);
   111	
   112		if (print_as_str)
   113			for (i = 0; i < sz; i++) printk("%c", str[i]);
   114		else
   115			for (i = 0; i < sz; i++) printk("%02x ", 0xF & str[i]);
   116	
   117		printk("\n");
   118	}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 55046 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (3 preceding siblings ...)
  2016-09-13 13:39 ` [PATCH] fstests: btrfs: support encryption Anand Jain
@ 2016-09-13 16:42 ` Wilson Meier
  2016-09-14  7:02   ` Anand Jain
  2016-09-15  4:53 ` Alex Elsayed
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Wilson Meier @ 2016-09-13 16:42 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, clm, dsterba

Hi Anand,

these are great news! Thanks for yor work. I'm looking forward to use the encryption.

I would like to ask a few question regarding the feature set.

1. is encryption of an existing, filled and unencrypted subvolume without manually moving the data possible?

2. What about encrypting the root and boot subvolume? Will it work with grub2?

3. How does btrfs rescue handle the encrypted subvolume to recover data in case of an emergency? 

4. Is it possible to unlock a subvolume using a keyfile?

Thanks in advance,

Wilson


> Am 13.09.2016 um 15:39 schrieb Anand Jain <anand.jain@oracle.com>:
> 
> 
> This patchset adds btrfs encryption support.
> 
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
> 
> A design write-up is coming next, however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.
> 
> Also would like to mention that a review from the security experts is due,
> which is important and I believe those review comments can be accommodated
> without major changes from here.
> 
> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.
> 
> As of now these patch set supports encryption on per subvolume, as
> managing properties on per subvolume is a kind of core to btrfs, which is
> easier for data center solution-ing, seamlessly persistent and easy to
> manage.
> 
> 
> Steps:
> -----
> 
> Make sure following kernel TFMs are compiled in.
> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
> name         : ctr(aes)
> name         : cbc(aes)
> 
> Create encrypted subvolume.
> # btrfs su create -e 'ctr(aes)' /btrfs/e1
> Create subvolume '/btrfs/e1'
> Passphrase: 
> Again passphrase: 
> 
> A key is created and its hash is updated into the subvolume item,
> and then added to the system keyctl.
> # btrfs su show /btrfs/e1 | egrep -i encrypt
>    Encryption:        ctr(aes)@btrfs:75197c8e (594790215)
> 
> # keyctl show 594790215
> Keyring
> 594790215 --alsw-v      0     0  logon: btrfs:75197c8e
> 
> 
> Now any file data extents under the subvol /btrfs/e1 will be
> encrypted.
> 
> You may revoke key using keyctl or btrfs(8) as below.
> # btrfs su encrypt -k out /btrfs/e1
> 
> # btrfs su show /btrfs/e1 | egrep -i encrypt
>    Encryption:        ctr(aes)@btrfs:75197c8e (Required key not available)
> 
> # keyctl show 594790215
> Keyring
> Unable to dump key: Key has been revoked
> 
> As the key hash is updated, If you provide wrong passphrase in the next
> key in, it won't add key to the system. So we have key verification
> from the day1.
> 
> # btrfs su encrypt -k in /btrfs/e1
> Passphrase: 
> Again passphrase: 
> ERROR: failed to set attribute 'btrfs.encrypt' to 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
> 
> ERROR: key set failed: Key was rejected by service
> 
> # btrfs su encrypt -k in /btrfs/e1
> Passphrase: 
> Again passphrase: 
> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
> 
> Now if you revoke the key the read / write fails with key error.
> 
> # md5sum /btrfs/e1/2k-test-file 
> 8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file
> 
> # btrfs su encrypt -k out /btrfs/e1
> 
> # md5sum /btrfs/e1/2k-test-file 
> md5sum: /btrfs/e1/2k-test-file: Key has been revoked
> 
> # cp /tfs/1k-test-file /btrfs/e1/
> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been revoked
> 
> Plain text memory scratches for security reason is pending. As there are some
> key revoke notification challenges to coincide with encryption context switch,
> which I do believe should be fixed in the due course, but is not a roadblock
> at this stage.
> 
> Thanks, Anand
> 
> 
> Anand Jain (1):
>  btrfs: Encryption: Add btrfs encryption support
> 
> fs/btrfs/Makefile               |   4 +-
> fs/btrfs/btrfs_inode.h          |   6 +
> fs/btrfs/compression.c          |  30 +-
> fs/btrfs/compression.h          |  10 +-
> fs/btrfs/ctree.h                |   4 +
> fs/btrfs/disk-io.c              |   3 +
> fs/btrfs/encrypt.c              | 807 ++++++++++++++++++++++++++++++++++++++++
> fs/btrfs/encrypt.h              |  94 +++++
> fs/btrfs/inode.c                | 255 ++++++++++++-
> fs/btrfs/ioctl.c                |  67 ++++
> fs/btrfs/lzo.c                  |   2 +-
> fs/btrfs/props.c                | 331 +++++++++++++++-
> fs/btrfs/super.c                |  27 +-
> fs/btrfs/tests/crypto-tests.c   | 376 +++++++++++++++++++
> fs/btrfs/tests/crypto-tests.h   |  38 ++
> fs/btrfs/zlib.c                 |   2 +-
> include/uapi/linux/btrfs_tree.h |   6 +-
> 17 files changed, 2027 insertions(+), 35 deletions(-)
> create mode 100644 fs/btrfs/encrypt.c
> create mode 100644 fs/btrfs/encrypt.h
> create mode 100755 fs/btrfs/tests/crypto-tests.c
> create mode 100755 fs/btrfs/tests/crypto-tests.h
> 
> Anand Jain (2):
>  btrfs-progs: make wait_for_commit non static
>  btrfs-progs: add encryption support
> 
> Makefile.in       |   5 +-
> btrfs-list.c      |  33 ++++
> cmds-filesystem.c |   4 +-
> cmds-restore.c    |  16 ++
> cmds-subvolume.c  | 112 ++++++++++++--
> commands.h        |   1 +
> ctree.h           |   5 +-
> encrypt.c         | 455 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> encrypt.h         |  46 ++++++
> props.c           |   4 +
> utils.h           |   2 +
> 11 files changed, 665 insertions(+), 18 deletions(-)
> create mode 100644 encrypt.c
> create mode 100644 encrypt.h
> 
> Anand Jain (1):
>  fstests: btrfs: support encryption
> 
> common/filter.btrfs |   2 +-
> common/rc           |   2 +-
> tests/btrfs/041     |   2 +
> tests/btrfs/041.out |  13 ++++
> tests/btrfs/052     |  12 +++
> tests/btrfs/052.out | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> tests/btrfs/079     |   2 +
> tests/btrfs/125     |   2 +-
> tests/generic/297   |   6 +-
> tests/generic/298   |   2 +-
> 10 files changed, 251 insertions(+), 6 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 16:42 ` [RFC] Preliminary BTRFS Encryption Wilson Meier
@ 2016-09-14  7:02   ` Anand Jain
  2016-09-14 18:26     ` Wilson Meier
  0 siblings, 1 reply; 66+ messages in thread
From: Anand Jain @ 2016-09-14  7:02 UTC (permalink / raw)
  To: Wilson Meier; +Cc: linux-btrfs, clm, dsterba



Wilson,

Thanks for commenting. Pls see inline below..

On 09/14/2016 12:42 AM, Wilson Meier wrote:
> Hi Anand,
>
> these are great news! Thanks for yor work. I'm looking forward to use the encryption.
>
> I would like to ask a few question regarding the feature set.
>
> 1. is encryption of an existing, filled and unencrypted subvolume without manually moving the data possible?

   Encrypt contexts are set only on newly created files. However you can
   create empty encrypted subvol and move files and dir into it. In short
   you can't set encrypt property on non-empty subvolume as of now.

> 2. What about encrypting the root and boot subvolume? Will it work with grub2?

   Keys are only in-memory, which does not persist or prompt
   for it across boot. I had keyctl code written to prompt
   for key but it isn't successful yet. Probably once we support
   keyfile, then root/boot support is possible.

> 3. How does btrfs rescue handle the encrypted subvolume to recover data in case of an emergency?

   btrfs rescue / btrfsck works as usual. btrfs restore which
   needs decrypt isn't supported.

> 4. Is it possible to unlock a subvolume using a keyfile?

   keyfile support is on top of the list to be supported, it helps
   testing as well.

Thanks, Anand


> Thanks in advance,
>
> Wilson
>
>
>> Am 13.09.2016 um 15:39 schrieb Anand Jain <anand.jain@oracle.com>:
>>
>>
>> This patchset adds btrfs encryption support.
>>
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>>
>> A design write-up is coming next, however here below is the quick example
>> on the cli usage. Please try out, let me know if I have missed something.
>>
>> Also would like to mention that a review from the security experts is due,
>> which is important and I believe those review comments can be accommodated
>> without major changes from here.
>>
>> Also yes, thanks for the emails, I hear, per file encryption and inline
>> with vfs layer is also important, which is wip among other things in the
>> list.
>>
>> As of now these patch set supports encryption on per subvolume, as
>> managing properties on per subvolume is a kind of core to btrfs, which is
>> easier for data center solution-ing, seamlessly persistent and easy to
>> manage.
>>
>>
>> Steps:
>> -----
>>
>> Make sure following kernel TFMs are compiled in.
>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>> name         : ctr(aes)
>> name         : cbc(aes)
>>
>> Create encrypted subvolume.
>> # btrfs su create -e 'ctr(aes)' /btrfs/e1
>> Create subvolume '/btrfs/e1'
>> Passphrase:
>> Again passphrase:
>>
>> A key is created and its hash is updated into the subvolume item,
>> and then added to the system keyctl.
>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>    Encryption:        ctr(aes)@btrfs:75197c8e (594790215)
>>
>> # keyctl show 594790215
>> Keyring
>> 594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>>
>>
>> Now any file data extents under the subvol /btrfs/e1 will be
>> encrypted.
>>
>> You may revoke key using keyctl or btrfs(8) as below.
>> # btrfs su encrypt -k out /btrfs/e1
>>
>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>    Encryption:        ctr(aes)@btrfs:75197c8e (Required key not available)
>>
>> # keyctl show 594790215
>> Keyring
>> Unable to dump key: Key has been revoked
>>
>> As the key hash is updated, If you provide wrong passphrase in the next
>> key in, it won't add key to the system. So we have key verification
>> from the day1.
>>
>> # btrfs su encrypt -k in /btrfs/e1
>> Passphrase:
>> Again passphrase:
>> ERROR: failed to set attribute 'btrfs.encrypt' to 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
>>
>> ERROR: key set failed: Key was rejected by service
>>
>> # btrfs su encrypt -k in /btrfs/e1
>> Passphrase:
>> Again passphrase:
>> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
>>
>> Now if you revoke the key the read / write fails with key error.
>>
>> # md5sum /btrfs/e1/2k-test-file
>> 8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file
>>
>> # btrfs su encrypt -k out /btrfs/e1
>>
>> # md5sum /btrfs/e1/2k-test-file
>> md5sum: /btrfs/e1/2k-test-file: Key has been revoked
>>
>> # cp /tfs/1k-test-file /btrfs/e1/
>> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been revoked
>>
>> Plain text memory scratches for security reason is pending. As there are some
>> key revoke notification challenges to coincide with encryption context switch,
>> which I do believe should be fixed in the due course, but is not a roadblock
>> at this stage.
>>
>> Thanks, Anand
>>
>>
>> Anand Jain (1):
>>  btrfs: Encryption: Add btrfs encryption support
>>
>> fs/btrfs/Makefile               |   4 +-
>> fs/btrfs/btrfs_inode.h          |   6 +
>> fs/btrfs/compression.c          |  30 +-
>> fs/btrfs/compression.h          |  10 +-
>> fs/btrfs/ctree.h                |   4 +
>> fs/btrfs/disk-io.c              |   3 +
>> fs/btrfs/encrypt.c              | 807 ++++++++++++++++++++++++++++++++++++++++
>> fs/btrfs/encrypt.h              |  94 +++++
>> fs/btrfs/inode.c                | 255 ++++++++++++-
>> fs/btrfs/ioctl.c                |  67 ++++
>> fs/btrfs/lzo.c                  |   2 +-
>> fs/btrfs/props.c                | 331 +++++++++++++++-
>> fs/btrfs/super.c                |  27 +-
>> fs/btrfs/tests/crypto-tests.c   | 376 +++++++++++++++++++
>> fs/btrfs/tests/crypto-tests.h   |  38 ++
>> fs/btrfs/zlib.c                 |   2 +-
>> include/uapi/linux/btrfs_tree.h |   6 +-
>> 17 files changed, 2027 insertions(+), 35 deletions(-)
>> create mode 100644 fs/btrfs/encrypt.c
>> create mode 100644 fs/btrfs/encrypt.h
>> create mode 100755 fs/btrfs/tests/crypto-tests.c
>> create mode 100755 fs/btrfs/tests/crypto-tests.h
>>
>> Anand Jain (2):
>>  btrfs-progs: make wait_for_commit non static
>>  btrfs-progs: add encryption support
>>
>> Makefile.in       |   5 +-
>> btrfs-list.c      |  33 ++++
>> cmds-filesystem.c |   4 +-
>> cmds-restore.c    |  16 ++
>> cmds-subvolume.c  | 112 ++++++++++++--
>> commands.h        |   1 +
>> ctree.h           |   5 +-
>> encrypt.c         | 455 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> encrypt.h         |  46 ++++++
>> props.c           |   4 +
>> utils.h           |   2 +
>> 11 files changed, 665 insertions(+), 18 deletions(-)
>> create mode 100644 encrypt.c
>> create mode 100644 encrypt.h
>>
>> Anand Jain (1):
>>  fstests: btrfs: support encryption
>>
>> common/filter.btrfs |   2 +-
>> common/rc           |   2 +-
>> tests/btrfs/041     |   2 +
>> tests/btrfs/041.out |  13 ++++
>> tests/btrfs/052     |  12 +++
>> tests/btrfs/052.out | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> tests/btrfs/079     |   2 +
>> tests/btrfs/125     |   2 +-
>> tests/generic/297   |   6 +-
>> tests/generic/298   |   2 +-
>> 10 files changed, 251 insertions(+), 6 deletions(-)
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-14  7:02   ` Anand Jain
@ 2016-09-14 18:26     ` Wilson Meier
  0 siblings, 0 replies; 66+ messages in thread
From: Wilson Meier @ 2016-09-14 18:26 UTC (permalink / raw)
  To: Anand Jain; +Cc: Wilson Meier, linux-btrfs, clm, dsterba



> Am 14.09.2016 um 09:02 schrieb Anand Jain <anand.jain@oracle.com>:
> 
> 
> 
> Wilson,
> 
> Thanks for commenting. Pls see inline below..
> 
>> On 09/14/2016 12:42 AM, Wilson Meier wrote:
>> Hi Anand,
>> 
>> these are great news! Thanks for yor work. I'm looking forward to use the encryption.
>> 
>> I would like to ask a few question regarding the feature set.
>> 
>> 1. is encryption of an existing, filled and unencrypted subvolume without manually moving the data possible?
> 
>  Encrypt contexts are set only on newly created files. However you can
>  create empty encrypted subvol and move files and dir into it. In short
>  you can't set encrypt property on non-empty subvolume as of now.

Ok, so manually moving to an new encrypted subvolume is the only possibility. Maybe there will be a possibility in the feature. ;)

> 
>> 2. What about encrypting the root and boot subvolume? Will it work with grub2?
> 
>  Keys are only in-memory, which does not persist or prompt
>  for it across boot. I had keyctl code written to prompt
>  for key but it isn't successful yet. Probably once we support
>  keyfile, then root/boot support is possible.
> 

Currently i'm using dm-crypt and btrfs to achieve a fully encrypted system. I'm looking forward to switch to a pure btrfs solution. Hopefully this will be possible soon.

>> 3. How does btrfs rescue handle the encrypted subvolume to recover data in case of an emergency?
> 
>  btrfs rescue / btrfsck works as usual. btrfs restore which
>  needs decrypt isn't supported.
> 

Don't get me wrong, but not being able to use btrfs restore is a showstopper as i already had a case where i could only rescue my data using the restore command. In my opinion in the current state of btrfs such recover options are key.

>> 4. Is it possible to unlock a subvolume using a keyfile?
> 
>  keyfile support is on top of the list to be supported, it helps
>  testing as well.
> 

This goes hand in hand with my question about boot/root unlocking. 

> Thanks, Anand

Thanks for your feedback. I really appreciate.
Wilson

> 
>> Thanks in advance,
>> 
>> Wilson
>> 
>> 
>>> Am 13.09.2016 um 15:39 schrieb Anand Jain <anand.jain@oracle.com>:
>>> 
>>> 
>>> This patchset adds btrfs encryption support.
>>> 
>>> The main objective of this series is to have bugs fixed and stability.
>>> I have verified with fstests to confirm that there is no regression.
>>> 
>>> A design write-up is coming next, however here below is the quick example
>>> on the cli usage. Please try out, let me know if I have missed something.
>>> 
>>> Also would like to mention that a review from the security experts is due,
>>> which is important and I believe those review comments can be accommodated
>>> without major changes from here.
>>> 
>>> Also yes, thanks for the emails, I hear, per file encryption and inline
>>> with vfs layer is also important, which is wip among other things in the
>>> list.
>>> 
>>> As of now these patch set supports encryption on per subvolume, as
>>> managing properties on per subvolume is a kind of core to btrfs, which is
>>> easier for data center solution-ing, seamlessly persistent and easy to
>>> manage.
>>> 
>>> 
>>> Steps:
>>> -----
>>> 
>>> Make sure following kernel TFMs are compiled in.
>>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>>> name         : ctr(aes)
>>> name         : cbc(aes)
>>> 
>>> Create encrypted subvolume.
>>> # btrfs su create -e 'ctr(aes)' /btrfs/e1
>>> Create subvolume '/btrfs/e1'
>>> Passphrase:
>>> Again passphrase:
>>> 
>>> A key is created and its hash is updated into the subvolume item,
>>> and then added to the system keyctl.
>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>>   Encryption:        ctr(aes)@btrfs:75197c8e (594790215)
>>> 
>>> # keyctl show 594790215
>>> Keyring
>>> 594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>>> 
>>> 
>>> Now any file data extents under the subvol /btrfs/e1 will be
>>> encrypted.
>>> 
>>> You may revoke key using keyctl or btrfs(8) as below.
>>> # btrfs su encrypt -k out /btrfs/e1
>>> 
>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>>   Encryption:        ctr(aes)@btrfs:75197c8e (Required key not available)
>>> 
>>> # keyctl show 594790215
>>> Keyring
>>> Unable to dump key: Key has been revoked
>>> 
>>> As the key hash is updated, If you provide wrong passphrase in the next
>>> key in, it won't add key to the system. So we have key verification
>>> from the day1.
>>> 
>>> # btrfs su encrypt -k in /btrfs/e1
>>> Passphrase:
>>> Again passphrase:
>>> ERROR: failed to set attribute 'btrfs.encrypt' to 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
>>> 
>>> ERROR: key set failed: Key was rejected by service
>>> 
>>> # btrfs su encrypt -k in /btrfs/e1
>>> Passphrase:
>>> Again passphrase:
>>> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
>>> 
>>> Now if you revoke the key the read / write fails with key error.
>>> 
>>> # md5sum /btrfs/e1/2k-test-file
>>> 8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file
>>> 
>>> # btrfs su encrypt -k out /btrfs/e1
>>> 
>>> # md5sum /btrfs/e1/2k-test-file
>>> md5sum: /btrfs/e1/2k-test-file: Key has been revoked
>>> 
>>> # cp /tfs/1k-test-file /btrfs/e1/
>>> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been revoked
>>> 
>>> Plain text memory scratches for security reason is pending. As there are some
>>> key revoke notification challenges to coincide with encryption context switch,
>>> which I do believe should be fixed in the due course, but is not a roadblock
>>> at this stage.
>>> 
>>> Thanks, Anand
>>> 
>>> 
>>> Anand Jain (1):
>>> btrfs: Encryption: Add btrfs encryption support
>>> 
>>> fs/btrfs/Makefile               |   4 +-
>>> fs/btrfs/btrfs_inode.h          |   6 +
>>> fs/btrfs/compression.c          |  30 +-
>>> fs/btrfs/compression.h          |  10 +-
>>> fs/btrfs/ctree.h                |   4 +
>>> fs/btrfs/disk-io.c              |   3 +
>>> fs/btrfs/encrypt.c              | 807 ++++++++++++++++++++++++++++++++++++++++
>>> fs/btrfs/encrypt.h              |  94 +++++
>>> fs/btrfs/inode.c                | 255 ++++++++++++-
>>> fs/btrfs/ioctl.c                |  67 ++++
>>> fs/btrfs/lzo.c                  |   2 +-
>>> fs/btrfs/props.c                | 331 +++++++++++++++-
>>> fs/btrfs/super.c                |  27 +-
>>> fs/btrfs/tests/crypto-tests.c   | 376 +++++++++++++++++++
>>> fs/btrfs/tests/crypto-tests.h   |  38 ++
>>> fs/btrfs/zlib.c                 |   2 +-
>>> include/uapi/linux/btrfs_tree.h |   6 +-
>>> 17 files changed, 2027 insertions(+), 35 deletions(-)
>>> create mode 100644 fs/btrfs/encrypt.c
>>> create mode 100644 fs/btrfs/encrypt.h
>>> create mode 100755 fs/btrfs/tests/crypto-tests.c
>>> create mode 100755 fs/btrfs/tests/crypto-tests.h
>>> 
>>> Anand Jain (2):
>>> btrfs-progs: make wait_for_commit non static
>>> btrfs-progs: add encryption support
>>> 
>>> Makefile.in       |   5 +-
>>> btrfs-list.c      |  33 ++++
>>> cmds-filesystem.c |   4 +-
>>> cmds-restore.c    |  16 ++
>>> cmds-subvolume.c  | 112 ++++++++++++--
>>> commands.h        |   1 +
>>> ctree.h           |   5 +-
>>> encrypt.c         | 455 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> encrypt.h         |  46 ++++++
>>> props.c           |   4 +
>>> utils.h           |   2 +
>>> 11 files changed, 665 insertions(+), 18 deletions(-)
>>> create mode 100644 encrypt.c
>>> create mode 100644 encrypt.h
>>> 
>>> Anand Jain (1):
>>> fstests: btrfs: support encryption
>>> 
>>> common/filter.btrfs |   2 +-
>>> common/rc           |   2 +-
>>> tests/btrfs/041     |   2 +
>>> tests/btrfs/041.out |  13 ++++
>>> tests/btrfs/052     |  12 +++
>>> tests/btrfs/052.out | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> tests/btrfs/079     |   2 +
>>> tests/btrfs/125     |   2 +-
>>> tests/generic/297   |   6 +-
>>> tests/generic/298   |   2 +-
>>> 10 files changed, 251 insertions(+), 6 deletions(-)
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (4 preceding siblings ...)
  2016-09-13 16:42 ` [RFC] Preliminary BTRFS Encryption Wilson Meier
@ 2016-09-15  4:53 ` Alex Elsayed
  2016-09-15 11:33   ` Anand Jain
  2016-09-15  5:38 ` Chris Murphy
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Alex Elsayed @ 2016-09-15  4:53 UTC (permalink / raw)
  To: linux-btrfs

On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote:

> This patchset adds btrfs encryption support.
> 
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
> 
> A design write-up is coming next, however here below is the quick
> example on the cli usage. Please try out, let me know if I have missed
> something.
> 
> Also would like to mention that a review from the security experts is
> due,
> which is important and I believe those review comments can be
> accommodated without major changes from here.
> 
> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.
> 
> As of now these patch set supports encryption on per subvolume, as
> managing properties on per subvolume is a kind of core to btrfs, which
> is easier for data center solution-ing, seamlessly persistent and easy
> to manage.
> 
> 
> Steps:
> -----
> 
> Make sure following kernel TFMs are compiled in.
> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
> name         : ctr(aes)
> name         : cbc(aes)

First problem: These are purely encryption algorithms, rather than AE 
(Authenticated Encryption) or AEAD (Authenticated Encryption with 
Associated Data). As a result, they are necessarily vulnerable to 
adaptive chosen-ciphertext attacks, and CBC has historically had other 
issues. I highly recommend using a well-reviewed AE or AEAD mode, such as 
AES-GCM (as ecryptfs does), as long as the code can handle the ciphertext 
being longer than the plaintext.

If it _cannot_ handle the ciphertext being longer than the plaintext, 
please consider that a very serious red flag: It means that you cannot 
provide better security than block-level encryption, which greatly 
reduces the benefit of filesystem-integrated encryption. Being at the 
extent level _should_ permit using AEAD - if it does not, something is 
wrong.

If at all possible, I'd suggest _only_ permitting AEAD cipher modes to be 
used.

Anyway, even for block-level encryption, CTR and CBC have been considered 
obsolete and potentially dangerous to use in disk encryption for quite a 
while - current recommendations for block-level encryption are to use 
either a narrow-block tweakable cipher mode (such as XTS), or a wide-
block one (such as EME or CMC), with the latter providing slightly better 
security, but worse performance.

> Create encrypted subvolume.
> # btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1'
> Passphrase:
> Again passphrase:

I presume the command first creates a key, then creates a subvolume 
referencing that key? If so, that seems sensible.

> A key is created and its hash is updated into the subvolume item,
> and then added to the system keyctl.
> # btrfs su show /btrfs/e1 | egrep -i encrypt
> 	Encryption: 		ctr(aes)@btrfs:75197c8e (594790215)
> 
> # keyctl show 594790215 Keyring
>  594790215 --alsw-v      0     0  logon: btrfs:75197c8e

That's entirely reasonable, though you may want to support "trusted and 
encrypted keys" (Documentation/security/keys-trusted-encrypted.txt)

> Now any file data extents under the subvol /btrfs/e1 will be encrypted.
> 
> You may revoke key using keyctl or btrfs(8) as below.
> # btrfs su encrypt -k out /btrfs/e1
> 
> # btrfs su show /btrfs/e1 | egrep -i encrypt
> 	Encryption: 		ctr(aes)@btrfs:75197c8e (Required key not 
available)
> 
> # keyctl show 594790215 Keyring Unable to dump key: Key has been revoked
> 
> As the key hash is updated, If you provide wrong passphrase in the next
> key in, it won't add key to the system. So we have key verification from
> the day1.

This is good.

> # btrfs su encrypt -k in /btrfs/e1 Passphrase:
> Again passphrase:
> ERROR: failed to set attribute 'btrfs.encrypt' to
> 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
> 
> ERROR: key set failed: Key was rejected by service
> 
> # btrfs su encrypt -k in /btrfs/e1 Passphrase:
> Again passphrase:
> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
> 
> Now if you revoke the key the read / write fails with key error.
> 
> # md5sum /btrfs/e1/2k-test-file 8c9fbc69125ebe84569a5c1ca088cb14 
> /btrfs/e1/2k-test-file
> 
> # btrfs su encrypt -k out /btrfs/e1
> 
> # md5sum /btrfs/e1/2k-test-file md5sum: /btrfs/e1/2k-test-file: Key has
> been revoked
> 
> # cp /tfs/1k-test-file /btrfs/e1/
> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been
> revoked
> 
> Plain text memory scratches for security reason is pending. As there are
> some key revoke notification challenges to coincide with encryption
> context switch,
> which I do believe should be fixed in the due course, but is not a
> roadblock at this stage.
> 
> Thanks, Anand


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (5 preceding siblings ...)
  2016-09-15  4:53 ` Alex Elsayed
@ 2016-09-15  5:38 ` Chris Murphy
  2016-09-15 11:32   ` Anand Jain
  2016-09-15 11:37 ` Austin S. Hemmelgarn
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Chris Murphy @ 2016-09-15  5:38 UTC (permalink / raw)
  To: Anand Jain; +Cc: Btrfs BTRFS, Chris Mason, David Sterba

On Tue, Sep 13, 2016 at 7:39 AM, Anand Jain <anand.jain@oracle.com> wrote:
>
> This patchset adds btrfs encryption support.
>
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
>
> A design write-up is coming next, however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.

What's the behavior with nested subvolumes having different keys?

subvolume A (encrypted with key A)
    |
    - subvolume B (encrypted with key B)

Without encryption I can discover either A or B whether top-level, A,
or B are mounted.

With encryption, must A be opened [1] for B to be discovered? Must A
be opened before B can be opened? Or is the subvolume metadata always
non-encrypted, and it's just file extents that are encrypted? Are
filenames in those subvolumes discoverable (e.g. btrfs-debug-tree,
btrfs-image) if the subvolume is not opened? And reflink handling
between subvolumes behaves how?


[1] open in the cryptsetup open/luksOpen sense


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15  5:38 ` Chris Murphy
@ 2016-09-15 11:32   ` Anand Jain
  0 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-15 11:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Chris Mason, David Sterba


Thanks for the comments. Pls see inline below..

On 09/15/2016 01:38 PM, Chris Murphy wrote:
> On Tue, Sep 13, 2016 at 7:39 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>> This patchset adds btrfs encryption support.
>>
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>>
>> A design write-up is coming next, however here below is the quick example
>> on the cli usage. Please try out, let me know if I have missed something.
>
> What's the behavior with nested subvolumes having different keys?
>
> subvolume A (encrypted with key A)
>     |
>     - subvolume B (encrypted with key B)
>
> Without encryption I can discover either A or B whether top-level, A,
> or B are mounted.
>
> With encryption, must A be opened [1] for B to be discovered? Must A
> be opened before B can be opened? Or is the subvolume metadata always
> non-encrypted, and it's just file extents that are encrypted? Are
> filenames in those subvolumes discoverable (e.g. btrfs-debug-tree,
> btrfs-image) if the subvolume is not opened? And reflink handling
> between subvolumes behaves how?

   nested encrypting subvolume isn't supported, its just that it wasn't
   in my mind or the use case analysis review which I did, didn't tell
   me that. However I did a bit of code changes, its not that tough to
   get that in the current setup though. Yes only extent encrypted.

Thanks, Anand

>
> [1] open in the cryptsetup open/luksOpen sense
>
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15  4:53 ` Alex Elsayed
@ 2016-09-15 11:33   ` Anand Jain
  2016-09-15 11:47     ` Alex Elsayed
  0 siblings, 1 reply; 66+ messages in thread
From: Anand Jain @ 2016-09-15 11:33 UTC (permalink / raw)
  To: Alex Elsayed, linux-btrfs


Thanks for commenting. pls see inline below.

On 09/15/2016 12:53 PM, Alex Elsayed wrote:
> On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote:
>
>> This patchset adds btrfs encryption support.
>>
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>>
>> A design write-up is coming next, however here below is the quick
>> example on the cli usage. Please try out, let me know if I have missed
>> something.
>>
>> Also would like to mention that a review from the security experts is
>> due,
>> which is important and I believe those review comments can be
>> accommodated without major changes from here.
>>
>> Also yes, thanks for the emails, I hear, per file encryption and inline
>> with vfs layer is also important, which is wip among other things in the
>> list.
>>
>> As of now these patch set supports encryption on per subvolume, as
>> managing properties on per subvolume is a kind of core to btrfs, which
>> is easier for data center solution-ing, seamlessly persistent and easy
>> to manage.
>>
>>
>> Steps:
>> -----
>>
>> Make sure following kernel TFMs are compiled in.
>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>> name         : ctr(aes)
>> name         : cbc(aes)
>
> First problem: These are purely encryption algorithms, rather than AE
> (Authenticated Encryption) or AEAD (Authenticated Encryption with
> Associated Data). As a result, they are necessarily vulnerable to
> adaptive chosen-ciphertext attacks, and CBC has historically had other
> issues. I highly recommend using a well-reviewed AE or AEAD mode, such as
> AES-GCM (as ecryptfs does), as long as the code can handle the ciphertext
> being longer than the plaintext.
>
> If it _cannot_ handle the ciphertext being longer than the plaintext,
> please consider that a very serious red flag: It means that you cannot
> provide better security than block-level encryption, which greatly
> reduces the benefit of filesystem-integrated encryption. Being at the
> extent level _should_ permit using AEAD - if it does not, something is
> wrong.
>
> If at all possible, I'd suggest _only_ permitting AEAD cipher modes to be
> used.
>
> Anyway, even for block-level encryption, CTR and CBC have been considered
> obsolete and potentially dangerous to use in disk encryption for quite a
> while - current recommendations for block-level encryption are to use
> either a narrow-block tweakable cipher mode (such as XTS), or a wide-
> block one (such as EME or CMC), with the latter providing slightly better
> security, but worse performance.

   Yes. CTR should be changed, so I have kept it as a cli option. And
   with the current internal design, hope we can plugin more algorithms
   as suggested/if-its-outdated and yes code can handle (or with a little
   tweak) bigger ciphertext (than plaintext) as well.

   encryption + keyhash (as below) + Btrfs-data-checksum provides
   similar to AE,  right ?


>> Create encrypted subvolume.
>> # btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1'
>> Passphrase:
>> Again passphrase:
>
> I presume the command first creates a key, then creates a subvolume
> referencing that key? If so, that seems sensible.

  Hmm I didn't get the why part, any help ? (this doesn't encrypt
  metadata part).

>> A key is created and its hash is updated into the subvolume item,
>> and then added to the system keyctl.
>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>> 	Encryption: 		ctr(aes)@btrfs:75197c8e (594790215)
>>
>> # keyctl show 594790215 Keyring
>>  594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>
> That's entirely reasonable, though you may want to support "trusted and
> encrypted keys" (Documentation/security/keys-trusted-encrypted.txt)

   Yes. that's in the list.

>> Now any file data extents under the subvol /btrfs/e1 will be encrypted.
>>
>> You may revoke key using keyctl or btrfs(8) as below.
>> # btrfs su encrypt -k out /btrfs/e1
>>
>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>> 	Encryption: 		ctr(aes)@btrfs:75197c8e (Required key not
> available)
>>
>> # keyctl show 594790215 Keyring Unable to dump key: Key has been revoked
>>
>> As the key hash is updated, If you provide wrong passphrase in the next
>> key in, it won't add key to the system. So we have key verification from
>> the day1.
>
> This is good.

   Thanks.

Thanks, Anand


>> # btrfs su encrypt -k in /btrfs/e1 Passphrase:
>> Again passphrase:
>> ERROR: failed to set attribute 'btrfs.encrypt' to
>> 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
>>
>> ERROR: key set failed: Key was rejected by service
>>
>> # btrfs su encrypt -k in /btrfs/e1 Passphrase:
>> Again passphrase:
>> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
>>
>> Now if you revoke the key the read / write fails with key error.
>>
>> # md5sum /btrfs/e1/2k-test-file 8c9fbc69125ebe84569a5c1ca088cb14
>> /btrfs/e1/2k-test-file
>>
>> # btrfs su encrypt -k out /btrfs/e1
>>
>> # md5sum /btrfs/e1/2k-test-file md5sum: /btrfs/e1/2k-test-file: Key has
>> been revoked
>>
>> # cp /tfs/1k-test-file /btrfs/e1/
>> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been
>> revoked
>>
>> Plain text memory scratches for security reason is pending. As there are
>> some key revoke notification challenges to coincide with encryption
>> context switch,
>> which I do believe should be fixed in the due course, but is not a
>> roadblock at this stage.
>>
>> Thanks, Anand
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (6 preceding siblings ...)
  2016-09-15  5:38 ` Chris Murphy
@ 2016-09-15 11:37 ` Austin S. Hemmelgarn
  2016-09-15 14:06   ` Anand Jain
  2016-09-16  1:12 ` Dave Chinner
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-15 11:37 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: clm, dsterba

On 2016-09-13 09:39, Anand Jain wrote:
>
> This patchset adds btrfs encryption support.
>
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
>
> A design write-up is coming next, however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.
>
> Also would like to mention that a review from the security experts is due,
> which is important and I believe those review comments can be accommodated
> without major changes from here.
>
> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.
>
> As of now these patch set supports encryption on per subvolume, as
> managing properties on per subvolume is a kind of core to btrfs, which is
> easier for data center solution-ing, seamlessly persistent and easy to
> manage.
>
>
> Steps:
> -----
>
> Make sure following kernel TFMs are compiled in.
> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
> name         : ctr(aes)
> name         : cbc(aes)
>
> Create encrypted subvolume.
> # btrfs su create -e 'ctr(aes)' /btrfs/e1
> Create subvolume '/btrfs/e1'
> Passphrase:
> Again passphrase:
>
> A key is created and its hash is updated into the subvolume item,
> and then added to the system keyctl.
> # btrfs su show /btrfs/e1 | egrep -i encrypt
> 	Encryption: 		ctr(aes)@btrfs:75197c8e (594790215)
>
> # keyctl show 594790215
> Keyring
>  594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>
>
> Now any file data extents under the subvol /btrfs/e1 will be
> encrypted.
>
> You may revoke key using keyctl or btrfs(8) as below.
> # btrfs su encrypt -k out /btrfs/e1
>
> # btrfs su show /btrfs/e1 | egrep -i encrypt
> 	Encryption: 		ctr(aes)@btrfs:75197c8e (Required key not available)
>
> # keyctl show 594790215
> Keyring
> Unable to dump key: Key has been revoked
>
> As the key hash is updated, If you provide wrong passphrase in the next
> key in, it won't add key to the system. So we have key verification
> from the day1.
>
> # btrfs su encrypt -k in /btrfs/e1
> Passphrase:
> Again passphrase:
> ERROR: failed to set attribute 'btrfs.encrypt' to 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
>
> ERROR: key set failed: Key was rejected by service
>
> # btrfs su encrypt -k in /btrfs/e1
> Passphrase:
> Again passphrase:
> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
>
> Now if you revoke the key the read / write fails with key error.
>
> # md5sum /btrfs/e1/2k-test-file
> 8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file
>
> # btrfs su encrypt -k out /btrfs/e1
>
> # md5sum /btrfs/e1/2k-test-file
> md5sum: /btrfs/e1/2k-test-file: Key has been revoked
>
> # cp /tfs/1k-test-file /btrfs/e1/
> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been revoked
>
> Plain text memory scratches for security reason is pending. As there are some
> key revoke notification challenges to coincide with encryption context switch,
> which I do believe should be fixed in the due course, but is not a roadblock
> at this stage.
>
Before I make any other comments, I should state that I asbolutely agree 
with Alex Elsayed about the issues with using CBC or CTR mode, and not 
supporting AE or AEAD modes.  If that's going to be the case, then 
there's essentially no point in merging this as is, as it has worse 
security than other filesystem level encryption options in the kernel by 
a pretty significant margin.  This absolutely _needs_ to be done right 
the first time, otherwise the reputation of BTRFS will suffer further, 
and nobody sane is going to use subvolume encryption for years after 
it's 'fixed' to be properly secure.

Now, the other thing I wanted to comment about:
How does this handle cloning of extents?  Can extents be cloned across 
subvolume boundaries when one of the subvolumes is encrypted?  Can they 
be cloned within an encrypted subvolume?  What happens when you try to 
clone them in either case if it isn't supported?


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15 11:33   ` Anand Jain
@ 2016-09-15 11:47     ` Alex Elsayed
  2016-09-16 11:35       ` Anand Jain
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Elsayed @ 2016-09-15 11:47 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 15 Sep 2016 19:33:48 +0800, Anand Jain wrote:

> Thanks for commenting. pls see inline below.
> 
> On 09/15/2016 12:53 PM, Alex Elsayed wrote:
>> On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote:
>>
>>> This patchset adds btrfs encryption support.
>>>
>>> The main objective of this series is to have bugs fixed and stability.
>>> I have verified with fstests to confirm that there is no regression.
>>>
>>> A design write-up is coming next, however here below is the quick
>>> example on the cli usage. Please try out, let me know if I have missed
>>> something.
>>>
>>> Also would like to mention that a review from the security experts is
>>> due,
>>> which is important and I believe those review comments can be
>>> accommodated without major changes from here.
>>>
>>> Also yes, thanks for the emails, I hear, per file encryption and
>>> inline with vfs layer is also important, which is wip among other
>>> things in the list.
>>>
>>> As of now these patch set supports encryption on per subvolume, as
>>> managing properties on per subvolume is a kind of core to btrfs, which
>>> is easier for data center solution-ing, seamlessly persistent and easy
>>> to manage.
>>>
>>>
>>> Steps:
>>> -----
>>>
>>> Make sure following kernel TFMs are compiled in.
>>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>>> name         : ctr(aes)
>>> name         : cbc(aes)
>>
>> First problem: These are purely encryption algorithms, rather than AE
>> (Authenticated Encryption) or AEAD (Authenticated Encryption with
>> Associated Data). As a result, they are necessarily vulnerable to
>> adaptive chosen-ciphertext attacks, and CBC has historically had other
>> issues. I highly recommend using a well-reviewed AE or AEAD mode, such
>> as AES-GCM (as ecryptfs does), as long as the code can handle the
>> ciphertext being longer than the plaintext.
>>
>> If it _cannot_ handle the ciphertext being longer than the plaintext,
>> please consider that a very serious red flag: It means that you cannot
>> provide better security than block-level encryption, which greatly
>> reduces the benefit of filesystem-integrated encryption. Being at the
>> extent level _should_ permit using AEAD - if it does not, something is
>> wrong.
>>
>> If at all possible, I'd suggest _only_ permitting AEAD cipher modes to
>> be used.
>>
>> Anyway, even for block-level encryption, CTR and CBC have been
>> considered obsolete and potentially dangerous to use in disk encryption
>> for quite a while - current recommendations for block-level encryption
>> are to use either a narrow-block tweakable cipher mode (such as XTS),
>> or a wide- block one (such as EME or CMC), with the latter providing
>> slightly better security, but worse performance.
> 
>    Yes. CTR should be changed, so I have kept it as a cli option. And
>    with the current internal design, hope we can plugin more algorithms
>    as suggested/if-its-outdated and yes code can handle (or with a
>    little tweak) bigger ciphertext (than plaintext) as well.
> 
>    encryption + keyhash (as below) + Btrfs-data-checksum provides
>    similar to AE,  right ?

No, it does not provide anything remotely similar to AE. AE requires 
_cryptographic_ authentication of the data. Not only is a CRC (as Btrfs 
uses for the data checksum) not enough, a _cryptographic hash_ (such as 
SHA256) isn't even enough. A MAC (message authentication code) is 
necessary.

Moreover, combining an encryption algorithm and a MAC is very easy to get 
wrong, in ways that absolutely ruin security - as an example, see the 
Vaudenay/Lucky13 padding oracle attacks on TLS.

In order for this to be secure, you need to use a secure encryption 
system that also authenticates the data in a cryptographically secure 
manner. Certain schemes are well-studied and believed to be secure - AES-
GCM and ChaCha20-Poly1305 are common and well-regarded, and there's a 
generic security reduction for Encrypt-then-MAC constructions (using CTR 
together with HMAC in such a construction is generally acceptable).

The Btrfs data checksum is wholly inadequate, and the keyhash is a non-
sequitur - it prevents accidentally opening the subvolume with the wrong 
key, but neither it (nor the btrfs data checksum, which is a CRC rather 
than a cryptographic MAC) protect adequately against malicious corruption 
of the ciphertext.

I'd suggest pulling in Herbert Xu, as he'd likely be able to tell you 
what of the Crypto API is actually sane to use for this.

>>> Create encrypted subvolume.
>>> # btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1'
>>> Passphrase:
>>> Again passphrase:
>>
>> I presume the command first creates a key, then creates a subvolume
>> referencing that key? If so, that seems sensible.
> 
>   Hmm I didn't get the why part, any help ? (this doesn't encrypt
>   metadata part).

Basically, if your tool merely sets up an entry in the kernel keyring, 
then calls the subvolume creation interface (passing in the key ID), then 
it can be composed with more advanced tooling that generates the key in a 
different manner.

If, instead, you call the subvolume creation API with a flag saying 
"please also create a key", then it does not compose and is inflexible.

That then becomes an obstacle to later extensions, such as trusted & 
encrypted keys.

>>> A key is created and its hash is updated into the subvolume item,
>>> and then added to the system keyctl.
>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>> 	Encryption: 		ctr(aes)@btrfs:75197c8e (594790215)
>>>
>>> # keyctl show 594790215 Keyring
>>>  594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>>
>> That's entirely reasonable, though you may want to support "trusted and
>> encrypted keys" (Documentation/security/keys-trusted-encrypted.txt)
> 
>    Yes. that's in the list.
> 

Okay, good to hear!

<snip>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15 11:37 ` Austin S. Hemmelgarn
@ 2016-09-15 14:06   ` Anand Jain
  2016-09-15 14:24     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 66+ messages in thread
From: Anand Jain @ 2016-09-15 14:06 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, linux-btrfs; +Cc: clm, dsterba


Thanks for comments.
Pls see inline as below.

On 09/15/2016 07:37 PM, Austin S. Hemmelgarn wrote:
> On 2016-09-13 09:39, Anand Jain wrote:
>>
>> This patchset adds btrfs encryption support.
>>
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>>
>> A design write-up is coming next, however here below is the quick example
>> on the cli usage. Please try out, let me know if I have missed something.
>>
>> Also would like to mention that a review from the security experts is
>> due,
>> which is important and I believe those review comments can be
>> accommodated
>> without major changes from here.
>>
>> Also yes, thanks for the emails, I hear, per file encryption and inline
>> with vfs layer is also important, which is wip among other things in the
>> list.
>>
>> As of now these patch set supports encryption on per subvolume, as
>> managing properties on per subvolume is a kind of core to btrfs, which is
>> easier for data center solution-ing, seamlessly persistent and easy to
>> manage.
>>
>>
>> Steps:
>> -----
>>
>> Make sure following kernel TFMs are compiled in.
>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>> name         : ctr(aes)
>> name         : cbc(aes)
>>
>> Create encrypted subvolume.
>> # btrfs su create -e 'ctr(aes)' /btrfs/e1
>> Create subvolume '/btrfs/e1'
>> Passphrase:
>> Again passphrase:
>>
>> A key is created and its hash is updated into the subvolume item,
>> and then added to the system keyctl.
>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>     Encryption:         ctr(aes)@btrfs:75197c8e (594790215)
>>
>> # keyctl show 594790215
>> Keyring
>>  594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>>
>>
>> Now any file data extents under the subvol /btrfs/e1 will be
>> encrypted.
>>
>> You may revoke key using keyctl or btrfs(8) as below.
>> # btrfs su encrypt -k out /btrfs/e1
>>
>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>     Encryption:         ctr(aes)@btrfs:75197c8e (Required key not
>> available)
>>
>> # keyctl show 594790215
>> Keyring
>> Unable to dump key: Key has been revoked
>>
>> As the key hash is updated, If you provide wrong passphrase in the next
>> key in, it won't add key to the system. So we have key verification
>> from the day1.
>>
>> # btrfs su encrypt -k in /btrfs/e1
>> Passphrase:
>> Again passphrase:
>> ERROR: failed to set attribute 'btrfs.encrypt' to
>> 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
>>
>> ERROR: key set failed: Key was rejected by service
>>
>> # btrfs su encrypt -k in /btrfs/e1
>> Passphrase:
>> Again passphrase:
>> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
>>
>> Now if you revoke the key the read / write fails with key error.
>>
>> # md5sum /btrfs/e1/2k-test-file
>> 8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file
>>
>> # btrfs su encrypt -k out /btrfs/e1
>>
>> # md5sum /btrfs/e1/2k-test-file
>> md5sum: /btrfs/e1/2k-test-file: Key has been revoked
>>
>> # cp /tfs/1k-test-file /btrfs/e1/
>> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been
>> revoked
>>
>> Plain text memory scratches for security reason is pending. As there
>> are some
>> key revoke notification challenges to coincide with encryption context
>> switch,
>> which I do believe should be fixed in the due course, but is not a
>> roadblock
>> at this stage.
>>



> Before I make any other comments, I should state that I asbolutely agree
> with Alex Elsayed about the issues with using CBC or CTR mode, and not
> supporting AE or AEAD modes.

   Alex comments was quite detailed, I did reply to it.
   Looks like you missed my reply to Alex's comments ?

> How does this handle cloning of extents?  Can extents be cloned across
> subvolume boundaries when one of the subvolumes is encrypted?

  Yes only if both the subvol keys match.

> Can they
> be cloned within an encrypted subvolume?

  Yes. That's things as usual.

> What happens when you try to
> clone them in either case if it isn't supported?

  Gets -EOPNOTSUPP.

Thanks, Anand


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15 14:06   ` Anand Jain
@ 2016-09-15 14:24     ` Austin S. Hemmelgarn
  2016-09-16  8:58       ` David Sterba
  2016-09-17  2:18       ` Zygo Blaxell
  0 siblings, 2 replies; 66+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-15 14:24 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: clm, dsterba

On 2016-09-15 10:06, Anand Jain wrote:
>
> Thanks for comments.
> Pls see inline as below.
>
> On 09/15/2016 07:37 PM, Austin S. Hemmelgarn wrote:
>> On 2016-09-13 09:39, Anand Jain wrote:
>>>
>>> This patchset adds btrfs encryption support.
>>>
>>> The main objective of this series is to have bugs fixed and stability.
>>> I have verified with fstests to confirm that there is no regression.
>>>
>>> A design write-up is coming next, however here below is the quick
>>> example
>>> on the cli usage. Please try out, let me know if I have missed
>>> something.
>>>
>>> Also would like to mention that a review from the security experts is
>>> due,
>>> which is important and I believe those review comments can be
>>> accommodated
>>> without major changes from here.
>>>
>>> Also yes, thanks for the emails, I hear, per file encryption and inline
>>> with vfs layer is also important, which is wip among other things in the
>>> list.
>>>
>>> As of now these patch set supports encryption on per subvolume, as
>>> managing properties on per subvolume is a kind of core to btrfs,
>>> which is
>>> easier for data center solution-ing, seamlessly persistent and easy to
>>> manage.
>>>
>>>
>>> Steps:
>>> -----
>>>
>>> Make sure following kernel TFMs are compiled in.
>>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>>> name         : ctr(aes)
>>> name         : cbc(aes)
>>>
>>> Create encrypted subvolume.
>>> # btrfs su create -e 'ctr(aes)' /btrfs/e1
>>> Create subvolume '/btrfs/e1'
>>> Passphrase:
>>> Again passphrase:
>>>
>>> A key is created and its hash is updated into the subvolume item,
>>> and then added to the system keyctl.
>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>>     Encryption:         ctr(aes)@btrfs:75197c8e (594790215)
>>>
>>> # keyctl show 594790215
>>> Keyring
>>>  594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>>>
>>>
>>> Now any file data extents under the subvol /btrfs/e1 will be
>>> encrypted.
>>>
>>> You may revoke key using keyctl or btrfs(8) as below.
>>> # btrfs su encrypt -k out /btrfs/e1
>>>
>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>>     Encryption:         ctr(aes)@btrfs:75197c8e (Required key not
>>> available)
>>>
>>> # keyctl show 594790215
>>> Keyring
>>> Unable to dump key: Key has been revoked
>>>
>>> As the key hash is updated, If you provide wrong passphrase in the next
>>> key in, it won't add key to the system. So we have key verification
>>> from the day1.
>>>
>>> # btrfs su encrypt -k in /btrfs/e1
>>> Passphrase:
>>> Again passphrase:
>>> ERROR: failed to set attribute 'btrfs.encrypt' to
>>> 'ctr(aes)@btrfs:75197c8e' : Key was rejected by service
>>>
>>> ERROR: key set failed: Key was rejected by service
>>>
>>> # btrfs su encrypt -k in /btrfs/e1
>>> Passphrase:
>>> Again passphrase:
>>> key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'
>>>
>>> Now if you revoke the key the read / write fails with key error.
>>>
>>> # md5sum /btrfs/e1/2k-test-file
>>> 8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file
>>>
>>> # btrfs su encrypt -k out /btrfs/e1
>>>
>>> # md5sum /btrfs/e1/2k-test-file
>>> md5sum: /btrfs/e1/2k-test-file: Key has been revoked
>>>
>>> # cp /tfs/1k-test-file /btrfs/e1/
>>> cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been
>>> revoked
>>>
>>> Plain text memory scratches for security reason is pending. As there
>>> are some
>>> key revoke notification challenges to coincide with encryption context
>>> switch,
>>> which I do believe should be fixed in the due course, but is not a
>>> roadblock
>>> at this stage.
>>>
>
>
>
>> Before I make any other comments, I should state that I asbolutely agree
>> with Alex Elsayed about the issues with using CBC or CTR mode, and not
>> supporting AE or AEAD modes.
>
>   Alex comments was quite detailed, I did reply to it.
>   Looks like you missed my reply to Alex's comments ?
I've been having issues with GMail delaying random e-mails for excessive 
amounts of time (hours sometimes), so I didn't see your reply before 
sending this.  Even so, I do want it on the record that I agree with him 
completely.
>
>> How does this handle cloning of extents?  Can extents be cloned across
>> subvolume boundaries when one of the subvolumes is encrypted?
>
>  Yes only if both the subvol keys match.
OK, that makes sense.
>
>> Can they
>> be cloned within an encrypted subvolume?
>
>  Yes. That's things as usual.
Glad to see that that still works.  Most people I know who do batch 
deduplication do so within subvolumes but not across them, so that still 
working with encrypted subvolumes is a good thing.
>
>> What happens when you try to
>> clone them in either case if it isn't supported?
>
>  Gets -EOPNOTSUPP.
That actually makes more sense than what my first thought for a return 
code was (-EINVAL).

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (7 preceding siblings ...)
  2016-09-15 11:37 ` Austin S. Hemmelgarn
@ 2016-09-16  1:12 ` Dave Chinner
  2016-09-16  5:47   ` Roman Mamedov
                     ` (3 more replies)
  2016-09-16  8:49 ` David Sterba
                   ` (2 subsequent siblings)
  11 siblings, 4 replies; 66+ messages in thread
From: Dave Chinner @ 2016-09-16  1:12 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, clm, dsterba

On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
> 
> This patchset adds btrfs encryption support.
> 
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
> 
> A design write-up is coming next, however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.

Yup, that best practices say "do not roll your own encryption
infrastructure".

This is just my 2c worth - take it or leave it, don't other flaming.
Keep in mind that I'm not picking on btrfs here - I asked similar
hard questions about the proposed f2fs encryption implementation.
That was a "copy and snowflake" version of the ext4 encryption code -
they made changes and now we have generic code and common
functionality between ext4 and f2fs.

> Also would like to mention that a review from the security experts is due,
> which is important and I believe those review comments can be accommodated
> without major changes from here.

That's a fairly significant red flag to me - security reviews need
to be done at the design phase against specific threat models -
security review is not a code/implementation review...

The ext4 developers got this right by publishing threat models and
design docs, which got quite a lot of review and feedback before
code was published for review.

https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew

[small reorder of comments]

> As of now these patch set supports encryption on per subvolume, as
> managing properties on per subvolume is a kind of core to btrfs, which is
> easier for data center solution-ing, seamlessly persistent and easy to
> manage.

We've got dmcrypt for this sort of transparent "device level"
encryption. Do we really need another btrfs layer that re-implements
generic, robust, widely deployed, stable functionality?

What concerns me the most here is that it seems like that nothing
has been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs
reimplementation of existing robust, stable, widely deployed
infrastructure was fatally flawed and despite regular corruption
reports they were ignored for, what, 2 years? And then a /user/
spent the time to isolate the problem, and now several months later
it still hasn't been fixed. I haven't seen any developer interest in
fixing it, either.

This meets the definition of unmaintained software, and it sets a
poor example for how complex new btrfs features might be maintained
in the long term. Encryption simply cannot be treated like this - it
has to be right, and it has to be well maintained.

So what is being done differently ito the RAID5/6 review process
this time that will make the new btrfs-specific encryption
implementation solid and have minimal risk of zero-day fatal flaws?
And how are you going to guarantee that it will be adequately
maintained several years down the track?

> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.

The generic file encryption code is solid, reviewed, tested and
already widely deployed via two separate filesystems. There is a
much wider pool of developers who will maintain it, reveiw changes
and know all the traps that a new implementation might fall into.
There's a much bigger safety net here, which significantly lowers
the risk of zero-day fatal flaws in a new implementation and of
flaws in future modifications and enhancements.

Hence, IMO, the first thing to do is implement and make the generic
file encryption support solid and robust, not tack it on as an
afterthought for the magic btrfs encryption pixies to take care of.

Indeed, with the generic file encryption, btrfs may not even need
the special subvolume encryption pixies. i.e. you can effectively
implement subvolume encryption via configuration of a multi-user
encryption key for each subvolume and apply it to the subvolume tree
root at creation time. Then only users with permission to unlock the
subvolume key can access it.

Once the generic file encryption is solid and fulfils the needs of
most users, then you can look to solving the less common threat
models that neither dmcrypt or per-file encryption address. Only if
the generic code cannot be expanded to address specific threat
models should you then implement something that is unique to
btrfs....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16  1:12 ` Dave Chinner
@ 2016-09-16  5:47   ` Roman Mamedov
  2016-09-16  6:49   ` Alex Elsayed
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 66+ messages in thread
From: Roman Mamedov @ 2016-09-16  5:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Anand Jain, linux-btrfs, clm, dsterba

[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]

On Fri, 16 Sep 2016 11:12:13 +1000
Dave Chinner <david@fromorbit.com> wrote:

> > As of now these patch set supports encryption on per subvolume, as
> > managing properties on per subvolume is a kind of core to btrfs, which is
> > easier for data center solution-ing, seamlessly persistent and easy to
> > manage.
> 
> We've got dmcrypt for this sort of transparent "device level"
> encryption. Do we really need another btrfs layer that re-implements
> generic, robust, widely deployed, stable functionality?

"Btrfs subvolume-level" is far from "device-level", subvolumes are so
lightweight and dynamic that they are akin to regular directories for most
intents and purposes, not devices or partitions.

And yes I'd say (effectively) a directory-level encryption in an FS can be
useful; for example encrypting /home, but not the rest of the filesystem, or
any other scenarios where only some of the stored data needs to be encrypted,
and it's not known in advance what proportion, so it's not convenient to have
any static partition or LVM based bounds.

Currently this can be achieved with tools like encfs or ecryptfs -- so it's
those you'd want to measure Btrfs encryption against, not dmcrypt.

-- 
With respect,
Roman

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16  1:12 ` Dave Chinner
  2016-09-16  5:47   ` Roman Mamedov
@ 2016-09-16  6:49   ` Alex Elsayed
  2016-09-17  4:38     ` Zygo Blaxell
  2016-09-16 10:45   ` Brendan Hide
  2016-09-16 11:46   ` Anand Jain
  3 siblings, 1 reply; 66+ messages in thread
From: Alex Elsayed @ 2016-09-16  6:49 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 16 Sep 2016 11:12:13 +1000, Dave Chinner wrote:

> On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>> 
>> This patchset adds btrfs encryption support.
>> 
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>> 
>> A design write-up is coming next, however here below is the quick
>> example on the cli usage. Please try out, let me know if I have missed
>> something.
> 
> Yup, that best practices say "do not roll your own encryption
> infrastructure".

IMO, (some of) this _is_ substantively justified by subvolumes being a 
meaningful unit of isolation/separation. However, yes, other parts really 
should be using Things That Have Already Been Figured Out, such as AEAD.

> This is just my 2c worth - take it or leave it, don't other flaming.
> Keep in mind that I'm not picking on btrfs here - I asked similar hard
> questions about the proposed f2fs encryption implementation.
> That was a "copy and snowflake" version of the ext4 encryption code -
> they made changes and now we have generic code and common functionality
> between ext4 and f2fs.
> 
>> Also would like to mention that a review from the security experts is
>> due,
>> which is important and I believe those review comments can be
>> accommodated without major changes from here.
> 
> That's a fairly significant red flag to me - security reviews need to be
> done at the design phase against specific threat models -
> security review is not a code/implementation review...
> 
> The ext4 developers got this right by publishing threat models and
> design docs, which got quite a lot of review and feedback before code
> was published for review.
> 
> https://docs.google.com/document/
d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew
> 
> [small reorder of comments]
> 
>> As of now these patch set supports encryption on per subvolume, as
>> managing properties on per subvolume is a kind of core to btrfs, which
>> is easier for data center solution-ing, seamlessly persistent and easy
>> to manage.
> 
> We've got dmcrypt for this sort of transparent "device level"
> encryption. Do we really need another btrfs layer that re-implements
> generic, robust, widely deployed, stable functionality?

The reason we do, in four words: dmcrypt cannot use AEAD. Because it 
operates on blocks rather than extents, it is _incapable_ of providing 
the security advantages of AEAD, as those intrinsically cause ciphertext 
expansion.

> What concerns me the most here is that it seems like that nothing has
> been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs
> reimplementation of existing robust, stable, widely deployed
> infrastructure was fatally flawed and despite regular corruption reports
> they were ignored for, what, 2 years? And then a /user/
> spent the time to isolate the problem, and now several months later it
> still hasn't been fixed. I haven't seen any developer interest in fixing
> it, either.

This is, fundamentally, not comparable to dmcrypt - this is not a 
reimplementation of the same tool, but a substantively different tool 
despite a similar goal in the _specific_ domain of "composability".

Because dm-crypt cannot use AEAD, it is incapable (as in, there's a 
nonexistence proof) of meeting the IND-CCA2 security notion. By operating 
on extents, this can.

> This meets the definition of unmaintained software, and it sets a poor
> example for how complex new btrfs features might be maintained in the
> long term. Encryption simply cannot be treated like this - it has to be
> right, and it has to be well maintained.

Entirely agreed - but dmcrypt does not do the job this aims to do, so the 
conversation needs to be reframed. This is, honestly, more like 
integrating a vastly more efficient ecryptfs, keyed on a per-subvolume 
basis, than dmcrypt - and needs to be evaluated as such.

> So what is being done differently ito the RAID5/6 review process this
> time that will make the new btrfs-specific encryption implementation
> solid and have minimal risk of zero-day fatal flaws?
> And how are you going to guarantee that it will be adequately maintained
> several years down the track?
> 
>> Also yes, thanks for the emails, I hear, per file encryption and inline
>> with vfs layer is also important, which is wip among other things in
>> the list.
> 
> The generic file encryption code is solid, reviewed, tested and already
> widely deployed via two separate filesystems. There is a much wider pool
> of developers who will maintain it, reveiw changes and know all the
> traps that a new implementation might fall into.
> There's a much bigger safety net here, which significantly lowers the
> risk of zero-day fatal flaws in a new implementation and of flaws in
> future modifications and enhancements.

This, I do agree with - I think it would be a good idea to start from the 
generic file encryption code. However it's fallacious to think that 
simply applying the generic file encryption code to btrfs automatically 
dodges the trap of "rolling your own".

The main issue I see is that subvolumes as btrfs has them _do_ introduce 
novel concerns - in particular, how should snapshots interact with keying 
(and nonces)? None of the AEADs currently in the kernel are nonce-misuse 
resistant, which means that if different data is encrypted under the same 
key and nonce, things go _very_ badly wrong. With writable snapshots, I'd 
consider that a nontrivial risk.

> Hence, IMO, the first thing to do is implement and make the generic file
> encryption support solid and robust, not tack it on as an afterthought
> for the magic btrfs encryption pixies to take care of.

Sure, though that will require very carefully examining how its 
invariants hold up under writable snapshots of subvolumes with encrypted 
files.

> Indeed, with the generic file encryption, btrfs may not even need the
> special subvolume encryption pixies. i.e. you can effectively implement
> subvolume encryption via configuration of a multi-user encryption key
> for each subvolume and apply it to the subvolume tree root at creation
> time. Then only users with permission to unlock the subvolume key can
> access it.

See above; I 100% disagree with this. Any difference between the context 
crypto was designed for, and the context it is applied in, requires 
exceedingly careful validation.

> Once the generic file encryption is solid and fulfils the needs of most
> users, then you can look to solving the less common threat models that
> neither dmcrypt or per-file encryption address. Only if the generic code
> cannot be expanded to address specific threat models should you then
> implement something that is unique to btrfs....

No argument here. (One that particularly interests me is the possibility 
of encrypting the btrees themselves, using the same extent-level 
primitives used for file encryption, and thus enabling AEAD there as 
well).


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (8 preceding siblings ...)
  2016-09-16  1:12 ` Dave Chinner
@ 2016-09-16  8:49 ` David Sterba
  2016-09-16 11:56   ` Anand Jain
  2016-09-20  0:12   ` Chris Mason
  2016-09-17  6:58 ` Eric Biggers
  2016-09-19 15:15 ` Experimental btrfs encryption Theodore Ts'o
  11 siblings, 2 replies; 66+ messages in thread
From: David Sterba @ 2016-09-16  8:49 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, clm, dsterba

On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
> This patchset adds btrfs encryption support.
> 
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
> 
> A design write-up is coming next,

You're approaching it from the wrong side. The detailed specification
must come first. Don't bother to send the code again.

> however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.
> 
> Also would like to mention that a review from the security experts is due,
> which is important and I believe those review comments can be accommodated
> without major changes from here.

I disagree. Others commented on the crypto stuff, I see enough points to
address that would lead to major changes.

> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.

Implementing the recent vfs encryption in btrfs is ok, it's just feature
parity using an existing API.

And a note from me with maintainer's hat on, there are enough pending
patches and patchsets that need review, and bugs to fix, I'm not going
to spend time on something that we don't need at the moment if there are
alternatives.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15 14:24     ` Austin S. Hemmelgarn
@ 2016-09-16  8:58       ` David Sterba
  2016-09-17  2:18       ` Zygo Blaxell
  1 sibling, 0 replies; 66+ messages in thread
From: David Sterba @ 2016-09-16  8:58 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Anand Jain, linux-btrfs, clm, dsterba

On Thu, Sep 15, 2016 at 10:24:02AM -0400, Austin S. Hemmelgarn wrote:
> >> What happens when you try to
> >> clone them in either case if it isn't supported?
> >
> >  Gets -EOPNOTSUPP.
> That actually makes more sense than what my first thought for a return 
> code was (-EINVAL).

Should be -EXDEV, as we do already.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16  1:12 ` Dave Chinner
  2016-09-16  5:47   ` Roman Mamedov
  2016-09-16  6:49   ` Alex Elsayed
@ 2016-09-16 10:45   ` Brendan Hide
  2016-09-16 11:46   ` Anand Jain
  3 siblings, 0 replies; 66+ messages in thread
From: Brendan Hide @ 2016-09-16 10:45 UTC (permalink / raw)
  To: Dave Chinner, Anand Jain; +Cc: linux-btrfs, clm, dsterba

For the most part, I agree with you, especially about the strategy being 
backward - and file encryption being a viable more-easily-implementable 
direction.

However, you are doing yourself a disservice to compare btrfs' features 
as a "re-implementation" of existing tools. The existing tools cannot do 
what btrfs' devs want to implement. See below inline.

On 09/16/2016 03:12 AM, Dave Chinner wrote:
> On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>>
>> This patchset adds btrfs encryption support.
>>
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>>
>> A design write-up is coming next, however here below is the quick example
>> on the cli usage. Please try out, let me know if I have missed something.
>
> Yup, that best practices say "do not roll your own encryption
> infrastructure".

100% agreed

>
> This is just my 2c worth - take it or leave it, don't other flaming.
> Keep in mind that I'm not picking on btrfs here - I asked similar
> hard questions about the proposed f2fs encryption implementation.
> That was a "copy and snowflake" version of the ext4 encryption code -
> they made changes and now we have generic code and common
> functionality between ext4 and f2fs.
>
>> Also would like to mention that a review from the security experts is due,
>> which is important and I believe those review comments can be accommodated
>> without major changes from here.
>
> That's a fairly significant red flag to me - security reviews need
> to be done at the design phase against specific threat models -
> security review is not a code/implementation review...

Also agreed. This is a bit backward.

>
> The ext4 developers got this right by publishing threat models and
> design docs, which got quite a lot of review and feedback before
> code was published for review.
>
> https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew
>
> [small reorder of comments]
>
>> As of now these patch set supports encryption on per subvolume, as
>> managing properties on per subvolume is a kind of core to btrfs, which is
>> easier for data center solution-ing, seamlessly persistent and easy to
>> manage.
>
> We've got dmcrypt for this sort of transparent "device level"
> encryption. Do we really need another btrfs layer that re-implements ...

[snip]
Woah, woah. This is partly addressed by Roman's reply - but ...

Subvolumes:
Subvolumes are not comparable to block devices. This thinking is flawed 
at best; cancerous at worst.

As a user I tend to think of subvolumes simply as directly-mountable 
folders.

As a sysadmin I also think of them as snapshottable/send-receiveable 
folders.

And as a dev I know they're actually not that different from regular 
folders. They have some extra metadata so aren't as lightweight - but of 
course they expose very useful flexibility not available in a regular 
folder.

MD/raid comparison:
In much the same way, comparing btrfs' raid features to md directly is 
also flawed. Btrfs even re-uses code in md to implement raid-type 
features in ways that md cannot.

I can't answer for the current raid5/6 stability issues - but I am 
confident that the overall design is good, and that it will be fixed.

>
> The generic file encryption code is solid, reviewed, tested and
> already widely deployed via two separate filesystems. There is a
> much wider pool of developers who will maintain it, reveiw changes
> and know all the traps that a new implementation might fall into.
> There's a much bigger safety net here, which significantly lowers
> the risk of zero-day fatal flaws in a new implementation and of
> flaws in future modifications and enhancements.
>
> Hence, IMO, the first thing to do is implement and make the generic
> file encryption support solid and robust, not tack it on as an
> afterthought for the magic btrfs encryption pixies to take care of.
>
> Indeed, with the generic file encryption, btrfs may not even need
> the special subvolume encryption pixies. i.e. you can effectively
> implement subvolume encryption via configuration of a multi-user
> encryption key for each subvolume and apply it to the subvolume tree
> root at creation time. Then only users with permission to unlock the
> subvolume key can access it.
>
> Once the generic file encryption is solid and fulfils the needs of
> most users, then you can look to solving the less common threat
> models that neither dmcrypt or per-file encryption address. Only if
> the generic code cannot be expanded to address specific threat
> models should you then implement something that is unique to
> btrfs....
>

Agreed, this sounds like a far safer and achievable implementation process.

> Cheers,
>
> Dave.
>

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15 11:47     ` Alex Elsayed
@ 2016-09-16 11:35       ` Anand Jain
  0 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-16 11:35 UTC (permalink / raw)
  To: Alex Elsayed, linux-btrfs



On 09/15/2016 07:47 PM, Alex Elsayed wrote:
> On Thu, 15 Sep 2016 19:33:48 +0800, Anand Jain wrote:
>
>> Thanks for commenting. pls see inline below.
>>
>> On 09/15/2016 12:53 PM, Alex Elsayed wrote:
>>> On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote:
>>>
>>>> This patchset adds btrfs encryption support.
>>>>
>>>> The main objective of this series is to have bugs fixed and stability.
>>>> I have verified with fstests to confirm that there is no regression.
>>>>
>>>> A design write-up is coming next, however here below is the quick
>>>> example on the cli usage. Please try out, let me know if I have missed
>>>> something.
>>>>
>>>> Also would like to mention that a review from the security experts is
>>>> due,
>>>> which is important and I believe those review comments can be
>>>> accommodated without major changes from here.
>>>>
>>>> Also yes, thanks for the emails, I hear, per file encryption and
>>>> inline with vfs layer is also important, which is wip among other
>>>> things in the list.
>>>>
>>>> As of now these patch set supports encryption on per subvolume, as
>>>> managing properties on per subvolume is a kind of core to btrfs, which
>>>> is easier for data center solution-ing, seamlessly persistent and easy
>>>> to manage.
>>>>
>>>>
>>>> Steps:
>>>> -----
>>>>
>>>> Make sure following kernel TFMs are compiled in.
>>>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>>>> name         : ctr(aes)
>>>> name         : cbc(aes)
>>>
>>> First problem: These are purely encryption algorithms, rather than AE
>>> (Authenticated Encryption) or AEAD (Authenticated Encryption with
>>> Associated Data). As a result, they are necessarily vulnerable to
>>> adaptive chosen-ciphertext attacks, and CBC has historically had other
>>> issues. I highly recommend using a well-reviewed AE or AEAD mode, such
>>> as AES-GCM (as ecryptfs does), as long as the code can handle the
>>> ciphertext being longer than the plaintext.
>>>
>>> If it _cannot_ handle the ciphertext being longer than the plaintext,
>>> please consider that a very serious red flag: It means that you cannot
>>> provide better security than block-level encryption, which greatly
>>> reduces the benefit of filesystem-integrated encryption. Being at the
>>> extent level _should_ permit using AEAD - if it does not, something is
>>> wrong.
>>>
>>> If at all possible, I'd suggest _only_ permitting AEAD cipher modes to
>>> be used.
>>>
>>> Anyway, even for block-level encryption, CTR and CBC have been
>>> considered obsolete and potentially dangerous to use in disk encryption
>>> for quite a while - current recommendations for block-level encryption
>>> are to use either a narrow-block tweakable cipher mode (such as XTS),
>>> or a wide- block one (such as EME or CMC), with the latter providing
>>> slightly better security, but worse performance.
>>
>>    Yes. CTR should be changed, so I have kept it as a cli option. And
>>    with the current internal design, hope we can plugin more algorithms
>>    as suggested/if-its-outdated and yes code can handle (or with a
>>    little tweak) bigger ciphertext (than plaintext) as well.
>>
>>    encryption + keyhash (as below) + Btrfs-data-checksum provides
>>    similar to AE,  right ?
>
> No, it does not provide anything remotely similar to AE. AE requires
> _cryptographic_ authentication of the data. Not only is a CRC (as Btrfs
> uses for the data checksum) not enough, a _cryptographic hash_ (such as
> SHA256) isn't even enough. A MAC (message authentication code) is
> necessary.
>
> Moreover, combining an encryption algorithm and a MAC is very easy to get
> wrong, in ways that absolutely ruin security - as an example, see the
> Vaudenay/Lucky13 padding oracle attacks on TLS.
>
> In order for this to be secure, you need to use a secure encryption
> system that also authenticates the data in a cryptographically secure
> manner. Certain schemes are well-studied and believed to be secure - AES-
> GCM and ChaCha20-Poly1305 are common and well-regarded, and there's a
> generic security reduction for Encrypt-then-MAC constructions (using CTR
> together with HMAC in such a construction is generally acceptable).
>
> The Btrfs data checksum is wholly inadequate, and the keyhash is a non-
> sequitur - it prevents accidentally opening the subvolume with the wrong
> key, but neither it (nor the btrfs data checksum, which is a CRC rather
> than a cryptographic MAC) protect adequately against malicious corruption
> of the ciphertext.
>
> I'd suggest pulling in Herbert Xu, as he'd likely be able to tell you
> what of the Crypto API is actually sane to use for this.


  As mentioned 'inline with vfs layer' I mean to say to use
  fs/crypto KPIs. Which I haven't seen what parts of the code
  was made as generic KPIs from ext4. If that's solving the
  problem, then it would here as well.


>>>> Create encrypted subvolume.
>>>> # btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1'
>>>> Passphrase:
>>>> Again passphrase:
>>>
>>> I presume the command first creates a key, then creates a subvolume
>>> referencing that key? If so, that seems sensible.
>>
>>   Hmm I didn't get the why part, any help ? (this doesn't encrypt
>>   metadata part).
>
> Basically, if your tool merely sets up an entry in the kernel keyring,
> then calls the subvolume creation interface (passing in the key ID), then
> it can be composed with more advanced tooling that generates the key in a
> different manner.
>
> If, instead, you call the subvolume creation API with a flag saying
> "please also create a key", then it does not compose and is inflexible.
>
> That then becomes an obstacle to later extensions, such as trusted &
> encrypted keys.

   Yes key creation and subvol create are separate and independent.

Thanks, Anand


>>>> A key is created and its hash is updated into the subvolume item,
>>>> and then added to the system keyctl.
>>>> # btrfs su show /btrfs/e1 | egrep -i encrypt
>>>> 	Encryption: 		ctr(aes)@btrfs:75197c8e (594790215)
>>>>
>>>> # keyctl show 594790215 Keyring
>>>>  594790215 --alsw-v      0     0  logon: btrfs:75197c8e
>>>
>>> That's entirely reasonable, though you may want to support "trusted and
>>> encrypted keys" (Documentation/security/keys-trusted-encrypted.txt)
>>
>>    Yes. that's in the list.
>>
>
> Okay, good to hear!
>
> <snip>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16  1:12 ` Dave Chinner
                     ` (2 preceding siblings ...)
  2016-09-16 10:45   ` Brendan Hide
@ 2016-09-16 11:46   ` Anand Jain
  3 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-16 11:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-btrfs, clm, dsterba



On 09/16/2016 09:12 AM, Dave Chinner wrote:
> On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>>
>> This patchset adds btrfs encryption support.
>>
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>>
>> A design write-up is coming next, however here below is the quick example
>> on the cli usage. Please try out, let me know if I have missed something.
>
> Yup, that best practices say "do not roll your own encryption
> infrastructure".
>
> This is just my 2c worth - take it or leave it, don't other flaming.
> Keep in mind that I'm not picking on btrfs here - I asked similar
> hard questions about the proposed f2fs encryption implementation.
> That was a "copy and snowflake" version of the ext4 encryption code -
> they made changes and now we have generic code and common
> functionality between ext4 and f2fs.
>
>> Also would like to mention that a review from the security experts is due,
>> which is important and I believe those review comments can be accommodated
>> without major changes from here.
>
> That's a fairly significant red flag to me - security reviews need
> to be done at the design phase against specific threat models -
> security review is not a code/implementation review...
>
> The ext4 developers got this right by publishing threat models and
> design docs, which got quite a lot of review and feedback before
> code was published for review.
>
> https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew


  As mentioned 'inline with vfs layer' I mean to say to use
  fs/crypto KPIs. Which I haven't seen what parts of the code
  from ext4 was made as generic KPIs. If that's getting stuff
  correct in the encryption related, I think it would here as well.

  Internal to btrfs - I had challenges to get the extents encoding
  done properly without bailout, and the test plan. Which I think
  is addressed here in this code.


Thanks, Anand


> [small reorder of comments]
>
>> As of now these patch set supports encryption on per subvolume, as
>> managing properties on per subvolume is a kind of core to btrfs, which is
>> easier for data center solution-ing, seamlessly persistent and easy to
>> manage.
>
> We've got dmcrypt for this sort of transparent "device level"
> encryption. Do we really need another btrfs layer that re-implements
> generic, robust, widely deployed, stable functionality?
>
> What concerns me the most here is that it seems like that nothing
> has been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs
> reimplementation of existing robust, stable, widely deployed
> infrastructure was fatally flawed and despite regular corruption
> reports they were ignored for, what, 2 years? And then a /user/
> spent the time to isolate the problem, and now several months later
> it still hasn't been fixed. I haven't seen any developer interest in
> fixing it, either.
>
> This meets the definition of unmaintained software, and it sets a
> poor example for how complex new btrfs features might be maintained
> in the long term. Encryption simply cannot be treated like this - it
> has to be right, and it has to be well maintained.
>
> So what is being done differently ito the RAID5/6 review process
> this time that will make the new btrfs-specific encryption
> implementation solid and have minimal risk of zero-day fatal flaws?
> And how are you going to guarantee that it will be adequately
> maintained several years down the track?
>
>> Also yes, thanks for the emails, I hear, per file encryption and inline
>> with vfs layer is also important, which is wip among other things in the
>> list.
>
> The generic file encryption code is solid, reviewed, tested and
> already widely deployed via two separate filesystems. There is a
> much wider pool of developers who will maintain it, reveiw changes
> and know all the traps that a new implementation might fall into.
> There's a much bigger safety net here, which significantly lowers
> the risk of zero-day fatal flaws in a new implementation and of
> flaws in future modifications and enhancements.
>
> Hence, IMO, the first thing to do is implement and make the generic
> file encryption support solid and robust, not tack it on as an
> afterthought for the magic btrfs encryption pixies to take care of.
>
> Indeed, with the generic file encryption, btrfs may not even need
> the special subvolume encryption pixies. i.e. you can effectively
> implement subvolume encryption via configuration of a multi-user
> encryption key for each subvolume and apply it to the subvolume tree
> root at creation time. Then only users with permission to unlock the
> subvolume key can access it.
>
> Once the generic file encryption is solid and fulfils the needs of
> most users, then you can look to solving the less common threat
> models that neither dmcrypt or per-file encryption address. Only if
> the generic code cannot be expanded to address specific threat
> models should you then implement something that is unique to
> btrfs....
>
> Cheers,
>
> Dave.
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16  8:49 ` David Sterba
@ 2016-09-16 11:56   ` Anand Jain
  2016-09-17 20:35     ` David Sterba
  2016-09-20  0:12   ` Chris Mason
  1 sibling, 1 reply; 66+ messages in thread
From: Anand Jain @ 2016-09-16 11:56 UTC (permalink / raw)
  To: linux-btrfs, clm, dsterba



>> however here below is the quick example
>> on the cli usage. Please try out, let me know if I have missed something.
>>
>> Also would like to mention that a review from the security experts is due,
>> which is important and I believe those review comments can be accommodated
>> without major changes from here.
>
> I disagree. Others commented on the crypto stuff, I see enough points to
> address that would lead to major changes.
>
>> Also yes, thanks for the emails, I hear, per file encryption and inline
>> with vfs layer is also important, which is wip among other things in the
>> list.
>
> Implementing the recent vfs encryption in btrfs is ok, it's just feature
> parity using an existing API.


  As mentioned 'inline with vfs layer' I mean to say to use
  fs/crypto KPIs. Which I haven't seen what parts of the code
  from ext4 was made as generic KPIs. If that's getting stuff
  correct in the encryption related, I think it would here as well.

  Internal to btrfs - I had challenges to get the extents encoding
  done properly without bailout, and the test plan. Which I think
  is addressed here in this code. as mentioned.



> And a note from me with maintainer's hat on, there are enough pending
> patches and patchsets that need review, and bugs to fix, I'm not going
> to spend time on something that we don't need at the moment if there are
> alternatives.

  Honestly I agree. I even suggested but I had no choice.


PS:
  Pls, feel free to flame on the (raid) patches if its not correct,
  because its rather more productive than no reply.


Thanks, Anand

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-15 14:24     ` Austin S. Hemmelgarn
  2016-09-16  8:58       ` David Sterba
@ 2016-09-17  2:18       ` Zygo Blaxell
  1 sibling, 0 replies; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-17  2:18 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Anand Jain, linux-btrfs, clm, dsterba

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

On Thu, Sep 15, 2016 at 10:24:02AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-09-15 10:06, Anand Jain wrote:
> >>How does this handle cloning of extents?  Can extents be cloned across
> >>subvolume boundaries when one of the subvolumes is encrypted?
> >
> > Yes only if both the subvol keys match.
> OK, that makes sense.
> >
> >>Can they
> >>be cloned within an encrypted subvolume?
> >
> > Yes. That's things as usual.
> Glad to see that that still works.  Most people I know who do batch
> deduplication do so within subvolumes but not across them, so that still
> working with encrypted subvolumes is a good thing.

I do continual filesystem-wide deduplication across subvolumes, but I
don't think this is a problem.

There are already a number of conditions when IOC_FILE_EXTENT_SAME might
fail and deduplicators must tolerate those failures.  Cross-subvol dedup
has to loop over all duplicate block references (including those in
other subvols) until all references to one of the blocks are eliminated.
So dedup should still work by sheer brute force, banging extents together
until they stick, but it would be noisy and slower if it was not aware
of encrypted subvols.

If there's a way to look at the subvolume properties and figure out
whether the extents are clonable (e.g. equal key IDs == clonable) then
it should be easy to avoid submitting FILE_EXTENT_SAME extent pairs
belonging to incompatibly encrypted subvols.  They can also be stored in
separate DDT entries (e.g. by extending the hash field) so that blocks
from incompatibly encrypted subvols won't have matching extended hashes.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16  6:49   ` Alex Elsayed
@ 2016-09-17  4:38     ` Zygo Blaxell
  2016-09-17  6:37       ` Alex Elsayed
  2016-09-17 18:45       ` David Sterba
  0 siblings, 2 replies; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-17  4:38 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3558 bytes --]

On Fri, Sep 16, 2016 at 06:49:53AM +0000, Alex Elsayed wrote:
> The main issue I see is that subvolumes as btrfs has them _do_ introduce 
> novel concerns - in particular, how should snapshots interact with keying 
> (and nonces)? None of the AEADs currently in the kernel are nonce-misuse 
> resistant, which means that if different data is encrypted under the same 
> key and nonce, things go _very_ badly wrong. With writable snapshots, I'd 
> consider that a nontrivial risk.

Snapshots should copy subvolume keys (or key UUIDs, since the keys aren't
stored in the filesystem), i.e. an ioctl could say "create a new subvol
'foo' with the same key as existing subvol 'bar'".  This could also
handle nested subvols (child copies key of parent) if the nested
subvols weren't created with their own separate keys.  For snapshots,
we wouldn't even ask--the snapshot and its origin subvol would share a
key unconditionally. (*)

I don't see how snapshots could work, writable or otherwise, without
separating the key identity from the subvol identity and having a
many-to-one relationship between subvols and keys.  The extents in each
subvol would be shared, and they'd be encrypted with a single secret,
so there's not really another way to do this.

If the key is immutable (which it probably is, given that it's used to
encrypt at the extent level, and extents are (mostly) immutable) then just
giving each subvol a copy of the key ID is sufficient.

(*) OK, we could ask, but if the answer was "no, please do not use the
origin subvol's key", then btrfs would return EINVAL and not create
the snapshot, since there would be no way to read any data contained
within it without the key.

> > Indeed, with the generic file encryption, btrfs may not even need the
> > special subvolume encryption pixies. i.e. you can effectively implement
> > subvolume encryption via configuration of a multi-user encryption key
> > for each subvolume and apply it to the subvolume tree root at creation
> > time. Then only users with permission to unlock the subvolume key can
> > access it.

Life is pretty easy when we're only encrypting data extents.

Encrypted subvol trees cause quite a few problems for btrfs when it needs
to relocate extents (e.g. to shrink a filesystem or change RAID profile)
or validate data integrity.  Ideally it would still be able to do these
operations without decrypting the data; otherwise, there are bad cases,
e.g. if a disk fails, all of the subvolumes would have to be unlocked
in order to replace a disk.

Still, there could be a half way point here.  If btrfs could tie
block groups to subvol encryption keys, it could arrange for all of
the extents in a metadata block group to use the same encryption key.
Then it would be possible to relocate the entire metadata block group
without decrypting its contents.  It would only be necessary to copy
the block group's encrypted data, then update the virtual-to-physical
address mappings in the chunk tree.  Something would have to be done
about checksums during the copy but that's a larger question (are there
two sets of checksums, one authenticated for the encrypted data, and
the crc32 check for device-level data corruption?).

There's also a nasty problem with the extent tree--there's only one per
filesystem, it's shared between all subvols and block groups, and every
extent in that tree has back references to the (possibly encrypted) subvol
trees.  I'll leave that problem as an exercise for other readers.  ;)


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17  4:38     ` Zygo Blaxell
@ 2016-09-17  6:37       ` Alex Elsayed
  2016-09-19 18:08         ` Zygo Blaxell
  2016-09-17 18:45       ` David Sterba
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Elsayed @ 2016-09-17  6:37 UTC (permalink / raw)
  To: linux-btrfs

On Sat, 17 Sep 2016 00:38:30 -0400, Zygo Blaxell wrote:

> On Fri, Sep 16, 2016 at 06:49:53AM +0000, Alex Elsayed wrote:
>> The main issue I see is that subvolumes as btrfs has them _do_
>> introduce novel concerns - in particular, how should snapshots interact
>> with keying (and nonces)? None of the AEADs currently in the kernel are
>> nonce-misuse resistant, which means that if different data is encrypted
>> under the same key and nonce, things go _very_ badly wrong. With
>> writable snapshots, I'd consider that a nontrivial risk.
> 
> Snapshots should copy subvolume keys (or key UUIDs, since the keys
> aren't stored in the filesystem), i.e. an ioctl could say "create a new
> subvol 'foo' with the same key as existing subvol 'bar'".  This could
> also handle nested subvols (child copies key of parent) if the nested
> subvols weren't created with their own separate keys.  For snapshots,
> we wouldn't even ask--the snapshot and its origin subvol would share a
> key unconditionally. (*)

I'll quote the LWN article on the way EXT4 (and VFS) encryption works 
(https://lwn.net/Articles/639427/):

> Encryption in ext4 is a per-directory-tree affair. One starts by
> setting an encryption policy (using an ioctl() call) for a given
> directory, which must be empty at the time; that policy includes a
> master key used for all files and directories stored below the target
> directory. Each individual file is encrypted with its own key, which is
> derived from the master key and a per-file random nonce value (which is
> stored in an extended attribute attached to the file's inode). File
> names and symbolic links are also encrypted.

So there isn't quite a "subvol key" in the VFS approach - each directory 
has a key, and there are derived keys for the entries below it. (I'll 
note that this framing does not address shared extents _at all_, and 
would love to have clarification on that).

> I don't see how snapshots could work, writable or otherwise, without
> separating the key identity from the subvol identity and having a
> many-to-one relationship between subvols and keys.  The extents in each
> subvol would be shared, and they'd be encrypted with a single secret,
> so there's not really another way to do this.

That's not the issue. The issue is that, assuming the key stays the same, 
then a user could quite possibly create a snapshot, write into both the 
original and the snapshot, causing encryption to occur twice with the 
same key, same nonce, and different data.

This invalidates both the integrity and confidentialiyy of AES-GCM (and 
any other AEAD that is not nonce-misuse resistant), allowing them to 
effectively mount offline decryption attacks against things they could 
not ordinarily read, or replace files without being caught.

> If the key is immutable (which it probably is, given that it's used to
> encrypt at the extent level, and extents are (mostly) immutable) then
> just giving each subvol a copy of the key ID is sufficient.

Sufficient for reading data, yes. Sufficient for really nasty nonce-reuse 
attacks, also yes.

> (*) OK, we could ask, but if the answer was "no, please do not use the
> origin subvol's key", then btrfs would return EINVAL and not create the
> snapshot, since there would be no way to read any data contained within
> it without the key.

It _might_ be possible to get away with only allowing RO snapshots for 
encrypted subvols, but this really requires much more careful thought 
than it's getting.

>> > Indeed, with the generic file encryption, btrfs may not even need the
>> > special subvolume encryption pixies. i.e. you can effectively
>> > implement subvolume encryption via configuration of a multi-user
>> > encryption key for each subvolume and apply it to the subvolume tree
>> > root at creation time. Then only users with permission to unlock the
>> > subvolume key can access it.
> 
> Life is pretty easy when we're only encrypting data extents.

Agreed.

> Encrypted subvol trees cause quite a few problems for btrfs when it
> needs to relocate extents (e.g. to shrink a filesystem or change RAID
> profile) or validate data integrity.  Ideally it would still be able to
> do these operations without decrypting the data; otherwise, there are
> bad cases, e.g. if a disk fails, all of the subvolumes would have to be
> unlocked in order to replace a disk.

Sure; there are certainly caveats. At very least, the free space map 
would need to be either unencrypted, or encrypted under a "global" key 
(no worse in security properties than dmcrypt, though, with the added 
boon of AEAD). That would at least make operation with some subvolumes 
locked safe.

One could also just encrypt all of the subvolume trees under one global 
key. Encrypting _all_ the metadata in _some fashion_ is _required_ if we 
ever want dmcrypt to be unnecessary, but if that's the goal, per-
subvolume keys for that aren't needed (though they would be nice).

(If making dmcrypt unnecessary is a non-goal, then we'll always need it, 
even if we use FS encryption - meaning if we use FS encryption, we do 
twice the work for only marginal gain. This is suboptimal.)

> Still, there could be a half way point here.  If btrfs could tie block
> groups to subvol encryption keys, it could arrange for all of the
> extents in a metadata block group to use the same encryption key. Then
> it would be possible to relocate the entire metadata block group without
> decrypting its contents.  It would only be necessary to copy the block
> group's encrypted data, then update the virtual-to-physical address
> mappings in the chunk tree.  Something would have to be done about
> checksums during the copy but that's a larger question (are there two
> sets of checksums, one authenticated for the encrypted data, and the
> crc32 check for device-level data corruption?).

That is definitely a neat possibility.

> There's also a nasty problem with the extent tree--there's only one per
> filesystem, it's shared between all subvols and block groups, and every
> extent in that tree has back references to the (possibly encrypted)
> subvol trees.  I'll leave that problem as an exercise for other readers.
>  ;)

See above - a different model is in play there (it can be encrypted under 
a global key), and it may be possible to treat backrefs to encrypted 
subvols as "immutable" - i.e. they count towards refcount, and cannot be 
walked, and so the extent cannot be (re)moved unless all backrefs are to 
unlocked subvols.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (9 preceding siblings ...)
  2016-09-16  8:49 ` David Sterba
@ 2016-09-17  6:58 ` Eric Biggers
  2016-09-17  7:13   ` Alex Elsayed
  2016-09-17 16:12   ` Anand Jain
  2016-09-19 15:15 ` Experimental btrfs encryption Theodore Ts'o
  11 siblings, 2 replies; 66+ messages in thread
From: Eric Biggers @ 2016-09-17  6:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: anand.jain, eternaleye

On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>
> This patchset adds btrfs encryption support.
>

Hi Anand,

I'm part of a team that will be maintaining and improving ext4 encryption.
Because f2fs now shares much of the code, it will benefit from the ext4
encryption work too.  It would be really nice if btrfs could be in the same
boat, since that would avoid redundant work and help prevent bugs and design
flaws.  So I strongly suggest that you explore how btrfs encryption might reuse
the existing code (and maybe extend it if there is very good reason to).

There also needs to be a proper design document for btrfs encryption.  This is
especially true if for some (very, very good) reason you can't reuse the
infrastructure from ext4 and f2fs.  There also could be unique challenges for
btrfs such as encryption keys and/or IVs being reused in reflinked extents.

You will also not get a proper review without a proper design document which
details things like the threat model and the security properties provided.  But
I did take a short look at the code anyway because I was interested.  The
results were not pretty.  As far as I can see the current proposal is fatally
flawed as it does not provide confidentiality of file contents against a basic
attack.

The main two flaws are:

1. Use of CTR mode of operation
2. Reuse of same (key, IV) pair for all pages of a given inode

CTR mode is well known to fall over completely when used with repeated (key, IV)
pairs.  This makes the encryption nearly worthless.  In more detail: suppose I
am an attacker who has access to a file's ciphertext.  By the definition of CTR
mode, each ciphertext block is given by: C = E(ctr) XOR P, where C and P denote
the ciphertext and plaintext blocks respectively, E denotes encryption with the
block cipher using the secret key, and 'ctr' denotes the counter value.  Due to
flaw (2) the ctr values repeat every page.  Consequently, if I can correctly
guess the plaintext P1 of *any* page in the file and I want to know the
plaintext P2 of some other page, I can trivially compute P2 = P1 XOR C1 XOR C2.
No secret key needed.

Essentially: if there is any part of a file which is easily guessable, such as
a header or even a zeroed region, then the whole file is revealed.

The solution is to use a less brittle mode of operation such as XTS in
combination with per-page IVs (or "tweaks") and derived per-file keys.  This is
already done in ext4 and f2fs, where the per-page IV is just the page offset.
Note that per-file keys were needed to prevent the same (key, IV) pair from
being used in multiple places.  So if you could reuse the fs/crypto
infrastructure, you could take advantage of the fact that this problem was
already solved.

Note: even better would be an authenticated encryption mode.  That isn't yet
done by ext4 or f2fs --- I think because there wasn't a good place to store a
per-page authentication tag.  It would be interesting to know whether this would
be possible for btrfs.

I also noticed that unlike ext4 and f2fs, filenames and symlinks are not being
encrypted in btrfs.  I know it may seem somewhat ad-hoc that filenames are
encrypted but not other metadata, but apparently filenames were considered
quite important and a lot of work went into making it possible to encrypt them
in ext4.

(Apologies if I misunderstood anything.  The proposal would be easier to review
with a design document, as mentioned.)

Hope this is helpful,

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17  6:58 ` Eric Biggers
@ 2016-09-17  7:13   ` Alex Elsayed
  2016-09-19 18:57     ` Zygo Blaxell
  2016-09-17 16:12   ` Anand Jain
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Elsayed @ 2016-09-17  7:13 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 16 Sep 2016 23:58:31 -0700, Eric Biggers wrote:

> On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>>
>> This patchset adds btrfs encryption support.
>>
>>
> Hi Anand,

<snip>

> Note: even better would be an authenticated encryption mode.  That isn't
> yet done by ext4 or f2fs --- I think because there wasn't a good place
> to store a per-page authentication tag.  It would be interesting to know
> whether this would be possible for btrfs.

IMO, this is already a flawed framing - in particular, if encrypting at 
the extent level, one _should not_ be encrypting (or authenticating) 
individual pages. The meaningful unit is the extent, and encrypting at 
page granularity puts you right back where dmcrypt is: dealing with fixed-
size space, and needing to find somewhere else to put the auth tag.

This is not a good place to be, and I strongly suspect it motivated 
choosing XTS in the first place - something I feel is an _error_ in the 
long run, and a dangerous one. (IMO, anything _but_ AEAD should be 
forbidden in FS-level encryption.)

In a nonce-misuse-resistent AEAD, there _is_ no auth tag: There's some 
amount of inherent ciphertext expansion, and the ciphertext _cannot be 
decrypted at all_ unless all of it is present. In essence, a built-in all-
or-nothing transform.

You could, potentially, chop off part of that and store it elsewhere, but 
now you're dealing with significant added complexity, for absolutely zero 
gain.

If you're _not_ using a nonce-misuse-resistant AEAD, it's even worse: 
keeping the tag out-of-band makes it far too easy to fail to verify it, 
or verify it only after decrypting the ciphertext to plaintext. Bluntly: 
that is an immediate security vulnerability.

tl;dr: Don't encrypt pages, encrypt extents. They grow a little for the 
auth tag, and that's fine.

Btrfs already handles needing to read the full extent in order to get a 
page out of it with compression, anyway.

<snip>



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17  6:58 ` Eric Biggers
  2016-09-17  7:13   ` Alex Elsayed
@ 2016-09-17 16:12   ` Anand Jain
  2016-09-17 18:57     ` Chris Murphy
  1 sibling, 1 reply; 66+ messages in thread
From: Anand Jain @ 2016-09-17 16:12 UTC (permalink / raw)
  To: Eric Biggers, linux-btrfs; +Cc: eternaleye



Hi Eric,

  Thanks for the constructive feedback, pls see inline below.


On 09/17/2016 02:58 PM, Eric Biggers wrote:
> On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>>
>> This patchset adds btrfs encryption support.
>>
>
> Hi Anand,
>
> I'm part of a team that will be maintaining and improving ext4 encryption.
> Because f2fs now shares much of the code, it will benefit from the ext4
> encryption work too.  It would be really nice if btrfs could be in the same
> boat, since that would avoid redundant work and help prevent bugs and design
> flaws.  So I strongly suggest that you explore how btrfs encryption might reuse
> the existing code (and maybe extend it if there is very good reason to).

In fact my first attempt was using f2fs/ext4, found its too complicated,
further couldn't stable it, so re-wrote completely to a version where I
won't worry too much on the cipher mode _at the moment_, so this version
came with a caveat as mentioned.
Now looking to integrate with fs/crypto, however have the following 
concerns,

fs/crypto:
.
   fscrypt_context:master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
   should it go to the disk ? its just a descriptor which might
   change at the user end and still may contain the right key
   in the payload.

   btrfs keeps it only in-memory and key hash goes to the disk.
   Further in the long we need an integration with key management
   system as well.

.
   ext4/f2fs allows per file keys, when we are talking about large
   data center FS for cloud services, it would need key services
   to scale along with the FS. And there will be a lot of resources
   which is allocated but not used.

.
  system keyring already defines struct user_key_payload
  probably we should have used it instead of

  71 struct fscrypt_key {
  72         u32 mode;
  73         u8 raw[FS_MAX_KEY_SIZE];
  74         u32 size;
  75 } __packed;


.
  Some of key derivation functions should have been part of
  crypto library rather.

.
  page->index based IV won't suite btrfs, so it uses a random, but yes
  it needs crypto hardened. I am kind of opinion that, for a real need
  of retrievable random number we are replacing it with sector(truecrypt)
  /page-offset number, I think we aren't addressing problem in a right
  way ? If that that's the best solution we could achieve, yet I am not
  sure how to solve the need of FS independent decrypt ? I am yet to
  look at gpg.

.
  Yes needs AEAD. But at the same time we need
    - MAC to be separated from the ciphertext and ciphertext-size
      == plaintext-size.
    To make sure for sync,dio we do create extents and IO which matches
    with the application IO. So that performance tuning will be things
    as usual.


> There also needs to be a proper design document for btrfs encryption.  This is
> especially true if for some (very, very good) reason you can't reuse the
> infrastructure from ext4 and f2fs.  There also could be unique challenges for
> btrfs such as encryption keys and/or IVs being reused in reflinked extents.
>
> You will also not get a proper review without a proper design document which
> details things like the threat model and the security properties provided.  But
> I did take a short look at the code anyway because I was interested.

  Thank You !!  I should have sent the code when doc is ready rather.
  Sorry about that.


>  The
> results were not pretty.  As far as I can see the current proposal is fatally
> flawed as it does not provide confidentiality of file contents against a basic
> attack.
>
> The main two flaws are:
>
> 1. Use of CTR mode of operation
> 2. Reuse of same (key, IV) pair for all pages of a given inode
>
> CTR mode is well known to fall over completely when used with repeated (key, IV)
> pairs.  This makes the encryption nearly worthless.  In more detail: suppose I
> am an attacker who has access to a file's ciphertext.  By the definition of CTR
> mode, each ciphertext block is given by: C = E(ctr) XOR P, where C and P denote
> the ciphertext and plaintext blocks respectively, E denotes encryption with the
> block cipher using the secret key, and 'ctr' denotes the counter value.  Due to
> flaw (2) the ctr values repeat every page.  Consequently, if I can correctly
> guess the plaintext P1 of *any* page in the file and I want to know the
> plaintext P2 of some other page, I can trivially compute P2 = P1 XOR C1 XOR C2.
> No secret key needed.
>
> Essentially: if there is any part of a file which is easily guessable, such as
> a header or even a zeroed region, then the whole file is revealed.

  Yes this will be fixed. No TFM is claimed to be btrfs default
  as of now.

> The solution is to use a less brittle mode of operation such as XTS in
> combination with per-page IVs (or "tweaks") and derived per-file keys.  This is
> already done in ext4 and f2fs, where the per-page IV is just the page offset.
> Note that per-file keys were needed to prevent the same (key, IV) pair from
> being used in multiple places.  So if you could reuse the fs/crypto
> infrastructure, you could take advantage of the fact that this problem was
> already solved.

> Note: even better would be an authenticated encryption mode.  That isn't yet
> done by ext4 or f2fs --- I think because there wasn't a good place to store a
> per-page authentication tag.  It would be interesting to know whether this would
> be possible for btrfs.

  Yes. That's possible in btrfs. Encoders integration was planned quite
  early in the design. There are reserved spaces in the extent items.
  I should attempt straight to GCM AEAD then.


> I also noticed that unlike ext4 and f2fs, filenames and symlinks are not being
> encrypted in btrfs.  I know it may seem somewhat ad-hoc that filenames are
> encrypted but not other metadata, but apparently filenames were considered
> quite important and a lot of work went into making it possible to encrypt them
> in ext4.

  Can we use inode number as file name when there is no key ?
  And save real file name as encrypted attribute, a bit neater
  though.


> (Apologies if I misunderstood anything.  The proposal would be easier to review
> with a design document, as mentioned.)

  Nope your understanding is correct. Apologies I should have
  sent code later when doc is ready.

Thanks, Anand

> Hope this is helpful,
>
> Eric
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17  4:38     ` Zygo Blaxell
  2016-09-17  6:37       ` Alex Elsayed
@ 2016-09-17 18:45       ` David Sterba
  2016-09-20 14:26         ` Anand Jain
  1 sibling, 1 reply; 66+ messages in thread
From: David Sterba @ 2016-09-17 18:45 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Alex Elsayed, linux-btrfs

On Sat, Sep 17, 2016 at 12:38:30AM -0400, Zygo Blaxell wrote:
> There's also a nasty problem with the extent tree--there's only one per
> filesystem, it's shared between all subvols and block groups, and every
> extent in that tree has back references to the (possibly encrypted) subvol
> trees.  I'll leave that problem as an exercise for other readers.  ;)

A design point that I'm not mentioning for the first time: there would
be per-subvolume group extent trees, ie. a set of subvolumes with
attached extent tree where similar to what we have now. So, encrypted
and unencrypted extent metadata will never be mixed.
(the crypto key questions are not addressed here)

This hasn't been implemented but I'm making sure this will be possible
when somebody mentions changes to the extent tree or blockgroup reworks
(to actually solve other problems).

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17 16:12   ` Anand Jain
@ 2016-09-17 18:57     ` Chris Murphy
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Murphy @ 2016-09-17 18:57 UTC (permalink / raw)
  To: Anand Jain; +Cc: Eric Biggers, Btrfs BTRFS, eternaleye

On Sat, Sep 17, 2016 at 10:12 AM, Anand Jain <anand.jain@oracle.com> wrote:

>   btrfs keeps it only in-memory and key hash goes to the disk.
>   Further in the long we need an integration with key management
>   system as well.

Maybe LUKS2 is usable for this part, and still adaptable since it's
not finished yet? It looks to me like essentially unlimited keyslots
compared to the current 8. You don't really care about the dm-crypt
part of it, but the key management part of it, perhaps.

Both the original and new subvolumes initially share one DEK that go
with the shared encrypted extents, but upon snapshot happening the new
extents in each subvolume need their own DEK. Policy wise these DEKs
can be wrapped in the same or separate passphrases or KEKs, as there
could be hundreds or thousands of DEKs that apply to the many possible
shared encrypted extents in a subvolume. If that's true, then it's an
explosive number of keys per subvolume potentially. It doesn't depend
on space as much as it depends on fs lifetime

Otherwise I don't see how this is different than using a single DEK
across all company hard drives. Compromise one, you've compromised
them all.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16 11:56   ` Anand Jain
@ 2016-09-17 20:35     ` David Sterba
  2016-09-18  8:34       ` RAID1 availability issue[2], Hot-spare and auto-replace Anand Jain
  2016-09-18  9:54       ` [RFC] Preliminary BTRFS Encryption Anand Jain
  0 siblings, 2 replies; 66+ messages in thread
From: David Sterba @ 2016-09-17 20:35 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, clm, dsterba

On Fri, Sep 16, 2016 at 07:56:02PM +0800, Anand Jain wrote:
> 
> 
> >> however here below is the quick example
> >> on the cli usage. Please try out, let me know if I have missed something.
> >>
> >> Also would like to mention that a review from the security experts is due,
> >> which is important and I believe those review comments can be accommodated
> >> without major changes from here.
> >
> > I disagree. Others commented on the crypto stuff, I see enough points to
> > address that would lead to major changes.
> >
> >> Also yes, thanks for the emails, I hear, per file encryption and inline
> >> with vfs layer is also important, which is wip among other things in the
> >> list.
> >
> > Implementing the recent vfs encryption in btrfs is ok, it's just feature
> > parity using an existing API.
> 
> 
>   As mentioned 'inline with vfs layer' I mean to say to use
>   fs/crypto KPIs. Which I haven't seen what parts of the code
>   from ext4 was made as generic KPIs. If that's getting stuff
>   correct in the encryption related, I think it would here as well.

So you were not talking about the 'fs/crypto' that was merged in 4.6?

>   Internal to btrfs - I had challenges to get the extents encoding
>   done properly without bailout, and the test plan. Which I think
>   is addressed here in this code. as mentioned.

Sorry, I don't understand what you mean.

> > And a note from me with maintainer's hat on, there are enough pending
> > patches and patchsets that need review, and bugs to fix, I'm not going
> > to spend time on something that we don't need at the moment if there are
> > alternatives.
> 
>   Honestly I agree. I even suggested but I had no choice.
> 
> 
> PS:
>   Pls, feel free to flame on the (raid) patches if its not correct,
>   because its rather more productive than no reply.

If it's the hot-spare and auto-replace feature, I've expressed my stance
in http://marc.info/?l=linux-btrfs&m=146252575330106, there hasn't
been any change. IMO the hot-spare feature makes most sense with the
raid56, which is stuck where it is, so we need to get it working first.

Lack of reply usually means lack of time (which I would not spend on
flaming but evaluation and review).

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RAID1 availability issue[2], Hot-spare and auto-replace
  2016-09-17 20:35     ` David Sterba
@ 2016-09-18  8:34       ` Anand Jain
  2016-09-18 17:28         ` Chris Murphy
  2016-09-18  9:54       ` [RFC] Preliminary BTRFS Encryption Anand Jain
  1 sibling, 1 reply; 66+ messages in thread
From: Anand Jain @ 2016-09-18  8:34 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, clm


(updated the subject, was [1])

> IMO the hot-spare feature makes most sense with the raid56,

   Why. ?

> which is stuck where it is, so we need to get it working first.


   We need at least one RAID which does not have the availability
   issue. We could achieve that with raid1, there are patches
   which needs maintainer time.


-Anand

[1]
Re: [RFC] Preliminary BTRFS Encryption

[2]
References:
btrfs: Do per-chunk check for mount time check
  OR
btrfs: create degraded-RAID1 chunks
(needs review).

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17 20:35     ` David Sterba
  2016-09-18  8:34       ` RAID1 availability issue[2], Hot-spare and auto-replace Anand Jain
@ 2016-09-18  9:54       ` Anand Jain
  1 sibling, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-18  9:54 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, clm



On 09/18/2016 04:35 AM, David Sterba wrote:
> On Fri, Sep 16, 2016 at 07:56:02PM +0800, Anand Jain wrote:
>>
>>
>>>> however here below is the quick example
>>>> on the cli usage. Please try out, let me know if I have missed something.
>>>>
>>>> Also would like to mention that a review from the security experts is due,
>>>> which is important and I believe those review comments can be accommodated
>>>> without major changes from here.
>>>
>>> I disagree. Others commented on the crypto stuff, I see enough points to
>>> address that would lead to major changes.
>>>
>>>> Also yes, thanks for the emails, I hear, per file encryption and inline
>>>> with vfs layer is also important, which is wip among other things in the
>>>> list.
>>>
>>> Implementing the recent vfs encryption in btrfs is ok, it's just feature
>>> parity using an existing API.
>>
>>
>>   As mentioned 'inline with vfs layer' I mean to say to use
>>   fs/crypto KPIs. Which I haven't seen what parts of the code
>>   from ext4 was made as generic KPIs. If that's getting stuff
>>   correct in the encryption related, I think it would here as well.
>
> So you were not talking about the 'fs/crypto' that was merged in 4.6?

  Looks like I am out of sync here, looks like I miss understood,
   'Implementing the recent vfs encryption in btrfs is ok'
  I was ref to fs/crypto

>>   Internal to btrfs - I had challenges to get the extents encoding
>>   done properly without bailout, and the test plan. Which I think
>>   is addressed here in this code. as mentioned.
>
> Sorry, I don't understand what you mean.

    basically making sure all the extents are really encoded, does not
    matter which crypto (unless like in compress where extents may not
    be encoded,in some situation) and having a test plan, now the test
    plan is same as
      mount option -o 'compress=ctr(aes)' with dummykey or dummy encrypt.
    for encryption.


  Thanks for integrating  most of the patches in the ML.


-Anand

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: RAID1 availability issue[2], Hot-spare and auto-replace
  2016-09-18  8:34       ` RAID1 availability issue[2], Hot-spare and auto-replace Anand Jain
@ 2016-09-18 17:28         ` Chris Murphy
  2016-09-18 17:34           ` Chris Murphy
                             ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Chris Murphy @ 2016-09-18 17:28 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, Btrfs BTRFS, Chris Mason

On Sun, Sep 18, 2016 at 2:34 AM, Anand Jain <anand.jain@oracle.com> wrote:
>
> (updated the subject, was [1])
>
>> IMO the hot-spare feature makes most sense with the raid56,
>
>
>   Why. ?

Raid56 is not scalable, has less redundancy in most all
configurations, rebuild impacts the entire array performance, and in
the case of raid6 two drives lost means incredibly slow rebuild. All
of that adds up to more disk for raid56 to be mitigated with a hot
spare being available for immediate rebuild.

Who currently would use hot spare right now? Problem 1 is Btrfs raid10
is not scalable like other raid10 implementations (mdadm, lvm,
hardware). Problem 2 is Btrfs the raid56 parity scrub bug; and
arguably also partial stripe writes not being CoW. I think hot spare
is pointless with those two problems still being true, and the way to
mitigate them right now is a clusterfs. Hot spare doesn't mitigate
these Btrfs weaknesses.


>
>> which is stuck where it is, so we need to get it working first.
>
>
>
>   We need at least one RAID which does not have the availability
>   issue. We could achieve that with raid1, there are patches
>   which needs maintainer time.

I agree with the idea of degraded raid1 chunks. It's a nasty surprise
to realize this only once it's too late and there's data loss. That
there is a user space work around, maybe makes it less of a big deal?
But I don't think it's documented on gotchas page with the soft
conversion work around to do the rebuild properly: scrub/balance alone
is not correct.

I kinda think we need a list of priorities for multiple device stuff,
and honestly hot spare while important I think is bottom of the list.

1. multiple fs UUID dev UUID corruption problem (the cloned device problem)
2. degraded volumes new bg's are single profile (Anand's April patchset)
3. raid56 bad parity created during scrub when data strip is bad and gets fixed
4. better faulty device tolerance (no crashing)
5. raid10 scaling, needs a way for even number block devices of the
same size to get fixed mirroring so it can tolerate multiple drive
failures so long as a mirrored pair don't fail
6. raid56 partial stripe RMW need to be CoW, doesn't matter if it
slows things down, if you don't like it, use raid10
7. raid1 threaded/async reads (whatever the correct term is to read
from all raid1 drives rather than PID based)
8. better faulty device notifications
9. raid56 parity needs to be checksummed
10. hotspare


2 and 3 might seem tied. Both can result in data loss, both have user
space work arounds (undocumented); but 2 has a greater chance of
happening than 3.

4 is probably worse than 3, but 4 is much more nebulous and 3 produces
a big negative perception.

I'm sure someone could argue hotspare could get squeezed in between 4
and 5; but that's really my one bias in the list, I don't care about
hot spare. I think it's more scalable to take advantage of Btrfs
uniqueness to shrink the file system to drop the bad drive to regain
full redundancy, rather than do hot spares, this is faster, and
doesn't waste a drive that's not doing any work.

I see shrink as more scalable with hard drives than hot spares,
especially in the case of data single profile with clusterfs's: drop
the bad device and its data, autodelete the lost files, rebuild
metadata to regain complete fs redundancy,  inform the cluster of
partial data loss - boom the array is completely fixed, let the
cluster figure out what to do next. Plus each brick isn't spinning an
unused hot spare. There is in effect a hot spare *somewhere* partially
used somewhere else in a cluster fs anyway. I see hot spare as an edge
case need, especially with hard drives. It's not a general purpose
need.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: RAID1 availability issue[2], Hot-spare and auto-replace
  2016-09-18 17:28         ` Chris Murphy
@ 2016-09-18 17:34           ` Chris Murphy
  2016-09-19  2:25           ` Anand Jain
  2016-09-19 12:25           ` Austin S. Hemmelgarn
  2 siblings, 0 replies; 66+ messages in thread
From: Chris Murphy @ 2016-09-18 17:34 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sun, Sep 18, 2016 at 11:28 AM, Chris Murphy <lists@colorremedies.com> wrote:
> On Sun, Sep 18, 2016 at 2:34 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>> (updated the subject, was [1])
>>
>>> IMO the hot-spare feature makes most sense with the raid56,
>>
>>
>>   Why. ?
>
> Raid56 is not scalable, has less redundancy in most all
> configurations, rebuild impacts the entire array performance, and in
> the case of raid6 two drives lost means incredibly slow rebuild. All
> of that adds up to more disk for raid56 to be mitigated with a hot
> spare being available for immediate rebuild.

s/disk/risk


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: RAID1 availability issue[2], Hot-spare and auto-replace
  2016-09-18 17:28         ` Chris Murphy
  2016-09-18 17:34           ` Chris Murphy
@ 2016-09-19  2:25           ` Anand Jain
  2016-09-19 12:07             ` Austin S. Hemmelgarn
  2016-09-19 12:25           ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 66+ messages in thread
From: Anand Jain @ 2016-09-19  2:25 UTC (permalink / raw)
  To: Chris Murphy; +Cc: David Sterba, Btrfs BTRFS, Chris Mason


Chris Murphy,

  Thanks for writing in detail, it makes sense..

  Generally hot spare is to reduce the risk of double disk failures
  leading to the data lose at the data centers before the data is
  reconstructed again for redundancy.

On 09/19/2016 01:28 AM, Chris Murphy wrote:
> On Sun, Sep 18, 2016 at 2:34 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>> (updated the subject, was [1])
>>
>>> IMO the hot-spare feature makes most sense with the raid56,
>>
>>
>>   Why. ?
>
> Raid56 is not scalable, has less redundancy in most all
> configurations, rebuild impacts the entire array performance, and in
> the case of raid6 two drives lost means incredibly slow rebuild. All
> of that adds up to more disk for raid56 to be mitigated with a hot
> spare being available for immediate rebuild.
>
> Who currently would use hot spare right now?

  Probably you mean to say hot spare is not P1 right now, looking at
  other things to fix, I agree.  raid1 availability issue is p1.
  I do get ping-ed on it once in a while.

  I am curious what do you recommend as a btrfs vm data solution for
  the enterprise production ?

Thanks, Anand

> Problem 1 is Btrfs raid10
> is not scalable like other raid10 implementations (mdadm, lvm,
> hardware). Problem 2 is Btrfs the raid56 parity scrub bug; and
> arguably also partial stripe writes not being CoW. I think hot spare
> is pointless with those two problems still being true, and the way to
> mitigate them right now is a clusterfs. Hot spare doesn't mitigate
> these Btrfs weaknesses.
>
>
>>
>>> which is stuck where it is, so we need to get it working first.
>>
>>
>>
>>   We need at least one RAID which does not have the availability
>>   issue. We could achieve that with raid1, there are patches
>>   which needs maintainer time.
>
> I agree with the idea of degraded raid1 chunks. It's a nasty surprise
> to realize this only once it's too late and there's data loss. That
> there is a user space work around, maybe makes it less of a big deal?
> But I don't think it's documented on gotchas page with the soft
> conversion work around to do the rebuild properly: scrub/balance alone
> is not correct.
>
> I kinda think we need a list of priorities for multiple device stuff,
> and honestly hot spare while important I think is bottom of the list.
>
> 1. multiple fs UUID dev UUID corruption problem (the cloned device problem)
> 2. degraded volumes new bg's are single profile (Anand's April patchset)
> 3. raid56 bad parity created during scrub when data strip is bad and gets fixed
> 4. better faulty device tolerance (no crashing)
> 5. raid10 scaling, needs a way for even number block devices of the
> same size to get fixed mirroring so it can tolerate multiple drive
> failures so long as a mirrored pair don't fail
> 6. raid56 partial stripe RMW need to be CoW, doesn't matter if it
> slows things down, if you don't like it, use raid10
> 7. raid1 threaded/async reads (whatever the correct term is to read
> from all raid1 drives rather than PID based)
> 8. better faulty device notifications
> 9. raid56 parity needs to be checksummed
> 10. hotspare
>
>
> 2 and 3 might seem tied. Both can result in data loss, both have user
> space work arounds (undocumented); but 2 has a greater chance of
> happening than 3.
>
> 4 is probably worse than 3, but 4 is much more nebulous and 3 produces
> a big negative perception.
>
> I'm sure someone could argue hotspare could get squeezed in between 4
> and 5; but that's really my one bias in the list, I don't care about
> hot spare. I think it's more scalable to take advantage of Btrfs
> uniqueness to shrink the file system to drop the bad drive to regain
> full redundancy, rather than do hot spares, this is faster, and
> doesn't waste a drive that's not doing any work.
>
> I see shrink as more scalable with hard drives than hot spares,
> especially in the case of data single profile with clusterfs's: drop
> the bad device and its data, autodelete the lost files, rebuild
> metadata to regain complete fs redundancy,  inform the cluster of
> partial data loss - boom the array is completely fixed, let the
> cluster figure out what to do next. Plus each brick isn't spinning an
> unused hot spare. There is in effect a hot spare *somewhere* partially
> used somewhere else in a cluster fs anyway. I see hot spare as an edge
> case need, especially with hard drives. It's not a general purpose
> need.
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: RAID1 availability issue[2], Hot-spare and auto-replace
  2016-09-19  2:25           ` Anand Jain
@ 2016-09-19 12:07             ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 66+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-19 12:07 UTC (permalink / raw)
  To: Anand Jain, Chris Murphy; +Cc: David Sterba, Btrfs BTRFS, Chris Mason

On 2016-09-18 22:25, Anand Jain wrote:
>
> Chris Murphy,
>
>  Thanks for writing in detail, it makes sense..
>
>  Generally hot spare is to reduce the risk of double disk failures
>  leading to the data lose at the data centers before the data is
>  reconstructed again for redundancy.
>
> On 09/19/2016 01:28 AM, Chris Murphy wrote:
>> On Sun, Sep 18, 2016 at 2:34 AM, Anand Jain <anand.jain@oracle.com>
>> wrote:
>>>
>>> (updated the subject, was [1])
>>>
>>>> IMO the hot-spare feature makes most sense with the raid56,
>>>
>>>
>>>   Why. ?
>>
>> Raid56 is not scalable, has less redundancy in most all
>> configurations, rebuild impacts the entire array performance, and in
>> the case of raid6 two drives lost means incredibly slow rebuild. All
>> of that adds up to more disk for raid56 to be mitigated with a hot
>> spare being available for immediate rebuild.
>>
>> Who currently would use hot spare right now?
>
>  Probably you mean to say hot spare is not P1 right now, looking at
>  other things to fix, I agree.  raid1 availability issue is p1.
>  I do get ping-ed on it once in a while.
>
>  I am curious what do you recommend as a btrfs vm data solution for
>  the enterprise production ?
I have no idea what Chris would recommend, but in my case, it depends on 
what you want to do.  For use inside a VM, I'd say it's entirely up to 
your requirements, but I'd only trust it for catching corruption, not 
preventing data loss (that's the job of the storage host anyway).  For 
use for storing VM images, there are much better options.  For a single 
user system or a small single server without HA requirements you should 
be using LVM (or something similar) and setting proper ACL's on the LV's 
so you don't need to run the VM's as root (and easy portability is a 
bogus argument against this, it's trivial to generate image files from 
block devices on Linux).  For HA setups, I'd probably set up a SAN using 
GlusterFS+iSCSI (possibly with BTRFS as a back-end for Gluster) or Ceph.
>
> Thanks, Anand
>
>> Problem 1 is Btrfs raid10
>> is not scalable like other raid10 implementations (mdadm, lvm,
>> hardware). Problem 2 is Btrfs the raid56 parity scrub bug; and
>> arguably also partial stripe writes not being CoW. I think hot spare
>> is pointless with those two problems still being true, and the way to
>> mitigate them right now is a clusterfs. Hot spare doesn't mitigate
>> these Btrfs weaknesses.
>>
>>
>>>
>>>> which is stuck where it is, so we need to get it working first.
>>>
>>>
>>>
>>>   We need at least one RAID which does not have the availability
>>>   issue. We could achieve that with raid1, there are patches
>>>   which needs maintainer time.
>>
>> I agree with the idea of degraded raid1 chunks. It's a nasty surprise
>> to realize this only once it's too late and there's data loss. That
>> there is a user space work around, maybe makes it less of a big deal?
>> But I don't think it's documented on gotchas page with the soft
>> conversion work around to do the rebuild properly: scrub/balance alone
>> is not correct.
>>
>> I kinda think we need a list of priorities for multiple device stuff,
>> and honestly hot spare while important I think is bottom of the list.
>>
>> 1. multiple fs UUID dev UUID corruption problem (the cloned device
>> problem)
>> 2. degraded volumes new bg's are single profile (Anand's April patchset)
>> 3. raid56 bad parity created during scrub when data strip is bad and
>> gets fixed
>> 4. better faulty device tolerance (no crashing)
>> 5. raid10 scaling, needs a way for even number block devices of the
>> same size to get fixed mirroring so it can tolerate multiple drive
>> failures so long as a mirrored pair don't fail
>> 6. raid56 partial stripe RMW need to be CoW, doesn't matter if it
>> slows things down, if you don't like it, use raid10
>> 7. raid1 threaded/async reads (whatever the correct term is to read
>> from all raid1 drives rather than PID based)
>> 8. better faulty device notifications
>> 9. raid56 parity needs to be checksummed
>> 10. hotspare
>>
>>
>> 2 and 3 might seem tied. Both can result in data loss, both have user
>> space work arounds (undocumented); but 2 has a greater chance of
>> happening than 3.
>>
>> 4 is probably worse than 3, but 4 is much more nebulous and 3 produces
>> a big negative perception.
>>
>> I'm sure someone could argue hotspare could get squeezed in between 4
>> and 5; but that's really my one bias in the list, I don't care about
>> hot spare. I think it's more scalable to take advantage of Btrfs
>> uniqueness to shrink the file system to drop the bad drive to regain
>> full redundancy, rather than do hot spares, this is faster, and
>> doesn't waste a drive that's not doing any work.
>>
>> I see shrink as more scalable with hard drives than hot spares,
>> especially in the case of data single profile with clusterfs's: drop
>> the bad device and its data, autodelete the lost files, rebuild
>> metadata to regain complete fs redundancy,  inform the cluster of
>> partial data loss - boom the array is completely fixed, let the
>> cluster figure out what to do next. Plus each brick isn't spinning an
>> unused hot spare. There is in effect a hot spare *somewhere* partially
>> used somewhere else in a cluster fs anyway. I see hot spare as an edge
>> case need, especially with hard drives. It's not a general purpose
>> need.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: RAID1 availability issue[2], Hot-spare and auto-replace
  2016-09-18 17:28         ` Chris Murphy
  2016-09-18 17:34           ` Chris Murphy
  2016-09-19  2:25           ` Anand Jain
@ 2016-09-19 12:25           ` Austin S. Hemmelgarn
  2 siblings, 0 replies; 66+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-19 12:25 UTC (permalink / raw)
  To: Chris Murphy, Anand Jain; +Cc: David Sterba, Btrfs BTRFS, Chris Mason

On 2016-09-18 13:28, Chris Murphy wrote:
> On Sun, Sep 18, 2016 at 2:34 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>> (updated the subject, was [1])
>>
>>> IMO the hot-spare feature makes most sense with the raid56,
>>
>>
>>   Why. ?
>
> Raid56 is not scalable, has less redundancy in most all
> configurations, rebuild impacts the entire array performance, and in
> the case of raid6 two drives lost means incredibly slow rebuild. All
> of that adds up to more disk for raid56 to be mitigated with a hot
> spare being available for immediate rebuild.
>
> Who currently would use hot spare right now? Problem 1 is Btrfs raid10
> is not scalable like other raid10 implementations (mdadm, lvm,
> hardware). Problem 2 is Btrfs the raid56 parity scrub bug; and
> arguably also partial stripe writes not being CoW. I think hot spare
> is pointless with those two problems still being true, and the way to
> mitigate them right now is a clusterfs. Hot spare doesn't mitigate
> these Btrfs weaknesses.
>
>
>>
>>> which is stuck where it is, so we need to get it working first.
>>
>>
>>
>>   We need at least one RAID which does not have the availability
>>   issue. We could achieve that with raid1, there are patches
>>   which needs maintainer time.
>
> I agree with the idea of degraded raid1 chunks. It's a nasty surprise
> to realize this only once it's too late and there's data loss. That
> there is a user space work around, maybe makes it less of a big deal?
> But I don't think it's documented on gotchas page with the soft
> conversion work around to do the rebuild properly: scrub/balance alone
> is not correct.
>
> I kinda think we need a list of priorities for multiple device stuff,
> and honestly hot spare while important I think is bottom of the list.
>
> 1. multiple fs UUID dev UUID corruption problem (the cloned device problem)
> 2. degraded volumes new bg's are single profile (Anand's April patchset)
> 3. raid56 bad parity created during scrub when data strip is bad and gets fixed
> 4. better faulty device tolerance (no crashing)
> 5. raid10 scaling, needs a way for even number block devices of the
> same size to get fixed mirroring so it can tolerate multiple drive
> failures so long as a mirrored pair don't fail
> 6. raid56 partial stripe RMW need to be CoW, doesn't matter if it
> slows things down, if you don't like it, use raid10
> 7. raid1 threaded/async reads (whatever the correct term is to read
> from all raid1 drives rather than PID based)
> 8. better faulty device notifications
> 9. raid56 parity needs to be checksummed
> 10. hotspare
FWIW, I'd probably list the faulty device tolerance and notifications 
(in that order) immediately after the first two items, put the raid1 
threaded reads at the end of the list (after hot-spares), and put the 
raid10 scaling after raid1 threading (anyone who's actually concerned 
with performance and has done their homework is more likely to be using 
BTRFS in raid1 mode on top of a pair of RAID0 arrays (most likely MD or 
LVM based) instead of BTRFS raid10 mode, not only because of the 
reliability factor, but also because it gets significantly better 
performance than BTRFS raid10 mode, and will continue to do so until we 
get proper load-balancing of reads on raid1 and raid10 profiles).  I'd 
also add that we should be parallelizing reads of stripe components in 
raid0, raid10, raid5, and raid6 modes (ie, if we're using raid10 mode 
and need to read both halves of a stripe, both reads should get 
dispatched at the same time), but that would likely go in with the raid1 
performance stuff.
>
>
> 2 and 3 might seem tied. Both can result in data loss, both have user
> space work arounds (undocumented); but 2 has a greater chance of
> happening than 3.
2 also impacts things other than raid5/6, which means (at least IMO) it 
should be higher priority.
>
> 4 is probably worse than 3, but 4 is much more nebulous and 3 produces
> a big negative perception.
>
> I'm sure someone could argue hotspare could get squeezed in between 4
> and 5; but that's really my one bias in the list, I don't care about
> hot spare. I think it's more scalable to take advantage of Btrfs
> uniqueness to shrink the file system to drop the bad drive to regain
> full redundancy, rather than do hot spares, this is faster, and
> doesn't waste a drive that's not doing any work.
This isn't just you, I'm pretty much of the same opinion on this 
particular item.
>
> I see shrink as more scalable with hard drives than hot spares,
> especially in the case of data single profile with clusterfs's: drop
> the bad device and its data, autodelete the lost files, rebuild
> metadata to regain complete fs redundancy,  inform the cluster of
> partial data loss - boom the array is completely fixed, let the
> cluster figure out what to do next. Plus each brick isn't spinning an
> unused hot spare. There is in effect a hot spare *somewhere* partially
> used somewhere else in a cluster fs anyway. I see hot spare as an edge
> case need, especially with hard drives. It's not a general purpose
> need.
I agree on this too to a certain extent, except:
1. There aren't any clustered filesystems that have this functionality. 
As far as the big three, I have zero personal experience with Hadoop, it 
would be impractical to try with Ceph (they're moving to using just flat 
block devices instead of filesystem backed storage), and while it may be 
doable on Gluster, I'm not 100% certain (I don't have a very concrete 
understanding of how exactly GlusterFS works at a low level, which 
ideally needs to change considering that we're using it where I work and 
I'm using it for storing backups for my personal systems).
2. There is one very specific use case where a hot-spare has an 
advantage over shrinking the FS: automatic repair.  If I have to set 
something up to automatically repair a storage array, I almost certainly 
want a hot-spare, not something that will reduce the size of the array, 
and I'd be willing to bet almost anyone else you ask will give the same 
opinion on that.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
                   ` (10 preceding siblings ...)
  2016-09-17  6:58 ` Eric Biggers
@ 2016-09-19 15:15 ` Theodore Ts'o
  2016-09-19 20:58   ` Alex Elsayed
  2016-09-20  4:05   ` Anand Jain
  11 siblings, 2 replies; 66+ messages in thread
From: Theodore Ts'o @ 2016-09-19 15:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba, Anand Jain

(I'm not on linux-btrfs@, so please keep me on the cc list.  Or
perhpas better yet, maybe we can move discussion to the linux-fsdevel@
list.)

Hi Anand,

After reading this thread on the web archives, and seeing that some
folks seem to be a bit confused about "vfs level crypto", fs/crypto,
and ext4/f2fs encryption, I thought I would give a few comments.

First of all, these are all the same thing.  Initially ext4 encryption
was implemented targetting ChromeOS as the initial customer, and as a
replacement for ecryptfs.  Folks have already pointed you at the
design document[1].  Also of interest is the is the 2015 Linux
Security Symposium slides set[2].  The first deployed use of this was
for Android N's File-based Encryption and Direct boot[3]; a technical
description which left out some of the product details (since LSS 2016
was before the Android N release) can be found at the 2016 LSS
slides[4].

[1] https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/preview
[2] http://kernsec.org/files/lss2014/Halcrow_EXT4_Encryption.pdf
[3] https://android.googleblog.com/2016/08/android-70-nougat-more-powerful-os-made.html
[4] http://kernsec.org/files/lss2015/halcrow.pdf

The other thing that perhaps would be worth noting is that Michael
Halcrow started this as an encryption/security expert who had dabbled
in file systems, while I was someone for whom encryption/security is a
hobby (although in a previous life I was the tech lead for Kerberos
and chaired the IPSEC working group) who was a file system expert.  In
order to do file system security well, you need people who are well
versed in both discplines working together.

With all due respect, the fact that you chose counter mode and how use
used it pretty clearly demonstrates that you would be well advised to
find someone who is a crypto expert to collaborate with you --- or use
the fs/crypto framework since it was designed and vetted by multiple
crypto experts as well as file system experts.

Having someone who is a product manager who can discuss with you
specific goals is also important, because there are lots of tradeoffs
and lots of design choices ---- and so what you chose to do is (or at
least should be!)  very much dependent on your threat model, who is
planning on using the feature, what you can and can not upon via-a-vis
hardware support, performance requirements, and so on.


Secondly, in terms of how it all works.  Each user as a "master key"
which is stored on a keyring.  We use a hash of the key to serve as
the key identifier, and associated with each inode we store a nonce (a
random unique string) and the key identifier.  We use the nonce and
the user's master key to generate a unique key for that inode.

That key is used to protect the contents of the data file, and to
encrypt filenames and symlink targets --- since filenames can leak
significant information about what the user is doing.  (For example,
in the downloads directory of their web browser, leaking filenames is
just as good as leaking part of their browsing history.)

As far as using the fs/crypto infrastructure, it's actually pretty
simple.  The file system needs to provide a flag indicating whether or
not the file is encrypted, and support extended attributes.  When you
create an inode in an encrypted directory, you call
fscrypt_inherit_context() and the fscrypto layer will take care of
creating the necessary xattr for the per-inode key.  When you need
open a encrypted file, or operate on an encrypted inode, you call
fscrypt_get_encryption_info() on the inode.  The per-inode encryption
key is cached in the i_crypt_info structure, which hangs off of the
struct inode.

When you write to an encrypted file, you call fscrypt_encrypt_page(),
which returns a struct page with the encrypted contents to be written.
After the write is completed (or in the error case), you call
fscrypt_restore_control_page() to release encrypted page.

To read from an encrypted page, you call fscrypt_get_ctx() to get an
encryption context, which gets stashed in the bio's bi_private
pointer.  (If btrfs is already using bi_private, then you'll need to
add a field in the structure which hangs off of bi_private to stash
the encryption context.)  After the read completes, you call
fscrypt_decrypt_bio_pages() to decrypt all of the pages read as part
of the read/write operation.

It's actually relatively straightforward to use.  If you have any
questions please feel free to ask on linux-fsdevel.


As far as poeple commenting that it might be better to encrypt on the
extent level --- the reason why we didn't chose that path is because
while it does make it easier to do authenticated encryption modes, the
downside is that you can only do the data integrity check if you read
in the entire extent.  This has obvious memory utilization impacts and
will also impact your 4k random read/write performance.

We do have a solution in mind to solve the authenticated encryption
problem; in fact, an intern has recently finished a prototype using
Authenticated Skip Lists[5][6].  Hopefully we'll be able to get some
patches for review in the near future.

[5] http://cs.brown.edu/cgc/stms/papers/hashskip.pdf
[6] http://cs.brown.edu/cgc/stms/papers/discex2001.pdf

One of the challenges with data integrity is that you need to be able
to update authentication data and the data blocks atomically, or else
you could end up breaking the file on a crash.  For ext4, we're going
to simply only support data integrity for those files which are
written and then closed, and if you crash, the file which is being
written may not have valid data integrity checksums.  This is good
enough for many use cases, since most files are not updated after they
are initially written.  Obviously, btrfs would be able to do much
better since it has COW properties.

Cheers,

							- Ted

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17  6:37       ` Alex Elsayed
@ 2016-09-19 18:08         ` Zygo Blaxell
  2016-09-19 20:01           ` Alex Elsayed
  0 siblings, 1 reply; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-19 18:08 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3959 bytes --]

On Sat, Sep 17, 2016 at 06:37:16AM +0000, Alex Elsayed wrote:
> > Encryption in ext4 is a per-directory-tree affair. One starts by
> > setting an encryption policy (using an ioctl() call) for a given
> > directory, which must be empty at the time; that policy includes a
> > master key used for all files and directories stored below the target
> > directory. Each individual file is encrypted with its own key, which is
> > derived from the master key and a per-file random nonce value (which is
> > stored in an extended attribute attached to the file's inode). File
> > names and symbolic links are also encrypted.

Probably the simplest way to map this to btrfs is to move the nonce from
the inode to the extent.

Inodes aren't unique within a btrfs filesystem, extents can be shared
by multiple inodes, and a single extent can appear multiple times in the
same inode at different offsets.  Attaching the nonce to the inode would
not be sufficient to read the extent in all but the special case of a
single reference at the original offset where it was written, and it
also leads to the replay problems with duplicate inodes you pointed out.

Extents in a btrfs filesystem are unique and carry their own attributes
(e.g. compression format, checksums) and reference count.  They can easily
carry a reference to an encryption policy object and a nonce attribute.

Nonces within metadata are more complicated.  btrfs doesn't have directory
files like ext4 does, so it doesn't get directory filename encryption
for free with file encryption.  Encryption could be done per-item in the
metadata trees, but in the special case of directories that happen to
the the roots of subvols, it would be possible to encrypt entire pages
of metadata at a time (with the caveat that a snapshot would require
shared encryption policy between the origin and snapshot subvols).
This is what makes keys at the subvol root level so attractive.

> So there isn't quite a "subvol key" in the VFS approach - each directory 
> has a key, and there are derived keys for the entries below it. (I'll 
> note that this framing does not address shared extents _at all_, and 
> would love to have clarification on that).

Files are modified by creating new extents (using parameters inherited
from the inode to fill in the extent attributes) and updating the inode to
refer to the new extent instead of the old one at the modified offset.
Cloned extents are references to existing extents associated with a
different inode or at a different place within the same inode (if the
extent is not compatible with the destination inode, clone fails with
an error).  A snapshot is an efficient way to clone an entire subvol
tree at once, including all inodes and attributes.

Inode attributes and extent attributes can sometimes conflict, especially
during a clone operation.  Encryption attributes could become one of
these cases (i.e. to prevent an extent from one encryption policy from
being cloned to an inode under a different encryption policy).

> > I don't see how snapshots could work, writable or otherwise, without
> > separating the key identity from the subvol identity and having a
> > many-to-one relationship between subvols and keys.  The extents in each
> > subvol would be shared, and they'd be encrypted with a single secret,
> > so there's not really another way to do this.
> 
> That's not the issue. The issue is that, assuming the key stays the same, 
> then a user could quite possibly create a snapshot, write into both the 
> original and the snapshot, causing encryption to occur twice with the 
> same key, same nonce, and different data.

If the extents have nonces (and inodes do not) then this doesn't happen.
A write to either snapshot necessarily creates new extents in all cases
(the nodatacow feature, the only way to modify a data extent in-place,
is disabled when the extent is shared).


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17  7:13   ` Alex Elsayed
@ 2016-09-19 18:57     ` Zygo Blaxell
  2016-09-19 19:50       ` Alex Elsayed
  0 siblings, 1 reply; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-19 18:57 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2019 bytes --]

On Sat, Sep 17, 2016 at 07:13:45AM +0000, Alex Elsayed wrote:
> IMO, this is already a flawed framing - in particular, if encrypting at 
> the extent level, one _should not_ be encrypting (or authenticating) 
> individual pages. The meaningful unit is the extent, and encrypting at 
> page granularity puts you right back where dmcrypt is: dealing with fixed-
> size space, and needing to find somewhere else to put the auth tag.
> 
> This is not a good place to be, and I strongly suspect it motivated 
> choosing XTS in the first place - something I feel is an _error_ in the 
> long run, and a dangerous one. (IMO, anything _but_ AEAD should be 
> forbidden in FS-level encryption.)
> 
> In a nonce-misuse-resistent AEAD, there _is_ no auth tag: There's some 
> amount of inherent ciphertext expansion, and the ciphertext _cannot be 
> decrypted at all_ unless all of it is present. In essence, a built-in all-
> or-nothing transform.
> 
> You could, potentially, chop off part of that and store it elsewhere, but 
> now you're dealing with significant added complexity, for absolutely zero 
> gain.

That would be true if the problem were not already long solved in btrfs.
The 32-bit CRC tree stores 4 bytes per block separately and efficiently.
With minor changes it can store a 32-byte HMAC for each block.

> If you're _not_ using a nonce-misuse-resistant AEAD, it's even worse: 
> keeping the tag out-of-band makes it far too easy to fail to verify it, 
> or verify it only after decrypting the ciphertext to plaintext. Bluntly: 
> that is an immediate security vulnerability.
> 
> tl;dr: Don't encrypt pages, encrypt extents. They grow a little for the 
> auth tag, and that's fine.
> 
> Btrfs already handles needing to read the full extent in order to get a 
> page out of it with compression, anyway.

It does, but compressed extents are limited to 128K.  Uncompressed extents
come in sizes up to 128M, far too large to read in their entirety for
many applications.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-19 18:57     ` Zygo Blaxell
@ 2016-09-19 19:50       ` Alex Elsayed
  2016-09-19 22:12         ` Zygo Blaxell
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Elsayed @ 2016-09-19 19:50 UTC (permalink / raw)
  To: linux-btrfs

On Mon, 19 Sep 2016 14:57:33 -0400, Zygo Blaxell wrote:

> On Sat, Sep 17, 2016 at 07:13:45AM +0000, Alex Elsayed wrote:
>> IMO, this is already a flawed framing - in particular, if encrypting at
>> the extent level, one _should not_ be encrypting (or authenticating)
>> individual pages. The meaningful unit is the extent, and encrypting at
>> page granularity puts you right back where dmcrypt is: dealing with
>> fixed-
>> size space, and needing to find somewhere else to put the auth tag.
>> 
>> This is not a good place to be, and I strongly suspect it motivated
>> choosing XTS in the first place - something I feel is an _error_ in the
>> long run, and a dangerous one. (IMO, anything _but_ AEAD should be
>> forbidden in FS-level encryption.)
>> 
>> In a nonce-misuse-resistent AEAD, there _is_ no auth tag: There's some
>> amount of inherent ciphertext expansion, and the ciphertext _cannot be
>> decrypted at all_ unless all of it is present. In essence, a built-in
>> all-
>> or-nothing transform.
>> 
>> You could, potentially, chop off part of that and store it elsewhere,
>> but now you're dealing with significant added complexity, for
>> absolutely zero gain.
> 
> That would be true if the problem were not already long solved in btrfs.
> The 32-bit CRC tree stores 4 bytes per block separately and efficiently.
> With minor changes it can store a 32-byte HMAC for each block.

I disagree that this "solves" it - in particular, the fact that the fsck 
tool support dropping/regenerating the extent tree is wildly unsafe in 
the face of this.

For an AEAD that lacks nonce-misuse-resistance, it's "merely" downgrading 
security from AEAD to simple encryption (GCM, for instance, becomes 
exactly CTR). This would be almost okay (it's a fsck tool, after all), 
but the fact that it's a fsck tool makes the next part worse.

In the case of nonce-misuse-resistant AEAD, it's much worse: Dropping the 
checksum tree would permanently and irrevocably corrupt every single 
extent, with no data recoverable at all. This is the _exact_ opposite of 
_anything_ you would _ever_ want a fsck tool to do.

This is, fundamentally, the problem with treating an "auth tag" as a 
separate thing: It's only separate at all in weaker systems, and the act 
of separating the data induces incredibly nasty failure modes.

It gets even worse if you consider _why_ that option exists for the fsck 
tool: Because of the possibility that the _structure_ of the checksum 
tree becomes corrupted. As a result, two bit-flips (one for each 
duplicate of the metadata) would be entirely capable of irrevocably 
destroying _all encrypted data on the FS_.

Separating the "auth tag" - simply considering an "auth tag" a separate 
thing from the overall ciphertext - is a dangerous thing to do.

>> If you're _not_ using a nonce-misuse-resistant AEAD, it's even worse:
>> keeping the tag out-of-band makes it far too easy to fail to verify it,
>> or verify it only after decrypting the ciphertext to plaintext.
>> Bluntly: that is an immediate security vulnerability.
>> 
>> tl;dr: Don't encrypt pages, encrypt extents. They grow a little for the
>> auth tag, and that's fine.
>> 
>> Btrfs already handles needing to read the full extent in order to get a
>> page out of it with compression, anyway.
> 
> It does, but compressed extents are limited to 128K.  Uncompressed
> extents come in sizes up to 128M, far too large to read in their
> entirety for many applications.

Er, yes, and? Just as compressed extents have a different cap for reasons 
of practicality, so too can encrypted extents.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-19 18:08         ` Zygo Blaxell
@ 2016-09-19 20:01           ` Alex Elsayed
  2016-09-19 22:22             ` Zygo Blaxell
  2016-09-19 22:25             ` Chris Murphy
  0 siblings, 2 replies; 66+ messages in thread
From: Alex Elsayed @ 2016-09-19 20:01 UTC (permalink / raw)
  To: linux-btrfs

On Mon, 19 Sep 2016 14:08:06 -0400, Zygo Blaxell wrote:

> On Sat, Sep 17, 2016 at 06:37:16AM +0000, Alex Elsayed wrote:
>> > Encryption in ext4 is a per-directory-tree affair. One starts by
>> > setting an encryption policy (using an ioctl() call) for a given
>> > directory, which must be empty at the time; that policy includes a
>> > master key used for all files and directories stored below the target
>> > directory. Each individual file is encrypted with its own key, which
>> > is derived from the master key and a per-file random nonce value
>> > (which is stored in an extended attribute attached to the file's
>> > inode). File names and symbolic links are also encrypted.
> 
> Probably the simplest way to map this to btrfs is to move the nonce from
> the inode to the extent.

I agree. Mostly, I was making a point about how the ext4/VFS code (which 
_does_ put it on the inode) can't just be transported over to btrfs 
unchanged, which is what I read Dave Chinner as advocating.

> Inodes aren't unique within a btrfs filesystem, extents can be shared by
> multiple inodes, and a single extent can appear multiple times in the
> same inode at different offsets.  Attaching the nonce to the inode would
> not be sufficient to read the extent in all but the special case of a
> single reference at the original offset where it was written, and it
> also leads to the replay problems with duplicate inodes you pointed out.

Yup.

> Extents in a btrfs filesystem are unique and carry their own attributes
> (e.g. compression format, checksums) and reference count.  They can
> easily carry a reference to an encryption policy object and a nonce
> attribute.

Definitely agreed.

> Nonces within metadata are more complicated.  btrfs doesn't have
> directory files like ext4 does, so it doesn't get directory filename
> encryption for free with file encryption.  Encryption could be done
> per-item in the metadata trees, but in the special case of directories
> that happen to the the roots of subvols, it would be possible to encrypt
> entire pages of metadata at a time (with the caveat that a snapshot
> would require shared encryption policy between the origin and snapshot
> subvols).

Encrypting tree values per-item is actually one of the best arguments in 
_favor_ of nonce-misuse-resistant AEAD. Its security notion is very, very 
strong:

If a (key, nonce, associated data, message) tuple is repeated, the only 
data an attacker can discover is the fact that the two ciphertexts have 
the same value (a one-bit leak).

In other words, if you encrypt each value in the b-tree with some key, 
some nonce, use the b-tree key as the associated data, and use the value 
as the message, you get a _very_ secure system against a _very_ wide 
variety of attacks - essentially for free. And all _without_ sacrificing 
flexibility, as one could use distinct (crypto) keys for distinct (b-
tree) keys.

(You still need something for protecting the _structure_ of the B-tree, 
but that's a different issue).

> This is what makes keys at the subvol root level so attractive.

Pretty much.

>> So there isn't quite a "subvol key" in the VFS approach - each
>> directory has a key, and there are derived keys for the entries below
>> it. (I'll note that this framing does not address shared extents _at
>> all_, and would love to have clarification on that).
> 
> Files are modified by creating new extents (using parameters inherited
> from the inode to fill in the extent attributes) and updating the inode
> to refer to the new extent instead of the old one at the modified
> offset. Cloned extents are references to existing extents associated
> with a different inode or at a different place within the same inode (if
> the extent is not compatible with the destination inode, clone fails
> with an error).  A snapshot is an efficient way to clone an entire
> subvol tree at once, including all inodes and attributes.

There is the caveat of chattr +C, which would need hard-disabled for 
extent-level encryption (vs block level).

> Inode attributes and extent attributes can sometimes conflict,
> especially during a clone operation.  Encryption attributes could become
> one of these cases (i.e. to prevent an extent from one encryption policy
> from being cloned to an inode under a different encryption policy).

That is a good approach.

>> > I don't see how snapshots could work, writable or otherwise, without
>> > separating the key identity from the subvol identity and having a
>> > many-to-one relationship between subvols and keys.  The extents in
>> > each subvol would be shared, and they'd be encrypted with a single
>> > secret, so there's not really another way to do this.
>> 
>> That's not the issue. The issue is that, assuming the key stays the
>> same,
>> then a user could quite possibly create a snapshot, write into both the
>> original and the snapshot, causing encryption to occur twice with the
>> same key, same nonce, and different data.
> 
> If the extents have nonces (and inodes do not) then this doesn't happen.
> A write to either snapshot necessarily creates new extents in all cases
> (the nodatacow feature, the only way to modify a data extent in-place,
> is disabled when the extent is shared).

As above, note that if encryption is applied to extents rather than 
blocks, nodatacow becomes a data loss vector (partial write -> AEAD 
verify failure).


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-19 15:15 ` Experimental btrfs encryption Theodore Ts'o
@ 2016-09-19 20:58   ` Alex Elsayed
  2016-09-20  0:32     ` Chris Mason
  2016-09-20  4:05   ` Anand Jain
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Elsayed @ 2016-09-19 20:58 UTC (permalink / raw)
  To: linux-btrfs

On Mon, 19 Sep 2016 11:15:18 -0400, Theodore Ts'o wrote:

> (I'm not on linux-btrfs@, so please keep me on the cc list.  Or perhpas
> better yet, maybe we can move discussion to the linux-fsdevel@
> list.)

I apologize if this doesn't keep you in the CC, as I'm posting via gmane.

> Hi Anand,
> 
> After reading this thread on the web archives, and seeing that some
> folks seem to be a bit confused about "vfs level crypto", fs/crypto,
> and ext4/f2fs encryption, I thought I would give a few comments.
> 
> First of all, these are all the same thing.  Initially ext4 encryption
> was implemented targetting ChromeOS as the initial customer, and as a
> replacement for ecryptfs.  Folks have already pointed you at the design
> document[1].  Also of interest is the is the 2015 Linux Security
> Symposium slides set[2].  The first deployed use of this was for Android
> N's File-based Encryption and Direct boot[3]; a technical description
> which left out some of the product details (since LSS 2016 was before
> the Android N release) can be found at the 2016 LSS slides[4].
> 
> [1]
> https://docs.google.com/document/
d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/preview
> [2] http://kernsec.org/files/lss2014/Halcrow_EXT4_Encryption.pdf [3]
> https://android.googleblog.com/2016/08/android-70-nougat-more-powerful-
os-made.html
> [4] http://kernsec.org/files/lss2015/halcrow.pdf
> 
> The other thing that perhaps would be worth noting is that Michael
> Halcrow started this as an encryption/security expert who had dabbled in
> file systems, while I was someone for whom encryption/security is a
> hobby (although in a previous life I was the tech lead for Kerberos and
> chaired the IPSEC working group) who was a file system expert.  In order
> to do file system security well, you need people who are well versed in
> both discplines working together.
> 
> With all due respect, the fact that you chose counter mode and how use
> used it pretty clearly demonstrates that you would be well advised to
> find someone who is a crypto expert to collaborate with you --- or use
> the fs/crypto framework since it was designed and vetted by multiple
> crypto experts as well as file system experts.

100% agreed on the former, and mostly agreed on the latter (though I feel 
that even applying fs/crypto to btrfs includes sufficient novelty as to 
require very careful review by crypto experts).

> Having someone who is a product manager who can discuss with you
> specific goals is also important, because there are lots of tradeoffs
> and lots of design choices ---- and so what you chose to do is (or at
> least should be!)  very much dependent on your threat model, who is
> planning on using the feature, what you can and can not upon via-a-vis
> hardware support, performance requirements, and so on.
> 
> 
> Secondly, in terms of how it all works.  Each user as a "master key"
> which is stored on a keyring.  We use a hash of the key to serve as the
> key identifier, and associated with each inode we store a nonce (a
> random unique string) and the key identifier.  We use the nonce and the
> user's master key to generate a unique key for that inode.

As noted in my discussions with Zygo Blaxell, this is one of the places 
where applying fs/crypto to btrfs without careful reexamination would 
fail badly - using the inode will not work.

> That key is used to protect the contents of the data file, and to
> encrypt filenames and symlink targets --- since filenames can leak
> significant information about what the user is doing.  (For example,
> in the downloads directory of their web browser, leaking filenames is
> just as good as leaking part of their browsing history.)
>
> As far as using the fs/crypto infrastructure, it's actually pretty
> simple.  The file system needs to provide a flag indicating whether or
> not the file is encrypted, and support extended attributes.  When you
> create an inode in an encrypted directory, you call
> fscrypt_inherit_context() and the fscrypto layer will take care of
> creating the necessary xattr for the per-inode key.  When you need open
> a encrypted file, or operate on an encrypted inode, you call
> fscrypt_get_encryption_info() on the inode.  The per-inode encryption
> key is cached in the i_crypt_info structure, which hangs off of the
> struct inode.

When someone says "pretty simple" regarding cryptography, it's often 
neither pretty nor simple :P

The issue, here, is that inodes are fundamentally not a safe scope to 
attach that information to in btrfs. As extents can be shared between 
inodes (and thus both will need to decrypt them), and inodes can be 
duplicated unmodified (snapshots), attaching keys and nonces to inodes 
opens up a whole host of (possibly insoluble) issues, including 
catastrophic nonce reuse via writable snapshots.

> When you write to an encrypted file, you call fscrypt_encrypt_page(),
> which returns a struct page with the encrypted contents to be written.
> After the write is completed (or in the error case), you call
> fscrypt_restore_control_page() to release encrypted page.
> 
> To read from an encrypted page, you call fscrypt_get_ctx() to get an
> encryption context, which gets stashed in the bio's bi_private pointer. 
> (If btrfs is already using bi_private, then you'll need to add a field
> in the structure which hangs off of bi_private to stash the encryption
> context.)  After the read completes, you call
> fscrypt_decrypt_bio_pages() to decrypt all of the pages read as part of
> the read/write operation.
> 
> It's actually relatively straightforward to use.  If you have any
> questions please feel free to ask on linux-fsdevel.

Straightforward to use, yes. Straightforward to use _securely_ with btrfs 
in particular, undetermined.

> As far as poeple commenting that it might be better to encrypt on the
> extent level --- the reason why we didn't chose that path is because
> while it does make it easier to do authenticated encryption modes, the
> downside is that you can only do the data integrity check if you read in
> the entire extent.  This has obvious memory utilization impacts and will
> also impact your 4k random read/write performance.

It does, yes, but btrfs already does this with compression. I'd suggest 
that's a pretty good argument  for it being at least potentially 
acceptable.

> We do have a solution in mind to solve the authenticated encryption
> problem; in fact, an intern has recently finished a prototype using
> Authenticated Skip Lists[5][6].  Hopefully we'll be able to get some
> patches for review in the near future.
> 
> [5] http://cs.brown.edu/cgc/stms/papers/hashskip.pdf [6]
> http://cs.brown.edu/cgc/stms/papers/discex2001.pdf

That's fascinating, and I'll have to read it in-depth later - thank you 
for the links!

> One of the challenges with data integrity is that you need to be able to
> update authentication data and the data blocks atomically, or else you
> could end up breaking the file on a crash.  For ext4, we're going to
> simply only support data integrity for those files which are written and
> then closed, and if you crash, the file which is being written may not
> have valid data integrity checksums.  This is good enough for many use
> cases, since most files are not updated after they are initially
> written.  Obviously, btrfs would be able to do much better since it has
> COW properties.

Makes sense.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-19 19:50       ` Alex Elsayed
@ 2016-09-19 22:12         ` Zygo Blaxell
  0 siblings, 0 replies; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-19 22:12 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3972 bytes --]

On Mon, Sep 19, 2016 at 07:50:07PM +0000, Alex Elsayed wrote:
> > That would be true if the problem were not already long solved in btrfs.
> > The 32-bit CRC tree stores 4 bytes per block separately and efficiently.
> > With minor changes it can store a 32-byte HMAC for each block.
> 
> I disagree that this "solves" it - in particular, the fact that the fsck 
> tool support dropping/regenerating the extent tree is wildly unsafe in 
> the face of this.

Those fsck features should no longer work on the AEAD tree (or would
require the keys to work if there was enough filesystem left to salvage).

> For an AEAD that lacks nonce-misuse-resistance, it's "merely" downgrading 
> security from AEAD to simple encryption (GCM, for instance, becomes 
> exactly CTR). This would be almost okay (it's a fsck tool, after all), 
> but the fact that it's a fsck tool makes the next part worse.
> 
> In the case of nonce-misuse-resistant AEAD, it's much worse: Dropping the 
> checksum tree would permanently and irrevocably corrupt every single 
> extent, with no data recoverable at all. This is the _exact_ opposite of 
> _anything_ you would _ever_ want a fsck tool to do.

So...don't put those features in fsck?

In my experience, if you're dropping the checksum or especially the
extent tree, your filesystem is already so badly damaged you might as
well mkfs+restore the filesystem.  It'll take longer to reverify the
data at the application level or compare with the last backup.

An AEAD tree would just be like that, except there's no point in even
offering the option.  It would just be "rebuilding the AEAD tree will
erase all your encrypted data, leaving only plaintext data on the
filesystem if you had any, are you very sure about this y/N"

> This is, fundamentally, the problem with treating an "auth tag" as a 
> separate thing: It's only separate at all in weaker systems, and the act 
> of separating the data induces incredibly nasty failure modes.
> 
> It gets even worse if you consider _why_ that option exists for the fsck 
> tool: Because of the possibility that the _structure_ of the checksum 
> tree becomes corrupted. As a result, two bit-flips (one for each 
> duplicate of the metadata) would be entirely capable of irrevocably 
> destroying _all encrypted data on the FS_.

That event already destroys a btrfs filesystem, even without encryption.
btrfs already includes much of the verification process of a Merkle tree,
with weak checksums and no auth.  Currently, if you lose both copies of an
interior tree node, it is only possible to recover the filesystem offline
by brute-force search of the metadata.  It's one of the reasons why it's
so important to have duplicate metadata even on a single disk.

The only difference with encryption is that recovery would be
theoretically impossible instead of just practically infeasible.

> Separating the "auth tag" - simply considering an "auth tag" a separate 
> thing from the overall ciphertext - is a dangerous thing to do.
> 
> >> If you're _not_ using a nonce-misuse-resistant AEAD, it's even worse:
> >> keeping the tag out-of-band makes it far too easy to fail to verify it,
> >> or verify it only after decrypting the ciphertext to plaintext.
> >> Bluntly: that is an immediate security vulnerability.
> >> 
> >> tl;dr: Don't encrypt pages, encrypt extents. They grow a little for the
> >> auth tag, and that's fine.
> >> 
> >> Btrfs already handles needing to read the full extent in order to get a
> >> page out of it with compression, anyway.
> > 
> > It does, but compressed extents are limited to 128K.  Uncompressed
> > extents come in sizes up to 128M, far too large to read in their
> > entirety for many applications.
> 
> Er, yes, and? Just as compressed extents have a different cap for reasons 
> of practicality, so too can encrypted extents.

...which very inefficient space usage for short extents.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-19 20:01           ` Alex Elsayed
@ 2016-09-19 22:22             ` Zygo Blaxell
  2016-09-19 22:25             ` Chris Murphy
  1 sibling, 0 replies; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-19 22:22 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2079 bytes --]

On Mon, Sep 19, 2016 at 08:01:06PM +0000, Alex Elsayed wrote:
> >> > I don't see how snapshots could work, writable or otherwise, without
> >> > separating the key identity from the subvol identity and having a
> >> > many-to-one relationship between subvols and keys.  The extents in
> >> > each subvol would be shared, and they'd be encrypted with a single
> >> > secret, so there's not really another way to do this.
> >> 
> >> That's not the issue. The issue is that, assuming the key stays the
> >> same,
> >> then a user could quite possibly create a snapshot, write into both the
> >> original and the snapshot, causing encryption to occur twice with the
> >> same key, same nonce, and different data.
> > 
> > If the extents have nonces (and inodes do not) then this doesn't happen.
> > A write to either snapshot necessarily creates new extents in all cases
> > (the nodatacow feature, the only way to modify a data extent in-place,
> > is disabled when the extent is shared).
> 
> As above, note that if encryption is applied to extents rather than 
> blocks, nodatacow becomes a data loss vector (partial write -> AEAD 
> verify failure).

nodatacow is already a data loss vector.  There is already no atomic
update guarantee and no checksums (not even CRC32) on nodatacow extents.
Silent data corruption is an expected outcome.

They are supposed to be used for applications where performance needs
trump data integrity needs on specific files (presumably applications
that provide their own checking or can reconstruct the data).

They could support an unauthenticated dm-crypt-style block level
encryption...if we like writing long security manual pages full of
caveats that nobody ever reads.  It's certainly safer to just prohibit
the combination of nodatacow and encryption.

Someone could implement metadata logging/journalling to make nodatacow
work with checksums, which would make AEAD work with them, but the
overhead would probably negate the performance benefit that makes
nodatacow attractive in the first place.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-19 20:01           ` Alex Elsayed
  2016-09-19 22:22             ` Zygo Blaxell
@ 2016-09-19 22:25             ` Chris Murphy
  2016-09-19 22:31               ` Zygo Blaxell
  1 sibling, 1 reply; 66+ messages in thread
From: Chris Murphy @ 2016-09-19 22:25 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: Btrfs BTRFS

On Mon, Sep 19, 2016 at 2:01 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> On Mon, 19 Sep 2016 14:08:06 -0400, Zygo Blaxell wrote:
>
>> On Sat, Sep 17, 2016 at 06:37:16AM +0000, Alex Elsayed wrote:
>>> > Encryption in ext4 is a per-directory-tree affair. One starts by
>>> > setting an encryption policy (using an ioctl() call) for a given
>>> > directory, which must be empty at the time; that policy includes a
>>> > master key used for all files and directories stored below the target
>>> > directory. Each individual file is encrypted with its own key, which
>>> > is derived from the master key and a per-file random nonce value
>>> > (which is stored in an extended attribute attached to the file's
>>> > inode). File names and symbolic links are also encrypted.
>>
>> Probably the simplest way to map this to btrfs is to move the nonce from
>> the inode to the extent.
>
> I agree. Mostly, I was making a point about how the ext4/VFS code (which
> _does_ put it on the inode) can't just be transported over to btrfs
> unchanged, which is what I read Dave Chinner as advocating.
>
>> Inodes aren't unique within a btrfs filesystem, extents can be shared by
>> multiple inodes, and a single extent can appear multiple times in the
>> same inode at different offsets.  Attaching the nonce to the inode would
>> not be sufficient to read the extent in all but the special case of a
>> single reference at the original offset where it was written, and it
>> also leads to the replay problems with duplicate inodes you pointed out.
>
> Yup.
>
>> Extents in a btrfs filesystem are unique and carry their own attributes
>> (e.g. compression format, checksums) and reference count.  They can
>> easily carry a reference to an encryption policy object and a nonce
>> attribute.
>
> Definitely agreed.
>
>> Nonces within metadata are more complicated.  btrfs doesn't have
>> directory files like ext4 does, so it doesn't get directory filename
>> encryption for free with file encryption.  Encryption could be done
>> per-item in the metadata trees, but in the special case of directories
>> that happen to the the roots of subvols, it would be possible to encrypt
>> entire pages of metadata at a time (with the caveat that a snapshot
>> would require shared encryption policy between the origin and snapshot
>> subvols).
>
> Encrypting tree values per-item is actually one of the best arguments in
> _favor_ of nonce-misuse-resistant AEAD. Its security notion is very, very
> strong:
>
> If a (key, nonce, associated data, message) tuple is repeated, the only
> data an attacker can discover is the fact that the two ciphertexts have
> the same value (a one-bit leak).
>
> In other words, if you encrypt each value in the b-tree with some key,
> some nonce, use the b-tree key as the associated data, and use the value
> as the message, you get a _very_ secure system against a _very_ wide
> variety of attacks - essentially for free. And all _without_ sacrificing
> flexibility, as one could use distinct (crypto) keys for distinct (b-
> tree) keys.
>
> (You still need something for protecting the _structure_ of the B-tree,
> but that's a different issue).
>
>> This is what makes keys at the subvol root level so attractive.
>
> Pretty much.
>
>>> So there isn't quite a "subvol key" in the VFS approach - each
>>> directory has a key, and there are derived keys for the entries below
>>> it. (I'll note that this framing does not address shared extents _at
>>> all_, and would love to have clarification on that).
>>
>> Files are modified by creating new extents (using parameters inherited
>> from the inode to fill in the extent attributes) and updating the inode
>> to refer to the new extent instead of the old one at the modified
>> offset. Cloned extents are references to existing extents associated
>> with a different inode or at a different place within the same inode (if
>> the extent is not compatible with the destination inode, clone fails
>> with an error).  A snapshot is an efficient way to clone an entire
>> subvol tree at once, including all inodes and attributes.
>
> There is the caveat of chattr +C, which would need hard-disabled for
> extent-level encryption (vs block level).

What about raid56 partial stripe writes? Aren't these effectively nocow?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-19 22:25             ` Chris Murphy
@ 2016-09-19 22:31               ` Zygo Blaxell
  2016-09-20  1:10                 ` Zygo Blaxell
  0 siblings, 1 reply; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-19 22:31 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Alex Elsayed, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

On Mon, Sep 19, 2016 at 04:25:55PM -0600, Chris Murphy wrote:
> >> Files are modified by creating new extents (using parameters inherited
> >> from the inode to fill in the extent attributes) and updating the inode
> >> to refer to the new extent instead of the old one at the modified
> >> offset. Cloned extents are references to existing extents associated
> >> with a different inode or at a different place within the same inode (if
> >> the extent is not compatible with the destination inode, clone fails
> >> with an error).  A snapshot is an efficient way to clone an entire
> >> subvol tree at once, including all inodes and attributes.
> >
> > There is the caveat of chattr +C, which would need hard-disabled for
> > extent-level encryption (vs block level).
> 
> What about raid56 partial stripe writes? Aren't these effectively nocow?

Those are a straight-up bug that should be fixed.  They are mixing committed
data with uncommitted data from two different transactions, and the stripe
temporarily contains garbage.  Combine that with unclean shutdown in degraded
mode and the data is gone.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-16  8:49 ` David Sterba
  2016-09-16 11:56   ` Anand Jain
@ 2016-09-20  0:12   ` Chris Mason
  2016-09-20  0:55     ` Anand Jain
  1 sibling, 1 reply; 66+ messages in thread
From: Chris Mason @ 2016-09-20  0:12 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs, dsterba



On 09/16/2016 04:49 AM, David Sterba wrote:
> On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>> This patchset adds btrfs encryption support.
>>
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>>
>> A design write-up is coming next,
>
> You're approaching it from the wrong side. The detailed specification
> must come first.

Hi Anand,

Thanks for sending these out.  It's an important feature and I'm glad to 
see it getting some love.

Right now, the design doc is really the most important part.  The 
encryption side of things requires a layer of extra verification that we 
can't get from reviewing the code alone.  We need to sit down with the 
use cases and the workflow and make sure it meets all the requirements 
of a secure system.

I'm the first to admit this needs a broader consensus than just 
linux-btrfs@, and we want our reviewers to be able to go through things 
without also having to understand every line of btrfs.

I do want to make sure that we can send/recv signed and encrypted 
subvolumes, and be able to send them again unmodified.  This will end up 
needing a revision of the send/recv protocol, but it adds a use case 
wrinkle that we need to iron out in the design docs.

I still prefer encryption at the subvolume level, but it's going to need 
to be a superset of what is already possible with the vfs interfaces. 
If we can do it with the existing interfaces, great.  If not we should 
be adding features into the generic code where it makes sense.

-chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-19 20:58   ` Alex Elsayed
@ 2016-09-20  0:32     ` Chris Mason
  2016-09-20  2:47       ` Alex Elsayed
  2016-09-20  2:50       ` Theodore Ts'o
  0 siblings, 2 replies; 66+ messages in thread
From: Chris Mason @ 2016-09-20  0:32 UTC (permalink / raw)
  To: Alex Elsayed, linux-btrfs, Theodore Ts'o

On 09/19/2016 04:58 PM, Alex Elsayed wrote:
> On Mon, 19 Sep 2016 11:15:18 -0400, Theodore Ts'o wrote:
>
>> (I'm not on linux-btrfs@, so please keep me on the cc list.  Or perhpas
>> better yet, maybe we can move discussion to the linux-fsdevel@
>> list.)
>
> I apologize if this doesn't keep you in the CC, as I'm posting via gmane.

Full quote of Alex's reply below, since Ted wasn't cc'd.

>
>> Hi Anand,
>>
>> After reading this thread on the web archives, and seeing that some
>> folks seem to be a bit confused about "vfs level crypto", fs/crypto,
>> and ext4/f2fs encryption, I thought I would give a few comments.
>>
>> First of all, these are all the same thing.  Initially ext4 encryption
>> was implemented targetting ChromeOS as the initial customer, and as a
>> replacement for ecryptfs.  Folks have already pointed you at the design
>> document[1].  Also of interest is the is the 2015 Linux Security
>> Symposium slides set[2].  The first deployed use of this was for Android
>> N's File-based Encryption and Direct boot[3]; a technical description
>> which left out some of the product details (since LSS 2016 was before
>> the Android N release) can be found at the 2016 LSS slides[4].
>>
>> [1]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_&d=DQIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P4Q4NcJ-vo0OUlmzW07AtoaRs7dVgu2TKtX-DObpPNo&s=8bEfnpR0pLttZyN53gVIpJhkE9iLw-tnz6nIwLh0beA&e=
> d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/preview
>> [2] https://urldefense.proofpoint.com/v2/url?u=http-3A__kernsec.org_files_lss2014_Halcrow-5FEXT4-5FEncryption.pdf&d=DQIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P4Q4NcJ-vo0OUlmzW07AtoaRs7dVgu2TKtX-DObpPNo&s=aG-njxU2NUhRSaiSyHhCGgobFxaic8FuvthgQ6UNUJY&e=  [3]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__android.googleblog.com_2016_08_android-2D70-2Dnougat-2Dmore-2Dpowerful-2D&d=DQIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P4Q4NcJ-vo0OUlmzW07AtoaRs7dVgu2TKtX-DObpPNo&s=hOddIo8N_AqvwA76Nb-6G_shkpTOHsc5oVnQhdq1Cb4&e=
> os-made.html
>> [4] https://urldefense.proofpoint.com/v2/url?u=http-3A__kernsec.org_files_lss2015_halcrow.pdf&d=DQIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P4Q4NcJ-vo0OUlmzW07AtoaRs7dVgu2TKtX-DObpPNo&s=s0IIXFVRxz6MPKK2Y4lfoIy0XRl3gTRObXbC75CJtL4&e=

[ Sorry, the facebook email servers are here to protect us all. 
Clicking on these links will get Ted's original link, I promise ]

>>
>> The other thing that perhaps would be worth noting is that Michael
>> Halcrow started this as an encryption/security expert who had dabbled in
>> file systems, while I was someone for whom encryption/security is a
>> hobby (although in a previous life I was the tech lead for Kerberos and
>> chaired the IPSEC working group) who was a file system expert.  In order
>> to do file system security well, you need people who are well versed in
>> both discplines working together.
>>
>> With all due respect, the fact that you chose counter mode and how use
>> used it pretty clearly demonstrates that you would be well advised to
>> find someone who is a crypto expert to collaborate with you --- or use
>> the fs/crypto framework since it was designed and vetted by multiple
>> crypto experts as well as file system experts.
>
> 100% agreed on the former, and mostly agreed on the latter (though I feel
> that even applying fs/crypto to btrfs includes sufficient novelty as to
> require very careful review by crypto experts).

Definitely agree about the extra review.

>
>> Having someone who is a product manager who can discuss with you
>> specific goals is also important, because there are lots of tradeoffs
>> and lots of design choices ---- and so what you chose to do is (or at
>> least should be!)  very much dependent on your threat model, who is
>> planning on using the feature, what you can and can not upon via-a-vis
>> hardware support, performance requirements, and so on.
>>
>>
>> Secondly, in terms of how it all works.  Each user as a "master key"
>> which is stored on a keyring.  We use a hash of the key to serve as the
>> key identifier, and associated with each inode we store a nonce (a
>> random unique string) and the key identifier.  We use the nonce and the
>> user's master key to generate a unique key for that inode.
>
> As noted in my discussions with Zygo Blaxell, this is one of the places
> where applying fs/crypto to btrfs without careful reexamination would
> fail badly - using the inode will not work.
>
>> That key is used to protect the contents of the data file, and to
>> encrypt filenames and symlink targets --- since filenames can leak
>> significant information about what the user is doing.  (For example,
>> in the downloads directory of their web browser, leaking filenames is
>> just as good as leaking part of their browsing history.)

One of the things that makes per-subvolume encryption attractive to me 
is that we're able to enforce the idea that an entire directory tree is 
encrypted by one key.  It can't be snapshotted again without the key, 
and it just fits with the rest of the btrfs management code.  I do want 
to support the existing vfs interfaces as well too though.

>>
>> As far as using the fs/crypto infrastructure, it's actually pretty
>> simple.  The file system needs to provide a flag indicating whether or
>> not the file is encrypted, and support extended attributes.  When you
>> create an inode in an encrypted directory, you call
>> fscrypt_inherit_context() and the fscrypto layer will take care of
>> creating the necessary xattr for the per-inode key.  When you need open
>> a encrypted file, or operate on an encrypted inode, you call
>> fscrypt_get_encryption_info() on the inode.  The per-inode encryption
>> key is cached in the i_crypt_info structure, which hangs off of the
>> struct inode.
>
> When someone says "pretty simple" regarding cryptography, it's often
> neither pretty nor simple :P
>
> The issue, here, is that inodes are fundamentally not a safe scope to
> attach that information to in btrfs. As extents can be shared between
> inodes (and thus both will need to decrypt them), and inodes can be
> duplicated unmodified (snapshots), attaching keys and nonces to inodes
> opens up a whole host of (possibly insoluble) issues, including
> catastrophic nonce reuse via writable snapshots.

I'm going to have to read harder about nonce reuse.  In btrfs an inode 
is really a pair [ root id, inode number ], so strictly speaking two 
writable snapshots won't have the same inode in memory and when a 
snapshot is modified we'd end up with a different nonce for the new 
modifications.

This would lead to a chain, where reading an single modified file in a 
snapshot might require multiple different keys.  The btrfs metadata has 
what it needs to look these things up in the readpage call, but it ends 
up being much closer to per-extent encryption.

>
>> When you write to an encrypted file, you call fscrypt_encrypt_page(),
>> which returns a struct page with the encrypted contents to be written.
>> After the write is completed (or in the error case), you call
>> fscrypt_restore_control_page() to release encrypted page.
>>
>> To read from an encrypted page, you call fscrypt_get_ctx() to get an
>> encryption context, which gets stashed in the bio's bi_private pointer.
>> (If btrfs is already using bi_private, then you'll need to add a field
>> in the structure which hangs off of bi_private to stash the encryption
>> context.)  After the read completes, you call
>> fscrypt_decrypt_bio_pages() to decrypt all of the pages read as part of
>> the read/write operation.
>>
>> It's actually relatively straightforward to use.  If you have any
>> questions please feel free to ask on linux-fsdevel.
>
> Straightforward to use, yes. Straightforward to use _securely_ with btrfs
> in particular, undetermined.
>
>> As far as poeple commenting that it might be better to encrypt on the
>> extent level --- the reason why we didn't chose that path is because
>> while it does make it easier to do authenticated encryption modes, the
>> downside is that you can only do the data integrity check if you read in
>> the entire extent.  This has obvious memory utilization impacts and will
>> also impact your 4k random read/write performance.
>
> It does, yes, but btrfs already does this with compression. I'd suggest
> that's a pretty good argument  for it being at least potentially
> acceptable.
>

For compression we solve this by limiting the size of the uncompressed 
result that may need to be read.  It's not perfect but it does work.

>> We do have a solution in mind to solve the authenticated encryption
>> problem; in fact, an intern has recently finished a prototype using
>> Authenticated Skip Lists[5][6].  Hopefully we'll be able to get some
>> patches for review in the near future.
>>
>> [5] https://urldefense.proofpoint.com/v2/url?u=http-3A__cs.brown.edu_cgc_stms_papers_hashskip.pdf&d=DQIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P4Q4NcJ-vo0OUlmzW07AtoaRs7dVgu2TKtX-DObpPNo&s=Rl42YmL7hS10ZRrW-yrRqpxs5FQNGmhKpGK0Z7nK2aM&e=  [6]
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__cs.brown.edu_cgc_stms_papers_discex2001.pdf&d=DQIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P4Q4NcJ-vo0OUlmzW07AtoaRs7dVgu2TKtX-DObpPNo&s=D6mXpj1ruLfq227rrhhjNns1gGsygMSOfKsjJGdE8lM&e=
>
> That's fascinating, and I'll have to read it in-depth later - thank you
> for the links!
>
>> One of the challenges with data integrity is that you need to be able to
>> update authentication data and the data blocks atomically, or else you
>> could end up breaking the file on a crash.  For ext4, we're going to
>> simply only support data integrity for those files which are written and
>> then closed, and if you crash, the file which is being written may not
>> have valid data integrity checksums.  This is good enough for many use
>> cases, since most files are not updated after they are initially
>> written.  Obviously, btrfs would be able to do much better since it has
>> COW properties.
>
> Makes sense.

-chris


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-20  0:12   ` Chris Mason
@ 2016-09-20  0:55     ` Anand Jain
  0 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-20  0:55 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs, dsterba



Hi Chris,

> Hi Anand,
>
> Thanks for sending these out.  It's an important feature and I'm glad to
> see it getting some love.
>
> Right now, the design doc is really the most important part.  The
> encryption side of things requires a layer of extra verification that we
> can't get from reviewing the code alone.  We need to sit down with the
> use cases and the workflow and make sure it meets all the requirements
> of a secure system.

  yes.

> I'm the first to admit this needs a broader consensus than just
> linux-btrfs@, and we want our reviewers to be able to go through things
> without also having to understand every line of btrfs.

  right.

> I do want to make sure that we can send/recv signed and encrypted
> subvolumes, and be able to send them again unmodified.  This will end up
> needing a revision of the send/recv protocol, but it adds a use case
> wrinkle that we need to iron out in the design docs.

  Exactly. I have this in the requirement list. So I was hesitant
  to use any page(ext4)/sector(truecrypt) related IV. BTRFS just
  uses random IV which can be accessed through xattributes. But
  certainly has crypto issues. Anyway I will cover this in the document
  for the review by crypto experts.

> I still prefer encryption at the subvolume level, but it's going to need
> to be a superset of what is already possible with the vfs interfaces. If
> we can do it with the existing interfaces, great.  If not we should be
> adding features into the generic code where it makes sense.

  Sure Chris. Thanks for your comments.

- Anand

> -chris




In this situation I was almost to drop the encrypted send/recv from my 
list, thanks for mentioning I shall keep that in the design so as of now 
as I did not use anything which is



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-19 22:31               ` Zygo Blaxell
@ 2016-09-20  1:10                 ` Zygo Blaxell
  0 siblings, 0 replies; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-20  1:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Alex Elsayed, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3196 bytes --]

On Mon, Sep 19, 2016 at 06:31:22PM -0400, Zygo Blaxell wrote:
> On Mon, Sep 19, 2016 at 04:25:55PM -0600, Chris Murphy wrote:
> > >> Files are modified by creating new extents (using parameters inherited
> > >> from the inode to fill in the extent attributes) and updating the inode
> > >> to refer to the new extent instead of the old one at the modified
> > >> offset. Cloned extents are references to existing extents associated
> > >> with a different inode or at a different place within the same inode (if
> > >> the extent is not compatible with the destination inode, clone fails
> > >> with an error).  A snapshot is an efficient way to clone an entire
> > >> subvol tree at once, including all inodes and attributes.
> > >
> > > There is the caveat of chattr +C, which would need hard-disabled for
> > > extent-level encryption (vs block level).
> > 
> > What about raid56 partial stripe writes? Aren't these effectively nocow?
> 
> Those are a straight-up bug that should be fixed.  They are mixing committed
> data with uncommitted data from two different transactions, and the stripe
> temporarily contains garbage.  Combine that with unclean shutdown in degraded
> mode and the data is gone.

A slightly more detailed answer:

nocow and raid56 partial stripe writes are different because nocow writes
won't corrupt unrelated extents, while raid56 partial stripe writes will.
They are entirely different classes of problem.

Even in non-degraded mode, an interrupted write to a modified stripe
is not recoverable from parity until after the parity is reconstructed
(e.g. by scrub or a later write to the stripe in non-degraded mode).

If one of the disks is significantly slower or has deeper queues than
the others, this could affect many extents, as btrfs could submit a
lot of writes to each disk and then wait for all the disks to finish
asynchronously, leaving a large time window for interruption.

If a disk fails after an unclean shutdown but before a scrub is complete,
data in all of the uncorrected stripes will be lost.  If the array enters
or is already in degraded mode during a write when an unclean shutdown
occurs, data will be lost immediately.

Users who don't scrub immediately after unclean shutdowns are sitting on
a ticking time bomb of corruption that explodes when a disk fails.

If this happens to data extents, only file data is lost.  If it happens
to metadata extents, the filesystem is severely damaged or destroyed
(more likely destroyed as the roots of the metadata trees are usually
the most recently written blocks).

mdadm avoids this by scrubbing immediately after an unclean shutdown
to minimize the vulnerable window (or using the new stripe journalling
feature), but it fails (causing severe filesystem damage) when there
are crashes in degraded mode.  ZFS avoids this using a combination of
dynamic stripe width to avoid failed devices and the ZIL journal.

The best thing to do is rework the raid56 layer (and probably some
higher layers in btrfs) until there are no further references to
raid56_rmw_stripe or async_rmw_stripe, then remove those functions and
never put them back.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-20  0:32     ` Chris Mason
@ 2016-09-20  2:47       ` Alex Elsayed
  2016-09-20  2:50       ` Theodore Ts'o
  1 sibling, 0 replies; 66+ messages in thread
From: Alex Elsayed @ 2016-09-20  2:47 UTC (permalink / raw)
  To: linux-btrfs

On Mon, 19 Sep 2016 20:32:34 -0400, Chris Mason wrote:

> On 09/19/2016 04:58 PM, Alex Elsayed wrote:

<snip>

>> When someone says "pretty simple" regarding cryptography, it's often
>> neither pretty nor simple :P
>>
>> The issue, here, is that inodes are fundamentally not a safe scope to
>> attach that information to in btrfs. As extents can be shared between
>> inodes (and thus both will need to decrypt them), and inodes can be
>> duplicated unmodified (snapshots), attaching keys and nonces to inodes
>> opens up a whole host of (possibly insoluble) issues, including
>> catastrophic nonce reuse via writable snapshots.
> 
> I'm going to have to read harder about nonce reuse.  In btrfs an inode
> is really a pair [ root id, inode number ], so strictly speaking two
> writable snapshots won't have the same inode in memory and when a
> snapshot is modified we'd end up with a different nonce for the new
> modifications.
> 
> This would lead to a chain, where reading an single modified file in a
> snapshot might require multiple different keys.  The btrfs metadata has
> what it needs to look these things up in the readpage call, but it ends
> up being much closer to per-extent encryption.

For reading about nonce reuse (and nonce-misuse-resistant AEAD), the best 
option to start with is likely Hoang, Krovetz, and Rogaway's "Robust 
Authenticated Encryption: AEZ and the problem it solves"

https://eprint.iacr.org/2014/793

For one of the first such schemes, it's likely of interest to read about 
SIV (Rogaway and Shrimpton):

http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/siv/
siv.pdf

A variant of SIV that can be efficiently realized using the same hardware 
acceleration as AES-GCM is AES-GCM-SIV (Gueron, Lindell):

https://eprint.iacr.org/2015/102

And for information on how catastrophic _ever_ reusing the same (nonce, 
key) pair is with plain GCM:

Nonce-Disrespecting Adversaries: Practical Forgery Attacks on GCM in TLS
(Böck, Zauner, Devlin, Somorovsky, Jovanovic)
https://eprint.iacr.org/2016/475

(The same applies to ChaCha20-Poly1305, and the vast majority of other 
AEADs that lack nonce-misuse-resistance).


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-20  0:32     ` Chris Mason
  2016-09-20  2:47       ` Alex Elsayed
@ 2016-09-20  2:50       ` Theodore Ts'o
  2016-09-20  3:05         ` Alex Elsayed
                           ` (2 more replies)
  1 sibling, 3 replies; 66+ messages in thread
From: Theodore Ts'o @ 2016-09-20  2:50 UTC (permalink / raw)
  To: Chris Mason; +Cc: Alex Elsayed, linux-btrfs

On Mon, Sep 19, 2016 at 08:32:34PM -0400, Chris Mason wrote:
> > > That key is used to protect the contents of the data file, and to
> > > encrypt filenames and symlink targets --- since filenames can leak
> > > significant information about what the user is doing.  (For example,
> > > in the downloads directory of their web browser, leaking filenames is
> > > just as good as leaking part of their browsing history.)
> 
> One of the things that makes per-subvolume encryption attractive to me is
> that we're able to enforce the idea that an entire directory tree is
> encrypted by one key.  It can't be snapshotted again without the key, and it
> just fits with the rest of the btrfs management code.  I do want to support
> the existing vfs interfaces as well too though.

One of the main reasons for doing fs-level encryption is so you can
allow multiple users to have different keys.  In some cases you can
assume that different users will be in different distinct subvolumes
(e.g., each user has their own home directory), but that's not always
going to be possible.

One of the other things that was in the original design, but which got
dropped in our initial implementation, was the concept of having the
per-inode key wrapped by multiple user keys.  This would allow a file
to be accessible by more than one user.  So something to consider is
that there may very well be situations where you *want* to have more
than one key associated with a directory hierarchy.

> > The issue, here, is that inodes are fundamentally not a safe scope to
> > attach that information to in btrfs. As extents can be shared between
> > inodes (and thus both will need to decrypt them), and inodes can be
> > duplicated unmodified (snapshots), attaching keys and nonces to inodes
> > opens up a whole host of (possibly insoluble) issues, including
> > catastrophic nonce reuse via writable snapshots.
> 
> I'm going to have to read harder about nonce reuse.  In btrfs an inode is
> really a pair [ root id, inode number ], so strictly speaking two writable
> snapshots won't have the same inode in memory and when a snapshot is
> modified we'd end up with a different nonce for the new modifications.

Nonce reuse is not necessrily catastrophic.  It all depends on the
context.  In the case of Counter or GCM mode, nonce (or IV) reuse is
absolutely catastrophic.  It must *never* be done or you completely
lose all security.  As the Soviets discovered the hard way courtesy of
the Venona project (well, they didn't discover it until after they
lost the cold war, but...) one time pads are completely secure.
Two-time pads, are most emphatically _not_.  :-)

In the case of the nonces used in fscrypt's key derivation, reuse of
the nonce basically means that two files share the same key.  Assuming
you're using a competently designed block cipher (e.g., AES), reuse of
the key is not necessarily a problem.  What it would mean is that two
files which are are reflinked would share the same key.  And if you
have writable snapshots, that's definitely not a problem, since with
AES we use the a fixed key and a fixed IV given a logical block
number, and we can do block overwrites without having to guarantee
unique nonces (which you *do* need to worry about if you use counter
mode or some other stream cipher such as ChaCha20 --- Kent Overstreet
had some clever tricks to avoid IV reuse since he used a stream cipher
in his proposed bcachefs encryption).

The main issue is if you want to reflink a file and then have the two
files have different permissions / ownerships.  In that case, you
really want to use different keys for user A and for user B --- but if
you are assuming a single key per subvolume, you can't support
different keys for different users anyway, so you're kind of toast for
that use case in any case.

So in any case, assuming you're using block encryption (which is what
fscrypt uses) there really isn't a problem with nonce reuse, although
in some cases if you really do want to reflink a file and have it be
protected by different user keys, this would have to force copy of the
duplicated blocks at that point.  But arguably, that is a feature, not
a bug.  If the two users are mutually suspicious, you don't _want_ to
leak information about who much of a particular file had been changed
by a particular user.  So you would want to break the reflink and have
separate copies for both users anyway.


One final thought --- something which is really going to be a factor
in many use cases is going to be hardware accelerated encryption.  For
example, Qualcomm is already shipping an SOC where the encryption can
be done in the data path between the CPU and the eMMC storage device.
If you want to support certain applications that try to read megabytes
and megabytes of data before painting a single pixel, in-line hardware
crypto at line speeds is going to be critical if you don't want to
sacrifice performance, and keep users from being cranky because it
took extra seconds before they could start reading their news feed (or
saving bird eggs from voracious porcine predators, or whatever).

This may very well be an issue in the future not just for mobile
devices, but I could imagine this potentially being an issue for other
form factors as well.  Yes, Skylake can encrypt multiple bytes per
clock cycle using the miracles of hardware acceleration and
pipelining.  But in-line encryption will still have the advantage of
avoiding the memory bandwidth costs.  So while it is fun to talk about
exotic encryption modes, it would be wise to have file system
encryption architectures to have modes which are compatible with
hardware in-line encryption schemes.

This is also why I'm not all that excited by Kent's work trying to
implement fast encryption using a stream cipher such as Chacha20.
Technically, it's interesting, sure.  But on most modern systems, you
will either have really really good AES acceleration (any recent x86
system), or you will probably have at your disposal a hardware in-line
cyptographic engine (ICE) that is going to be way faster than Chacha20
implemented in software, and it means you don't have to go to extreme
lengths to avoid never reusing a nonce or risk losing all security
guarantees.  Block ciphers are much safer, and with hardware support,
any speed advantage of using a stream cipher disappears; indeed, a
stream cipher in software will be slower than a hardware accelerated
block cipher.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-20  2:50       ` Theodore Ts'o
@ 2016-09-20  3:05         ` Alex Elsayed
  2016-09-20  4:09         ` Zygo Blaxell
  2016-09-20 15:44         ` Chris Mason
  2 siblings, 0 replies; 66+ messages in thread
From: Alex Elsayed @ 2016-09-20  3:05 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Chris Mason, linux-btrfs

Taking a stab at a different way of replying, to try and keep Ted in the loop.

On Monday, 19 September 2016 22:50:41 PDT Theodore Ts'o wrote:
> On Mon, Sep 19, 2016 at 08:32:34PM -0400, Chris Mason wrote:
> > > > That key is used to protect the contents of the data file, and to
> > > > encrypt filenames and symlink targets --- since filenames can leak
> > > > significant information about what the user is doing.  (For example,
> > > > in the downloads directory of their web browser, leaking filenames is
> > > > just as good as leaking part of their browsing history.)
> > 
> > One of the things that makes per-subvolume encryption attractive to me is
> > that we're able to enforce the idea that an entire directory tree is
> > encrypted by one key.  It can't be snapshotted again without the key, and
> > it just fits with the rest of the btrfs management code.  I do want to
> > support the existing vfs interfaces as well too though.
> 
> One of the main reasons for doing fs-level encryption is so you can
> allow multiple users to have different keys.  In some cases you can
> assume that different users will be in different distinct subvolumes
> (e.g., each user has their own home directory), but that's not always
> going to be possible.

Mm, that's definitely something to keep in mind.

> One of the other things that was in the original design, but which got
> dropped in our initial implementation, was the concept of having the
> per-inode key wrapped by multiple user keys.  This would allow a file
> to be accessible by more than one user.  So something to consider is
> that there may very well be situations where you *want* to have more
> than one key associated with a directory hierarchy.

Makes sense.

> > > The issue, here, is that inodes are fundamentally not a safe scope to
> > > attach that information to in btrfs. As extents can be shared between
> > > inodes (and thus both will need to decrypt them), and inodes can be
> > > duplicated unmodified (snapshots), attaching keys and nonces to inodes
> > > opens up a whole host of (possibly insoluble) issues, including
> > > catastrophic nonce reuse via writable snapshots.
> > 
> > I'm going to have to read harder about nonce reuse.  In btrfs an inode is
> > really a pair [ root id, inode number ], so strictly speaking two writable
> > snapshots won't have the same inode in memory and when a snapshot is
> > modified we'd end up with a different nonce for the new modifications.
> 
> Nonce reuse is not necessrily catastrophic.  It all depends on the
> context.  In the case of Counter or GCM mode, nonce (or IV) reuse is
> absolutely catastrophic.  It must *never* be done or you completely
> lose all security.  As the Soviets discovered the hard way courtesy of
> the Venona project (well, they didn't discover it until after they
> lost the cold war, but...) one time pads are completely secure.
> Two-time pads, are most emphatically _not_.  :-)

Aaaand now I wish I'd seen this before I sent my Big Ol' Mail Full of 
References to Chris, so I could have tried this and kept you on CC.

> In the case of the nonces used in fscrypt's key derivation, reuse of
> the nonce basically means that two files share the same key.  Assuming
> you're using a competently designed block cipher (e.g., AES), reuse of
> the key is not necessarily a problem.  What it would mean is that two
> files which are are reflinked would share the same key.  And if you
> have writable snapshots, that's definitely not a problem, since with
> AES we use the a fixed key and a fixed IV given a logical block
> number, and we can do block overwrites without having to guarantee
> unique nonces (which you *do* need to worry about if you use counter
> mode or some other stream cipher such as ChaCha20 --- Kent Overstreet
> had some clever tricks to avoid IV reuse since he used a stream cipher
> in his proposed bcachefs encryption).

Er, not quite on the "safe" bit - part of the problem is that without going 
AEAD, you lose out on a good bit of security relative to GCM without reusing 
nonces. The reason (say) EME or CMC are safe for block-overwrite is actually 
_not_ that they're block ciphers - it's that they implement a security notion 
called SPRP, Strong Pseudorandom Permutation, which has a direct equivalence 
with misuse-resistant AEAD. XTS _not_ meeting that is in fact exactly why it's 
not as strong.

If you take an SPRP, reserve `p` bits at the end for zeroes, fill the rest with 
your message, and encrypt it, the result is _exactly_ a misuse-resistant AEAD 
with `p`-bit integrity. Modern misuse-resistant AEADs differ from EME and CMC 
only in 1.) efficiency and 2.) supporting variable-length messages.

> The main issue is if you want to reflink a file and then have the two
> files have different permissions / ownerships.  In that case, you
> really want to use different keys for user A and for user B --- but if
> you are assuming a single key per subvolume, you can't support
> different keys for different users anyway, so you're kind of toast for
> that use case in any case.

Mm.

> So in any case, assuming you're using block encryption (which is what
> fscrypt uses) there really isn't a problem with nonce reuse, although
> in some cases if you really do want to reflink a file and have it be
> protected by different user keys, this would have to force copy of the
> duplicated blocks at that point.  But arguably, that is a feature, not
> a bug.  If the two users are mutually suspicious, you don't _want_ to
> leak information about who much of a particular file had been changed
> by a particular user.  So you would want to break the reflink and have
> separate copies for both users anyway.

Agreed.

> One final thought --- something which is really going to be a factor
> in many use cases is going to be hardware accelerated encryption.  For
> example, Qualcomm is already shipping an SOC where the encryption can
> be done in the data path between the CPU and the eMMC storage device.
> If you want to support certain applications that try to read megabytes
> and megabytes of data before painting a single pixel, in-line hardware
> crypto at line speeds is going to be critical if you don't want to
> sacrifice performance, and keep users from being cranky because it
> took extra seconds before they could start reading their news feed (or
> saving bird eggs from voracious porcine predators, or whatever).

I heavily recommend reading the AES-GCM-SIV paper from my response to Chris - 
it uses exactly the same hardware acceleration as GCM, but achieves nonce-
misuse-resistance. Less than one cycle per byte, too.

> This may very well be an issue in the future not just for mobile
> devices, but I could imagine this potentially being an issue for other
> form factors as well.  Yes, Skylake can encrypt multiple bytes per
> clock cycle using the miracles of hardware acceleration and
> pipelining.  But in-line encryption will still have the advantage of
> avoiding the memory bandwidth costs.  So while it is fun to talk about
> exotic encryption modes, it would be wise to have file system
> encryption architectures to have modes which are compatible with
> hardware in-line encryption schemes.

Considering AES-GCM-SIV is being heavily considered for use in TLS 1.3, that 
may well be viable.

> This is also why I'm not all that excited by Kent's work trying to
> implement fast encryption using a stream cipher such as Chacha20.
> Technically, it's interesting, sure.  But on most modern systems, you
> will either have really really good AES acceleration (any recent x86
> system), or you will probably have at your disposal a hardware in-line
> cyptographic engine (ICE) that is going to be way faster than Chacha20
> implemented in software, and it means you don't have to go to extreme
> lengths to avoid never reusing a nonce or risk losing all security
> guarantees.  Block ciphers are much safer, and with hardware support,
> any speed advantage of using a stream cipher disappears; indeed, a
> stream cipher in software will be slower than a hardware accelerated
> block cipher.

I agree regarding ChaCha20 (The cases it's good for - devices without AES - 
are already fading, with deep embedded using AES-CTR-CCM and mobile gaining 
AES-GCM accel), but I really think that nonce-misuse-resistant AEAD is going 
to be incredibly important to keep in mind.

In crypto, it's far more often a subtle tool misapplied that causes problems 
than anything else - and both non-AEAD (due to CCA2) and nonce-dependent AEAD 
(due to nonce misuse catastrophes) are subtle tools indeed. Nonce-misuse-
resistant AEAD is a much less subtle tool.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-19 15:15 ` Experimental btrfs encryption Theodore Ts'o
  2016-09-19 20:58   ` Alex Elsayed
@ 2016-09-20  4:05   ` Anand Jain
  1 sibling, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-20  4:05 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-btrfs, clm, dsterba


Hi Ted,

  I appreciate your email, thanks for taking time to review.


  Before I wrote the current version, I had a version which used
  fs/crypto. BTRFS needs a highly scalable solution, I am experimenting
  and evaluating, integration with fs/crypto is needed, I will discuss
  that in the fsdevel ML when its appropriate.


  Data center storage requisites and Scalability is very important
  to BTRFS. Based on my experiences with the engineered storage(s),
  of course I do have the requirement list in the background for
  this project. Chris mentioned one of it.


  Yes, we need crypto experts to review the document or in other words
  its not integration-ready unless crypto experts has reviewed/approved.

  Thanks. I shall cc you for further development on this.

Regds, -Anand


On 09/19/2016 11:15 PM, Theodore Ts'o wrote:
> (I'm not on linux-btrfs@, so please keep me on the cc list.  Or
> perhpas better yet, maybe we can move discussion to the linux-fsdevel@
> list.)
>
> Hi Anand,
>
> After reading this thread on the web archives, and seeing that some
> folks seem to be a bit confused about "vfs level crypto", fs/crypto,
> and ext4/f2fs encryption, I thought I would give a few comments.
>
> First of all, these are all the same thing.  Initially ext4 encryption
> was implemented targetting ChromeOS as the initial customer, and as a
> replacement for ecryptfs.  Folks have already pointed you at the
> design document[1].  Also of interest is the is the 2015 Linux
> Security Symposium slides set[2].  The first deployed use of this was
> for Android N's File-based Encryption and Direct boot[3]; a technical
> description which left out some of the product details (since LSS 2016
> was before the Android N release) can be found at the 2016 LSS
> slides[4].
>
> [1] https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/preview
> [2] http://kernsec.org/files/lss2014/Halcrow_EXT4_Encryption.pdf
> [3] https://android.googleblog.com/2016/08/android-70-nougat-more-powerful-os-made.html
> [4] http://kernsec.org/files/lss2015/halcrow.pdf
>
> The other thing that perhaps would be worth noting is that Michael
> Halcrow started this as an encryption/security expert who had dabbled
> in file systems, while I was someone for whom encryption/security is a
> hobby (although in a previous life I was the tech lead for Kerberos
> and chaired the IPSEC working group) who was a file system expert.  In
> order to do file system security well, you need people who are well
> versed in both discplines working together.
>
> With all due respect, the fact that you chose counter mode and how use
> used it pretty clearly demonstrates that you would be well advised to
> find someone who is a crypto expert to collaborate with you --- or use
> the fs/crypto framework since it was designed and vetted by multiple
> crypto experts as well as file system experts.
>
> Having someone who is a product manager who can discuss with you
> specific goals is also important, because there are lots of tradeoffs
> and lots of design choices ---- and so what you chose to do is (or at
> least should be!)  very much dependent on your threat model, who is
> planning on using the feature, what you can and can not upon via-a-vis
> hardware support, performance requirements, and so on.
>
>
> Secondly, in terms of how it all works.  Each user as a "master key"
> which is stored on a keyring.  We use a hash of the key to serve as
> the key identifier, and associated with each inode we store a nonce (a
> random unique string) and the key identifier.  We use the nonce and
> the user's master key to generate a unique key for that inode.
>
> That key is used to protect the contents of the data file, and to
> encrypt filenames and symlink targets --- since filenames can leak
> significant information about what the user is doing.  (For example,
> in the downloads directory of their web browser, leaking filenames is
> just as good as leaking part of their browsing history.)
>
> As far as using the fs/crypto infrastructure, it's actually pretty
> simple.  The file system needs to provide a flag indicating whether or
> not the file is encrypted, and support extended attributes.  When you
> create an inode in an encrypted directory, you call
> fscrypt_inherit_context() and the fscrypto layer will take care of
> creating the necessary xattr for the per-inode key.  When you need
> open a encrypted file, or operate on an encrypted inode, you call
> fscrypt_get_encryption_info() on the inode.  The per-inode encryption
> key is cached in the i_crypt_info structure, which hangs off of the
> struct inode.
>
> When you write to an encrypted file, you call fscrypt_encrypt_page(),
> which returns a struct page with the encrypted contents to be written.
> After the write is completed (or in the error case), you call
> fscrypt_restore_control_page() to release encrypted page.
>
> To read from an encrypted page, you call fscrypt_get_ctx() to get an
> encryption context, which gets stashed in the bio's bi_private
> pointer.  (If btrfs is already using bi_private, then you'll need to
> add a field in the structure which hangs off of bi_private to stash
> the encryption context.)  After the read completes, you call
> fscrypt_decrypt_bio_pages() to decrypt all of the pages read as part
> of the read/write operation.
>
> It's actually relatively straightforward to use.  If you have any
> questions please feel free to ask on linux-fsdevel.
>
>
> As far as poeple commenting that it might be better to encrypt on the
> extent level --- the reason why we didn't chose that path is because
> while it does make it easier to do authenticated encryption modes, the
> downside is that you can only do the data integrity check if you read
> in the entire extent.  This has obvious memory utilization impacts and
> will also impact your 4k random read/write performance.
>
> We do have a solution in mind to solve the authenticated encryption
> problem; in fact, an intern has recently finished a prototype using
> Authenticated Skip Lists[5][6].  Hopefully we'll be able to get some
> patches for review in the near future.
>
> [5] http://cs.brown.edu/cgc/stms/papers/hashskip.pdf
> [6] http://cs.brown.edu/cgc/stms/papers/discex2001.pdf
>
> One of the challenges with data integrity is that you need to be able
> to update authentication data and the data blocks atomically, or else
> you could end up breaking the file on a crash.  For ext4, we're going
> to simply only support data integrity for those files which are
> written and then closed, and if you crash, the file which is being
> written may not have valid data integrity checksums.  This is good
> enough for many use cases, since most files are not updated after they
> are initially written.  Obviously, btrfs would be able to do much
> better since it has COW properties.
>
> Cheers,
>
> 							- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-20  2:50       ` Theodore Ts'o
  2016-09-20  3:05         ` Alex Elsayed
@ 2016-09-20  4:09         ` Zygo Blaxell
  2016-09-20 15:44         ` Chris Mason
  2 siblings, 0 replies; 66+ messages in thread
From: Zygo Blaxell @ 2016-09-20  4:09 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Chris Mason, Alex Elsayed, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3956 bytes --]

On Mon, Sep 19, 2016 at 10:50:41PM -0400, Theodore Ts'o wrote:
> On Mon, Sep 19, 2016 at 08:32:34PM -0400, Chris Mason wrote:
> One of the other things that was in the original design, but which got
> dropped in our initial implementation, was the concept of having the
> per-inode key wrapped by multiple user keys.  This would allow a file
> to be accessible by more than one user.  So something to consider is
> that there may very well be situations where you *want* to have more
> than one key associated with a directory hierarchy.

That can get very complicated very quickly, unless you push the keys
off to one side and make them objects with separate lifetimes from the
files and subvols that use them by reference.  Any problem can be made
different by another layer of indirection.

> The main issue is if you want to reflink a file and then have the two
> files have different permissions / ownerships.  In that case, you
> really want to use different keys for user A and for user B --- but if
> you are assuming a single key per subvolume, you can't support
> different keys for different users anyway, so you're kind of toast for
> that use case in any case.

The gotcha there is that reflink file copies are just a special case
of shared extent refs in which all the individual extents in a file are
reflinked at once, but that's not the only case (or even a common one).

Currently any extent in the filesystem can be shared by any inode in
the filesystem (assuming the two inodes have compatible attributes,
which could include encryption policy), including multiple references
from the same inode to the same extent at different logical offsets.
This is the basis of the deduplication and copy_file_range features.

This confuses the VFS caching layer when dealing with deduped reflinked,
or snapshotted files.  It's not surprising that VFS crypto has problems
coping with it as well.

It's much more natural for btrfs to attach nonces to the extents rather
than the inodes, and even put references to keys on the extents as well.
Key references could be inherited from the inode (directory, parent,
subvol, wherever you want to put them) that was used to create the extent,
the same way extents inherit their other attributes from inodes now.

> So in any case, assuming you're using block encryption (which is what
> fscrypt uses) there really isn't a problem with nonce reuse, although
> in some cases if you really do want to reflink a file and have it be
> protected by different user keys, this would have to force copy of the
> duplicated blocks at that point.  But arguably, that is a feature, not
> a bug.  If the two users are mutually suspicious, you don't _want_ to
> leak information about who much of a particular file had been changed
> by a particular user.  So you would want to break the reflink and have
> separate copies for both users anyway.

It would probably be most naturally implemented as not allowing the
reflink in the first place, or not allowing a key change on a non-empty
file (the same way that attributes like nodatasum and nodatacow are
implemented).  'cp' would then have to fall back to a brute-force copy.
Cloning reflinks after the fact would be a radical change of direction
for btrfs.

This does create all sorts of interesting interactions with snapshots.
What happens if you remove a user's key to a file in a snapshot?  If the
key is embedded in an inode, only one snapshot is affected, but if the
key is stored separately by reference, it could revoke access to files
in all the snapshots at once.

The only information I know of that one (non-root) user gets about
modifications to the other user's reflink file is the SHARED bit in
FIEMAP, which goes from 1 to 0 when the user holds the last reference
to the file.  That could simply be forced to always be 1 if the extent
is encrypted so it doesn't leak information.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] Preliminary BTRFS Encryption
  2016-09-17 18:45       ` David Sterba
@ 2016-09-20 14:26         ` Anand Jain
  0 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-20 14:26 UTC (permalink / raw)
  To: David Sterba; +Cc: Zygo Blaxell, Alex Elsayed, linux-btrfs




Hi David,

On 09/18/2016 02:45 AM, David Sterba wrote:
> On Sat, Sep 17, 2016 at 12:38:30AM -0400, Zygo Blaxell wrote:
>> There's also a nasty problem with the extent tree--there's only one per
>> filesystem, it's shared between all subvols and block groups, and every
>> extent in that tree has back references to the (possibly encrypted) subvol
>> trees.  I'll leave that problem as an exercise for other readers.  ;)
>
> A design point that I'm not mentioning for the first time: there would
> be per-subvolume group extent trees, ie. a set of subvolumes with
> attached extent tree where similar to what we have now. So, encrypted
> and unencrypted extent metadata will never be mixed.
> (the crypto key questions are not addressed here)
>
> This hasn't been implemented but I'm making sure this will be possible
> when somebody mentions changes to the extent tree or blockgroup reworks
> (to actually solve other problems).

  Now I remember this was told before, sorry this slipped my mind.

Thanks, Anand


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-20  2:50       ` Theodore Ts'o
  2016-09-20  3:05         ` Alex Elsayed
  2016-09-20  4:09         ` Zygo Blaxell
@ 2016-09-20 15:44         ` Chris Mason
  2016-09-21 13:52           ` Anand Jain
  2 siblings, 1 reply; 66+ messages in thread
From: Chris Mason @ 2016-09-20 15:44 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Alex Elsayed, linux-btrfs



On 09/19/2016 10:50 PM, Theodore Ts'o wrote:
> On Mon, Sep 19, 2016 at 08:32:34PM -0400, Chris Mason wrote:
>>>> That key is used to protect the contents of the data file, and to
>>>> encrypt filenames and symlink targets --- since filenames can leak
>>>> significant information about what the user is doing.  (For example,
>>>> in the downloads directory of their web browser, leaking filenames is
>>>> just as good as leaking part of their browsing history.)
>>
>> One of the things that makes per-subvolume encryption attractive to me is
>> that we're able to enforce the idea that an entire directory tree is
>> encrypted by one key.  It can't be snapshotted again without the key, and it
>> just fits with the rest of the btrfs management code.  I do want to support
>> the existing vfs interfaces as well too though.
>
> One of the main reasons for doing fs-level encryption is so you can
> allow multiple users to have different keys.  In some cases you can
> assume that different users will be in different distinct subvolumes
> (e.g., each user has their own home directory), but that's not always
> going to be possible.
>

Agreed, they are just different use cases.  I think both are important, 
and btrfs won't do encryption without the file-level option.

> One of the other things that was in the original design, but which got
> dropped in our initial implementation, was the concept of having the
> per-inode key wrapped by multiple user keys.  This would allow a file
> to be accessible by more than one user.  So something to consider is
> that there may very well be situations where you *want* to have more
> than one key associated with a directory hierarchy.
>
>>> The issue, here, is that inodes are fundamentally not a safe scope to
>>> attach that information to in btrfs. As extents can be shared between
>>> inodes (and thus both will need to decrypt them), and inodes can be
>>> duplicated unmodified (snapshots), attaching keys and nonces to inodes
>>> opens up a whole host of (possibly insoluble) issues, including
>>> catastrophic nonce reuse via writable snapshots.
>>
>> I'm going to have to read harder about nonce reuse.  In btrfs an inode is
>> really a pair [ root id, inode number ], so strictly speaking two writable
>> snapshots won't have the same inode in memory and when a snapshot is
>> modified we'd end up with a different nonce for the new modifications.
>
> Nonce reuse is not necessrily catastrophic.  It all depends on the
> context.  In the case of Counter or GCM mode, nonce (or IV) reuse is
> absolutely catastrophic.  It must *never* be done or you completely
> lose all security.  As the Soviets discovered the hard way courtesy of
> the Venona project (well, they didn't discover it until after they
> lost the cold war, but...) one time pads are completely secure.
> Two-time pads, are most emphatically _not_.  :-)
>
> In the case of the nonces used in fscrypt's key derivation, reuse of
> the nonce basically means that two files share the same key.  Assuming
> you're using a competently designed block cipher (e.g., AES), reuse of
> the key is not necessarily a problem.  What it would mean is that two
> files which are are reflinked would share the same key.  And if you
> have writable snapshots, that's definitely not a problem, since with
> AES we use the a fixed key and a fixed IV given a logical block
> number, and we can do block overwrites without having to guarantee
> unique nonces (which you *do* need to worry about if you use counter
> mode or some other stream cipher such as ChaCha20 --- Kent Overstreet
> had some clever tricks to avoid IV reuse since he used a stream cipher
> in his proposed bcachefs encryption).
>
> The main issue is if you want to reflink a file and then have the two
> files have different permissions / ownerships.  In that case, you
> really want to use different keys for user A and for user B --- but if
> you are assuming a single key per subvolume, you can't support
> different keys for different users anyway, so you're kind of toast for
> that use case in any case.

So there's a matrix of possible configurations.  If you're doing a 
reflink between subvolumes and you're doing a subvolume granular 
encryption and you don't have keys to the source subvolume, the reflink 
shouldn't be allowed.  If you do have keys, any new writes are happening 
into a different inode, and will be encrypted with a different key.

If you're doing a file level encryption and you do have access to the 
source file, the destination file is a new inode.  Thanks to COW any 
changes are going to go into new extents and will end up with different 
keys/nonces.

Either way, we degrade down into extent based encryption.  I'd take that 
hit to maintain sane semantics in the face of snapshots and reflinks. 
The btrfs extent structures on disk already have an encryption type field.

>
> So in any case, assuming you're using block encryption (which is what
> fscrypt uses) there really isn't a problem with nonce reuse, although
> in some cases if you really do want to reflink a file and have it be
> protected by different user keys, this would have to force copy of the
> duplicated blocks at that point.  But arguably, that is a feature, not
> a bug.  If the two users are mutually suspicious, you don't _want_ to
> leak information about who much of a particular file had been changed
> by a particular user.  So you would want to break the reflink and have
> separate copies for both users anyway.
>
>
> One final thought --- something which is really going to be a factor
> in many use cases is going to be hardware accelerated encryption.  For
> example, Qualcomm is already shipping an SOC where the encryption can
> be done in the data path between the CPU and the eMMC storage device.
> If you want to support certain applications that try to read megabytes
> and megabytes of data before painting a single pixel, in-line hardware
> crypto at line speeds is going to be critical if you don't want to
> sacrifice performance, and keep users from being cranky because it
> took extra seconds before they could start reading their news feed (or
> saving bird eggs from voracious porcine predators, or whatever).
>
> This may very well be an issue in the future not just for mobile
> devices, but I could imagine this potentially being an issue for other
> form factors as well.  Yes, Skylake can encrypt multiple bytes per
> clock cycle using the miracles of hardware acceleration and
> pipelining.  But in-line encryption will still have the advantage of
> avoiding the memory bandwidth costs.  So while it is fun to talk about
> exotic encryption modes, it would be wise to have file system
> encryption architectures to have modes which are compatible with
> hardware in-line encryption schemes.
>

Strongly agree here.  This is the whole reason btrfs used crc32c, but 
times 100 (or maybe 1000).  I love that  Kent and others are 
experimenting in bcachefs and elsewhere.  Btrfs can always bring in new 
schemes that work well once the framework is in place, but its not an 
area where I have enough expertise to get exotic on the first try.

-chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Experimental btrfs encryption
  2016-09-20 15:44         ` Chris Mason
@ 2016-09-21 13:52           ` Anand Jain
  0 siblings, 0 replies; 66+ messages in thread
From: Anand Jain @ 2016-09-21 13:52 UTC (permalink / raw)
  To: Chris Mason; +Cc: Theodore Ts'o, Alex Elsayed, linux-btrfs



> So there's a matrix of possible configurations.  If you're doing a
> reflink between subvolumes and you're doing a subvolume granular
> encryption and you don't have keys to the source subvolume, the reflink
> shouldn't be allowed.

  Right, this is working.

> If you do have keys, any new writes are happening
> into a different inode, and will be encrypted with a different key.

  As of now it returns -EXDEV. But I should change this.

> If you're doing a file level encryption and you do have access to the
> source file, the destination file is a new inode.  Thanks to COW any
> changes are going to go into new extents and will end up with different
> keys/nonces.

> Either way, we degrade down into extent based encryption.  I'd take that
> hit to maintain sane semantics in the face of snapshots and reflinks.
> The btrfs extent structures on disk already have an encryption type field.

  Agreed. About keys.. a file might need N number of keys (N = number of
  extents) to open.

Thanks, Anand


>>
>> So in any case, assuming you're using block encryption (which is what
>> fscrypt uses) there really isn't a problem with nonce reuse, although
>> in some cases if you really do want to reflink a file and have it be
>> protected by different user keys, this would have to force copy of the
>> duplicated blocks at that point.  But arguably, that is a feature, not
>> a bug.  If the two users are mutually suspicious, you don't _want_ to
>> leak information about who much of a particular file had been changed
>> by a particular user.  So you would want to break the reflink and have
>> separate copies for both users anyway.
>>
>>
>> One final thought --- something which is really going to be a factor
>> in many use cases is going to be hardware accelerated encryption.  For
>> example, Qualcomm is already shipping an SOC where the encryption can
>> be done in the data path between the CPU and the eMMC storage device.
>> If you want to support certain applications that try to read megabytes
>> and megabytes of data before painting a single pixel, in-line hardware
>> crypto at line speeds is going to be critical if you don't want to
>> sacrifice performance, and keep users from being cranky because it
>> took extra seconds before they could start reading their news feed (or
>> saving bird eggs from voracious porcine predators, or whatever).
>>
>> This may very well be an issue in the future not just for mobile
>> devices, but I could imagine this potentially being an issue for other
>> form factors as well.  Yes, Skylake can encrypt multiple bytes per
>> clock cycle using the miracles of hardware acceleration and
>> pipelining.  But in-line encryption will still have the advantage of
>> avoiding the memory bandwidth costs.  So while it is fun to talk about
>> exotic encryption modes, it would be wise to have file system
>> encryption architectures to have modes which are compatible with
>> hardware in-line encryption schemes.
>>
>
> Strongly agree here.  This is the whole reason btrfs used crc32c, but
> times 100 (or maybe 1000).  I love that  Kent and others are
> experimenting in bcachefs and elsewhere.  Btrfs can always bring in new
> schemes that work well once the framework is in place, but its not an
> area where I have enough expertise to get exotic on the first try.
>
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2016-09-21 13:50 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-13 13:39 [RFC] Preliminary BTRFS Encryption Anand Jain
2016-09-13 13:39 ` [PATCH] btrfs: Encryption: Add btrfs encryption support Anand Jain
2016-09-13 14:12   ` kbuild test robot
2016-09-13 14:24   ` kbuild test robot
2016-09-13 16:10   ` kbuild test robot
2016-09-13 13:39 ` [PATCH 1/2] btrfs-progs: make wait_for_commit non static Anand Jain
2016-09-13 13:39 ` [PATCH 2/2] btrfs-progs: add encryption support Anand Jain
2016-09-13 13:39 ` [PATCH] fstests: btrfs: support encryption Anand Jain
2016-09-13 16:42 ` [RFC] Preliminary BTRFS Encryption Wilson Meier
2016-09-14  7:02   ` Anand Jain
2016-09-14 18:26     ` Wilson Meier
2016-09-15  4:53 ` Alex Elsayed
2016-09-15 11:33   ` Anand Jain
2016-09-15 11:47     ` Alex Elsayed
2016-09-16 11:35       ` Anand Jain
2016-09-15  5:38 ` Chris Murphy
2016-09-15 11:32   ` Anand Jain
2016-09-15 11:37 ` Austin S. Hemmelgarn
2016-09-15 14:06   ` Anand Jain
2016-09-15 14:24     ` Austin S. Hemmelgarn
2016-09-16  8:58       ` David Sterba
2016-09-17  2:18       ` Zygo Blaxell
2016-09-16  1:12 ` Dave Chinner
2016-09-16  5:47   ` Roman Mamedov
2016-09-16  6:49   ` Alex Elsayed
2016-09-17  4:38     ` Zygo Blaxell
2016-09-17  6:37       ` Alex Elsayed
2016-09-19 18:08         ` Zygo Blaxell
2016-09-19 20:01           ` Alex Elsayed
2016-09-19 22:22             ` Zygo Blaxell
2016-09-19 22:25             ` Chris Murphy
2016-09-19 22:31               ` Zygo Blaxell
2016-09-20  1:10                 ` Zygo Blaxell
2016-09-17 18:45       ` David Sterba
2016-09-20 14:26         ` Anand Jain
2016-09-16 10:45   ` Brendan Hide
2016-09-16 11:46   ` Anand Jain
2016-09-16  8:49 ` David Sterba
2016-09-16 11:56   ` Anand Jain
2016-09-17 20:35     ` David Sterba
2016-09-18  8:34       ` RAID1 availability issue[2], Hot-spare and auto-replace Anand Jain
2016-09-18 17:28         ` Chris Murphy
2016-09-18 17:34           ` Chris Murphy
2016-09-19  2:25           ` Anand Jain
2016-09-19 12:07             ` Austin S. Hemmelgarn
2016-09-19 12:25           ` Austin S. Hemmelgarn
2016-09-18  9:54       ` [RFC] Preliminary BTRFS Encryption Anand Jain
2016-09-20  0:12   ` Chris Mason
2016-09-20  0:55     ` Anand Jain
2016-09-17  6:58 ` Eric Biggers
2016-09-17  7:13   ` Alex Elsayed
2016-09-19 18:57     ` Zygo Blaxell
2016-09-19 19:50       ` Alex Elsayed
2016-09-19 22:12         ` Zygo Blaxell
2016-09-17 16:12   ` Anand Jain
2016-09-17 18:57     ` Chris Murphy
2016-09-19 15:15 ` Experimental btrfs encryption Theodore Ts'o
2016-09-19 20:58   ` Alex Elsayed
2016-09-20  0:32     ` Chris Mason
2016-09-20  2:47       ` Alex Elsayed
2016-09-20  2:50       ` Theodore Ts'o
2016-09-20  3:05         ` Alex Elsayed
2016-09-20  4:09         ` Zygo Blaxell
2016-09-20 15:44         ` Chris Mason
2016-09-21 13:52           ` Anand Jain
2016-09-20  4:05   ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.