linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize
@ 2018-05-22 16:00 Chandan Rajendra
  2018-05-22 16:00 ` [RFC PATCH V3 01/12] ext4: Clear BH_Uptodate flag on decryption error Chandan Rajendra
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:00 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

This patchset implements code to support encryption of Ext4 filesystem
instances that have blocksize less than pagesize. Ext4 code with this
patchset has been tested on both ppc64 and x86_64 machines. F2FS and
UBIFS were tested on x86_64.

TODO:
1. generic/233 fails with bigalloc mkfs option. This will be fixed in
   the next version of the patchset.

Changelog:
"RFC V2" -> "RFC V3":
1. mpage_readpage[s]() now has arguments to function pointers which
   decrypt the data after the read I/O operation gets completed. Due
   to these changes the code in fs/ext4/readpage.c isn't required
   anymore. Hence this patchset deletes that file.
2. Revert fscrypt_{encrypt,decrypt}_block functions back to
   fscrypt_{encrypt,decrypt}_page i.e. These functions now accept a
   complete memory page as an argument. But internally these functions
   now iterate over all the blocks mapped by the page. Since there
   were no changes in prototypes of these fscrypt APIs, there were no
   changes made to either F2FS or UBIFS code.
3. Address all the review comments provided by Eric Biggers.

"RFC V1" -> "RFC V2":
1. Ext4's "encryption aware" functionality in fs/ext4/readpage.c has
   been moved to fs/crypto/.
2. fscrypt functions have now been renamed to indicate that they work
   on blocks rather than pages.
   Eric, I have renamed completion_pages() to fscrypt_complete_pages()
   rather than to fscrypt_complete_blocks(). This is because we have a
   new function fscrypt_complete_block() (which operates on a single
   block) and IMHO having the identifier fscrypt_complete_blocks()
   which differs from it by just one letter would confuse the reader.
3. ext4_block_write_begin() now clears BH_Uptodate flag when
   decryption of boundary blocks fail.
4. fscrypt_encrypt_page() (now renamed to fscrypt_encrypt_block()) is
   now split into two functions. fscrypt_prep_ciphertext_page()
   allocates and initializes the fscrypt context and the bounce
   page. fscrypt_encrypt_block() is limited to encrypting the
   filesystem's block.
5. fscrypt_zeroout_range() has been updated to work on blocksize <
   pagesize scenario.
6. Documentation/filesystems/fscrypt.rst has been updated to indicate
   encryption support for blocksize < pagesize.
   
Chandan Rajendra (12):
  ext4: Clear BH_Uptodate flag on decryption error
  Rename fscrypt_do_page_crypto to fscrypt_do_block_crypto
  fscrypt_decrypt_page: Decrypt all blocks in a page
  __fscrypt_decrypt_bio: Fix page offset and len args to
    fscrypt_decrypt_page
  ext4: Decrypt all boundary blocks when doing buffered write
  ext4: Decrypt the block that needs to be partially zeroed
  mpage_readpage[s]: Introduce post process callback parameters
  fscrypt_zeroout_range: Encrypt all zeroed out blocks of a page
  fscrypt_encrypt_page: Encrypt all blocks mapped by a page
  ext4: Fix block number passed to fscrypt_encrypt_page
  ext4: Move encryption code into its own function
  ext4: Enable encryption for blocksize less than page size

 Documentation/filesystems/fscrypt.rst |  14 +-
 fs/block_dev.c                        |   5 +-
 fs/buffer.c                           | 298 ++++++++++++++++++++--------------
 fs/crypto/bio.c                       | 141 ++++++++++++++--
 fs/crypto/crypto.c                    |  44 +++--
 fs/crypto/fscrypt_private.h           |   2 +-
 fs/ext2/inode.c                       |   4 +-
 fs/ext4/Makefile                      |   2 +-
 fs/ext4/inode.c                       |  55 +++++--
 fs/ext4/page-io.c                     |  43 +++--
 fs/ext4/readpage.c                    | 294 ---------------------------------
 fs/ext4/super.c                       |   7 -
 fs/fat/inode.c                        |   4 +-
 fs/isofs/inode.c                      |   5 +-
 fs/mpage.c                            |  48 +++++-
 fs/xfs/xfs_aops.c                     |   4 +-
 include/linux/buffer_head.h           |   2 +-
 include/linux/fs.h                    |   4 +
 include/linux/fscrypt_notsupp.h       |  37 ++++-
 include/linux/fscrypt_supp.h          |  13 +-
 include/linux/mpage.h                 |   6 +-
 21 files changed, 522 insertions(+), 510 deletions(-)
 delete mode 100644 fs/ext4/readpage.c

-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 01/12] ext4: Clear BH_Uptodate flag on decryption error
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
@ 2018-05-22 16:00 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 02/12] Rename fscrypt_do_page_crypto to fscrypt_do_block_crypto Chandan Rajendra
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:00 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

On an error return from fscrypt_decrypt_page(), ext4_block_write_begin()
can return with the page's buffer_head marked with BH_Uptodate
flag. This commit clears the BH_Uptodate flag in such cases.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/ext4/inode.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 37a2f7a..5df4c7e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1231,9 +1231,13 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len,
 	}
 	if (unlikely(err))
 		page_zero_new_buffers(page, from, to);
-	else if (decrypt)
+	else if (decrypt) {
 		err = fscrypt_decrypt_page(page->mapping->host, page,
 				PAGE_SIZE, 0, page->index);
+		if (err)
+			clear_buffer_uptodate(*wait_bh);
+	}
+
 	return err;
 }
 #endif
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 02/12] Rename fscrypt_do_page_crypto to fscrypt_do_block_crypto
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
  2018-05-22 16:00 ` [RFC PATCH V3 01/12] ext4: Clear BH_Uptodate flag on decryption error Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 03/12] fscrypt_decrypt_page: Decrypt all blocks in a page Chandan Rajendra
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

fscrypt_do_page_crypto() encrypts one filesystem block rather than a
memory page. Hence this commit renames the function appropriately.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/crypto/bio.c             | 2 +-
 fs/crypto/crypto.c          | 8 ++++----
 fs/crypto/fscrypt_private.h | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 0959044..778b826 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -114,7 +114,7 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
 	}
 
 	while (len--) {
-		err = fscrypt_do_page_crypto(inode, FS_ENCRYPT, lblk,
+		err = fscrypt_do_block_crypto(inode, FS_ENCRYPT, lblk,
 					     ZERO_PAGE(0), ciphertext_page,
 					     PAGE_SIZE, 0, GFP_NOFS);
 		if (err)
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 0758d32..fac445d 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -133,7 +133,7 @@ struct fscrypt_ctx *fscrypt_get_ctx(const struct inode *inode, gfp_t gfp_flags)
 }
 EXPORT_SYMBOL(fscrypt_get_ctx);
 
-int fscrypt_do_page_crypto(const struct inode *inode, fscrypt_direction_t rw,
+int fscrypt_do_block_crypto(const struct inode *inode, fscrypt_direction_t rw,
 			   u64 lblk_num, struct page *src_page,
 			   struct page *dest_page, unsigned int len,
 			   unsigned int offs, gfp_t gfp_flags)
@@ -248,7 +248,7 @@ struct page *fscrypt_encrypt_page(const struct inode *inode,
 
 	if (inode->i_sb->s_cop->flags & FS_CFLG_OWN_PAGES) {
 		/* with inplace-encryption we just encrypt the page */
-		err = fscrypt_do_page_crypto(inode, FS_ENCRYPT, lblk_num, page,
+		err = fscrypt_do_block_crypto(inode, FS_ENCRYPT, lblk_num, page,
 					     ciphertext_page, len, offs,
 					     gfp_flags);
 		if (err)
@@ -269,7 +269,7 @@ struct page *fscrypt_encrypt_page(const struct inode *inode,
 		goto errout;
 
 	ctx->w.control_page = page;
-	err = fscrypt_do_page_crypto(inode, FS_ENCRYPT, lblk_num,
+	err = fscrypt_do_block_crypto(inode, FS_ENCRYPT, lblk_num,
 				     page, ciphertext_page, len, offs,
 				     gfp_flags);
 	if (err) {
@@ -308,7 +308,7 @@ int fscrypt_decrypt_page(const struct inode *inode, struct page *page,
 	if (!(inode->i_sb->s_cop->flags & FS_CFLG_OWN_PAGES))
 		BUG_ON(!PageLocked(page));
 
-	return fscrypt_do_page_crypto(inode, FS_DECRYPT, lblk_num, page, page,
+	return fscrypt_do_block_crypto(inode, FS_DECRYPT, lblk_num, page, page,
 				      len, offs, GFP_NOFS);
 }
 EXPORT_SYMBOL(fscrypt_decrypt_page);
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 4012558..71ac753 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -97,7 +97,7 @@ static inline bool fscrypt_valid_enc_modes(u32 contents_mode,
 /* crypto.c */
 extern struct kmem_cache *fscrypt_info_cachep;
 extern int fscrypt_initialize(unsigned int cop_flags);
-extern int fscrypt_do_page_crypto(const struct inode *inode,
+extern int fscrypt_do_block_crypto(const struct inode *inode,
 				  fscrypt_direction_t rw, u64 lblk_num,
 				  struct page *src_page,
 				  struct page *dest_page,
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 03/12] fscrypt_decrypt_page: Decrypt all blocks in a page
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
  2018-05-22 16:00 ` [RFC PATCH V3 01/12] ext4: Clear BH_Uptodate flag on decryption error Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 02/12] Rename fscrypt_do_page_crypto to fscrypt_do_block_crypto Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 04/12] __fscrypt_decrypt_bio: Fix page offset and len args to fscrypt_decrypt_page Chandan Rajendra
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

On BLOCKSIZE < PAGE SIZE systems, a page maps more than one filesystem
block. This commit changes fscrypt_decrypt_page() to iterate across and
decrypt all the blocks mapped by a page.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/crypto/crypto.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index fac445d..27509b1 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -305,11 +305,24 @@ EXPORT_SYMBOL(fscrypt_encrypt_page);
 int fscrypt_decrypt_page(const struct inode *inode, struct page *page,
 			unsigned int len, unsigned int offs, u64 lblk_num)
 {
+	unsigned long blocksize = inode->i_sb->s_blocksize;
+	unsigned int last = offs + len;
+	int ret;
+
 	if (!(inode->i_sb->s_cop->flags & FS_CFLG_OWN_PAGES))
 		BUG_ON(!PageLocked(page));
 
-	return fscrypt_do_block_crypto(inode, FS_DECRYPT, lblk_num, page, page,
-				      len, offs, GFP_NOFS);
+	while (offs < last) {
+		ret = fscrypt_do_block_crypto(inode, FS_DECRYPT, lblk_num, page,
+					page, blocksize, offs, GFP_NOFS);
+		if (ret)
+			return ret;
+
+		offs += blocksize;
+		++lblk_num;
+	}
+
+	return 0;
 }
 EXPORT_SYMBOL(fscrypt_decrypt_page);
 
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 04/12] __fscrypt_decrypt_bio: Fix page offset and len args to fscrypt_decrypt_page
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (2 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 03/12] fscrypt_decrypt_page: Decrypt all blocks in a page Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 05/12] ext4: Decrypt all boundary blocks when doing buffered write Chandan Rajendra
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

For block size < page size scenario, the block to be decrypted can be
mapped at a non-zero page offset and the length of the block is less
than page size. Hence this commit fixes the corresponding arguments sent
to fscrypt_decrypt_page().

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/crypto/bio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 778b826..32288c3 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -33,9 +33,15 @@ static void __fscrypt_decrypt_bio(struct bio *bio, bool done)
 
 	bio_for_each_segment_all(bv, bio, i) {
 		struct page *page = bv->bv_page;
-		int ret = fscrypt_decrypt_page(page->mapping->host, page,
-				PAGE_SIZE, 0, page->index);
+		struct inode *inode = page->mapping->host;
+		u64 blk;
+		int ret;
 
+		blk = page->index << (PAGE_SHIFT - inode->i_blkbits);
+		blk += bv->bv_offset >> inode->i_blkbits;
+
+		ret = fscrypt_decrypt_page(page->mapping->host, page,
+					bv->bv_len, bv->bv_offset, blk);
 		if (ret) {
 			WARN_ON_ONCE(1);
 			SetPageError(page);
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 05/12] ext4: Decrypt all boundary blocks when doing buffered write
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (3 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 04/12] __fscrypt_decrypt_bio: Fix page offset and len args to fscrypt_decrypt_page Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 06/12] ext4: Decrypt the block that needs to be partially zeroed Chandan Rajendra
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

With block size < page size, ext4_block_write_begin() can have up to two
blocks to decrypt. Hence this commit invokes fscrypt_decrypt_page() for
each of those blocks.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/ext4/inode.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5df4c7e..736d209b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1158,12 +1158,14 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len,
 	unsigned to = from + len;
 	struct inode *inode = page->mapping->host;
 	unsigned block_start, block_end;
-	sector_t block;
+	sector_t block, page_blk_nr;
 	int err = 0;
 	unsigned blocksize = inode->i_sb->s_blocksize;
 	unsigned bbits;
-	struct buffer_head *bh, *head, *wait[2], **wait_bh = wait;
+	struct buffer_head *bh, *head, *wait[2];
+	int nr_wait = 0;
 	bool decrypt = false;
+	int i;
 
 	BUG_ON(!PageLocked(page));
 	BUG_ON(from > PAGE_SIZE);
@@ -1216,7 +1218,7 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len,
 		    !buffer_unwritten(bh) &&
 		    (block_start < from || block_end > to)) {
 			ll_rw_block(REQ_OP_READ, 0, 1, &bh);
-			*wait_bh++ = bh;
+			wait[nr_wait++] = bh;
 			decrypt = ext4_encrypted_inode(inode) &&
 				S_ISREG(inode->i_mode);
 		}
@@ -1224,18 +1226,29 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len,
 	/*
 	 * If we issued read requests, let them complete.
 	 */
-	while (wait_bh > wait) {
-		wait_on_buffer(*--wait_bh);
-		if (!buffer_uptodate(*wait_bh))
+	for (i = 0; i < nr_wait; i++) {
+		wait_on_buffer(wait[i]);
+		if (!buffer_uptodate(wait[i]))
 			err = -EIO;
 	}
-	if (unlikely(err))
+	if (unlikely(err)) {
 		page_zero_new_buffers(page, from, to);
-	else if (decrypt) {
-		err = fscrypt_decrypt_page(page->mapping->host, page,
-				PAGE_SIZE, 0, page->index);
-		if (err)
-			clear_buffer_uptodate(*wait_bh);
+	} else if (decrypt) {
+		page_blk_nr = (sector_t)page->index << (PAGE_SHIFT - bbits);
+
+		for (i = 0; i < nr_wait; i++) {
+			int err2;
+
+			block = page_blk_nr + (bh_offset(wait[i]) >> bbits);
+			err2 = fscrypt_decrypt_page(page->mapping->host, page,
+						wait[i]->b_size,
+						bh_offset(wait[i]),
+						block);
+			if (err2) {
+				clear_buffer_uptodate(wait[i]);
+				err = err2;
+			}
+		}
 	}
 
 	return err;
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 06/12] ext4: Decrypt the block that needs to be partially zeroed
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (4 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 05/12] ext4: Decrypt all boundary blocks when doing buffered write Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters Chandan Rajendra
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

__ext4_block_zero_page_range decrypts the entire page. With block size <
page size this assumption is incorrect since the block to be decrypted
spans a sub-range of the page. This commit now passes the block's offset
within the page and also the length of the block as arguments to
fscrypt_decrypt_page().

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/ext4/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 736d209b..fbc89d9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4050,9 +4050,10 @@ static int __ext4_block_zero_page_range(handle_t *handle,
 		    ext4_encrypted_inode(inode)) {
 			/* We expect the key to be set. */
 			BUG_ON(!fscrypt_has_encryption_key(inode));
-			BUG_ON(blocksize != PAGE_SIZE);
 			WARN_ON_ONCE(fscrypt_decrypt_page(page->mapping->host,
-						page, PAGE_SIZE, 0, page->index));
+						page, blocksize,
+						round_down(offset, blocksize),
+						iblock));
 		}
 	}
 	if (ext4_should_journal_data(inode)) {
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (5 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 06/12] ext4: Decrypt the block that needs to be partially zeroed Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-25 20:01   ` Theodore Y. Ts'o
  2018-05-22 16:01 ` [RFC PATCH V3 08/12] fscrypt_zeroout_range: Encrypt all zeroed out blocks of a page Chandan Rajendra
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

This commit introduces a new parameter to mpage_readpage[s]()
functions. This parameter contains pointers to functions that can be
used to decrypt data read from the backing device. These are stored in
the fscrypt_ctx structure and one of these functions is invoked after
the read operation is completed.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/block_dev.c                  |   5 +-
 fs/buffer.c                     | 298 ++++++++++++++++++++++++----------------
 fs/crypto/bio.c                 |  95 ++++++++++++-
 fs/crypto/crypto.c              |   2 +
 fs/ext2/inode.c                 |   4 +-
 fs/ext4/Makefile                |   2 +-
 fs/ext4/inode.c                 |  13 +-
 fs/ext4/readpage.c              | 294 ---------------------------------------
 fs/fat/inode.c                  |   4 +-
 fs/isofs/inode.c                |   5 +-
 fs/mpage.c                      |  48 +++++--
 fs/xfs/xfs_aops.c               |   4 +-
 include/linux/buffer_head.h     |   2 +-
 include/linux/fs.h              |   4 +
 include/linux/fscrypt_notsupp.h |  37 ++++-
 include/linux/fscrypt_supp.h    |  13 +-
 include/linux/mpage.h           |   6 +-
 17 files changed, 392 insertions(+), 444 deletions(-)
 delete mode 100644 fs/ext4/readpage.c

diff --git a/fs/block_dev.c b/fs/block_dev.c
index b549666..254af9a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -568,13 +568,14 @@ static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 
 static int blkdev_readpage(struct file * file, struct page * page)
 {
-	return block_read_full_page(page, blkdev_get_block);
+	return block_read_full_page(page, blkdev_get_block, NULL);
 }
 
 static int blkdev_readpages(struct file *file, struct address_space *mapping,
 			struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block,
+			NULL);
 }
 
 static int blkdev_write_begin(struct file *file, struct address_space *mapping,
diff --git a/fs/buffer.c b/fs/buffer.c
index fda7926..978a8b7 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -45,8 +45,12 @@
 #include <linux/mpage.h>
 #include <linux/bit_spinlock.h>
 #include <linux/pagevec.h>
+#include <linux/fs.h>
 #include <trace/events/block.h>
 
+#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
+#include <linux/fscrypt.h>
+
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 			 enum rw_hint hint, struct writeback_control *wbc);
@@ -2197,6 +2201,172 @@ int block_is_partially_uptodate(struct page *page, unsigned long from,
 }
 EXPORT_SYMBOL(block_is_partially_uptodate);
 
+static void end_bio_bh_io_sync(struct bio *bio)
+{
+	post_process_read_t *post_process;
+	struct fscrypt_ctx *ctx;
+	struct buffer_head *bh;
+
+	if (fscrypt_bio_encrypted(bio)) {
+		ctx = bio->bi_private;
+		post_process = fscrypt_get_post_process(ctx);
+
+		if (bio->bi_status || post_process->process_block == NULL) {
+			bh = fscrypt_get_bh(ctx);
+			fscrypt_release_ctx(ctx);
+		} else {
+			fscrypt_enqueue_decrypt_bio(ctx, bio,
+						post_process->process_block);
+			return;
+		}
+	} else {
+		bh = bio->bi_private;
+	}
+
+	if (unlikely(bio_flagged(bio, BIO_QUIET)))
+		set_bit(BH_Quiet, &bh->b_state);
+
+	bh->b_end_io(bh, !bio->bi_status);
+	bio_put(bio);
+}
+
+/*
+ * This allows us to do IO even on the odd last sectors
+ * of a device, even if the block size is some multiple
+ * of the physical sector size.
+ *
+ * We'll just truncate the bio to the size of the device,
+ * and clear the end of the buffer head manually.
+ *
+ * Truly out-of-range accesses will turn into actual IO
+ * errors, this only handles the "we need to be able to
+ * do IO at the final sector" case.
+ */
+void guard_bio_eod(int op, struct bio *bio)
+{
+	sector_t maxsector;
+	struct bio_vec *bvec = bio_last_bvec_all(bio);
+	unsigned truncated_bytes;
+	struct hd_struct *part;
+
+	rcu_read_lock();
+	part = __disk_get_part(bio->bi_disk, bio->bi_partno);
+	if (part)
+		maxsector = part_nr_sects_read(part);
+	else
+		maxsector = get_capacity(bio->bi_disk);
+	rcu_read_unlock();
+
+	if (!maxsector)
+		return;
+
+	/*
+	 * If the *whole* IO is past the end of the device,
+	 * let it through, and the IO layer will turn it into
+	 * an EIO.
+	 */
+	if (unlikely(bio->bi_iter.bi_sector >= maxsector))
+		return;
+
+	maxsector -= bio->bi_iter.bi_sector;
+	if (likely((bio->bi_iter.bi_size >> 9) <= maxsector))
+		return;
+
+	/* Uhhuh. We've got a bio that straddles the device size! */
+	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
+
+	/* Truncate the bio.. */
+	bio->bi_iter.bi_size -= truncated_bytes;
+	bvec->bv_len -= truncated_bytes;
+
+	/* ..and clear the end of the buffer for reads */
+	if (op == REQ_OP_READ) {
+		zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
+				truncated_bytes);
+	}
+}
+
+struct bio *create_bh_bio(int op, int op_flags, struct buffer_head *bh,
+			enum rw_hint write_hint,
+			post_process_read_t *post_process)
+{
+	struct address_space *mapping;
+	struct fscrypt_ctx *ctx = NULL;
+	struct inode *inode;
+	struct page *page;
+	struct bio *bio;
+
+	BUG_ON(!buffer_locked(bh));
+	BUG_ON(!buffer_mapped(bh));
+	BUG_ON(!bh->b_end_io);
+	BUG_ON(buffer_delay(bh));
+	BUG_ON(buffer_unwritten(bh));
+
+	/*
+	 * Only clear out a write error when rewriting
+	 */
+	if (test_set_buffer_req(bh) && (op == REQ_OP_WRITE))
+		clear_buffer_write_io_error(bh);
+
+	page = bh->b_page;
+
+	if (op == REQ_OP_READ) {
+		mapping = page_mapping(page);
+		if (mapping && !PageSwapCache(page)) {
+			inode = mapping->host;
+			if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) {
+				ctx = fscrypt_get_ctx(inode, GFP_NOFS);
+				BUG_ON(!ctx);
+				fscrypt_set_bh(ctx, bh);
+				if (post_process)
+					fscrypt_set_post_process(ctx,
+								post_process);
+			}
+		}
+	}
+
+	/*
+	 * from here on down, it's all bio -- do the initial mapping,
+	 * submit_bio -> generic_make_request may further map this bio around
+	 */
+	bio = bio_alloc(GFP_NOIO, 1);
+
+	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+	bio_set_dev(bio, bh->b_bdev);
+	bio->bi_write_hint = write_hint;
+
+	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
+	BUG_ON(bio->bi_iter.bi_size != bh->b_size);
+
+	bio->bi_end_io = end_bio_bh_io_sync;
+
+	if (ctx)
+		bio->bi_private = ctx;
+	else
+		bio->bi_private = bh;
+
+	/* Take care of bh's that straddle the end of the device */
+	guard_bio_eod(op, bio);
+
+	if (buffer_meta(bh))
+		op_flags |= REQ_META;
+	if (buffer_prio(bh))
+		op_flags |= REQ_PRIO;
+	bio_set_op_attrs(bio, op, op_flags);
+
+	return bio;
+}
+
+static int submit_bh_post_process(int op, int op_flags, struct buffer_head *bh,
+		post_process_read_t *post_process)
+{
+	struct bio *bio;
+
+	bio = create_bh_bio(op, op_flags, bh, 0, post_process);
+	submit_bio(bio);
+	return 0;
+}
+
 /*
  * Generic "read page" function for block devices that have the normal
  * get_block functionality. This is most of the block device filesystems.
@@ -2204,7 +2374,8 @@ EXPORT_SYMBOL(block_is_partially_uptodate);
  * set/clear_buffer_uptodate() functions propagate buffer state into the
  * page struct once IO has completed.
  */
-int block_read_full_page(struct page *page, get_block_t *get_block)
+int block_read_full_page(struct page *page, get_block_t *get_block,
+			post_process_read_t *post_process)
 {
 	struct inode *inode = page->mapping->host;
 	sector_t iblock, lblock;
@@ -2284,7 +2455,8 @@ int block_read_full_page(struct page *page, get_block_t *get_block)
 		if (buffer_uptodate(bh))
 			end_buffer_async_read(bh, 1);
 		else
-			submit_bh(REQ_OP_READ, 0, bh);
+			submit_bh_post_process(REQ_OP_READ, 0, bh,
+					post_process);
 	}
 	return 0;
 }
@@ -2959,124 +3131,12 @@ sector_t generic_block_bmap(struct address_space *mapping, sector_t block,
 }
 EXPORT_SYMBOL(generic_block_bmap);
 
-static void end_bio_bh_io_sync(struct bio *bio)
-{
-	struct buffer_head *bh = bio->bi_private;
-
-	if (unlikely(bio_flagged(bio, BIO_QUIET)))
-		set_bit(BH_Quiet, &bh->b_state);
-
-	bh->b_end_io(bh, !bio->bi_status);
-	bio_put(bio);
-}
-
-/*
- * This allows us to do IO even on the odd last sectors
- * of a device, even if the block size is some multiple
- * of the physical sector size.
- *
- * We'll just truncate the bio to the size of the device,
- * and clear the end of the buffer head manually.
- *
- * Truly out-of-range accesses will turn into actual IO
- * errors, this only handles the "we need to be able to
- * do IO at the final sector" case.
- */
-void guard_bio_eod(int op, struct bio *bio)
-{
-	sector_t maxsector;
-	struct bio_vec *bvec = bio_last_bvec_all(bio);
-	unsigned truncated_bytes;
-	struct hd_struct *part;
-
-	rcu_read_lock();
-	part = __disk_get_part(bio->bi_disk, bio->bi_partno);
-	if (part)
-		maxsector = part_nr_sects_read(part);
-	else
-		maxsector = get_capacity(bio->bi_disk);
-	rcu_read_unlock();
-
-	if (!maxsector)
-		return;
-
-	/*
-	 * If the *whole* IO is past the end of the device,
-	 * let it through, and the IO layer will turn it into
-	 * an EIO.
-	 */
-	if (unlikely(bio->bi_iter.bi_sector >= maxsector))
-		return;
-
-	maxsector -= bio->bi_iter.bi_sector;
-	if (likely((bio->bi_iter.bi_size >> 9) <= maxsector))
-		return;
-
-	/* Uhhuh. We've got a bio that straddles the device size! */
-	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
-
-	/* Truncate the bio.. */
-	bio->bi_iter.bi_size -= truncated_bytes;
-	bvec->bv_len -= truncated_bytes;
-
-	/* ..and clear the end of the buffer for reads */
-	if (op == REQ_OP_READ) {
-		zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
-				truncated_bytes);
-	}
-}
-
-struct bio *create_bh_bio(int op, int op_flags, struct buffer_head *bh,
-                          enum rw_hint write_hint)
-{
-	struct bio *bio;
-
-	BUG_ON(!buffer_locked(bh));
-	BUG_ON(!buffer_mapped(bh));
-	BUG_ON(!bh->b_end_io);
-	BUG_ON(buffer_delay(bh));
-	BUG_ON(buffer_unwritten(bh));
-
-	/*
-	 * Only clear out a write error when rewriting
-	 */
-	if (test_set_buffer_req(bh) && (op == REQ_OP_WRITE))
-		clear_buffer_write_io_error(bh);
-
-	/*
-	 * from here on down, it's all bio -- do the initial mapping,
-	 * submit_bio -> generic_make_request may further map this bio around
-	 */
-	bio = bio_alloc(GFP_NOIO, 1);
-
-	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio_set_dev(bio, bh->b_bdev);
-	bio->bi_write_hint = write_hint;
-
-	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
-	BUG_ON(bio->bi_iter.bi_size != bh->b_size);
-
-	bio->bi_end_io = end_bio_bh_io_sync;
-	bio->bi_private = bh;
-
-	/* Take care of bh's that straddle the end of the device */
-	guard_bio_eod(op, bio);
-
-	if (buffer_meta(bh))
-		op_flags |= REQ_META;
-	if (buffer_prio(bh))
-		op_flags |= REQ_PRIO;
-	bio_set_op_attrs(bio, op, op_flags);
-
-	return bio;
-}
-
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 			 enum rw_hint write_hint, struct writeback_control *wbc)
 {
 	struct bio *bio;
 
-	bio = create_bh_bio(op, op_flags, bh, write_hint);
+	bio = create_bh_bio(op, op_flags, bh, write_hint, NULL);
 
 	if (wbc) {
 		wbc_init_bio(wbc, bio);
@@ -3092,7 +3152,7 @@ int submit_bh_blkcg_css(int op, int op_flags, struct buffer_head *bh,
 {
 	struct bio *bio;
 
-	bio = create_bh_bio(op, op_flags, bh, 0);
+	bio = create_bh_bio(op, op_flags, bh, 0, NULL);
 	bio_associate_blkcg(bio, blkcg_css);
 	submit_bio(bio);
 	return 0;
@@ -3101,11 +3161,7 @@ EXPORT_SYMBOL(submit_bh_blkcg_css);
 
 int submit_bh(int op, int op_flags, struct buffer_head *bh)
 {
-	struct bio *bio;
-
-	bio = create_bh_bio(op, op_flags, bh, 0);
-	submit_bio(bio);
-	return 0;
+	return submit_bh_post_process(op, op_flags, bh, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 32288c3..aba22f7 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -24,6 +24,7 @@
 #include <linux/module.h>
 #include <linux/bio.h>
 #include <linux/namei.h>
+#include <linux/buffer_head.h>
 #include "fscrypt_private.h"
 
 static void __fscrypt_decrypt_bio(struct bio *bio, bool done)
@@ -59,7 +60,7 @@ void fscrypt_decrypt_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(fscrypt_decrypt_bio);
 
-static void completion_pages(struct work_struct *work)
+void fscrypt_complete_pages(struct work_struct *work)
 {
 	struct fscrypt_ctx *ctx =
 		container_of(work, struct fscrypt_ctx, r.work);
@@ -69,15 +70,103 @@ static void completion_pages(struct work_struct *work)
 	fscrypt_release_ctx(ctx);
 	bio_put(bio);
 }
+EXPORT_SYMBOL(fscrypt_complete_pages);
 
-void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx, struct bio *bio)
+void fscrypt_complete_block(struct work_struct *work)
 {
-	INIT_WORK(&ctx->r.work, completion_pages);
+	struct fscrypt_ctx *ctx =
+		container_of(work, struct fscrypt_ctx, r.work);
+	struct buffer_head *bh;
+	struct bio *bio;
+	struct bio_vec *bv;
+	struct page *page;
+	struct inode *inode;
+	u64 blk_nr;
+	int ret;
+
+	bio = ctx->r.bio;
+	WARN_ON(bio->bi_vcnt != 1);
+
+	bv = bio->bi_io_vec;
+	page = bv->bv_page;
+	inode = page->mapping->host;
+
+	WARN_ON(bv->bv_len != i_blocksize(inode));
+
+	blk_nr = page->index << (PAGE_SHIFT - inode->i_blkbits);
+	blk_nr += bv->bv_offset >> inode->i_blkbits;
+
+	bh = ctx->r.bh;
+
+	ret = fscrypt_decrypt_page(inode, page, bv->bv_len,
+				bv->bv_offset, blk_nr);
+
+	bh->b_end_io(bh, !ret);
+
+	fscrypt_release_ctx(ctx);
+	bio_put(bio);
+}
+EXPORT_SYMBOL(fscrypt_complete_block);
+
+bool fscrypt_bio_encrypted(struct bio *bio)
+{
+	struct address_space *mapping;
+	struct inode *inode;
+	struct page *page;
+
+	if (bio_op(bio) == REQ_OP_READ && bio->bi_vcnt) {
+		page = bio->bi_io_vec->bv_page;
+
+		if (!PageSwapCache(page)) {
+			mapping = page_mapping(page);
+			if (mapping) {
+				inode = mapping->host;
+
+				if (IS_ENCRYPTED(inode) &&
+					S_ISREG(inode->i_mode))
+					return true;
+			}
+		}
+	}
+
+	return false;
+}
+EXPORT_SYMBOL(fscrypt_bio_encrypted);
+
+void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx, struct bio *bio,
+			void (*process_bio)(struct work_struct *))
+{
+	BUG_ON(!process_bio);
+	INIT_WORK(&ctx->r.work, process_bio);
 	ctx->r.bio = bio;
 	fscrypt_enqueue_decrypt_work(&ctx->r.work);
 }
 EXPORT_SYMBOL(fscrypt_enqueue_decrypt_bio);
 
+post_process_read_t *fscrypt_get_post_process(struct fscrypt_ctx *ctx)
+{
+	return &(ctx->r.post_process);
+}
+EXPORT_SYMBOL(fscrypt_get_post_process);
+
+void fscrypt_set_post_process(struct fscrypt_ctx *ctx,
+			post_process_read_t *post_process)
+{
+	ctx->r.post_process = *post_process;
+}
+
+struct buffer_head *fscrypt_get_bh(struct fscrypt_ctx *ctx)
+{
+	return ctx->r.bh;
+}
+EXPORT_SYMBOL(fscrypt_get_bh);
+
+void fscrypt_set_bh(struct fscrypt_ctx *ctx, struct buffer_head *bh)
+{
+	ctx->r.bh = bh;
+}
+EXPORT_SYMBOL(fscrypt_set_bh);
+
 void fscrypt_pullback_bio_page(struct page **page, bool restore)
 {
 	struct fscrypt_ctx *ctx;
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 27509b1..2148651 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -127,6 +127,8 @@ struct fscrypt_ctx *fscrypt_get_ctx(const struct inode *inode, gfp_t gfp_flags)
 		ctx->flags |= FS_CTX_REQUIRES_FREE_ENCRYPT_FL;
 	} else {
 		ctx->flags &= ~FS_CTX_REQUIRES_FREE_ENCRYPT_FL;
+		ctx->r.post_process.process_block = NULL;
+		ctx->r.post_process.process_pages = NULL;
 	}
 	ctx->flags &= ~FS_CTX_HAS_BOUNCE_BUFFER_FL;
 	return ctx;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 1329b69..0a91f87 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -869,14 +869,14 @@ static int ext2_writepage(struct page *page, struct writeback_control *wbc)
 
 static int ext2_readpage(struct file *file, struct page *page)
 {
-	return mpage_readpage(page, ext2_get_block);
+	return mpage_readpage(page, ext2_get_block, NULL);
 }
 
 static int
 ext2_readpages(struct file *file, struct address_space *mapping,
 		struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, ext2_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, ext2_get_block, NULL);
 }
 
 static int
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 8fdfcd3..7c38803 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_EXT4_FS) += ext4.o
 ext4-y	:= balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
 		extents_status.o file.o fsmap.o fsync.o hash.o ialloc.o \
 		indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
-		mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
+		mmp.o move_extent.o namei.o page-io.o resize.o \
 		super.o symlink.o sysfs.o xattr.o xattr_trusted.o xattr_user.o
 
 ext4-$(CONFIG_EXT4_FS_POSIX_ACL)	+= acl.o
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fbc89d9..5ae3c7b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3337,6 +3337,10 @@ static int ext4_readpage(struct file *file, struct page *page)
 {
 	int ret = -EAGAIN;
 	struct inode *inode = page->mapping->host;
+	post_process_read_t post_process = {
+		.process_block = fscrypt_complete_block,
+		.process_pages = fscrypt_complete_pages,
+	};
 
 	trace_ext4_readpage(page);
 
@@ -3344,7 +3348,7 @@ static int ext4_readpage(struct file *file, struct page *page)
 		ret = ext4_readpage_inline(inode, page);
 
 	if (ret == -EAGAIN)
-		return ext4_mpage_readpages(page->mapping, NULL, page, 1);
+		return mpage_readpage(page, ext4_get_block, &post_process);
 
 	return ret;
 }
@@ -3354,12 +3358,17 @@ ext4_readpages(struct file *file, struct address_space *mapping,
 		struct list_head *pages, unsigned nr_pages)
 {
 	struct inode *inode = mapping->host;
+	post_process_read_t post_process = {
+		.process_block = fscrypt_complete_block,
+		.process_pages = fscrypt_complete_pages,
+	};
 
 	/* If the file has inline data, no need to do readpages. */
 	if (ext4_has_inline_data(inode))
 		return 0;
 
-	return ext4_mpage_readpages(mapping, pages, NULL, nr_pages);
+	return mpage_readpages(mapping, pages, nr_pages, ext4_get_block,
+			&post_process);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
deleted file mode 100644
index 19b87a8..0000000
--- a/fs/ext4/readpage.c
+++ /dev/null
@@ -1,294 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * linux/fs/ext4/readpage.c
- *
- * Copyright (C) 2002, Linus Torvalds.
- * Copyright (C) 2015, Google, Inc.
- *
- * This was originally taken from fs/mpage.c
- *
- * The intent is the ext4_mpage_readpages() function here is intended
- * to replace mpage_readpages() in the general case, not just for
- * encrypted files.  It has some limitations (see below), where it
- * will fall back to read_block_full_page(), but these limitations
- * should only be hit when page_size != block_size.
- *
- * This will allow us to attach a callback function to support ext4
- * encryption.
- *
- * If anything unusual happens, such as:
- *
- * - encountering a page which has buffers
- * - encountering a page which has a non-hole after a hole
- * - encountering a page with non-contiguous blocks
- *
- * then this code just gives up and calls the buffer_head-based read function.
- * It does handle a page which has holes at the end - that is a common case:
- * the end-of-file on blocksize < PAGE_SIZE setups.
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/export.h>
-#include <linux/mm.h>
-#include <linux/kdev_t.h>
-#include <linux/gfp.h>
-#include <linux/bio.h>
-#include <linux/fs.h>
-#include <linux/buffer_head.h>
-#include <linux/blkdev.h>
-#include <linux/highmem.h>
-#include <linux/prefetch.h>
-#include <linux/mpage.h>
-#include <linux/writeback.h>
-#include <linux/backing-dev.h>
-#include <linux/pagevec.h>
-#include <linux/cleancache.h>
-
-#include "ext4.h"
-
-static inline bool ext4_bio_encrypted(struct bio *bio)
-{
-#ifdef CONFIG_EXT4_FS_ENCRYPTION
-	return unlikely(bio->bi_private != NULL);
-#else
-	return false;
-#endif
-}
-
-/*
- * I/O completion handler for multipage BIOs.
- *
- * The mpage code never puts partial pages into a BIO (except for end-of-file).
- * If a page does not map to a contiguous run of blocks then it simply falls
- * back to block_read_full_page().
- *
- * Why is this?  If a page's completion depends on a number of different BIOs
- * which can complete in any order (or at the same time) then determining the
- * status of that page is hard.  See end_buffer_async_read() for the details.
- * There is no point in duplicating all that complexity.
- */
-static void mpage_end_io(struct bio *bio)
-{
-	struct bio_vec *bv;
-	int i;
-
-	if (ext4_bio_encrypted(bio)) {
-		if (bio->bi_status) {
-			fscrypt_release_ctx(bio->bi_private);
-		} else {
-			fscrypt_enqueue_decrypt_bio(bio->bi_private, bio);
-			return;
-		}
-	}
-	bio_for_each_segment_all(bv, bio, i) {
-		struct page *page = bv->bv_page;
-
-		if (!bio->bi_status) {
-			SetPageUptodate(page);
-		} else {
-			ClearPageUptodate(page);
-			SetPageError(page);
-		}
-		unlock_page(page);
-	}
-
-	bio_put(bio);
-}
-
-int ext4_mpage_readpages(struct address_space *mapping,
-			 struct list_head *pages, struct page *page,
-			 unsigned nr_pages)
-{
-	struct bio *bio = NULL;
-	sector_t last_block_in_bio = 0;
-
-	struct inode *inode = mapping->host;
-	const unsigned blkbits = inode->i_blkbits;
-	const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
-	const unsigned blocksize = 1 << blkbits;
-	sector_t block_in_file;
-	sector_t last_block;
-	sector_t last_block_in_file;
-	sector_t blocks[MAX_BUF_PER_PAGE];
-	unsigned page_block;
-	struct block_device *bdev = inode->i_sb->s_bdev;
-	int length;
-	unsigned relative_block = 0;
-	struct ext4_map_blocks map;
-
-	map.m_pblk = 0;
-	map.m_lblk = 0;
-	map.m_len = 0;
-	map.m_flags = 0;
-
-	for (; nr_pages; nr_pages--) {
-		int fully_mapped = 1;
-		unsigned first_hole = blocks_per_page;
-
-		prefetchw(&page->flags);
-		if (pages) {
-			page = list_entry(pages->prev, struct page, lru);
-			list_del(&page->lru);
-			if (add_to_page_cache_lru(page, mapping, page->index,
-				  readahead_gfp_mask(mapping)))
-				goto next_page;
-		}
-
-		if (page_has_buffers(page))
-			goto confused;
-
-		block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
-		last_block = block_in_file + nr_pages * blocks_per_page;
-		last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
-		if (last_block > last_block_in_file)
-			last_block = last_block_in_file;
-		page_block = 0;
-
-		/*
-		 * Map blocks using the previous result first.
-		 */
-		if ((map.m_flags & EXT4_MAP_MAPPED) &&
-		    block_in_file > map.m_lblk &&
-		    block_in_file < (map.m_lblk + map.m_len)) {
-			unsigned map_offset = block_in_file - map.m_lblk;
-			unsigned last = map.m_len - map_offset;
-
-			for (relative_block = 0; ; relative_block++) {
-				if (relative_block == last) {
-					/* needed? */
-					map.m_flags &= ~EXT4_MAP_MAPPED;
-					break;
-				}
-				if (page_block == blocks_per_page)
-					break;
-				blocks[page_block] = map.m_pblk + map_offset +
-					relative_block;
-				page_block++;
-				block_in_file++;
-			}
-		}
-
-		/*
-		 * Then do more ext4_map_blocks() calls until we are
-		 * done with this page.
-		 */
-		while (page_block < blocks_per_page) {
-			if (block_in_file < last_block) {
-				map.m_lblk = block_in_file;
-				map.m_len = last_block - block_in_file;
-
-				if (ext4_map_blocks(NULL, inode, &map, 0) < 0) {
-				set_error_page:
-					SetPageError(page);
-					zero_user_segment(page, 0,
-							  PAGE_SIZE);
-					unlock_page(page);
-					goto next_page;
-				}
-			}
-			if ((map.m_flags & EXT4_MAP_MAPPED) == 0) {
-				fully_mapped = 0;
-				if (first_hole == blocks_per_page)
-					first_hole = page_block;
-				page_block++;
-				block_in_file++;
-				continue;
-			}
-			if (first_hole != blocks_per_page)
-				goto confused;		/* hole -> non-hole */
-
-			/* Contiguous blocks? */
-			if (page_block && blocks[page_block-1] != map.m_pblk-1)
-				goto confused;
-			for (relative_block = 0; ; relative_block++) {
-				if (relative_block == map.m_len) {
-					/* needed? */
-					map.m_flags &= ~EXT4_MAP_MAPPED;
-					break;
-				} else if (page_block == blocks_per_page)
-					break;
-				blocks[page_block] = map.m_pblk+relative_block;
-				page_block++;
-				block_in_file++;
-			}
-		}
-		if (first_hole != blocks_per_page) {
-			zero_user_segment(page, first_hole << blkbits,
-					  PAGE_SIZE);
-			if (first_hole == 0) {
-				SetPageUptodate(page);
-				unlock_page(page);
-				goto next_page;
-			}
-		} else if (fully_mapped) {
-			SetPageMappedToDisk(page);
-		}
-		if (fully_mapped && blocks_per_page == 1 &&
-		    !PageUptodate(page) && cleancache_get_page(page) == 0) {
-			SetPageUptodate(page);
-			goto confused;
-		}
-
-		/*
-		 * This page will go to BIO.  Do we need to send this
-		 * BIO off first?
-		 */
-		if (bio && (last_block_in_bio != blocks[0] - 1)) {
-		submit_and_realloc:
-			submit_bio(bio);
-			bio = NULL;
-		}
-		if (bio == NULL) {
-			struct fscrypt_ctx *ctx = NULL;
-
-			if (ext4_encrypted_inode(inode) &&
-			    S_ISREG(inode->i_mode)) {
-				ctx = fscrypt_get_ctx(inode, GFP_NOFS);
-				if (IS_ERR(ctx))
-					goto set_error_page;
-			}
-			bio = bio_alloc(GFP_KERNEL,
-				min_t(int, nr_pages, BIO_MAX_PAGES));
-			if (!bio) {
-				if (ctx)
-					fscrypt_release_ctx(ctx);
-				goto set_error_page;
-			}
-			bio_set_dev(bio, bdev);
-			bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9);
-			bio->bi_end_io = mpage_end_io;
-			bio->bi_private = ctx;
-			bio_set_op_attrs(bio, REQ_OP_READ, 0);
-		}
-
-		length = first_hole << blkbits;
-		if (bio_add_page(bio, page, length, 0) < length)
-			goto submit_and_realloc;
-
-		if (((map.m_flags & EXT4_MAP_BOUNDARY) &&
-		     (relative_block == map.m_len)) ||
-		    (first_hole != blocks_per_page)) {
-			submit_bio(bio);
-			bio = NULL;
-		} else
-			last_block_in_bio = blocks[blocks_per_page - 1];
-		goto next_page;
-	confused:
-		if (bio) {
-			submit_bio(bio);
-			bio = NULL;
-		}
-		if (!PageUptodate(page))
-			block_read_full_page(page, ext4_get_block);
-		else
-			unlock_page(page);
-	next_page:
-		if (pages)
-			put_page(page);
-	}
-	BUG_ON(pages && !list_empty(pages));
-	if (bio)
-		submit_bio(bio);
-	return 0;
-}
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index ffbbf05..ee1ddc4f 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -194,13 +194,13 @@ static int fat_writepages(struct address_space *mapping,
 
 static int fat_readpage(struct file *file, struct page *page)
 {
-	return mpage_readpage(page, fat_get_block);
+	return mpage_readpage(page, fat_get_block, NULL);
 }
 
 static int fat_readpages(struct file *file, struct address_space *mapping,
 			 struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, fat_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, fat_get_block, NULL);
 }
 
 static void fat_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index ec3fba7..60df56f 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -1173,13 +1173,14 @@ struct buffer_head *isofs_bread(struct inode *inode, sector_t block)
 
 static int isofs_readpage(struct file *file, struct page *page)
 {
-	return mpage_readpage(page, isofs_get_block);
+	return mpage_readpage(page, isofs_get_block, NULL);
 }
 
 static int isofs_readpages(struct file *file, struct address_space *mapping,
 			struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, isofs_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, isofs_get_block,
+			NULL);
 }
 
 static sector_t _isofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/mpage.c b/fs/mpage.c
index b7e7f57..c88fdd4 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -30,6 +30,8 @@
 #include <linux/backing-dev.h>
 #include <linux/pagevec.h>
 #include <linux/cleancache.h>
+#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
+#include <linux/fscrypt.h>
 #include "internal.h"
 
 /*
@@ -46,9 +48,24 @@
  */
 static void mpage_end_io(struct bio *bio)
 {
+	post_process_read_t *post_process;
+	struct fscrypt_ctx *ctx;
 	struct bio_vec *bv;
 	int i;
 
+	if (fscrypt_bio_encrypted(bio)) {
+		ctx = bio->bi_private;
+		post_process = fscrypt_get_post_process(ctx);
+
+		if (bio->bi_status || post_process->process_pages == NULL) {
+			fscrypt_release_ctx(ctx);
+		} else {
+			fscrypt_enqueue_decrypt_bio(ctx, bio,
+						post_process->process_pages);
+			return;
+		}
+	}
+
 	bio_for_each_segment_all(bv, bio, i) {
 		struct page *page = bv->bv_page;
 		page_endio(page, op_is_write(bio_op(bio)),
@@ -146,7 +163,7 @@ static struct bio *
 do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 		sector_t *last_block_in_bio, struct buffer_head *map_bh,
 		unsigned long *first_logical_block, get_block_t get_block,
-		gfp_t gfp)
+		post_process_read_t *post_process, gfp_t gfp)
 {
 	struct inode *inode = page->mapping->host;
 	const unsigned blkbits = inode->i_blkbits;
@@ -278,15 +295,26 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 
 alloc_new:
 	if (bio == NULL) {
-		if (first_hole == blocks_per_page) {
+		struct fscrypt_ctx *ctx = NULL;
+
+		if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) {
+			ctx = fscrypt_get_ctx(inode, gfp & GFP_KERNEL);
+			if (IS_ERR(ctx))
+				goto confused;
+			fscrypt_set_post_process(ctx, post_process);
+		} else if (first_hole == blocks_per_page) {
 			if (!bdev_read_page(bdev, blocks[0] << (blkbits - 9),
 								page))
 				goto out;
 		}
 		bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
 				min_t(int, nr_pages, BIO_MAX_PAGES), gfp);
-		if (bio == NULL)
+		if (bio == NULL) {
+			if (ctx)
+				fscrypt_release_ctx(ctx);
 			goto confused;
+		}
+		bio->bi_private = ctx;
 	}
 
 	length = first_hole << blkbits;
@@ -309,7 +337,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 	if (bio)
 		bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
 	if (!PageUptodate(page))
-	        block_read_full_page(page, get_block);
+		block_read_full_page(page, get_block, post_process);
 	else
 		unlock_page(page);
 	goto out;
@@ -361,7 +389,8 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
  */
 int
 mpage_readpages(struct address_space *mapping, struct list_head *pages,
-				unsigned nr_pages, get_block_t get_block)
+		unsigned nr_pages, get_block_t get_block,
+		post_process_read_t *post_process)
 {
 	struct bio *bio = NULL;
 	unsigned page_idx;
@@ -384,7 +413,8 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages,
 					nr_pages - page_idx,
 					&last_block_in_bio, &map_bh,
 					&first_logical_block,
-					get_block, gfp);
+					get_block, post_process,
+					gfp);
 		}
 		put_page(page);
 	}
@@ -398,7 +428,8 @@ EXPORT_SYMBOL(mpage_readpages);
 /*
  * This isn't called much at all
  */
-int mpage_readpage(struct page *page, get_block_t get_block)
+int mpage_readpage(struct page *page, get_block_t get_block,
+		post_process_read_t *post_process)
 {
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
@@ -409,7 +440,8 @@ int mpage_readpage(struct page *page, get_block_t get_block)
 	map_bh.b_state = 0;
 	map_bh.b_size = 0;
 	bio = do_mpage_readpage(bio, page, 1, &last_block_in_bio,
-			&map_bh, &first_logical_block, get_block, gfp);
+				&map_bh, &first_logical_block, get_block,
+				post_process, gfp);
 	if (bio)
 		mpage_bio_submit(REQ_OP_READ, 0, bio);
 	return 0;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0ab824f..74591b8 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1405,7 +1405,7 @@ xfs_vm_readpage(
 	struct page		*page)
 {
 	trace_xfs_vm_readpage(page->mapping->host, 1);
-	return mpage_readpage(page, xfs_get_blocks);
+	return mpage_readpage(page, xfs_get_blocks, NULL);
 }
 
 STATIC int
@@ -1416,7 +1416,7 @@ xfs_vm_readpages(
 	unsigned		nr_pages)
 {
 	trace_xfs_vm_readpages(mapping->host, nr_pages);
-	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
+	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks, NULL);
 }
 
 /*
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c2fbd97..3718c20 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -224,7 +224,7 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
 int __block_write_full_page(struct inode *inode, struct page *page,
 			get_block_t *get_block, struct writeback_control *wbc,
 			bh_end_io_t *handler);
-int block_read_full_page(struct page*, get_block_t*);
+int block_read_full_page(struct page*, get_block_t*, post_process_read_t*);
 int block_is_partially_uptodate(struct page *page, unsigned long from,
 				unsigned long count);
 int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0eedf74..40e3537 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -83,6 +83,10 @@ typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
 typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 			ssize_t bytes, void *private);
+typedef struct post_process_read {
+	void (*process_block)(struct work_struct *);
+	void (*process_pages)(struct work_struct *);
+} post_process_read_t;
 
 #define MAY_EXEC		0x00000001
 #define MAY_WRITE		0x00000002
diff --git a/include/linux/fscrypt_notsupp.h b/include/linux/fscrypt_notsupp.h
index 9770be37..ceac8c8 100644
--- a/include/linux/fscrypt_notsupp.h
+++ b/include/linux/fscrypt_notsupp.h
@@ -168,9 +168,44 @@ static inline void fscrypt_decrypt_bio(struct bio *bio)
 {
 }
 
+static inline void fscrypt_complete_block(struct work_struct *work)
+{
+}
+
+static inline void fscrypt_complete_pages(struct work_struct *work)
+{
+}
+
 static inline void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx,
-					       struct bio *bio)
+					struct bio *bio,
+					void (*process_bio)(struct work_struct *))
+{
+}
+
+static inline bool fscrypt_bio_encrypted(struct bio *bio)
+{
+	return false;
+}
+
+static inline post_process_read_t *
+fscrypt_get_post_process(struct fscrypt_ctx *ctx)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
+static inline void
+fscrypt_set_post_process(struct fscrypt_ctx *ctx, post_process_read_t *post_process)
+{
+}
+
+static inline void
+fscrypt_set_bh(struct fscrypt_ctx *ctx, struct buffer_head *bh)
+{
+}
+
+static inline struct buffer_head *fscrypt_get_bh(struct fscrypt_ctx *ctx)
 {
+	return ERR_PTR(-EOPNOTSUPP);
 }
 
 static inline void fscrypt_pullback_bio_page(struct page **page, bool restore)
diff --git a/include/linux/fscrypt_supp.h b/include/linux/fscrypt_supp.h
index 2c9a86a..b946eca 100644
--- a/include/linux/fscrypt_supp.h
+++ b/include/linux/fscrypt_supp.h
@@ -39,8 +39,10 @@ struct fscrypt_ctx {
 			struct page *control_page;	/* Original page  */
 		} w;
 		struct {
+			struct buffer_head *bh;
 			struct bio *bio;
 			struct work_struct work;
+			post_process_read_t post_process;
 		} r;
 		struct list_head free_list;	/* Free list */
 	};
@@ -190,8 +192,17 @@ static inline bool fscrypt_match_name(const struct fscrypt_name *fname,
 
 /* bio.c */
 extern void fscrypt_decrypt_bio(struct bio *);
+extern void fscrypt_complete_pages(struct work_struct *work);
+extern void fscrypt_complete_block(struct work_struct *work);
 extern void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx,
-					struct bio *bio);
+					struct bio *bio,
+					void (*process_bio)(struct work_struct *));
+extern post_process_read_t *fscrypt_get_post_process(struct fscrypt_ctx *ctx);
+extern void fscrypt_set_post_process(struct fscrypt_ctx *ctx,
+				post_process_read_t *post_process);
+extern struct buffer_head *fscrypt_get_bh(struct fscrypt_ctx *ctx);
+extern void fscrypt_set_bh(struct fscrypt_ctx *ctx, struct buffer_head *bh);
+extern bool fscrypt_bio_encrypted(struct bio *bio);
 extern void fscrypt_pullback_bio_page(struct page **, bool);
 extern int fscrypt_zeroout_range(const struct inode *, pgoff_t, sector_t,
 				 unsigned int);
diff --git a/include/linux/mpage.h b/include/linux/mpage.h
index 001f1fc..da2526a 100644
--- a/include/linux/mpage.h
+++ b/include/linux/mpage.h
@@ -15,8 +15,10 @@
 struct writeback_control;
 
 int mpage_readpages(struct address_space *mapping, struct list_head *pages,
-				unsigned nr_pages, get_block_t get_block);
-int mpage_readpage(struct page *page, get_block_t get_block);
+		unsigned nr_pages, get_block_t get_block,
+		post_process_read_t *post_process);
+int mpage_readpage(struct page *page, get_block_t get_block,
+		post_process_read_t *post_process);
 int mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block);
 int mpage_writepage(struct page *page, get_block_t *get_block,
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 08/12] fscrypt_zeroout_range: Encrypt all zeroed out blocks of a page
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (6 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 09/12] fscrypt_encrypt_page: Encrypt all blocks mapped by " Chandan Rajendra
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

For block size < page size, a page can have more than one block mapped
and the range of blocks to be zeroed out by fscrypt_zeroout_range() can
span more than one block. Hence this commit adds code to encrypt all
zeroed out blocks of a page.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/crypto/bio.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index aba22f7..d8904c0 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -193,10 +193,11 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
 {
 	struct fscrypt_ctx *ctx;
 	struct page *ciphertext_page = NULL;
+	unsigned int page_nr_blks;
+	unsigned int offset;
 	struct bio *bio;
 	int ret, err = 0;
-
-	BUG_ON(inode->i_sb->s_blocksize != PAGE_SIZE);
+	int i;
 
 	ctx = fscrypt_get_ctx(inode, GFP_NOFS);
 	if (IS_ERR(ctx))
@@ -208,12 +209,22 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
 		goto errout;
 	}
 
-	while (len--) {
-		err = fscrypt_do_block_crypto(inode, FS_ENCRYPT, lblk,
-					     ZERO_PAGE(0), ciphertext_page,
-					     PAGE_SIZE, 0, GFP_NOFS);
-		if (err)
-			goto errout;
+	page_nr_blks = 1 << (PAGE_SHIFT - inode->i_blkbits);
+
+	while (len) {
+		page_nr_blks = min_t(unsigned int, page_nr_blks, len);
+		offset = 0;
+
+		for (i = 0; i < page_nr_blks; i++) {
+			err = fscrypt_do_block_crypto(inode, FS_ENCRYPT, lblk,
+						ZERO_PAGE(0), ciphertext_page,
+						inode->i_sb->s_blocksize,
+						offset, GFP_NOFS);
+			if (err)
+				goto errout;
+			lblk++;
+			offset += inode->i_sb->s_blocksize;
+		}
 
 		bio = bio_alloc(GFP_NOWAIT, 1);
 		if (!bio) {
@@ -224,9 +235,8 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
 		bio->bi_iter.bi_sector =
 			pblk << (inode->i_sb->s_blocksize_bits - 9);
 		bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-		ret = bio_add_page(bio, ciphertext_page,
-					inode->i_sb->s_blocksize, 0);
-		if (ret != inode->i_sb->s_blocksize) {
+		ret = bio_add_page(bio, ciphertext_page, offset, 0);
+		if (ret != offset) {
 			/* should never happen! */
 			WARN_ON(1);
 			bio_put(bio);
@@ -239,8 +249,8 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
 		bio_put(bio);
 		if (err)
 			goto errout;
-		lblk++;
-		pblk++;
+		pblk += page_nr_blks;
+		len -= page_nr_blks;
 	}
 	err = 0;
 errout:
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 09/12] fscrypt_encrypt_page: Encrypt all blocks mapped by a page
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (7 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 08/12] fscrypt_zeroout_range: Encrypt all zeroed out blocks of a page Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 10/12] ext4: Fix block number passed to fscrypt_encrypt_page Chandan Rajendra
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

This commit changes fscrypt_encrypt_page() to encrypt all the blocks
mapped by a page.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/crypto/crypto.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 2148651..fd2ebe8 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -242,6 +242,7 @@ struct page *fscrypt_encrypt_page(const struct inode *inode,
 				u64 lblk_num, gfp_t gfp_flags)
 
 {
+	unsigned int blocksize;
 	struct fscrypt_ctx *ctx;
 	struct page *ciphertext_page = page;
 	int err;
@@ -271,13 +272,21 @@ struct page *fscrypt_encrypt_page(const struct inode *inode,
 		goto errout;
 
 	ctx->w.control_page = page;
-	err = fscrypt_do_block_crypto(inode, FS_ENCRYPT, lblk_num,
-				     page, ciphertext_page, len, offs,
-				     gfp_flags);
-	if (err) {
-		ciphertext_page = ERR_PTR(err);
-		goto errout;
+
+	blocksize = i_blocksize(inode);
+	while (offs < len) {
+		err = fscrypt_do_block_crypto(inode, FS_ENCRYPT, lblk_num,
+					page, ciphertext_page, blocksize,
+					offs, gfp_flags);
+		if (err) {
+			ciphertext_page = ERR_PTR(err);
+			goto errout;
+		}
+
+		offs += blocksize;
+		++lblk_num;
 	}
+
 	SetPagePrivate(ciphertext_page);
 	set_page_private(ciphertext_page, (unsigned long)ctx);
 	lock_page(ciphertext_page);
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 10/12] ext4: Fix block number passed to fscrypt_encrypt_page
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (8 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 09/12] fscrypt_encrypt_page: Encrypt all blocks mapped by " Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 11/12] ext4: Move encryption code into its own function Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 12/12] ext4: Enable encryption for blocksize less than page size Chandan Rajendra
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

For the case of Block size < Page size, this commit computes the block
number of the first block mapped by the page. This is required by
fscrypt_encrypt_page() for encrypting the blocks mapped by the page.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/ext4/page-io.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index db75901..52f9218 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -480,10 +480,14 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 	if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode) &&
 	    nr_to_submit) {
 		gfp_t gfp_flags = GFP_NOFS;
+		u64 blk_nr;
+
+		blk_nr = page->index << (PAGE_SHIFT - inode->i_blkbits);
 
 	retry_encrypt:
-		data_page = fscrypt_encrypt_page(inode, page, PAGE_SIZE, 0,
-						page->index, gfp_flags);
+		len = roundup(len, i_blocksize(inode));
+		data_page = fscrypt_encrypt_page(inode, page, len, 0, blk_nr,
+						gfp_flags);
 		if (IS_ERR(data_page)) {
 			ret = PTR_ERR(data_page);
 			if (ret == -ENOMEM && wbc->sync_mode == WB_SYNC_ALL) {
@@ -520,7 +524,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 	/* Error stopped previous loop? Clean up buffers... */
 	if (ret) {
 	out:
-		if (data_page)
+		if (data_page && bh == head)
 			fscrypt_restore_control_page(data_page);
 		printk_ratelimited(KERN_ERR "%s: ret = %d\n", __func__, ret);
 		redirty_page_for_writepage(wbc, page);
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 11/12] ext4: Move encryption code into its own function
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (9 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 10/12] ext4: Fix block number passed to fscrypt_encrypt_page Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  2018-05-22 16:01 ` [RFC PATCH V3 12/12] ext4: Enable encryption for blocksize less than page size Chandan Rajendra
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

This commit makes the encryption code in ext4 to be self-contained by
moving the invocation of fscrypt_encrypt_page() and the corresponding
"retry encryption on ENOMEM" code to a function of its own.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/ext4/page-io.c | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 52f9218..55e52d8 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -409,6 +409,33 @@ static int io_submit_add_bh(struct ext4_io_submit *io,
 	return 0;
 }
 
+static struct page *
+encrypt_page(struct ext4_io_submit *io, struct inode *inode,
+	struct page *page, int len, struct writeback_control *wbc)
+{
+	struct page *data_page;
+	gfp_t gfp_flags = GFP_NOFS;
+	u64 blk_nr;
+
+	blk_nr = page->index << (PAGE_SHIFT - inode->i_blkbits);
+
+retry_encrypt:
+	len = roundup(len, i_blocksize(inode));
+	data_page = fscrypt_encrypt_page(inode, page, len, 0, blk_nr,
+					gfp_flags);
+	if (IS_ERR(data_page) && PTR_ERR(data_page) == -ENOMEM
+		&& wbc->sync_mode == WB_SYNC_ALL) {
+		if (io->io_bio) {
+			ext4_io_submit(io);
+			congestion_wait(BLK_RW_ASYNC, HZ/50);
+		}
+		gfp_flags |= __GFP_NOFAIL;
+		goto retry_encrypt;
+	}
+
+	return data_page;
+}
+
 int ext4_bio_write_page(struct ext4_io_submit *io,
 			struct page *page,
 			int len,
@@ -479,25 +506,9 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 
 	if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode) &&
 	    nr_to_submit) {
-		gfp_t gfp_flags = GFP_NOFS;
-		u64 blk_nr;
-
-		blk_nr = page->index << (PAGE_SHIFT - inode->i_blkbits);
-
-	retry_encrypt:
-		len = roundup(len, i_blocksize(inode));
-		data_page = fscrypt_encrypt_page(inode, page, len, 0, blk_nr,
-						gfp_flags);
+		data_page = encrypt_page(io, inode, page, len, wbc);
 		if (IS_ERR(data_page)) {
 			ret = PTR_ERR(data_page);
-			if (ret == -ENOMEM && wbc->sync_mode == WB_SYNC_ALL) {
-				if (io->io_bio) {
-					ext4_io_submit(io);
-					congestion_wait(BLK_RW_ASYNC, HZ/50);
-				}
-				gfp_flags |= __GFP_NOFAIL;
-				goto retry_encrypt;
-			}
 			data_page = NULL;
 			goto out;
 		}
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH V3 12/12] ext4: Enable encryption for blocksize less than page size
  2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
                   ` (10 preceding siblings ...)
  2018-05-22 16:01 ` [RFC PATCH V3 11/12] ext4: Move encryption code into its own function Chandan Rajendra
@ 2018-05-22 16:01 ` Chandan Rajendra
  11 siblings, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-22 16:01 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: Chandan Rajendra, ebiggers3, tytso, linux-ext4, linux-fsdevel

Now that we have all the code to support encryption for block size less
than page size scenario, this commit removes the conditional check in
filesystem mount code.

The commit also changes the support statement in
Documentation/filesystems/fscrypt.rst to reflect the fact that
encryption of filesystems with blocksize less than page size now works.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 Documentation/filesystems/fscrypt.rst | 14 +++++++-------
 fs/ext4/super.c                       |  7 -------
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index cfbc18f..6f0827c 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -202,13 +202,13 @@ modes are not currently supported because of the difficulty of dealing
 with ciphertext expansion.
 
 For file contents, each filesystem block is encrypted independently.
-Currently, only the case where the filesystem block size is equal to
-the system's page size (usually 4096 bytes) is supported.  With the
-XTS mode of operation (recommended), the logical block number within
-the file is used as the IV.  With the CBC mode of operation (not
-recommended), ESSIV is used; specifically, the IV for CBC is the
-logical block number encrypted with AES-256, where the AES-256 key is
-the SHA-256 hash of the inode's data encryption key.
+Starting from Linux kernel 4.18, encryption of filesystems with block
+size less than system's page size is supported.  With the XTS mode of
+operation (recommended), the logical block number within the file is
+used as the IV.  With the CBC mode of operation (not recommended),
+ESSIV is used; specifically, the IV for CBC is the logical block
+number encrypted with AES-256, where the AES-256 key is the SHA-256
+hash of the inode's data encryption key.
 
 For filenames, the full filename is encrypted at once.  Because of the
 requirements to retain support for efficient directory lookups and
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c40dda6..ebfbeea 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4161,13 +4161,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		}
 	}
 
-	if ((DUMMY_ENCRYPTION_ENABLED(sbi) || ext4_has_feature_encrypt(sb)) &&
-	    (blocksize != PAGE_SIZE)) {
-		ext4_msg(sb, KERN_ERR,
-			 "Unsupported blocksize for fs encryption");
-		goto failed_mount_wq;
-	}
-
 	if (DUMMY_ENCRYPTION_ENABLED(sbi) && !sb_rdonly(sb) &&
 	    !ext4_has_feature_encrypt(sb)) {
 		ext4_set_feature_encrypt(sb);
-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-22 16:01 ` [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters Chandan Rajendra
@ 2018-05-25 20:01   ` Theodore Y. Ts'o
  2018-05-28  5:35     ` Chandan Rajendra
  0 siblings, 1 reply; 23+ messages in thread
From: Theodore Y. Ts'o @ 2018-05-25 20:01 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: linux-fscrypt, ebiggers3, linux-ext4, linux-fsdevel

On Tue, May 22, 2018 at 09:31:05PM +0530, Chandan Rajendra wrote:
> This commit introduces a new parameter to mpage_readpage[s]()
> functions. This parameter contains pointers to functions that can be
> used to decrypt data read from the backing device. These are stored in
> the fscrypt_ctx structure and one of these functions is invoked after
> the read operation is completed.
> 
> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>

Can you describe more of what you are doing here; specifically, you
deleted all of fs/ext4/readpage.c --- was this because you moved
functionality back into fs/mpage.c?  Did you make sure all of the
local changes in fs/ext4/readpage was moved back to fs/mpage.c?

If the goal is to refactor code to remove the need for
fs/ext4/readpage.c, you should probably make that be the first patch
as a prerequisite patch.  And we then need to make sure we don't
accidentally break anyone else who might be using fs/mpage.c.  Saying
a bit more about why you think the refactor is a good thing would also
be useful.

	     	   	       	   	    	  - Ted

P.S.  What version of the kernel was these patches against?  I noticed
the patches weren't applying cleanly to the ext4 git tree.  Given how
invasive these patches are, it's not surprising that they are very
version-sensitive.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-25 20:01   ` Theodore Y. Ts'o
@ 2018-05-28  5:35     ` Chandan Rajendra
  2018-05-28 19:34       ` Theodore Y. Ts'o
  0 siblings, 1 reply; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-28  5:35 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: linux-fscrypt, ebiggers3, linux-ext4, linux-fsdevel

On Saturday, May 26, 2018 1:31:21 AM IST Theodore Y. Ts'o wrote:
> On Tue, May 22, 2018 at 09:31:05PM +0530, Chandan Rajendra wrote:
> > This commit introduces a new parameter to mpage_readpage[s]()
> > functions. This parameter contains pointers to functions that can be
> > used to decrypt data read from the backing device. These are stored in
> > the fscrypt_ctx structure and one of these functions is invoked after
> > the read operation is completed.
> > 
> > Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> 
> Can you describe more of what you are doing here; specifically, you
> deleted all of fs/ext4/readpage.c --- was this because you moved
> functionality back into fs/mpage.c?  Did you make sure all of the
> local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> 
> If the goal is to refactor code to remove the need for
> fs/ext4/readpage.c, you should probably make that be the first patch
> as a prerequisite patch.  And we then need to make sure we don't
> accidentally break anyone else who might be using fs/mpage.c.  Saying
> a bit more about why you think the refactor is a good thing would also
> be useful.

I will split this patch into two as suggested by you. Also, I will update 
the commit messages.

The patchset was based on next-20180503 tree. You can obtain it from
"https://github.com/chandanr/linux.git ext4-encryption"

-- 
chandan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-28  5:35     ` Chandan Rajendra
@ 2018-05-28 19:34       ` Theodore Y. Ts'o
  2018-05-29  3:04         ` Chandan Rajendra
  0 siblings, 1 reply; 23+ messages in thread
From: Theodore Y. Ts'o @ 2018-05-28 19:34 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: linux-fscrypt, ebiggers3, linux-ext4, linux-fsdevel

On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > Can you describe more of what you are doing here; specifically, you
> > deleted all of fs/ext4/readpage.c --- was this because you moved
> > functionality back into fs/mpage.c?  Did you make sure all of the
> > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > 
> > If the goal is to refactor code to remove the need for
> > fs/ext4/readpage.c, you should probably make that be the first patch
> > as a prerequisite patch.  And we then need to make sure we don't
> > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > a bit more about why you think the refactor is a good thing would also
> > be useful.
> 
> I will split this patch into two as suggested by you. Also, I will update 
> the commit messages.

Note that I was planning on making changes to fs/ext4/readpage.c as
part of integrating fsverity[1][2] support into ext4.  Basically, I
need to do something like [3] to fs/ext4/readpage.c.

[1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
[2] https://www.youtube.com/watch?v=GlEWcVuRbNA
[3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2

Which is why I'm really interested in your reasoning for why you
propose to drop fs/ext4/readpage.c.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-28 19:34       ` Theodore Y. Ts'o
@ 2018-05-29  3:04         ` Chandan Rajendra
  2018-05-29 17:53           ` Eric Biggers
  0 siblings, 1 reply; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-29  3:04 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: linux-fscrypt, ebiggers3, linux-ext4, linux-fsdevel

On Tuesday, May 29, 2018 1:04:37 AM IST Theodore Y. Ts'o wrote:
> On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > > Can you describe more of what you are doing here; specifically, you
> > > deleted all of fs/ext4/readpage.c --- was this because you moved
> > > functionality back into fs/mpage.c?  Did you make sure all of the
> > > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > > 
> > > If the goal is to refactor code to remove the need for
> > > fs/ext4/readpage.c, you should probably make that be the first patch
> > > as a prerequisite patch.  And we then need to make sure we don't
> > > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > > a bit more about why you think the refactor is a good thing would also
> > > be useful.
> > 
> > I will split this patch into two as suggested by you. Also, I will update 
> > the commit messages.
> 
> Note that I was planning on making changes to fs/ext4/readpage.c as
> part of integrating fsverity[1][2] support into ext4.  Basically, I
> need to do something like [3] to fs/ext4/readpage.c.
> 
> [1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
> [2] https://www.youtube.com/watch?v=GlEWcVuRbNA
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2
> 
> Which is why I'm really interested in your reasoning for why you
> propose to drop fs/ext4/readpage.c.  :-)
> 

The first patchset to support encryption in subpage-blocksize scenario copied
the block_read_full_page() from fs/buffer.c to ext4/readpage.c and had made
changes required to support encryption in that function. However, the
conclusion was to not create copies of existing code but rather add support
for decryption inside generic mpage_readpage[s] functions. Hence this patchset
implements the required decryption logic in the generic mpage_readpage[s]
functions. Since this makes the code in ext4/readpage.c redundant, I had
decided to delete the ext4/readpage.c.

-- 
chandan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-29  3:04         ` Chandan Rajendra
@ 2018-05-29 17:53           ` Eric Biggers
  2018-05-30  3:09             ` Chandan Rajendra
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Biggers @ 2018-05-29 17:53 UTC (permalink / raw)
  To: Chandan Rajendra
  Cc: Theodore Y. Ts'o, linux-fscrypt, linux-ext4, linux-fsdevel

Hi Chandan,

On Tue, May 29, 2018 at 08:34:21AM +0530, Chandan Rajendra wrote:
> On Tuesday, May 29, 2018 1:04:37 AM IST Theodore Y. Ts'o wrote:
> > On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > > > Can you describe more of what you are doing here; specifically, you
> > > > deleted all of fs/ext4/readpage.c --- was this because you moved
> > > > functionality back into fs/mpage.c?  Did you make sure all of the
> > > > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > > > 
> > > > If the goal is to refactor code to remove the need for
> > > > fs/ext4/readpage.c, you should probably make that be the first patch
> > > > as a prerequisite patch.  And we then need to make sure we don't
> > > > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > > > a bit more about why you think the refactor is a good thing would also
> > > > be useful.
> > > 
> > > I will split this patch into two as suggested by you. Also, I will update 
> > > the commit messages.
> > 
> > Note that I was planning on making changes to fs/ext4/readpage.c as
> > part of integrating fsverity[1][2] support into ext4.  Basically, I
> > need to do something like [3] to fs/ext4/readpage.c.
> > 
> > [1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
> > [2] https://www.youtube.com/watch?v=GlEWcVuRbNA
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2
> > 
> > Which is why I'm really interested in your reasoning for why you
> > propose to drop fs/ext4/readpage.c.  :-)
> > 
> 
> The first patchset to support encryption in subpage-blocksize scenario copied
> the block_read_full_page() from fs/buffer.c to ext4/readpage.c and had made
> changes required to support encryption in that function. However, the
> conclusion was to not create copies of existing code but rather add support
> for decryption inside generic mpage_readpage[s] functions. Hence this patchset
> implements the required decryption logic in the generic mpage_readpage[s]
> functions. Since this makes the code in ext4/readpage.c redundant, I had
> decided to delete the ext4/readpage.c.
> 

Strictly speaking, I don't think anything has been "concluded" yet.  The issue,
as I saw it, was that your original patchset just copy-and-pasted lots more
generic code from fs/buffer.c into ext4, without consideration of alternatives
that would allow the code to be shared, such as adding a postprocessing callback
to mpage_readpage{,s}().  My hope was that you would thoughtfully consider the
alternatives and make a decision of what was the best solution, and then explain
that decision as part of your patchset -- not just implement some solution
without much explanation, which makes it very difficult for people to decide
whether it's the best solution or not.

And yes, now that fs-verity is planned to be a thing too, we should stop
thinking of the problem as specifically "how to support decryption", but rather
how to support the ability to post-process the data using potentially multiple
length-preserving postprocessing steps such as decryption,
integrity/authenticity verification, etc.

I'll take a closer look at this patch when I have a chance, but as Ted pointed
out it really needs to be split out into multiple patches.  Just as a
preliminary comment, it looks like you are directly calling into fs/crypto/ from
fs/buffer.c, e.g. fscrypt_enqueue_decrypt_bio().  I don't understand that.  If
you're doing that (which would start requiring that fscrypt be built-in, not
modular) then there should be no need for the filesystem to pass a
postprocessing callback to the generic code, as you could just check
S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode) in generic code to tell whether
decryption needs to be done.  The whole point of the postprocessing callback
would be to allow the generic read code to be used without it having to be aware
of all the specific types of post-read processing that filesystems may want.

Thanks!

- Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-29 17:53           ` Eric Biggers
@ 2018-05-30  3:09             ` Chandan Rajendra
  2018-05-30  5:06               ` Theodore Y. Ts'o
  0 siblings, 1 reply; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-30  3:09 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Theodore Y. Ts'o, linux-fscrypt, linux-ext4, linux-fsdevel

On Tuesday, May 29, 2018 11:23:17 PM IST Eric Biggers wrote:
> Hi Chandan,
> 
> On Tue, May 29, 2018 at 08:34:21AM +0530, Chandan Rajendra wrote:
> > On Tuesday, May 29, 2018 1:04:37 AM IST Theodore Y. Ts'o wrote:
> > > On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > > > > Can you describe more of what you are doing here; specifically, you
> > > > > deleted all of fs/ext4/readpage.c --- was this because you moved
> > > > > functionality back into fs/mpage.c?  Did you make sure all of the
> > > > > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > > > > 
> > > > > If the goal is to refactor code to remove the need for
> > > > > fs/ext4/readpage.c, you should probably make that be the first patch
> > > > > as a prerequisite patch.  And we then need to make sure we don't
> > > > > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > > > > a bit more about why you think the refactor is a good thing would also
> > > > > be useful.
> > > > 
> > > > I will split this patch into two as suggested by you. Also, I will update 
> > > > the commit messages.
> > > 
> > > Note that I was planning on making changes to fs/ext4/readpage.c as
> > > part of integrating fsverity[1][2] support into ext4.  Basically, I
> > > need to do something like [3] to fs/ext4/readpage.c.
> > > 
> > > [1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
> > > [2] https://www.youtube.com/watch?v=GlEWcVuRbNA
> > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2
> > > 
> > > Which is why I'm really interested in your reasoning for why you
> > > propose to drop fs/ext4/readpage.c.  :-)
> > > 
> > 
> > The first patchset to support encryption in subpage-blocksize scenario copied
> > the block_read_full_page() from fs/buffer.c to ext4/readpage.c and had made
> > changes required to support encryption in that function. However, the
> > conclusion was to not create copies of existing code but rather add support
> > for decryption inside generic mpage_readpage[s] functions. Hence this patchset
> > implements the required decryption logic in the generic mpage_readpage[s]
> > functions. Since this makes the code in ext4/readpage.c redundant, I had
> > decided to delete the ext4/readpage.c.
> > 
> 
> Strictly speaking, I don't think anything has been "concluded" yet.  The issue,
> as I saw it, was that your original patchset just copy-and-pasted lots more
> generic code from fs/buffer.c into ext4, without consideration of alternatives
> that would allow the code to be shared, such as adding a postprocessing callback
> to mpage_readpage{,s}().  My hope was that you would thoughtfully consider the
> alternatives and make a decision of what was the best solution, and then explain
> that decision as part of your patchset -- not just implement some solution
> without much explanation, which makes it very difficult for people to decide
> whether it's the best solution or not.
> 
> And yes, now that fs-verity is planned to be a thing too, we should stop
> thinking of the problem as specifically "how to support decryption", but rather
> how to support the ability to post-process the data using potentially multiple
> length-preserving postprocessing steps such as decryption,
> integrity/authenticity verification, etc.
> 
> I'll take a closer look at this patch when I have a chance, but as Ted pointed
> out it really needs to be split out into multiple patches.  Just as a
> preliminary comment, it looks like you are directly calling into fs/crypto/ from
> fs/buffer.c, e.g. fscrypt_enqueue_decrypt_bio().  I don't understand that.  If
> you're doing that (which would start requiring that fscrypt be built-in, not
> modular) then there should be no need for the filesystem to pass a
> postprocessing callback to the generic code, as you could just check
> S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode) in generic code to tell whether
> decryption needs to be done.  The whole point of the postprocessing callback
> would be to allow the generic read code to be used without it having to be aware
> of all the specific types of post-read processing that filesystems may want.

Hi Eric,

I had misunderstood the requirement. Sorry about that. I had written the
patchset in its current form with the understanding that fs/buffer.c and
fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
at all. When the fscrypt module isn't selected for build in the kernel config,
calls to fscrypt_*() functions would end up calling the equivalent nop
functions in fscrypt_notsupp.h file.

For the generic code to be completely unaware of several stages of "post
processing" functionality, I would most likely have to add more callback
pointers into the newly introduced "struct post_process_read" structure. I
will work on this and post the results in the next version of the patchset.


-- 
chandan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-30  3:09             ` Chandan Rajendra
@ 2018-05-30  5:06               ` Theodore Y. Ts'o
  2018-05-30 11:33                 ` Chandan Rajendra
  2018-06-04 10:09                 ` Chandan Rajendra
  0 siblings, 2 replies; 23+ messages in thread
From: Theodore Y. Ts'o @ 2018-05-30  5:06 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: Eric Biggers, linux-fscrypt, linux-ext4, linux-fsdevel

Note: we may want to move this thread so that it's on linux-fscrypt
exclusively.  I'm thinking that we should consider using linux-fscrypt
for fscrypt and fsverity discussions.  That way we can avoid adding
extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?

On Wed, May 30, 2018 at 08:39:11AM +0530, Chandan Rajendra wrote:
> 
> I had misunderstood the requirement. Sorry about that. I had written the
> patchset in its current form with the understanding that fs/buffer.c and
> fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
> at all. When the fscrypt module isn't selected for build in the kernel config,
> calls to fscrypt_*() functions would end up calling the equivalent nop
> functions in fscrypt_notsupp.h file.
> 
> For the generic code to be completely unaware of several stages of "post
> processinhg" functionality, I would most likely have to add more callback
> pointers into the newly introduced "struct post_process_read" structure. 

It's still a work in progress, but I have an initial integration of
ext4 with fsverity.  See the fsverity branch on ext4.git, and in
particular, the changes made to fs/ext4/readpage.c.

Let's be clear that neither Eric and I are completely happy with how
the fsveirty post-read processing is being done.  What's there right
now is a place-holder so we can continue to development/debug the
other aspects of fsverity.  In particular, we're aware that there is
code duplication between code in fs/f2fs/data.c and fs/ext4/readpage.c

One of the things which is a bit tricky right now is that fscrypt and
fsverity can be enabled on a per-file system basis.  That is, there are
separate CONFIG options:

   * CONFIG_EXT4_FS_ENCRYPTION
   * CONFIG_F2FS_FS_ENCRYPTION
   * CONFIG_EXT4_FS_VERITY
   * CONFIG_F2FS_FS_VERITY

And in each file system, you have to do this before including the header files:

#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
#include <linux/fscrypt.h>

#define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
#include <linux/fsverity.ha>

That's because whether you get the function prototype or an inline
function which returns EOhPNOTSUPP for functions such as
fscrypt_decrypt_bio() depends on how __FS_HAS_ENCRYPTION is defined
before including linux/fscrypt.h.

I now think this was a mistake, and that we should handle this the
same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
should be enabled for all file sytems which support that feature.
Otherwise it becomes too hard to try to support this in a file system
independent way --- and we end up having code which is cut-and-pasted
for each file system (e.g., in fs/f2fs/data.c and fs/ext4/readpage.c)
but which may compile to something quite different thanks to the C
preprocessor magic which is going on at the moment.

> I will work on this and post the results in the next version of the patchset.

So in order to avoid your wasting time and energy, and perhaps getting
unnecessarily frustrated, I'd recommend that before you immediately
start diving into implementation, that we have a design discussion
about the best way to proceed.  And then when we have a common
agreement about how to proceed, let's get something upstream first
which changes the infrastructure used by the file systems and by
fscrypt first.  And let's get that working, *before* we start
integrating the changes for supporting fscrypt for 4k blocks for
systems with 32k pagesize, or before we start making final (e.g.,
non-prototype) changes to integrate fsverity.

I'll include my first attempt that I played around over the weekend
with in terms of generic infrastructure, before I realized that we
need to decide whether the current way we configure fscrypt and
fsverity makes sense or not.  It's an example of why we should have
design discussions first, and not just immediately start diving into
code.

Fortunately (perhaps because I've some experience with these sorts of
things) I only spent about an hour or so working on the prototype, and
none trying to integrate it and doing all of the testing and
debugging, before recognizing that we needed to have a meeting of the
minds about design and requirements first.  :-)

Cheers,

						- Ted

------------- include/linux/bio_post_read.h

#ifndef _LINUX_BIO_POST_READ_H
#define _LINUX_BIO_POST_READ_H

#include <linux/bio.h>

enum bio_post_read_step {
	STEP_INITIAL = 0,
	STEP_DECRYPT,
	STEP_VERITY,
};

struct bio_post_read_ctx {
	struct bio *bio;
	struct work_struct work;
	unsigned int cur_step;
	unsigned int enabled_steps;
};

static inline bool bpr_required(struct bio *bio)
{
	return bio->bi_private && !bio->bi_status;
}

extern void __bpr_read_end_io(struct bio *bio);
void bpr_do_processing(struct bio_post_read_ctx *ctx);

#endif /* _LINUX_BIO_POST_READ_H */

------------- fs/bio_post_read.c

// SPDX-License-Identifier: GPL-2.0
/*
 * fs/bio_post_read.c
 *
 * Copyright (C) 2018, Google LLC
 *
 * Contains helper functions used by file systems which use fscrypt
 * and/or fsverity
 */

#include <linux/mempool.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/bio_post_read.h>
#include <linux/pagemap.h>
#include <linux/fscrypt.h>
#include <linux/fsverity.h>

static unsigned int num_prealloc_post_read_ctxs = 128;

module_param(num_prealloc_post_read_ctxs, uint, 0444);
MODULE_PARM_DESC(num_prealloc_post_read_ctxs,
		"Number of bio_post_read contexts to preallocate");

static struct kmem_cache *bpr_ctx_cache;
static mempool_t *bpr_ctx_pool;

void __bpr_read_end_io(struct bio *bio)
{
	struct page *page;
	struct bio_vec *bv;
	int i;

	bio_for_each_segment_all(bv, bio, i) {
		page = bv->bv_page;

		/* PG_error was set if any post_read step failed */
		if (bio->bi_status || PageError(page)) {
			ClearPageUptodate(page);
			SetPageError(page);
		} else {
			SetPageUptodate(page);
		}
		unlock_page(page);
	}
	if (bio->bi_private)
		mempool_free(bio->bi_private, bpr_ctx_pool);
	bio_put(bio);
}

static void decrypt_work(struct work_struct *work)
{
	struct bio_post_read_ctx *ctx =
		container_of(work, struct bio_post_read_ctx, work);

	fscrypt_decrypt_bio(ctx->bio);

	bpr_do_processing(ctx);
}

static void verity_work(struct work_struct *work)
{
	struct bio_post_read_ctx *ctx =
		container_of(work, struct bio_post_read_ctx, work);

	fsverity_verify_bio(ctx->bio);

	bpr_do_processing(ctx);
}

void bpr_do_processing(struct bio_post_read_ctx *ctx)
{
	/*
	 * We use different work queues for decryption and for verity because
	 * verity may require reading metadata pages that need decryption, and
	 * we shouldn't recurse to the same workqueue.
	 */
	switch (++ctx->cur_step) {
	case STEP_DECRYPT:
		if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
			INIT_WORK(&ctx->work, decrypt_work);
			fscrypt_enqueue_decrypt_work(&ctx->work);
			return;
		}
		ctx->cur_step++;
		/* fall-through */
	case STEP_VERITY:
		if (ctx->enabled_steps & (1 << STEP_VERITY)) {
			INIT_WORK(&ctx->work, verity_work);
			fsverity_enqueue_verify_work(&ctx->work);
			return;
		}
		ctx->cur_step++;
		/* fall-through */
	default:
		__bpr_read_end_io(ctx->bio);
	}
}

int __init bpr_init(void)
{
	bpr_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
	if (!bpr_ctx_cache)
		goto fail;
	bpr_ctx_pool = mempool_create_slab_pool(num_prealloc_post_read_ctxs,
						bpr_ctx_cache);
	if (!bpr_ctx_pool)
		goto fail_free_cache;
	return 0;

fail_free_cache:
	kmem_cache_destroy(bpr_ctx_cache);
fail:
	return -ENOMEM;
}

void __exit bpr_exit(void)
{
	mempool_destroy(bpr_ctx_pool);
	kmem_cache_destroy(bpr_ctx_cache);
}

module_init(bpr_init);
module_exit(bpr_exit);
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-30  5:06               ` Theodore Y. Ts'o
@ 2018-05-30 11:33                 ` Chandan Rajendra
  2018-05-30 16:02                   ` Theodore Y. Ts'o
  2018-06-04 10:09                 ` Chandan Rajendra
  1 sibling, 1 reply; 23+ messages in thread
From: Chandan Rajendra @ 2018-05-30 11:33 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Eric Biggers, linux-fscrypt, linux-ext4, linux-fsdevel

On Wednesday, May 30, 2018 10:36:42 AM IST Theodore Y. Ts'o wrote:
> Note: we may want to move this thread so that it's on linux-fscrypt
> exclusively.  I'm thinking that we should consider using linux-fscrypt
> for fscrypt and fsverity discussions.  That way we can avoid adding
> extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?

I agree.

> 
> On Wed, May 30, 2018 at 08:39:11AM +0530, Chandan Rajendra wrote:
> > 
> > I had misunderstood the requirement. Sorry about that. I had written the
> > patchset in its current form with the understanding that fs/buffer.c and
> > fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
> > at all. When the fscrypt module isn't selected for build in the kernel config,
> > calls to fscrypt_*() functions would end up calling the equivalent nop
> > functions in fscrypt_notsupp.h file.
> > 
> > For the generic code to be completely unaware of several stages of "post
> > processinhg" functionality, I would most likely have to add more callback
> > pointers into the newly introduced "struct post_process_read" structure. 
> 
> It's still a work in progress, but I have an initial integration of
> ext4 with fsverity.  See the fsverity branch on ext4.git, and in
> particular, the changes made to fs/ext4/readpage.c.
> 
> Let's be clear that neither Eric and I are completely happy with how
> the fsveirty post-read processing is being done.  What's there right
> now is a place-holder so we can continue to development/debug the
> other aspects of fsverity.  In particular, we're aware that there is
> code duplication between code in fs/f2fs/data.c and fs/ext4/readpage.c
> 
> One of the things which is a bit tricky right now is that fscrypt and
> fsverity can be enabled on a per-file system basis.  That is, there are
> separate CONFIG options:
> 
>    * CONFIG_EXT4_FS_ENCRYPTION
>    * CONFIG_F2FS_FS_ENCRYPTION
>    * CONFIG_EXT4_FS_VERITY
>    * CONFIG_F2FS_FS_VERITY
> 
> And in each file system, you have to do this before including the header files:
> 
> #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
> #include <linux/fscrypt.h>
> 
> #define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
> #include <linux/fsverity.ha>
> 
> That's because whether you get the function prototype or an inline
> function which returns EOhPNOTSUPP for functions such as
> fscrypt_decrypt_bio() depends on how __FS_HAS_ENCRYPTION is defined
> before including linux/fscrypt.h.
> 
> I now think this was a mistake, and that we should handle this the
> same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
> should be enabled for all file sytems which support that feature.

Do we continue to retain ext4/readpage.c? I ask because if the logic in
ext4/readpage.c has to be moved to fs/buffer.c (to avoid code duplication) then 
we end up losing the ability to build fscrypt as a module even if one of
the filesystems using fs/bio_post_read.c is itself built as a module.


> Otherwise it becomes too hard to try to support this in a file system
> independent way --- and we end up having code which is cut-and-pasted
> for each file system (e.g., in fs/f2fs/data.c and fs/ext4/readpage.c)
> but which may compile to something quite different thanks to the C
> preprocessor magic which is going on at the moment.
> 
> > I will work on this and post the results in the next version of the patchset.
> 
> So in order to avoid your wasting time and energy, and perhaps getting
> unnecessarily frustrated, I'd recommend that before you immediately
> start diving into implementation, that we have a design discussion
> about the best way to proceed.  And then when we have a common
> agreement about how to proceed, let's get something upstream first
> which changes the infrastructure used by the file systems and by
> fscrypt first.  And let's get that working, *before* we start
> integrating the changes for supporting fscrypt for 4k blocks for
> systems with 32k pagesize, or before we start making final (e.g.,
> non-prototype) changes to integrate fsverity.
> 
> I'll include my first attempt that I played around over the weekend
> with in terms of generic infrastructure, before I realized that we
> need to decide whether the current way we configure fscrypt and
> fsverity makes sense or not.  It's an example of why we should have
> design discussions first, and not just immediately start diving into
> code.
> 
> Fortunately (perhaps because I've some experience with these sorts of
> things) I only spent about an hour or so working on the prototype, and
> none trying to integrate it and doing all of the testing and
> debugging, before recognizing that we needed to have a meeting of the
> minds about design and requirements first.  :-)
> 
> Cheers,
> 
> 						- Ted
> 
> ------------- include/linux/bio_post_read.h
> 
> #ifndef _LINUX_BIO_POST_READ_H
> #define _LINUX_BIO_POST_READ_H
> 
> #include <linux/bio.h>
> 
> enum bio_post_read_step {
> 	STEP_INITIAL = 0,
> 	STEP_DECRYPT,
> 	STEP_VERITY,
> };
> 
> struct bio_post_read_ctx {
> 	struct bio *bio;
> 	struct work_struct work;
> 	unsigned int cur_step;
> 	unsigned int enabled_steps;
> };
> 
> static inline bool bpr_required(struct bio *bio)
> {
> 	return bio->bi_private && !bio->bi_status;
> }

Relying on bio->bi_private to check for requirement of post processing steps
does not seem to be correct. AFAIK (please correct me if I am wrong) the
convention is that bi_private field is owned by the code that allocates the
bio and the owner can assign its own value to this field.

Ideally we should be checking for per-inode encryption/verity flags to decide
if any post-processing steps are required to be executed.

I had implemented the following in the patchset that I had posted earlier:

bool fscrypt_bio_encrypted(struct bio *bio)
{
	struct address_space *mapping;
	struct inode *inode;
	struct page *page;

	if (bio_op(bio) == REQ_OP_READ && bio->bi_vcnt) {
		page = bio->bi_io_vec->bv_page;

		if (!PageSwapCache(page)) {
			mapping = page_mapping(page);
			if (mapping) {
				inode = mapping->host;

				if (IS_ENCRYPTED(inode) &&
					S_ISREG(inode->i_mode))
					return true;
			}
		}
	}

	return false;
}

> 
> extern void __bpr_read_end_io(struct bio *bio);
> void bpr_do_processing(struct bio_post_read_ctx *ctx);
> 
> #endif /* _LINUX_BIO_POST_READ_H */
> 
> ------------- fs/bio_post_read.c
> 
> // SPDX-License-Identifier: GPL-2.0
> /*
>  * fs/bio_post_read.c
>  *
>  * Copyright (C) 2018, Google LLC
>  *
>  * Contains helper functions used by file systems which use fscrypt
>  * and/or fsverity
>  */
> 
> #include <linux/mempool.h>
> #include <linux/module.h>
> #include <linux/slab.h>
> #include <linux/bio_post_read.h>
> #include <linux/pagemap.h>
> #include <linux/fscrypt.h>
> #include <linux/fsverity.h>
> 
> static unsigned int num_prealloc_post_read_ctxs = 128;
> 
> module_param(num_prealloc_post_read_ctxs, uint, 0444);
> MODULE_PARM_DESC(num_prealloc_post_read_ctxs,
> 		"Number of bio_post_read contexts to preallocate");
> 
> static struct kmem_cache *bpr_ctx_cache;
> static mempool_t *bpr_ctx_pool;
> 
> void __bpr_read_end_io(struct bio *bio)
> {
> 	struct page *page;
> 	struct bio_vec *bv;
> 	int i;
> 
> 	bio_for_each_segment_all(bv, bio, i) {
> 		page = bv->bv_page;
> 
> 		/* PG_error was set if any post_read step failed */
> 		if (bio->bi_status || PageError(page)) {
> 			ClearPageUptodate(page);
> 			SetPageError(page);
> 		} else {
> 			SetPageUptodate(page);
> 		}
> 		unlock_page(page);
> 	}
> 	if (bio->bi_private)
> 		mempool_free(bio->bi_private, bpr_ctx_pool);
> 	bio_put(bio);
> }
> 
> static void decrypt_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fscrypt_decrypt_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> static void verity_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fsverity_verify_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> void bpr_do_processing(struct bio_post_read_ctx *ctx)
> {
> 	/*
> 	 * We use different work queues for decryption and for verity because
> 	 * verity may require reading metadata pages that need decryption, and
> 	 * we shouldn't recurse to the same workqueue.
> 	 */
> 	switch (++ctx->cur_step) {
> 	case STEP_DECRYPT:
> 		if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
> 			INIT_WORK(&ctx->work, decrypt_work);
> 			fscrypt_enqueue_decrypt_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	case STEP_VERITY:
> 		if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> 			INIT_WORK(&ctx->work, verity_work);
> 			fsverity_enqueue_verify_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	default:
> 		__bpr_read_end_io(ctx->bio);
> 	}
> }
> 
> int __init bpr_init(void)
> {
> 	bpr_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
> 	if (!bpr_ctx_cache)
> 		goto fail;
> 	bpr_ctx_pool = mempool_create_slab_pool(num_prealloc_post_read_ctxs,
> 						bpr_ctx_cache);
> 	if (!bpr_ctx_pool)
> 		goto fail_free_cache;
> 	return 0;
> 
> fail_free_cache:
> 	kmem_cache_destroy(bpr_ctx_cache);
> fail:
> 	return -ENOMEM;
> }
> 
> void __exit bpr_exit(void)
> {
> 	mempool_destroy(bpr_ctx_pool);
> 	kmem_cache_destroy(bpr_ctx_cache);
> }
> 
> module_init(bpr_init);
> module_exit(bpr_exit);
> MODULE_LICENSE("GPL");
> 
> 

-- 
chandan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-30 11:33                 ` Chandan Rajendra
@ 2018-05-30 16:02                   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 23+ messages in thread
From: Theodore Y. Ts'o @ 2018-05-30 16:02 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: Eric Biggers, linux-fscrypt

On Wed, May 30, 2018 at 05:03:40PM +0530, Chandan Rajendra wrote:
> On Wednesday, May 30, 2018 10:36:42 AM IST Theodore Y. Ts'o wrote:
> > Note: we may want to move this thread so that it's on linux-fscrypt
> > exclusively.  I'm thinking that we should consider using linux-fscrypt
> > for fscrypt and fsverity discussions.  That way we can avoid adding
> > extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?
> 
> I agree.

OK, I've moved linux-ext4 and linux-fsdevel to bcc.

> > I now think this was a mistake, and that we should handle this the
> > same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
> > should be enabled for all file sytems which support that feature.
> 
> Do we continue to retain ext4/readpage.c? I ask because if the logic in
> ext4/readpage.c has to be moved to fs/buffer.c (to avoid code duplication) then 
> we end up losing the ability to build fscrypt as a module even if one of
> the filesystems using fs/bio_post_read.c is itself built as a module.

It's not necessarily an either-or.  For example, we could use function
pointers to allow avoid needing code in the built-in portion of the
kernel from linking directly to the module.  Just as an illustration,
we could hang a pointer off of the struct super data structure which
is an array of post-processing functions, so each file system can
register post-processing handlers for each "post processing step".

I think the bigger deal is that mpage_readpage() and mpage_readpages()
are used by a large number of file systems, with a variety of
requirements --- including some with high performance requirements.
So if we make changes to fs/mpage.c, we need to make sure we don't
make things worse for them.  In particular, if post-processing isn't
needed, we shouldn't be doing any extra memory allocations or taking
any extra locks.

The somewhat smaller concern is that because mpage_readpages() has to
be used for some very old file systems, it uses a very simple
interface for getting the mapping from a logical block # to the
physical block number.  One of the benefits when I created
fs/ext4/readpage.c was that I dind't need to restrict myself to using
the old get_blocks_t interface, but could call ext4_map_blocks()
directly.  So for a readpages() call, we don't need end up needing to
call ext4_get_block() for each block mapping.  This saves CPU and the
need to take a read spinlock multiple times.  On the other hand,
readpages() is only used for for preallocation reads (so it's not
_quite_ as performance critical) and as an indication, XFS, which
tends to be highly performance sensitive, is using mpage_readpages()
without worrying too much about this issue.

The bottom line is that there is a large set of engineering tradeoffsh
to be made here, which is why we should talk about the approach first.
It may be there are people who can emit large amounts of perfectly
designed code all at once at high speed (perhaps like Mozart, who once
compared is musical composition process to vomiting, to the point
where he was bottlenecked on the speed that he could write notes on
staff paper), but I'm not smart enough to do that.  For code this
complex, doing some design ahead of time is a good thing.  :-)

> Relying on bio->bi_private to check for requirement of post processing steps
> does not seem to be correct. AFAIK (please correct me if I am wrong) the
> convention is that bi_private field is owned by the code that allocates the
> bio and the owner can assign its own value to this field.

Well, there is precedence for common code creating a convention for
using a private field.  For example, page_buffers() uses
page->private.  If a page belongs to a user such as a file system,
it's up the owner to decide how to use the private field, that's true.
However, if it uses fs/mpage.c, fs/mpage.c does use page_buffers(), so
a file system which uses mpage_readpages() has to allow page->private
to be used according to the page_buffers convention.

If we try to use bio->bi_private for fscrypt and fsveirty, *and* we
want to fold this support into common code like fs/mpage.c --- we're
probably OK, since at the moment, the fs/mpage.c code:

   * creates the bio
   * submits the bio
   * handles the bio completion handling

There is no way callers of mpage_readpage() and mpage_readpages() can
override how the bio is created (and hence control setting bi_private)
nor can they provide a bio completion handler (which could consume the
bi_private) field.

> Ideally we should be checking for per-inode encryption/verity flags to decide
> if any post-processing steps are required to be executed

At least for fsverity, it can't be a per-inode flag, since whether or
not verity processing is enabled depends on what part of the inode you
are reading.  (For example, if you are reading the Merkle tree or the
verity footer block, that's file metadata which does need verity
processing.)

We could make that be handled in the post-processing function, which
either returns an ERR_PTR, 0 if whatever work it did was done
immediately and nothing had to be queued on a workqueue, so the
bio-post-read code should try figure out what is the next
post-processing function should be called, and then call it, or 1 if
the work was queued on a workqueue, and so bio-post-read processor
should exit, since the bio-post-read processing function will be
called once the workqueue function is done doing its thing.

I'm not saying that's how we must do things; this is all part of the
design brainstorming process.  It's a lot easier to throw out possible
ideas for discussion than to spend days or weeks coding, only to
discover that it's not the best way forward.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters
  2018-05-30  5:06               ` Theodore Y. Ts'o
  2018-05-30 11:33                 ` Chandan Rajendra
@ 2018-06-04 10:09                 ` Chandan Rajendra
  1 sibling, 0 replies; 23+ messages in thread
From: Chandan Rajendra @ 2018-06-04 10:09 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Eric Biggers, linux-fscrypt, linux-ext4, linux-fsdevel

On Wednesday, May 30, 2018 10:36:42 AM IST Theodore Y. Ts'o wrote:
> Note: we may want to move this thread so that it's on linux-fscrypt
> exclusively.  I'm thinking that we should consider using linux-fscrypt
> for fscrypt and fsverity discussions.  That way we can avoid adding
> extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?
> 
> On Wed, May 30, 2018 at 08:39:11AM +0530, Chandan Rajendra wrote:
> > 
> > I had misunderstood the requirement. Sorry about that. I had written the
> > patchset in its current form with the understanding that fs/buffer.c and
> > fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
> > at all. When the fscrypt module isn't selected for build in the kernel config,
> > calls to fscrypt_*() functions would end up calling the equivalent nop
> > functions in fscrypt_notsupp.h file.
> > 
> > For the generic code to be completely unaware of several stages of "post
> > processinhg" functionality, I would most likely have to add more callback
> > pointers into the newly introduced "struct post_process_read" structure. 
> 
> It's still a work in progress, but I have an initial integration of
> ext4 with fsverity.  See the fsverity branch on ext4.git, and in
> particular, the changes made to fs/ext4/readpage.c.
> 
> Let's be clear that neither Eric and I are completely happy with how
> the fsveirty post-read processing is being done.  What's there right
> now is a place-holder so we can continue to development/debug the
> other aspects of fsverity.  In particular, we're aware that there is
> code duplication between code in fs/f2fs/data.c and fs/ext4/readpage.c
> 
> One of the things which is a bit tricky right now is that fscrypt and
> fsverity can be enabled on a per-file system basis.  That is, there are
> separate CONFIG options:
> 
>    * CONFIG_EXT4_FS_ENCRYPTION
>    * CONFIG_F2FS_FS_ENCRYPTION
>    * CONFIG_EXT4_FS_VERITY
>    * CONFIG_F2FS_FS_VERITY
> 
> And in each file system, you have to do this before including the header files:
> 
> #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
> #include <linux/fscrypt.h>
> 
> #define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
> #include <linux/fsverity.ha>
> 
> That's because whether you get the function prototype or an inline
> function which returns EOhPNOTSUPP for functions such as
> fscrypt_decrypt_bio() depends on how __FS_HAS_ENCRYPTION is defined
> before including linux/fscrypt.h.
> 
> I now think this was a mistake, and that we should handle this the
> same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
> should be enabled for all file sytems which support that feature.
> Otherwise it becomes too hard to try to support this in a file system
> independent way --- and we end up having code which is cut-and-pasted
> for each file system (e.g., in fs/f2fs/data.c and fs/ext4/readpage.c)
> but which may compile to something quite different thanks to the C
> preprocessor magic which is going on at the moment.
> 
> > I will work on this and post the results in the next version of the patchset.
> 
> So in order to avoid your wasting time and energy, and perhaps getting
> unnecessarily frustrated, I'd recommend that before you immediately
> start diving into implementation, that we have a design discussion
> about the best way to proceed.  And then when we have a common
> agreement about how to proceed, let's get something upstream first
> which changes the infrastructure used by the file systems and by
> fscrypt first.  And let's get that working, *before* we start
> integrating the changes for supporting fscrypt for 4k blocks for
> systems with 32k pagesize, or before we start making final (e.g.,
> non-prototype) changes to integrate fsverity.
> 
> I'll include my first attempt that I played around over the weekend
> with in terms of generic infrastructure, before I realized that we
> need to decide whether the current way we configure fscrypt and
> fsverity makes sense or not.  It's an example of why we should have
> design discussions first, and not just immediately start diving into
> code.
> 
> Fortunately (perhaps because I've some experience with these sorts of
> things) I only spent about an hour or so working on the prototype, and
> none trying to integrate it and doing all of the testing and
> debugging, before recognizing that we needed to have a meeting of the
> minds about design and requirements first.  :-)
> 
> Cheers,
> 
> 						- Ted
> 
> ------------- include/linux/bio_post_read.h
> 
> #ifndef _LINUX_BIO_POST_READ_H
> #define _LINUX_BIO_POST_READ_H
> 
> #include <linux/bio.h>
> 
> enum bio_post_read_step {
> 	STEP_INITIAL = 0,
> 	STEP_DECRYPT,
> 	STEP_VERITY,
> };
> 
> struct bio_post_read_ctx {
> 	struct bio *bio;
> 	struct work_struct work;
> 	unsigned int cur_step;
> 	unsigned int enabled_steps;
> };
> 
> static inline bool bpr_required(struct bio *bio)
> {
> 	return bio->bi_private && !bio->bi_status;
> }
> 
> extern void __bpr_read_end_io(struct bio *bio);
> void bpr_do_processing(struct bio_post_read_ctx *ctx);
> 
> #endif /* _LINUX_BIO_POST_READ_H */
> 
> ------------- fs/bio_post_read.c
> 
> // SPDX-License-Identifier: GPL-2.0
> /*
>  * fs/bio_post_read.c
>  *
>  * Copyright (C) 2018, Google LLC
>  *
>  * Contains helper functions used by file systems which use fscrypt
>  * and/or fsverity
>  */
> 
> #include <linux/mempool.h>
> #include <linux/module.h>
> #include <linux/slab.h>
> #include <linux/bio_post_read.h>
> #include <linux/pagemap.h>
> #include <linux/fscrypt.h>
> #include <linux/fsverity.h>
> 
> static unsigned int num_prealloc_post_read_ctxs = 128;
> 
> module_param(num_prealloc_post_read_ctxs, uint, 0444);
> MODULE_PARM_DESC(num_prealloc_post_read_ctxs,
> 		"Number of bio_post_read contexts to preallocate");
> 
> static struct kmem_cache *bpr_ctx_cache;
> static mempool_t *bpr_ctx_pool;
> 
> void __bpr_read_end_io(struct bio *bio)
> {
> 	struct page *page;
> 	struct bio_vec *bv;
> 	int i;
> 
> 	bio_for_each_segment_all(bv, bio, i) {
> 		page = bv->bv_page;
> 
> 		/* PG_error was set if any post_read step failed */
> 		if (bio->bi_status || PageError(page)) {
> 			ClearPageUptodate(page);
> 			SetPageError(page);
> 		} else {
> 			SetPageUptodate(page);
> 		}
> 		unlock_page(page);
> 	}
> 	if (bio->bi_private)
> 		mempool_free(bio->bi_private, bpr_ctx_pool);
> 	bio_put(bio);
> }
> 
> static void decrypt_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fscrypt_decrypt_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> static void verity_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fsverity_verify_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> void bpr_do_processing(struct bio_post_read_ctx *ctx)
> {
> 	/*
> 	 * We use different work queues for decryption and for verity because
> 	 * verity may require reading metadata pages that need decryption, and
> 	 * we shouldn't recurse to the same workqueue.
> 	 */
> 	switch (++ctx->cur_step) {
> 	case STEP_DECRYPT:
> 		if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
> 			INIT_WORK(&ctx->work, decrypt_work);
> 			fscrypt_enqueue_decrypt_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	case STEP_VERITY:
> 		if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> 			INIT_WORK(&ctx->work, verity_work);
> 			fsverity_enqueue_verify_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	default:
> 		__bpr_read_end_io(ctx->bio);
> 	}
> }

decrypt_work() and verity_work() can be defined in fscrypt and fsverity
modules. inode->[i_crypt_info|i_verity_info] can hold the pointers to
decrypt_work() and verity_work() functions. The newly introduced function
pointers in inode->[i_crypt_info|i_verity_info] can be initialized in
fscrypt_get_encryption_info() and create_fsverity_info() respectively.

'struct bio_post_read_ctx' should have a new member to store a pointer to the
bpr_do_processing() function. Once *_work() functions complete their job, they
could invoke ctx->do_processing() to continue with the next stage of bio
processing.


> 
> int __init bpr_init(void)
> {
> 	bpr_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
> 	if (!bpr_ctx_cache)
> 		goto fail;
> 	bpr_ctx_pool = mempool_create_slab_pool(num_prealloc_post_read_ctxs,
> 						bpr_ctx_cache);
> 	if (!bpr_ctx_pool)
> 		goto fail_free_cache;
> 	return 0;
> 
> fail_free_cache:
> 	kmem_cache_destroy(bpr_ctx_cache);
> fail:
> 	return -ENOMEM;
> }
> 
> void __exit bpr_exit(void)
> {
> 	mempool_destroy(bpr_ctx_pool);
> 	kmem_cache_destroy(bpr_ctx_cache);
> }
> 
> module_init(bpr_init);
> module_exit(bpr_exit);
> MODULE_LICENSE("GPL");
> 
> 


-- 
chandan

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-04 10:08 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 16:00 [RFC PATCH V3 00/12] Ext4 encryption support for blocksize < pagesize Chandan Rajendra
2018-05-22 16:00 ` [RFC PATCH V3 01/12] ext4: Clear BH_Uptodate flag on decryption error Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 02/12] Rename fscrypt_do_page_crypto to fscrypt_do_block_crypto Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 03/12] fscrypt_decrypt_page: Decrypt all blocks in a page Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 04/12] __fscrypt_decrypt_bio: Fix page offset and len args to fscrypt_decrypt_page Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 05/12] ext4: Decrypt all boundary blocks when doing buffered write Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 06/12] ext4: Decrypt the block that needs to be partially zeroed Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 07/12] mpage_readpage[s]: Introduce post process callback parameters Chandan Rajendra
2018-05-25 20:01   ` Theodore Y. Ts'o
2018-05-28  5:35     ` Chandan Rajendra
2018-05-28 19:34       ` Theodore Y. Ts'o
2018-05-29  3:04         ` Chandan Rajendra
2018-05-29 17:53           ` Eric Biggers
2018-05-30  3:09             ` Chandan Rajendra
2018-05-30  5:06               ` Theodore Y. Ts'o
2018-05-30 11:33                 ` Chandan Rajendra
2018-05-30 16:02                   ` Theodore Y. Ts'o
2018-06-04 10:09                 ` Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 08/12] fscrypt_zeroout_range: Encrypt all zeroed out blocks of a page Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 09/12] fscrypt_encrypt_page: Encrypt all blocks mapped by " Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 10/12] ext4: Fix block number passed to fscrypt_encrypt_page Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 11/12] ext4: Move encryption code into its own function Chandan Rajendra
2018-05-22 16:01 ` [RFC PATCH V3 12/12] ext4: Enable encryption for blocksize less than page size Chandan Rajendra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).