All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
@ 2015-12-16  3:18 ` Baolin Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2015-12-16  3:18 UTC (permalink / raw)
  To: axboe, agk, snitzer, dm-devel
  Cc: neilb, dan.j.williams, martin.petersen, sagig, kent.overstreet,
	keith.busch, tj, broonie, arnd, linux-block, linux-raid,
	linux-kernel, baolin.wang

From the dm-crypt performance report, we found it shows low efficiency
with crypto engine for some mode (like ecb or xts mode). Because in dm
crypt, it will map the IO data buffer with scatterlist, and send the
scatterlist of one bio to the encryption engine, if send more scatterlists
with bigger size at one time, that helps the engine palys best performance,
which means a high encryption speed. 

But now the dm-crypt only map one segment (always one sector) of one bio
with one scatterlist to the crypto engine at one time. which is more
time-consuming and ineffective for the crypto engine. Especially for some
modes which don't need different IV for each sector, we can map the whole
bio with multiple scatterlists to improve the engine performance.

But this optimization is not support some ciphers and IV modes which should
do sector by sector and need different IV for each sector.

Change since v1:
 - Introduce one different IV mode.
 - Change the conditions for bulk mode.

Baolin Wang (2):
  block: Export the __blk_bios_map_sg() to map one bio
  md: dm-crypt: Introduce the bulk IV mode for bulk crypto

 block/blk-merge.c      |    7 +-
 drivers/md/dm-crypt.c  |  333 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/blkdev.h |    3 +
 3 files changed, 334 insertions(+), 9 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
@ 2015-12-16  3:18 ` Baolin Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2015-12-16  3:18 UTC (permalink / raw)
  To: axboe, agk, snitzer, dm-devel
  Cc: neilb, dan.j.williams, martin.petersen, sagig, kent.overstreet,
	keith.busch, tj, broonie, arnd, linux-block, linux-raid,
	linux-kernel, baolin.wang

>From the dm-crypt performance report, we found it shows low efficiency
with crypto engine for some mode (like ecb or xts mode). Because in dm
crypt, it will map the IO data buffer with scatterlist, and send the
scatterlist of one bio to the encryption engine, if send more scatterlists
with bigger size at one time, that helps the engine palys best performance,
which means a high encryption speed. 

But now the dm-crypt only map one segment (always one sector) of one bio
with one scatterlist to the crypto engine at one time. which is more
time-consuming and ineffective for the crypto engine. Especially for some
modes which don't need different IV for each sector, we can map the whole
bio with multiple scatterlists to improve the engine performance.

But this optimization is not support some ciphers and IV modes which should
do sector by sector and need different IV for each sector.

Change since v1:
 - Introduce one different IV mode.
 - Change the conditions for bulk mode.

Baolin Wang (2):
  block: Export the __blk_bios_map_sg() to map one bio
  md: dm-crypt: Introduce the bulk IV mode for bulk crypto

 block/blk-merge.c      |    7 +-
 drivers/md/dm-crypt.c  |  333 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/blkdev.h |    3 +
 3 files changed, 334 insertions(+), 9 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] block: Export the __blk_bios_map_sg() to map one bio
  2015-12-16  3:18 ` Baolin Wang
  (?)
@ 2015-12-16  3:18 ` Baolin Wang
  -1 siblings, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2015-12-16  3:18 UTC (permalink / raw)
  To: axboe, agk, snitzer, dm-devel
  Cc: neilb, dan.j.williams, martin.petersen, sagig, kent.overstreet,
	keith.busch, tj, broonie, arnd, linux-block, linux-raid,
	linux-kernel, baolin.wang

In dm-crypt, it need to map one bio to scatterlist for improving the
encryption efficiency. Thus this patch exports the __blk_bios_map_sg()
function to map one bio with scatterlists.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 block/blk-merge.c      |    7 ++++---
 include/linux/blkdev.h |    3 +++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..09cc7c4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -331,9 +331,9 @@ new_segment:
 	*bvprv = *bvec;
 }
 
-static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
-			     struct scatterlist *sglist,
-			     struct scatterlist **sg)
+int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
+		      struct scatterlist *sglist,
+		      struct scatterlist **sg)
 {
 	struct bio_vec bvec, bvprv = { NULL };
 	struct bvec_iter iter;
@@ -372,6 +372,7 @@ single_segment:
 
 	return nsegs;
 }
+EXPORT_SYMBOL(__blk_bios_map_sg);
 
 /*
  * map a request to scatterlist, return number of sg entries setup. Caller
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3fe27f8..dd8d10f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1004,6 +1004,9 @@ extern void blk_queue_flush_queueable(struct request_queue *q, bool queueable);
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *);
+extern int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
+			     struct scatterlist *sglist,
+			     struct scatterlist **sg);
 extern void blk_dump_rq_flags(struct request *, char *);
 extern long nr_blockdev_pages(void);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] md: dm-crypt: Introduce the bulk IV mode for bulk crypto
  2015-12-16  3:18 ` Baolin Wang
  (?)
  (?)
@ 2015-12-16  3:18 ` Baolin Wang
  -1 siblings, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2015-12-16  3:18 UTC (permalink / raw)
  To: axboe, agk, snitzer, dm-devel
  Cc: neilb, dan.j.williams, martin.petersen, sagig, kent.overstreet,
	keith.busch, tj, broonie, arnd, linux-block, linux-raid,
	linux-kernel, baolin.wang

In now dm-crypt code, it is ineffective to map one segment (always one
sector) of one bio with just only one scatterlist at one time for hardware
crypto engine. Especially for some encryption mode (like ecb or xts mode)
cooperating with the crypto engine, they just need one initial IV or null
IV instead of different IV for each sector. In this situation We can consider
to use multiple scatterlists to map the whole bio and send all scatterlists
of one bio to crypto engine to encrypt or decrypt, which can improve the
hardware engine's efficiency.

With this optimization, On my test setup (beaglebone black board) using 64KB
I/Os on an eMMC storage device I saw about 60% improvement in throughput for
encrypted writes, and about 100% improvement for encrypted reads. But this
is not fit for other modes which need different IV for each sector.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/md/dm-crypt.c |  333 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 327 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 917d47e..dc2e5e6 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -32,6 +32,7 @@
 #include <linux/device-mapper.h>
 
 #define DM_MSG_PREFIX "crypt"
+#define DM_MAX_SG_LIST	1024
 
 /*
  * context holding the current state of a multi-part conversion
@@ -68,6 +69,8 @@ struct dm_crypt_request {
 	struct convert_context *ctx;
 	struct scatterlist sg_in;
 	struct scatterlist sg_out;
+	struct sg_table sgt_in;
+	struct sg_table sgt_out;
 	sector_t iv_sector;
 };
 
@@ -140,6 +143,7 @@ struct crypt_config {
 	char *cipher;
 	char *cipher_string;
 
+	int bulk_crypto;
 	struct crypt_iv_operations *iv_gen_ops;
 	union {
 		struct iv_essiv_private essiv;
@@ -238,6 +242,9 @@ static struct crypto_ablkcipher *any_tfm(struct crypt_config *cc)
  *
  * plumb: unimplemented, see:
  * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454
+ *
+ * bulk: the initial vector is the 64-bit little-endian version of the sector
+ *	 number, which is used as just one initial IV for one bulk data.
  */
 
 static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv,
@@ -755,6 +762,15 @@ static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv,
 	return r;
 }
 
+static int crypt_iv_bulk_gen(struct crypt_config *cc, u8 *iv,
+			     struct dm_crypt_request *dmreq)
+{
+	memset(iv, 0, cc->iv_size);
+	*(__le64 *)iv = cpu_to_le64(dmreq->iv_sector);
+
+	return 0;
+}
+
 static struct crypt_iv_operations crypt_iv_plain_ops = {
 	.generator = crypt_iv_plain_gen
 };
@@ -799,6 +815,10 @@ static struct crypt_iv_operations crypt_iv_tcw_ops = {
 	.post	   = crypt_iv_tcw_post
 };
 
+static struct crypt_iv_operations crypt_iv_bulk_ops = {
+	.generator = crypt_iv_bulk_gen
+};
+
 static void crypt_convert_init(struct crypt_config *cc,
 			       struct convert_context *ctx,
 			       struct bio *bio_out, struct bio *bio_in,
@@ -833,6 +853,11 @@ static u8 *iv_of_dmreq(struct crypt_config *cc,
 		crypto_ablkcipher_alignmask(any_tfm(cc)) + 1);
 }
 
+static int crypt_is_bulk_mode(struct crypt_config *cc)
+{
+	return cc->bulk_crypto;
+}
+
 static int crypt_convert_block(struct crypt_config *cc,
 			       struct convert_context *ctx,
 			       struct ablkcipher_request *req)
@@ -881,24 +906,40 @@ static int crypt_convert_block(struct crypt_config *cc,
 
 static void kcryptd_async_done(struct crypto_async_request *async_req,
 			       int error);
+static void kcryptd_async_all_done(struct crypto_async_request *async_req,
+				   int error);
 
 static void crypt_alloc_req(struct crypt_config *cc,
 			    struct convert_context *ctx)
 {
 	unsigned key_index = ctx->cc_sector & (cc->tfms_count - 1);
+	struct dm_crypt_request *dmreq;
 
 	if (!ctx->req)
 		ctx->req = mempool_alloc(cc->req_pool, GFP_NOIO);
 
+	dmreq = dmreq_of_req(cc, ctx->req);
+	dmreq->sgt_in.orig_nents = 0;
+	dmreq->sgt_out.orig_nents = 0;
+
 	ablkcipher_request_set_tfm(ctx->req, cc->tfms[key_index]);
 
 	/*
 	 * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 	 * requests if driver request queue is full.
 	 */
-	ablkcipher_request_set_callback(ctx->req,
-	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-	    kcryptd_async_done, dmreq_of_req(cc, ctx->req));
+	if (crypt_is_bulk_mode(cc))
+		ablkcipher_request_set_callback(ctx->req,
+						CRYPTO_TFM_REQ_MAY_BACKLOG
+						| CRYPTO_TFM_REQ_MAY_SLEEP,
+						kcryptd_async_all_done,
+						dmreq_of_req(cc, ctx->req));
+	else
+		ablkcipher_request_set_callback(ctx->req,
+						CRYPTO_TFM_REQ_MAY_BACKLOG
+						| CRYPTO_TFM_REQ_MAY_SLEEP,
+						kcryptd_async_done,
+						dmreq_of_req(cc, ctx->req));
 }
 
 static void crypt_free_req(struct crypt_config *cc,
@@ -911,6 +952,221 @@ static void crypt_free_req(struct crypt_config *cc,
 }
 
 /*
+ * Check how many sg entry numbers are needed when map one bio
+ * with scatterlist in advance.
+ */
+static unsigned int crypt_sg_entry(struct bio *bio_t)
+{
+	struct request_queue *q = bdev_get_queue(bio_t->bi_bdev);
+	int cluster = blk_queue_cluster(q);
+	struct bio_vec bvec, bvprv = { NULL };
+	struct bvec_iter biter;
+	unsigned long nbytes = 0, sg_length = 0;
+	unsigned int sg_cnt = 0;
+
+	if (bio_t->bi_rw & REQ_DISCARD) {
+		if (bio_t->bi_vcnt)
+			return 1;
+		return 0;
+	}
+
+	if (bio_t->bi_rw & REQ_WRITE_SAME)
+		return 1;
+
+	bio_for_each_segment(bvec, bio_t, biter) {
+		nbytes = bvec.bv_len;
+
+		if (!cluster) {
+			sg_cnt++;
+			continue;
+		}
+
+		if (sg_length + nbytes > queue_max_segment_size(q)) {
+			sg_length = nbytes;
+			sg_cnt++;
+			goto next;
+		}
+
+		if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bvec)) {
+			sg_length = nbytes;
+			sg_cnt++;
+			goto next;
+		}
+
+		if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bvec)) {
+			sg_length = nbytes;
+			sg_cnt++;
+			goto next;
+		}
+
+		sg_length += nbytes;
+next:
+		memcpy(&bvprv, &bvec, sizeof(struct bio_vec));
+	}
+
+	return sg_cnt;
+}
+
+static int crypt_convert_all_blocks(struct crypt_config *cc,
+				   struct convert_context *ctx,
+				   struct ablkcipher_request *req)
+{
+	struct dm_crypt_io *io =
+		container_of(ctx, struct dm_crypt_io, ctx);
+	struct dm_crypt_request *dmreq = dmreq_of_req(cc, req);
+	u8 *iv = iv_of_dmreq(cc, dmreq);
+	struct bio *orig_bio = io->base_bio;
+	struct bio *bio_in = ctx->bio_in;
+	struct bio *bio_out = ctx->bio_out;
+	unsigned int total_bytes = orig_bio->bi_iter.bi_size;
+	struct scatterlist *sg_in = NULL;
+	struct scatterlist *sg_out = NULL;
+	struct scatterlist *sg = NULL;
+	unsigned int total_sg_len_in = 0;
+	unsigned int total_sg_len_out = 0;
+	unsigned int sg_in_max = 0, sg_out_max = 0;
+	int ret;
+
+	dmreq->iv_sector = ctx->cc_sector;
+	dmreq->ctx = ctx;
+
+	/*
+	 * Need to calculate how many sg entry need to be used
+	 * for this bio.
+	 */
+	sg_in_max = crypt_sg_entry(bio_in) + 1;
+	if (sg_in_max > DM_MAX_SG_LIST || sg_in_max <= 0) {
+		DMERR("%s sg entry too large or none %d\n",
+		      __func__, sg_in_max);
+		return -EINVAL;
+	} else if (sg_in_max == 2) {
+		sg_in = &dmreq->sg_in;
+	}
+
+	if (!sg_in) {
+		ret = sg_alloc_table(&dmreq->sgt_in, sg_in_max, GFP_KERNEL);
+		if (ret) {
+			DMERR("%s sg in allocation failed\n", __func__);
+			return -ENOMEM;
+		}
+
+		sg_in = dmreq->sgt_in.sgl;
+	}
+
+	total_sg_len_in = __blk_bios_map_sg(bdev_get_queue(bio_in->bi_bdev),
+					    bio_in, sg_in, &sg);
+	if ((total_sg_len_in <= 0)
+	    || (total_sg_len_in > sg_in_max)) {
+		DMERR("%s in sg map error %d, sg_in_max[%d]\n",
+		      __func__, total_sg_len_in, sg_in_max);
+		return -EINVAL;
+	}
+
+	if (sg)
+		sg_mark_end(sg);
+
+	ctx->iter_in.bi_size -= total_bytes;
+
+	if (bio_data_dir(orig_bio) == READ)
+		goto set_crypt;
+
+	sg_out_max = crypt_sg_entry(bio_out) + 1;
+	if (sg_out_max > DM_MAX_SG_LIST || sg_out_max <= 0) {
+		DMERR("%s sg entry too large or none %d\n",
+		      __func__, sg_out_max);
+		return -EINVAL;
+	} else if (sg_out_max == 2) {
+		sg_out = &dmreq->sg_out;
+	}
+
+	if (!sg_out) {
+		ret = sg_alloc_table(&dmreq->sgt_out, sg_out_max, GFP_KERNEL);
+		if (ret) {
+			DMERR("%s sg out allocation failed\n", __func__);
+			return -ENOMEM;
+		}
+
+		sg_out = dmreq->sgt_out.sgl;
+	}
+
+	sg = NULL;
+	total_sg_len_out = __blk_bios_map_sg(bdev_get_queue(bio_out->bi_bdev),
+					     bio_out, sg_out, &sg);
+	if ((total_sg_len_out <= 0) ||
+	    (total_sg_len_out > sg_out_max)) {
+		DMERR("%s out sg map error %d, sg_out_max[%d]\n",
+		      __func__, total_sg_len_out, sg_out_max);
+		return -EINVAL;
+	}
+
+	if (sg)
+		sg_mark_end(sg);
+
+	ctx->iter_out.bi_size -= total_bytes;
+set_crypt:
+	if (cc->iv_gen_ops) {
+		ret = cc->iv_gen_ops->generator(cc, iv, dmreq);
+		if (ret < 0) {
+			DMERR("%s generator iv error %d\n", __func__, ret);
+			return ret;
+		}
+	}
+
+	if (bio_data_dir(orig_bio) == WRITE) {
+		ablkcipher_request_set_crypt(req, sg_in,
+					     sg_out, total_bytes, iv);
+
+		ret = crypto_ablkcipher_encrypt(req);
+	} else {
+		ablkcipher_request_set_crypt(req, sg_in,
+					     sg_in, total_bytes, iv);
+
+		ret = crypto_ablkcipher_decrypt(req);
+	}
+
+	if (!ret && cc->iv_gen_ops && cc->iv_gen_ops->post)
+		ret = cc->iv_gen_ops->post(cc, iv, dmreq);
+
+	return ret;
+}
+
+/*
+ * Encrypt / decrypt data from one whole bio at one time.
+ */
+static int crypt_convert_io(struct crypt_config *cc,
+			    struct convert_context *ctx)
+{
+	int r;
+
+	atomic_set(&ctx->cc_pending, 1);
+	crypt_alloc_req(cc, ctx);
+	atomic_inc(&ctx->cc_pending);
+
+	r = crypt_convert_all_blocks(cc, ctx, ctx->req);
+	switch (r) {
+	case -EBUSY:
+		/*
+		 * Lets make this synchronous bio by waiting on
+		 * in progress as well.
+		 */
+	case -EINPROGRESS:
+		wait_for_completion(&ctx->restart);
+		ctx->req = NULL;
+		break;
+	case 0:
+		atomic_dec(&ctx->cc_pending);
+		cond_resched();
+		break;
+	/* There was an error while processing the request. */
+	default:
+		atomic_dec(&ctx->cc_pending);
+		return r;
+	}
+
+	return 0;
+}
+
+/*
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
 static int crypt_convert(struct crypt_config *cc,
@@ -1070,12 +1326,18 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
 	struct crypt_config *cc = io->cc;
 	struct bio *base_bio = io->base_bio;
 	int error = io->error;
+	struct dm_crypt_request *dmreq;
 
 	if (!atomic_dec_and_test(&io->io_pending))
 		return;
 
-	if (io->ctx.req)
+	if (io->ctx.req) {
+		dmreq = dmreq_of_req(cc, io->ctx.req);
+		sg_free_table(&dmreq->sgt_out);
+		sg_free_table(&dmreq->sgt_in);
+
 		crypt_free_req(cc, io->ctx.req, base_bio);
+	}
 
 	base_bio->bi_error = error;
 	bio_endio(base_bio);
@@ -1312,7 +1574,11 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	sector += bio_sectors(clone);
 
 	crypt_inc_pending(io);
-	r = crypt_convert(cc, &io->ctx);
+	if (crypt_is_bulk_mode(cc))
+		r = crypt_convert_io(cc, &io->ctx);
+	else
+		r = crypt_convert(cc, &io->ctx);
+
 	if (r)
 		io->error = -EIO;
 	crypt_finished = atomic_dec_and_test(&io->ctx.cc_pending);
@@ -1342,7 +1608,11 @@ static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 	crypt_convert_init(cc, &io->ctx, io->base_bio, io->base_bio,
 			   io->sector);
 
-	r = crypt_convert(cc, &io->ctx);
+	if (crypt_is_bulk_mode(cc))
+		r = crypt_convert_io(cc, &io->ctx);
+	else
+		r = crypt_convert(cc, &io->ctx);
+
 	if (r < 0)
 		io->error = -EIO;
 
@@ -1387,6 +1657,40 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 		kcryptd_crypt_write_io_submit(io, 1);
 }
 
+static void kcryptd_async_all_done(struct crypto_async_request *async_req,
+			       int error)
+{
+	struct dm_crypt_request *dmreq = async_req->data;
+	struct convert_context *ctx = dmreq->ctx;
+	struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx);
+	struct crypt_config *cc = io->cc;
+
+	if (error == -EINPROGRESS)
+		return;
+
+	if (!error && cc->iv_gen_ops && cc->iv_gen_ops->post)
+		error = cc->iv_gen_ops->post(cc, iv_of_dmreq(cc, dmreq), dmreq);
+
+	if (error < 0)
+		io->error = error;
+
+	sg_free_table(&dmreq->sgt_out);
+	sg_free_table(&dmreq->sgt_in);
+
+	crypt_free_req(cc, req_of_dmreq(cc, dmreq), io->base_bio);
+
+	if (!atomic_dec_and_test(&ctx->cc_pending)) {
+		complete(&io->ctx.restart);
+		return;
+	}
+
+	complete(&io->ctx.restart);
+	if (bio_data_dir(io->base_bio) == READ)
+		kcryptd_crypt_read_done(io);
+	else
+		kcryptd_crypt_write_io_submit(io, 1);
+}
+
 static void kcryptd_crypt(struct work_struct *work)
 {
 	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
@@ -1633,6 +1937,21 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		goto bad_mem;
 	}
 
+	/*
+	 * Here we need to check if it can be encrypted or decrypted with
+	 * bulk block, which means these encryption modes don't need IV or
+	 * just need one initial IV. For bulk mode, we can expand the
+	 * scatterlist entries to map the bio, then send all the scatterlists
+	 * to the hardware engine at one time to improve the crypto engine
+	 * efficiency. But it does not fit for other IV modes, it has to do
+	 * encryption and decryption sector by sector because every sector
+	 * has different IV.
+	 */
+	if (!ivmode || !strcmp(ivmode, "bulk") || !strcmp(ivmode, "null"))
+		cc->bulk_crypto = 1;
+	else
+		cc->bulk_crypto = 0;
+
 	/* Allocate cipher */
 	ret = crypt_alloc_tfms(cc, cipher_api);
 	if (ret < 0) {
@@ -1680,6 +1999,8 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		cc->iv_gen_ops = &crypt_iv_tcw_ops;
 		cc->key_parts += 2; /* IV + whitening */
 		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
+	} else if (strcmp(ivmode, "bulk") == 0) {
+		cc->iv_gen_ops = &crypt_iv_bulk_ops;
 	} else {
 		ret = -EINVAL;
 		ti->error = "Invalid IV mode";
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2015-12-16  3:18 ` Baolin Wang
                   ` (2 preceding siblings ...)
  (?)
@ 2015-12-16  8:08 ` Milan Broz
  2015-12-16  8:31   ` Baolin Wang
  2015-12-17  7:37   ` Baolin Wang
  -1 siblings, 2 replies; 20+ messages in thread
From: Milan Broz @ 2015-12-16  8:08 UTC (permalink / raw)
  To: Baolin Wang, axboe, agk, snitzer, dm-devel
  Cc: neilb, dan.j.williams, martin.petersen, sagig, kent.overstreet,
	keith.busch, tj, broonie, arnd, linux-block, linux-raid,
	linux-kernel

On 12/16/2015 04:18 AM, Baolin Wang wrote:
> From the dm-crypt performance report, we found it shows low efficiency
> with crypto engine for some mode (like ecb or xts mode). Because in dm
> crypt, it will map the IO data buffer with scatterlist, and send the
> scatterlist of one bio to the encryption engine, if send more scatterlists
> with bigger size at one time, that helps the engine palys best performance,
> which means a high encryption speed. 
> 
> But now the dm-crypt only map one segment (always one sector) of one bio
> with one scatterlist to the crypto engine at one time. which is more
> time-consuming and ineffective for the crypto engine. Especially for some
> modes which don't need different IV for each sector, we can map the whole
> bio with multiple scatterlists to improve the engine performance.
> 
> But this optimization is not support some ciphers and IV modes which should
> do sector by sector and need different IV for each sector.
> 
> Change since v1:
>  - Introduce one different IV mode.
>  - Change the conditions for bulk mode.

I tried the patchset on 32bit Intel VM and kernel immediately OOPsed (just tried aes-ecb)...

Crash log below.

Milan


[   40.989759] BUG: unable to handle kernel NULL pointer dereference at   (null)
[   40.990736] IP: [<f8710d26>] crypt_sg_entry+0x186/0x270 [dm_crypt]
[   40.990800] *pde = 00000000 
[   40.990844] Oops: 0000 [#1] PREEMPT SMP 
[   40.990961] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   40.991412] CPU: 2 PID: 6 Comm: kworker/u8:0 Not tainted 4.4.0-rc5+ #44
[   40.991460] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   40.991531] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[   40.991587] task: f4c04180 ti: f4c06000 task.ti: f4c06000
[   40.991629] EIP: 0060:[<f8710d26>] EFLAGS: 00010246 CPU: 2
[   40.991672] EIP is at crypt_sg_entry+0x186/0x270 [dm_crypt]
[   40.991725] EAX: 00001000 EBX: 00001000 ECX: f73e85c0 EDX: 00000000
[   40.991772] ESI: 00000000 EDI: 00001000 EBP: f4c07e28 ESP: f4c07de8
[   40.991819]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   40.991862] CR0: 8005003b CR2: 00000000 CR3: 018c1000 CR4: 001406d0
[   40.991949] Stack:
[   40.991976]  f49967c8 00000000 00000000 00000000 00000000 01000058 00000000 f73e85c0
[   40.992173]  00000000 f4baf170 00001000 f4ba7290 00000000 f4baf030 f4baf170 f4baf0c0
[   40.992387]  f4c07e60 f8710e8a f4baf170 f4c07e4c f4baf170 f4baf114 f4baf160 f8713fe8
[   40.992599] Call Trace:
[   40.992647]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   40.992715]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   40.992781]  [<c1056e23>] process_one_work+0x153/0x420
[   40.992842]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   40.992905]  [<c1057127>] worker_thread+0x37/0x470
[   40.992964]  [<c10570f0>] ? process_one_work+0x420/0x420
[   40.993026]  [<c105c326>] kthread+0x96/0xb0
[   40.993083]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   40.993146]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0
[   40.993207] Code: c2 01 31 f6 85 db 75 d1 89 55 e0 85 ff 0f 85 41 ff ff ff 8b 55 d8 8d 65 f4 89 d0 5b 5e 5f 5d c3 90 8d 74 26 00 8b 55 c8 8b 4d dc <8b> 02 89 4d dc 89 45 c8 c1 e8 1a c1 e0 04 8b 80 80 a2 0b c2 83
[   40.995405] EIP: [<f8710d26>] crypt_sg_entry+0x186/0x270 [dm_crypt] SS:ESP 0068:f4c07de8
[   40.995604] CR2: 0000000000000000
[   40.995703] ---[ end trace d78b89aae913dc1f ]---
[   40.995825] ------------[ cut here ]------------
[   40.995930] WARNING: CPU: 2 PID: 6 at kernel/softirq.c:150 __local_bh_enable_ip+0x88/0xd0()
[   40.996118] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul
[   40.996352] BUG: unable to handle kernel NULL pointer dereference at   (null)
[   40.996354] IP: [<f8710d26>] crypt_sg_entry+0x186/0x270 [dm_crypt]
[   40.996357] *pde = 00000000 
[   40.996359] Oops: 0000 [#2] PREEMPT SMP 
[   40.996361] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   40.996368] CPU: 3 PID: 53 Comm: kworker/u8:1 Tainted: G      D         4.4.0-rc5+ #44
[   40.996369] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   40.996371] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[   40.996372] task: f489c0c0 ti: f489e000 task.ti: f489e000
[   40.996373] EIP: 0060:[<f8710d26>] EFLAGS: 00010246 CPU: 3
[   40.996375] EIP is at crypt_sg_entry+0x186/0x270 [dm_crypt]
[   40.996375] EAX: 00001000 EBX: 00001000 ECX: f71e3a20 EDX: 00000000
[   40.996376] ESI: 00000000 EDI: 00010000 EBP: f489fe28 ESP: f489fde8
[   40.996377]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   40.996377] CR0: 8005003b CR2: 00000000 CR3: 33c26000 CR4: 001406d0
[   40.996410] Stack:
[   40.996411]  f49967c8 00000000 00000000 00000000 00000000 01000058 00000000 f71e3a20
[   40.996413]  00000000 f4bed370 00001000 f4b2e7c0 00000000 f4bed230 f4bed370 f4bed2c0
[   40.996415]  f489fe60 f8710e8a f4bed370 f489fe4c f4bed370 f4bed314 f4bed360 f8713fe8
[   40.996418] Call Trace:
[   40.996421]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   40.996423]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   40.996426]  [<c1056e23>] process_one_work+0x153/0x420
[   40.996428]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   40.996430]  [<c1057127>] worker_thread+0x37/0x470
[   40.996432]  [<c10570f0>] ? process_one_work+0x420/0x420
[   40.996433]  [<c105c326>] kthread+0x96/0xb0
[   40.996436]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   40.996438]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0
[   40.996439] Code: c2 01 31 f6 85 db 75 d1 89 55 e0 85 ff 0f 85 41 ff ff ff 8b 55 d8 8d 65 f4 89 d0 5b 5e 5f 5d c3 90 8d 74 26 00 8b 55 c8 8b 4d dc <8b> 02 89 4d dc 89 45 c8 c1 e8 1a c1 e0 04 8b 80 80 a2 0b c2 83
[   40.996453] EIP: [<f8710d26>] crypt_sg_entry+0x186/0x270 [dm_crypt] SS:ESP 0068:f489fde8
[   40.996455] CR2: 0000000000000000
[   40.996456] ---[ end trace d78b89aae913dc20 ]---
[   40.996459] ------------[ cut here ]------------
[   40.996461] WARNING: CPU: 3 PID: 53 at kernel/softirq.c:150 __local_bh_enable_ip+0x88/0xd0()
[   40.996461] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   40.996465] CPU: 3 PID: 53 Comm: kworker/u8:1 Tainted: G      D         4.4.0-rc5+ #44
[   40.996466] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   40.996469]  00000000 00000000 f489fc64 c12a98f2 00000000 f489fc7c c1042757 c10455a8
[   40.996471]  00000201 c10bf91e f489c0c0 f489fc8c c10427ff 00000009 00000000 f489fc9c
[   40.996473]  c10455a8 c16d07e0 c16d0660 f489fca8 c14e49ca f489c0c0 f489fcbc c10bf91e
[   40.996475] Call Trace:
[   40.996477]  [<c12a98f2>] dump_stack+0x4b/0x79
[   40.996479]  [<c1042757>] warn_slowpath_common+0x67/0xa0
[   40.996480]  [<c10455a8>] ? __local_bh_enable_ip+0x88/0xd0
[   40.996482]  [<c10bf91e>] ? cgroup_exit+0x4e/0xc0
[   40.996484]  [<c10427ff>] warn_slowpath_null+0xf/0x20
[   40.996486]  [<c10455a8>] __local_bh_enable_ip+0x88/0xd0
[   40.996488]  [<c14e49ca>] _raw_spin_unlock_bh+0x2a/0x30
[   40.996490]  [<c10bf91e>] cgroup_exit+0x4e/0xc0
[   40.996491]  [<c1043674>] do_exit+0x224/0x920
[   40.996494]  [<c1092065>] ? kmsg_dump+0x105/0x180
[   40.996496]  [<c1004f21>] oops_end+0x61/0x90
[   40.996498]  [<c1038965>] no_context+0xf5/0x210
[   40.996500]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   40.996501]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   40.996502]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   40.996504]  [<c1064022>] ? try_to_wake_up+0x182/0x340
[   40.996505]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996507]  [<c103924b>] do_page_fault+0xb/0x10
[   40.996508]  [<c14e601f>] error_code+0x5f/0x64
[   40.996509]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996510]  [<f8710d26>] ? crypt_sg_entry+0x186/0x270 [dm_crypt]
[   40.996511]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   40.996512]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   40.996514]  [<c1056e23>] process_one_work+0x153/0x420
[   40.996515]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   40.996516]  [<c1057127>] worker_thread+0x37/0x470
[   40.996517]  [<c10570f0>] ? process_one_work+0x420/0x420
[   40.996518]  [<c105c326>] kthread+0x96/0xb0
[   40.996519]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   40.996520]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0
[   40.996521] ---[ end trace d78b89aae913dc21 ]---
[   40.996547] BUG: unable to handle kernel paging request at ffffffc8
[   40.996548] IP: [<c105c75a>] kthread_data+0xa/0x10
[   40.996551] *pde = 018c3067 *pte = 00000000 
[   40.996552] Oops: 0000 [#3] PREEMPT SMP 
[   40.996554] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   40.996560] CPU: 3 PID: 53 Comm: kworker/u8:1 Tainted: G      D W       4.4.0-rc5+ #44
[   40.996560] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   40.996564] task: f489c0c0 ti: f489e000 task.ti: f489e000
[   40.996565] EIP: 0060:[<c105c75a>] EFLAGS: 00010002 CPU: 3
[   40.996566] EIP is at kthread_data+0xa/0x10
[   40.996567] EAX: 00000000 EBX: 00000003 ECX: 18a6743d EDX: 00000003
[   40.996567] ESI: f489c0c0 EDI: c18b6e00 EBP: f489fc80 ESP: f489fc78
[   40.996568]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   40.996568] CR0: 8005003b CR2: 00000014 CR3: 33c26000 CR4: 001406d0
[   40.996594] Stack:
[   40.996595]  c1057e5b f489c478 f489fcb0 c14e0083 00000003 f5fb6e10 f5fb6e00 c129283a
[   40.996597]  00000000 f5fb6e00 f489c0c0 f48a0000 f489f9b8 f489fce0 f489fcbc c14e0612
[   40.996598]  f489c0c0 f489fcf4 c1043a2e f489c3cc f57e8040 00000001 00000002 f489c3cc
[   40.996600] Call Trace:
[   40.996602]  [<c1057e5b>] ? wq_worker_sleeping+0xb/0x90
[   40.996603]  [<c14e0083>] __schedule+0x6a3/0xad0
[   40.996605]  [<c129283a>] ? put_io_context_active+0xaa/0xd0
[   40.996607]  [<c14e0612>] schedule+0x32/0x80
[   40.996609]  [<c1043a2e>] do_exit+0x5de/0x920
[   40.996610]  [<c1004f21>] oops_end+0x61/0x90
[   40.996612]  [<c1038965>] no_context+0xf5/0x210
[   40.996614]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   40.996616]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   40.996617]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   40.996619]  [<c1064022>] ? try_to_wake_up+0x182/0x340
[   40.996621]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996622]  [<c103924b>] do_page_fault+0xb/0x10
[   40.996623]  [<c14e601f>] error_code+0x5f/0x64
[   40.996625]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996627]  [<f8710d26>] ? crypt_sg_entry+0x186/0x270 [dm_crypt]
[   40.996629]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   40.996630]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   40.996631]  [<c1056e23>] process_one_work+0x153/0x420
[   40.996632]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   40.996633]  [<c1057127>] worker_thread+0x37/0x470
[   40.996634]  [<c10570f0>] ? process_one_work+0x420/0x420
[   40.996635]  [<c105c326>] kthread+0x96/0xb0
[   40.996636]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   40.996637]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0
[   40.996638] Code: 00 3b 5f 34 74 0f 89 f8 e8 04 83 48 00 83 c4 78 5b 5e 5f 5d c3 8b 4f 28 eb bf 8d b4 26 00 00 00 00 55 8b 80 64 03 00 00 89 e5 5d <8b> 40 c8 c3 66 90 55 b9 04 00 00 00 89 e5 83 ec 04 8b 90 64 03
[   40.996652] EIP: [<c105c75a>] kthread_data+0xa/0x10 SS:ESP 0068:f489fc78
[   40.996654] CR2: 00000000ffffffc8
[   40.996655] ---[ end trace d78b89aae913dc22 ]---
[   40.996655] Fixing recursive fault but reboot is needed!
[   40.996656] BUG: scheduling while atomic: kworker/u8:1/53/0x00000003
[   40.996657] INFO: lockdep is turned off.
[   40.996657] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   40.996660] irq event stamp: 11388
[   40.996661] hardirqs last  enabled at (11387): [<c14e4a62>] _raw_spin_unlock_irq+0x22/0x50
[   40.996662] hardirqs last disabled at (11388): [<c14e489d>] _raw_spin_lock_irq+0xd/0x60
[   40.996664] softirqs last  enabled at (11314): [<c10fcddc>] wb_wakeup_delayed+0x4c/0x60
[   40.996666] softirqs last disabled at (11310): [<c10fcdb6>] wb_wakeup_delayed+0x26/0x60
[   40.996667] Preemption disabled at:[<c1004f21>] oops_end+0x61/0x90
[   40.996668] 
[   40.996669] CPU: 3 PID: 53 Comm: kworker/u8:1 Tainted: G      D W       4.4.0-rc5+ #44
[   40.996670] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   40.996674]  00000000 00000000 f489fb00 c12a98f2 f489c0c0 f489fb0c c106059f f48a0000
[   40.996677]  f489fb3c c14e0168 00000003 00000009 f5fb6e00 f489fb40 00000000 f5fb6e00
[   40.996680]  f489c0c0 f48a0000 00000009 f489c0c0 f489fb48 c14e0612 f489c0c0 f489fb84
[   40.996683] Call Trace:
[   40.996685]  [<c12a98f2>] dump_stack+0x4b/0x79
[   40.996687]  [<c106059f>] __schedule_bug+0x5f/0xb0
[   40.996688]  [<c14e0168>] __schedule+0x788/0xad0
[   40.996690]  [<c14e0612>] schedule+0x32/0x80
[   40.996691]  [<c1043b7b>] do_exit+0x72b/0x920
[   40.996692]  [<c1092065>] ? kmsg_dump+0x105/0x180
[   40.996693]  [<c1004f21>] oops_end+0x61/0x90
[   40.996694]  [<c1038965>] no_context+0xf5/0x210
[   40.996695]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   40.996697]  [<c106a54a>] ? update_curr+0x9a/0x130
[   40.996698]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996699]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   40.996700]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   40.996701]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996702]  [<c103924b>] do_page_fault+0xb/0x10
[   40.996703]  [<c14e601f>] error_code+0x5f/0x64
[   40.996704]  [<c106007b>] ? ttwu_stat+0x5b/0x200
[   40.996705]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996706]  [<c105c75a>] ? kthread_data+0xa/0x10
[   40.996707]  [<c1057e5b>] ? wq_worker_sleeping+0xb/0x90
[   40.996708]  [<c14e0083>] __schedule+0x6a3/0xad0
[   40.996709]  [<c129283a>] ? put_io_context_active+0xaa/0xd0
[   40.996710]  [<c14e0612>] schedule+0x32/0x80
[   40.996711]  [<c1043a2e>] do_exit+0x5de/0x920
[   40.996712]  [<c1004f21>] oops_end+0x61/0x90
[   40.996713]  [<c1038965>] no_context+0xf5/0x210
[   40.996715]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   40.996716]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   40.996717]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   40.996718]  [<c1064022>] ? try_to_wake_up+0x182/0x340
[   40.996719]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996720]  [<c103924b>] do_page_fault+0xb/0x10
[   40.996721]  [<c14e601f>] error_code+0x5f/0x64
[   40.996722]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   40.996723]  [<f8710d26>] ? crypt_sg_entry+0x186/0x270 [dm_crypt]
[   40.996724]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   40.996726]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   40.996727]  [<c1056e23>] process_one_work+0x153/0x420
[   40.996728]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   40.996729]  [<c1057127>] worker_thread+0x37/0x470
[   40.996730]  [<c10570f0>] ? process_one_work+0x420/0x420
[   40.996731]  [<c105c326>] kthread+0x96/0xb0
[   40.996733]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   40.996735]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0
[   41.023333]  crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   41.023559] CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: G      D W       4.4.0-rc5+ #44
[   41.023649] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   41.023757]  00000000 00000000 f4c07c64 c12a98f2 00000000 f4c07c7c c1042757 c10455a8
[   41.023963]  00000201 c10bf91e f4c04180 f4c07c8c c10427ff 00000009 00000000 f4c07c9c
[   41.024168]  c10455a8 c16d07e0 c16d0660 f4c07ca8 c14e49ca f4c04180 f4c07cbc c10bf91e
[   41.024373] Call Trace:
[   41.024421]  [<c12a98f2>] dump_stack+0x4b/0x79
[   41.024477]  [<c1042757>] warn_slowpath_common+0x67/0xa0
[   41.024537]  [<c10455a8>] ? __local_bh_enable_ip+0x88/0xd0
[   41.024599]  [<c10bf91e>] ? cgroup_exit+0x4e/0xc0
[   41.024655]  [<c10427ff>] warn_slowpath_null+0xf/0x20
[   41.024714]  [<c10455a8>] __local_bh_enable_ip+0x88/0xd0
[   41.024775]  [<c14e49ca>] _raw_spin_unlock_bh+0x2a/0x30
[   41.024834]  [<c10bf91e>] cgroup_exit+0x4e/0xc0
[   41.024890]  [<c1043674>] do_exit+0x224/0x920
[   41.024945]  [<c1092065>] ? kmsg_dump+0x105/0x180
[   41.025002]  [<c1004f21>] oops_end+0x61/0x90
[   41.025057]  [<c1038965>] no_context+0xf5/0x210
[   41.025113]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   41.025176]  [<c10b5f0b>] ? __module_text_address+0xb/0x60
[   41.025236]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   41.025295]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   41.025353]  [<c1004d0e>] ? print_context_stack+0x4e/0xb0
[   41.025412]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.025471]  [<c103924b>] do_page_fault+0xb/0x10
[   41.025527]  [<c14e601f>] error_code+0x5f/0x64
[   41.025582]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.025642]  [<f8710d26>] ? crypt_sg_entry+0x186/0x270 [dm_crypt]
[   41.025705]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   41.025768]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   41.025831]  [<c1056e23>] process_one_work+0x153/0x420
[   41.026756]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   41.026816]  [<c1057127>] worker_thread+0x37/0x470
[   41.026873]  [<c10570f0>] ? process_one_work+0x420/0x420
[   41.026932]  [<c105c326>] kthread+0x96/0xb0
[   41.026987]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   41.027047]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0
[   41.027106] ---[ end trace d78b89aae913dc23 ]---
[   41.027229] BUG: unable to handle kernel paging request at ffffffc8
[   41.027349] IP: [<c105c75a>] kthread_data+0xa/0x10
[   41.027455] *pde = 018c3067 *pte = 00000000 
[   41.027568] Oops: 0000 [#4] PREEMPT SMP 
[   41.027755] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   41.029586] CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: G      D W       4.4.0-rc5+ #44
[   41.029826] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   41.030943] task: f4c04180 ti: f4c06000 task.ti: f4c06000
[   41.031137] EIP: 0060:[<c105c75a>] EFLAGS: 00010002 CPU: 2
[   41.031341] EIP is at kthread_data+0xa/0x10
[   41.031536] EAX: 00000000 EBX: 00000002 ECX: 18c2e78b EDX: 00000002
[   41.031687] ESI: f4c04180 EDI: c18b6e00 EBP: f4c07c80 ESP: f4c07c78
[   41.031802]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   41.031955] CR0: 8005003b CR2: 00000014 CR3: 018c1000 CR4: 001406d0
[   41.032207] Stack:
[   41.032410]  c1057e5b f4c04538 f4c07cb0 c14e0083 00000002 f5e65e10 f5e65e00 c129283a
[   41.032772]  00000000 f5e65e00 f4c04180 f4c08000 f4c079b8 f4c07ce0 f4c07cbc c14e0612
[   41.035807]  f4c04180 f4c07cf4 c1043a2e f4c0448c f57e8040 00000001 00000002 f4c0448c
[   41.037146] Call Trace:
[   41.037353]  [<c1057e5b>] ? wq_worker_sleeping+0xb/0x90
[   41.037556]  [<c14e0083>] __schedule+0x6a3/0xad0
[   41.037759]  [<c129283a>] ? put_io_context_active+0xaa/0xd0
[   41.038744]  [<c14e0612>] schedule+0x32/0x80
[   41.038883]  [<c1043a2e>] do_exit+0x5de/0x920
[   41.039067]  [<c1004f21>] oops_end+0x61/0x90
[   41.039263]  [<c1038965>] no_context+0xf5/0x210
[   41.039455]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   41.039655]  [<c10b5f0b>] ? __module_text_address+0xb/0x60
[   41.039850]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   41.040025]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   41.040222]  [<c1004d0e>] ? print_context_stack+0x4e/0xb0
[   41.040373]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.040543]  [<c103924b>] do_page_fault+0xb/0x10
[   41.040765]  [<c14e601f>] error_code+0x5f/0x64
[   41.040984]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.041210]  [<f8710d26>] ? crypt_sg_entry+0x186/0x270 [dm_crypt]
[   41.041435]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   41.041650]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   41.042746]  [<c1056e23>] process_one_work+0x153/0x420
[   41.042928]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   41.043123]  [<c1057127>] worker_thread+0x37/0x470
[   41.043337]  [<c10570f0>] ? process_one_work+0x420/0x420
[   41.043450]  [<c105c326>] kthread+0x96/0xb0
[   41.043529]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   41.043653]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0
[   41.043831] Code: 00 3b 5f 34 74 0f 89 f8 e8 04 83 48 00 83 c4 78 5b 5e 5f 5d c3 8b 4f 28 eb bf 8d b4 26 00 00 00 00 55 8b 80 64 03 00 00 89 e5 5d <8b> 40 c8 c3 66 90 55 b9 04 00 00 00 89 e5 83 ec 04 8b 90 64 03
[   41.055486] EIP: [<c105c75a>] kthread_data+0xa/0x10 SS:ESP 0068:f4c07c78
[   41.055692] CR2: 00000000ffffffc8
[   41.055803] ---[ end trace d78b89aae913dc24 ]---
[   41.055917] Fixing recursive fault but reboot is needed!
[   41.056038] BUG: scheduling while atomic: kworker/u8:0/6/0x00000003
[   41.056162] INFO: lockdep is turned off.
[   41.056353] Modules linked in: dm_crypt loop rpcsec_gss_krb5 dm_mod crc32_pclmul crc32c_intel ata_piix aesni_intel aes_i586 lrw ablk_helper cryptd
[   41.056783] irq event stamp: 17002
[   41.056833] hardirqs last  enabled at (17001): [<c108d686>] __raw_spin_lock_init+0x16/0x50
[   41.056944] hardirqs last disabled at (17002): [<c14e601b>] error_code+0x5b/0x64
[   41.057103] softirqs last  enabled at (16962): [<c10bf868>] cgroup_post_fork+0x68/0xd0
[   41.057335] softirqs last disabled at (16960): [<c10bf822>] cgroup_post_fork+0x22/0xd0
[   41.057522] Preemption disabled at:[<c1004f21>] oops_end+0x61/0x90
[   41.057719] 
[   41.057819] CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: G      D W       4.4.0-rc5+ #44
[   41.058887] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   41.059066]  00000000 00000000 f4c07b00 c12a98f2 f4c04180 f4c07b0c c106059f f4c08000
[   41.059774]  f4c07b3c c14e0168 00000002 00000009 f5e65e00 f4c07b40 00000000 f5e65e00
[   41.060519]  f4c04180 f4c08000 00000009 f4c04180 f4c07b48 c14e0612 f4c04180 f4c07b84
[   41.060728] Call Trace:
[   41.060776]  [<c12a98f2>] dump_stack+0x4b/0x79
[   41.060833]  [<c106059f>] __schedule_bug+0x5f/0xb0
[   41.060891]  [<c14e0168>] __schedule+0x788/0xad0
[   41.060979]  [<c14e0612>] schedule+0x32/0x80
[   41.061067]  [<c1043b7b>] do_exit+0x72b/0x920
[   41.061148]  [<c1092065>] ? kmsg_dump+0x105/0x180
[   41.061250]  [<c1004f21>] oops_end+0x61/0x90
[   41.061306]  [<c1038965>] no_context+0xf5/0x210
[   41.061362]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   41.061425]  [<c106a54a>] ? update_curr+0x9a/0x130
[   41.061482]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.061542]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   41.061602]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   41.061661]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.061720]  [<c103924b>] do_page_fault+0xb/0x10
[   41.061789]  [<c14e601f>] error_code+0x5f/0x64
[   41.062866]  [<c106007b>] ? ttwu_stat+0x5b/0x200
[   41.063053]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.063247]  [<c105c75a>] ? kthread_data+0xa/0x10
[   41.063434]  [<c1057e5b>] ? wq_worker_sleeping+0xb/0x90
[   41.063625]  [<c14e0083>] __schedule+0x6a3/0xad0
[   41.063818]  [<c129283a>] ? put_io_context_active+0xaa/0xd0
[   41.064043]  [<c14e0612>] schedule+0x32/0x80
[   41.064762]  [<c1043a2e>] do_exit+0x5de/0x920
[   41.064944]  [<c1004f21>] oops_end+0x61/0x90
[   41.065133]  [<c1038965>] no_context+0xf5/0x210
[   41.065313]  [<c1038b1c>] __bad_area_nosemaphore+0x9c/0x150
[   41.065505]  [<c10b5f0b>] ? __module_text_address+0xb/0x60
[   41.065692]  [<c1038bdd>] bad_area_nosemaphore+0xd/0x10
[   41.066731]  [<c1038e0f>] __do_page_fault+0x6f/0x4a0
[   41.066848]  [<c1004d0e>] ? print_context_stack+0x4e/0xb0
[   41.067012]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.067231]  [<c103924b>] do_page_fault+0xb/0x10
[   41.067500]  [<c14e601f>] error_code+0x5f/0x64
[   41.067689]  [<c1039240>] ? __do_page_fault+0x4a0/0x4a0
[   41.067881]  [<f8710d26>] ? crypt_sg_entry+0x186/0x270 [dm_crypt]
[   41.068069]  [<f8710e8a>] crypt_convert_io+0x7a/0x360 [dm_crypt]
[   41.068259]  [<f87131e5>] kcryptd_crypt+0x395/0x3da [dm_crypt]
[   41.068417]  [<c1056e23>] process_one_work+0x153/0x420
[   41.068534]  [<c1056dd0>] ? process_one_work+0x100/0x420
[   41.068709]  [<c1057127>] worker_thread+0x37/0x470
[   41.068917]  [<c10570f0>] ? process_one_work+0x420/0x420
[   41.069118]  [<c105c326>] kthread+0x96/0xb0
[   41.069307]  [<c14e5389>] ret_from_kernel_thread+0x21/0x38
[   41.069434]  [<c105c290>] ? kthread_worker_fn+0xf0/0xf0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2015-12-16  8:08 ` [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency Milan Broz
@ 2015-12-16  8:31   ` Baolin Wang
  2015-12-17  7:37   ` Baolin Wang
  1 sibling, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2015-12-16  8:31 UTC (permalink / raw)
  To: Milan Broz
  Cc: Jens Axboe, Alasdair G Kergon, Mike Snitzer, dm-devel, neilb,
	dan.j.williams, martin.petersen, sagig, Kent Overstreet,
	keith.busch, tj, Mark Brown, Arnd Bergmann, linux-block,
	linux-raid, LKML

On 16 December 2015 at 16:08, Milan Broz <gmazyland@gmail.com> wrote:
> On 12/16/2015 04:18 AM, Baolin Wang wrote:
>> From the dm-crypt performance report, we found it shows low efficiency
>> with crypto engine for some mode (like ecb or xts mode). Because in dm
>> crypt, it will map the IO data buffer with scatterlist, and send the
>> scatterlist of one bio to the encryption engine, if send more scatterlists
>> with bigger size at one time, that helps the engine palys best performance,
>> which means a high encryption speed.
>>
>> But now the dm-crypt only map one segment (always one sector) of one bio
>> with one scatterlist to the crypto engine at one time. which is more
>> time-consuming and ineffective for the crypto engine. Especially for some
>> modes which don't need different IV for each sector, we can map the whole
>> bio with multiple scatterlists to improve the engine performance.
>>
>> But this optimization is not support some ciphers and IV modes which should
>> do sector by sector and need different IV for each sector.
>>
>> Change since v1:
>>  - Introduce one different IV mode.
>>  - Change the conditions for bulk mode.
>
> I tried the patchset on 32bit Intel VM and kernel immediately OOPsed (just tried aes-ecb)...
>

I'm sorry for that. I'll check why it will crash though it can work
well on my beaglebone black board. Thanks.

> Crash log below.




-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2015-12-16  8:08 ` [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency Milan Broz
  2015-12-16  8:31   ` Baolin Wang
@ 2015-12-17  7:37   ` Baolin Wang
  2016-01-02 22:46     ` Milan Broz
  1 sibling, 1 reply; 20+ messages in thread
From: Baolin Wang @ 2015-12-17  7:37 UTC (permalink / raw)
  To: Milan Broz
  Cc: Jens Axboe, Alasdair G Kergon, Mike Snitzer, dm-devel, neilb,
	dan.j.williams, martin.petersen, sagig, Kent Overstreet,
	keith.busch, tj, Mark Brown, Arnd Bergmann, linux-block,
	linux-raid, LKML

Hi Milan,

On 16 December 2015 at 16:08, Milan Broz <gmazyland@gmail.com> wrote:
> On 12/16/2015 04:18 AM, Baolin Wang wrote:
>> From the dm-crypt performance report, we found it shows low efficiency
>> with crypto engine for some mode (like ecb or xts mode). Because in dm
>> crypt, it will map the IO data buffer with scatterlist, and send the
>> scatterlist of one bio to the encryption engine, if send more scatterlists
>> with bigger size at one time, that helps the engine palys best performance,
>> which means a high encryption speed.
>>
>> But now the dm-crypt only map one segment (always one sector) of one bio
>> with one scatterlist to the crypto engine at one time. which is more
>> time-consuming and ineffective for the crypto engine. Especially for some
>> modes which don't need different IV for each sector, we can map the whole
>> bio with multiple scatterlists to improve the engine performance.
>>
>> But this optimization is not support some ciphers and IV modes which should
>> do sector by sector and need different IV for each sector.
>>
>> Change since v1:
>>  - Introduce one different IV mode.
>>  - Change the conditions for bulk mode.
>
> I tried the patchset on 32bit Intel VM and kernel immediately OOPsed (just tried aes-ecb)...
>

I've checked the code and I guess some macros I used with different
definitions on different arch. Could you please try the new patchset
with some optimization on your platform? It can work well on my arm
board. Thanks.



-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2015-12-17  7:37   ` Baolin Wang
@ 2016-01-02 22:46     ` Milan Broz
  2016-01-04  6:58       ` Baolin Wang
  2016-01-04 20:13       ` Mark Brown
  0 siblings, 2 replies; 20+ messages in thread
From: Milan Broz @ 2016-01-02 22:46 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Jens Axboe, Alasdair G Kergon, Mike Snitzer, dm-devel, neilb,
	dan.j.williams, martin.petersen, sagig, Kent Overstreet,
	keith.busch, tj, Mark Brown, Arnd Bergmann, linux-block,
	linux-raid, LKML

On 12/17/2015 08:37 AM, Baolin Wang wrote:
> Hi Milan,
> 
> On 16 December 2015 at 16:08, Milan Broz <gmazyland@gmail.com> wrote:
>> On 12/16/2015 04:18 AM, Baolin Wang wrote:
>>> From the dm-crypt performance report, we found it shows low efficiency
>>> with crypto engine for some mode (like ecb or xts mode). Because in dm
>>> crypt, it will map the IO data buffer with scatterlist, and send the
>>> scatterlist of one bio to the encryption engine, if send more scatterlists
>>> with bigger size at one time, that helps the engine palys best performance,
>>> which means a high encryption speed.
>>>
>>> But now the dm-crypt only map one segment (always one sector) of one bio
>>> with one scatterlist to the crypto engine at one time. which is more
>>> time-consuming and ineffective for the crypto engine. Especially for some
>>> modes which don't need different IV for each sector, we can map the whole
>>> bio with multiple scatterlists to improve the engine performance.
>>>
>>> But this optimization is not support some ciphers and IV modes which should
>>> do sector by sector and need different IV for each sector.
>>>
>>> Change since v1:
>>>  - Introduce one different IV mode.
>>>  - Change the conditions for bulk mode.
>>
>> I tried the patchset on 32bit Intel VM and kernel immediately OOPsed (just tried aes-ecb)...
>>
> 
> I've checked the code and I guess some macros I used with different
> definitions on different arch. Could you please try the new patchset
> with some optimization on your platform? It can work well on my arm
> board. Thanks.

Sorry for delay, I tried to compile it.
It doesn't crash now, but it also does not work.

You usage of IV in XTS mode is not correct - it cannot just work this way,
you have to initialize IV after each block. And just one write not aligned
to your large XTS block will corrupt it.

Did you tried to _read_ data you write to the device?

See this test :

# create  device with your patch
$ echo "test"|cryptsetup create -s 512 -c aes-xts-bulk tst /dev/sdg

# prepare random test file
$ dd if=/dev/urandom of=/src.img bs=1M count=16

# now copy the file to the plaintext device and drop caches
$ dd if=/src.img of=/dev/mapper/tst bs=1M count=16

$ echo 3 > /proc/sys/vm/drop_caches

# and verify that we are (not) reading the same data ...

$ dd if=/dev/mapper/tst of=/dst1.img bs=1M count=16

$ sha256sum /src.img /dst1.img 
5401119fa9975bbeebac58e0b2598bc87247a29e62417f9f58fe200b531602ad  /src.img
e9bf5efa95031fdb5adf618db141f48ed23f71b12c017b8a0cbe0a694f18b979  /dst1.img

(I think only first page-sized block is correct, because without direct-io
it writes in page-sized IOs.)


... or just try to mkfs and mount it
$ mkfs -t ext4  /dev/mapper/tst 

mke2fs 1.42.13 (17-May-2015)
Creating filesystem with 262144 4k blocks and 65536 inodes
...

$ mount /dev/mapper/tst /mnt/tst
mount: wrong fs type, bad option, bad superblock on /dev/mapper/tst,
       missing codepage or helper program, or other error


You approach simply does not work. (It will probably work for ECB mode but it is
unusable in real world.)


Anyway, I think that you should optimize driver, not add strange hw-dependent
crypto modes to dmcrypt. This is not the first crypto accelerator that is just not
suited for this kind of use.

(If it can process batch of chunks of data each with own IV, then it can work
with dmcrypt, but I think such optimized code should be inside crypto API,
not in dmcrypt.)

Milan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-02 22:46     ` Milan Broz
@ 2016-01-04  6:58       ` Baolin Wang
  2016-01-04 20:13       ` Mark Brown
  1 sibling, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2016-01-04  6:58 UTC (permalink / raw)
  To: Milan Broz
  Cc: Jens Axboe, Alasdair G Kergon, Mike Snitzer, dm-devel, neilb,
	dan.j.williams, martin.petersen, sagig, Kent Overstreet,
	keith.busch, tj, Mark Brown, Arnd Bergmann, linux-block,
	linux-raid, LKML

Hi Milan,

On 3 January 2016 at 06:46, Milan Broz <gmazyland@gmail.com> wrote:
>
> Sorry for delay, I tried to compile it.
> It doesn't crash now, but it also does not work.
>
> You usage of IV in XTS mode is not correct - it cannot just work this way,
> you have to initialize IV after each block. And just one write not aligned
> to your large XTS block will corrupt it.
>
> Did you tried to _read_ data you write to the device?
>
> See this test :
>
> # create  device with your patch
> $ echo "test"|cryptsetup create -s 512 -c aes-xts-bulk tst /dev/sdg
>
> # prepare random test file
> $ dd if=/dev/urandom of=/src.img bs=1M count=16
>
> # now copy the file to the plaintext device and drop caches
> $ dd if=/src.img of=/dev/mapper/tst bs=1M count=16
>
> $ echo 3 > /proc/sys/vm/drop_caches
>
> # and verify that we are (not) reading the same data ...
>
> $ dd if=/dev/mapper/tst of=/dst1.img bs=1M count=16
>
> $ sha256sum /src.img /dst1.img
> 5401119fa9975bbeebac58e0b2598bc87247a29e62417f9f58fe200b531602ad  /src.img
> e9bf5efa95031fdb5adf618db141f48ed23f71b12c017b8a0cbe0a694f18b979  /dst1.img
>
> (I think only first page-sized block is correct, because without direct-io
> it writes in page-sized IOs.)
>
>
> ... or just try to mkfs and mount it
> $ mkfs -t ext4  /dev/mapper/tst
>
> mke2fs 1.42.13 (17-May-2015)
> Creating filesystem with 262144 4k blocks and 65536 inodes
> ...
>
> $ mount /dev/mapper/tst /mnt/tst
> mount: wrong fs type, bad option, bad superblock on /dev/mapper/tst,
>        missing codepage or helper program, or other error
>
>
> You approach simply does not work. (It will probably work for ECB mode but it is
> unusable in real world.)
>
>
> Anyway, I think that you should optimize driver, not add strange hw-dependent
> crypto modes to dmcrypt. This is not the first crypto accelerator that is just not
> suited for this kind of use.

Very grateful for your feedback. I'm sorry I didn't check much data
correctness, mostly focus on the encryption speed. It looks like there
are something wrong when I follow your test procedure. I will optimize
the driver and need to be known much about XTS mode to check why it
can not work. Thanks.

>
> (If it can process batch of chunks of data each with own IV, then it can work
> with dmcrypt, but I think such optimized code should be inside crypto API,
> not in dmcrypt.)
>
> Milan



-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-02 22:46     ` Milan Broz
  2016-01-04  6:58       ` Baolin Wang
@ 2016-01-04 20:13       ` Mark Brown
  2016-01-06  6:49         ` Baolin Wang
  2016-01-12 23:31         ` [dm-devel] " Mikulas Patocka
  1 sibling, 2 replies; 20+ messages in thread
From: Mark Brown @ 2016-01-04 20:13 UTC (permalink / raw)
  To: Milan Broz
  Cc: Baolin Wang, Jens Axboe, Alasdair G Kergon, Mike Snitzer,
	dm-devel, neilb, dan.j.williams, martin.petersen, sagig,
	Kent Overstreet, keith.busch, tj, Arnd Bergmann, linux-block,
	linux-raid, LKML

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

On Sat, Jan 02, 2016 at 11:46:08PM +0100, Milan Broz wrote:

> Anyway, I think that you should optimize driver, not add strange hw-dependent
> crypto modes to dmcrypt. This is not the first crypto accelerator that is just not
> suited for this kind of use.

> (If it can process batch of chunks of data each with own IV, then it can work
> with dmcrypt, but I think such optimized code should be inside crypto API,
> not in dmcrypt.)

The flip side of this is there is an awful lot of hardware out there
that has basically this pattern and if we can make the difference
between people being able to encrypt or not encrypt their storage due to
performance then that seems like a win.  Getting hardware changes isn't
going to be a fast process.  From a brief look at the crypto layer it
does look there may be things we can do there, if only in terms of
factoring out the common patterns for driving the queue of operations
into the hardware so it's easy for drivers to do the best thing.  

One thing that occurs to me for the IV programming that has been
proposed for SPI by Martin Sparl (and seen good results on Raspberry PI)
is to insert transfers programming the crypto engine into the stream of
DMA operations so we can keep the hardware busy.  It won't work with
every SoC out there but it will work with a lot of them, it's what
hardware that explicitly supports this will be doing internally.  It's
the sort of thing that would benefit from factoring out, it's a lot of
hassle to implement per driver.

The main thing the out of tree req-dm-crypt code is doing was using a
larger block size which does seem like a reasonable thing to allow
people to tune for performance tradeofffs but I undertand that's a lot
harder to achieve in a good way than one might hope.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-04 20:13       ` Mark Brown
@ 2016-01-06  6:49         ` Baolin Wang
  2016-01-12 23:31         ` [dm-devel] " Mikulas Patocka
  1 sibling, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2016-01-06  6:49 UTC (permalink / raw)
  To: Mark Brown
  Cc: Milan Broz, Jens Axboe, Alasdair G Kergon, Mike Snitzer,
	dm-devel, neilb, dan.j.williams, martin.petersen, sagig,
	Kent Overstreet, keith.busch, tj, Arnd Bergmann, linux-block,
	linux-raid, LKML

On 5 January 2016 at 04:13, Mark Brown <broonie@kernel.org> wrote:
> On Sat, Jan 02, 2016 at 11:46:08PM +0100, Milan Broz wrote:
>
>> Anyway, I think that you should optimize driver, not add strange hw-dependent
>> crypto modes to dmcrypt. This is not the first crypto accelerator that is just not
>> suited for this kind of use.
>
>> (If it can process batch of chunks of data each with own IV, then it can work
>> with dmcrypt, but I think such optimized code should be inside crypto API,
>> not in dmcrypt.)
>
> The flip side of this is there is an awful lot of hardware out there
> that has basically this pattern and if we can make the difference
> between people being able to encrypt or not encrypt their storage due to
> performance then that seems like a win.  Getting hardware changes isn't

Yeah, Now many vendors will supply AES hardware engine to improve the
encryption speed. Like Qualcom or Spreadtrum, they both support the
AES engine, which can handle the IV things in engine interior. So it
can handle bulk data with just one initial IV, which is implemented in
the out of tree req-dm-crypt code. That is why Milan's testing is
failed, cause there is no hardware engine to support this.

> going to be a fast process.  From a brief look at the crypto layer it
> does look there may be things we can do there, if only in terms of
> factoring out the common patterns for driving the queue of operations
> into the hardware so it's easy for drivers to do the best thing.
>
> One thing that occurs to me for the IV programming that has been
> proposed for SPI by Martin Sparl (and seen good results on Raspberry PI)
> is to insert transfers programming the crypto engine into the stream of
> DMA operations so we can keep the hardware busy.  It won't work with
> every SoC out there but it will work with a lot of them, it's what
> hardware that explicitly supports this will be doing internally.  It's
> the sort of thing that would benefit from factoring out, it's a lot of
> hassle to implement per driver.
>
> The main thing the out of tree req-dm-crypt code is doing was using a
> larger block size which does seem like a reasonable thing to allow
> people to tune for performance tradeofffs but I undertand that's a lot
> harder to achieve in a good way than one might hope.



-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-04 20:13       ` Mark Brown
  2016-01-06  6:49         ` Baolin Wang
@ 2016-01-12 23:31         ` Mikulas Patocka
  2016-01-12 23:38           ` Arnd Bergmann
  2016-01-12 23:40           ` Mark Brown
  1 sibling, 2 replies; 20+ messages in thread
From: Mikulas Patocka @ 2016-01-12 23:31 UTC (permalink / raw)
  To: device-mapper development, Mark Brown
  Cc: Milan Broz, Jens Axboe, keith.busch, linux-raid, martin.petersen,
	Mike Snitzer, Baolin Wang, linux-block, neilb, LKML, sagig,
	Arnd Bergmann, tj, dan.j.williams, Kent Overstreet,
	Alasdair G Kergon



On Mon, 4 Jan 2016, Mark Brown wrote:

> On Sat, Jan 02, 2016 at 11:46:08PM +0100, Milan Broz wrote:
> 
> > Anyway, I think that you should optimize driver, not add strange hw-dependent
> > crypto modes to dmcrypt. This is not the first crypto accelerator that is just not
> > suited for this kind of use.
> 
> > (If it can process batch of chunks of data each with own IV, then it can work
> > with dmcrypt, but I think such optimized code should be inside crypto API,
> > not in dmcrypt.)
> 
> The flip side of this is there is an awful lot of hardware out there
> that has basically this pattern and if we can make the difference
> between people being able to encrypt or not encrypt their storage due to
> performance then that seems like a win.  Getting hardware changes isn't
> going to be a fast process.  From a brief look at the crypto layer it
> does look there may be things we can do there, if only in terms of
> factoring out the common patterns for driving the queue of operations
> into the hardware so it's easy for drivers to do the best thing.  
> 
> One thing that occurs to me for the IV programming that has been
> proposed for SPI by Martin Sparl (and seen good results on Raspberry PI)
> is to insert transfers programming the crypto engine into the stream of
> DMA operations so we can keep the hardware busy.  It won't work with
> every SoC out there but it will work with a lot of them, it's what
> hardware that explicitly supports this will be doing internally.  It's
> the sort of thing that would benefit from factoring out, it's a lot of
> hassle to implement per driver.
> 
> The main thing the out of tree req-dm-crypt code is doing was using a
> larger block size which does seem like a reasonable thing to allow
> people to tune for performance tradeofffs but I undertand that's a lot
> harder to achieve in a good way than one might hope.

But as Milan pointed out, that larger block size doesn't work if you 
process requests with different sizes - the data encrypted with one 
request size won't match if you decrypt them with a different request 
size.


XTS with larger block could work if it were possible to use arbitrary 
initial tweak - the function crypt() in crypto/xts.c calculates the 
initial sector tweak by encrypting the iv:

tw(crypto_cipher_tfm(ctx->tweak), w->iv, w->iv);

and then calculates each cipher block's tweak by multiplying the tweak by 
a constant polynomial (alpha):

gf128mul_x_ble(s.t, s.t);	(s.t is the same as w->iv)


If we could supply the tweak directly, we could use larger sectors in 
dm-crypt.

For example, we could use 64k XTS sectors and if the user is accessing 1k 
offset in the sector, we could calculate initial sector tweak
	tw(crypto_cipher_tfm(ctx->tweak), w->iv, w->iv);
and then multiply it by alpha^(1024/16) (because we are 1024 bytes into 
the sector and xts block size is 16). That would make it possible to use 
larger encryption requests and the data would match regardless of request 
size.

But the Linux crypto API doesn't allow this - the code that would multiply 
the tweak after initial encryption isn't there (maybe we could get this 
behavior by modifying ctx->tweak to point to a null cipher, but it is 
dirty hack to poke into private crypto structures).

Does the hardware encryption you are optimizing for allow setting 
arbitrary tweaks in XTS mode? What is the specific driver you are 
optimizing for?


Another possibility is to use dm-crypt block size 4k and use a filesystem 
with 4k blocksize on it (it will never send requests not aligned on 4k 
boundary, so we could reject such requests with an error).

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-12 23:31         ` [dm-devel] " Mikulas Patocka
@ 2016-01-12 23:38           ` Arnd Bergmann
  2016-01-13  2:18             ` Mikulas Patocka
  2016-01-13  7:01             ` Milan Broz
  2016-01-12 23:40           ` Mark Brown
  1 sibling, 2 replies; 20+ messages in thread
From: Arnd Bergmann @ 2016-01-12 23:38 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Mark Brown, Milan Broz, Jens Axboe,
	keith.busch, linux-raid, martin.petersen, Mike Snitzer,
	Baolin Wang, linux-block, neilb, LKML, sagig, tj, dan.j.williams,
	Kent Overstreet, Alasdair G Kergon

On Tuesday 12 January 2016 18:31:19 Mikulas Patocka wrote:
> 
> Another possibility is to use dm-crypt block size 4k and use a filesystem 
> with 4k blocksize on it (it will never send requests not aligned on 4k 
> boundary, so we could reject such requests with an error).

Is there ever a reason to use something other than 4K block size on
dm-crypt?

	Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-12 23:31         ` [dm-devel] " Mikulas Patocka
  2016-01-12 23:38           ` Arnd Bergmann
@ 2016-01-12 23:40           ` Mark Brown
  2016-01-13  2:13             ` Mikulas Patocka
  1 sibling, 1 reply; 20+ messages in thread
From: Mark Brown @ 2016-01-12 23:40 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Milan Broz, Jens Axboe, keith.busch,
	linux-raid, martin.petersen, Mike Snitzer, Baolin Wang,
	linux-block, neilb, LKML, sagig, Arnd Bergmann, tj,
	dan.j.williams, Kent Overstreet, Alasdair G Kergon

[-- Attachment #1: Type: text/plain, Size: 969 bytes --]

On Tue, Jan 12, 2016 at 06:31:19PM -0500, Mikulas Patocka wrote:
> On Mon, 4 Jan 2016, Mark Brown wrote:

> > The main thing the out of tree req-dm-crypt code is doing was using a
> > larger block size which does seem like a reasonable thing to allow
> > people to tune for performance tradeofffs but I undertand that's a lot
> > harder to achieve in a good way than one might hope.

> But as Milan pointed out, that larger block size doesn't work if you 
> process requests with different sizes - the data encrypted with one 
> request size won't match if you decrypt them with a different request 
> size.

Sure, you need to fix that block size.

> Does the hardware encryption you are optimizing for allow setting 
> arbitrary tweaks in XTS mode? What is the specific driver you are 
> optimizing for?

This isn't targeted at a specific driver or system, it's trying to make
dm-crypt better able to make use of hardware acceleration in general.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-12 23:40           ` Mark Brown
@ 2016-01-13  2:13             ` Mikulas Patocka
  2016-01-14 11:35               ` Mark Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2016-01-13  2:13 UTC (permalink / raw)
  To: Mark Brown
  Cc: device-mapper development, Milan Broz, Jens Axboe, keith.busch,
	linux-raid, martin.petersen, Mike Snitzer, Baolin Wang,
	linux-block, neilb, LKML, sagig, Arnd Bergmann, tj,
	dan.j.williams, Kent Overstreet, Alasdair G Kergon



On Tue, 12 Jan 2016, Mark Brown wrote:

> On Tue, Jan 12, 2016 at 06:31:19PM -0500, Mikulas Patocka wrote:
> > On Mon, 4 Jan 2016, Mark Brown wrote:
> 
> > > The main thing the out of tree req-dm-crypt code is doing was using a
> > > larger block size which does seem like a reasonable thing to allow
> > > people to tune for performance tradeofffs but I undertand that's a lot
> > > harder to achieve in a good way than one might hope.
> 
> > But as Milan pointed out, that larger block size doesn't work if you 
> > process requests with different sizes - the data encrypted with one 
> > request size won't match if you decrypt them with a different request 
> > size.
> 
> Sure, you need to fix that block size.
> 
> > Does the hardware encryption you are optimizing for allow setting 
> > arbitrary tweaks in XTS mode? What is the specific driver you are 
> > optimizing for?
> 
> This isn't targeted at a specific driver or system, it's trying to make
> dm-crypt better able to make use of hardware acceleration in general.

If the hardware acceleration doesn't allow to set arbitrary XTS tweak, 
then this "large block" optimization on XTS can't be done at all.

So, we need to know which driver(s) you want to optimize for and how do 
those driver(s) handle tweak generation.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-12 23:38           ` Arnd Bergmann
@ 2016-01-13  2:18             ` Mikulas Patocka
  2016-01-13 10:17               ` Arnd Bergmann
  2016-01-13  7:01             ` Milan Broz
  1 sibling, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2016-01-13  2:18 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: device-mapper development, Mark Brown, Milan Broz, Jens Axboe,
	keith.busch, linux-raid, martin.petersen, Mike Snitzer,
	Baolin Wang, linux-block, neilb, LKML, sagig, tj, dan.j.williams,
	Kent Overstreet, Alasdair G Kergon



On Wed, 13 Jan 2016, Arnd Bergmann wrote:

> On Tuesday 12 January 2016 18:31:19 Mikulas Patocka wrote:
> > 
> > Another possibility is to use dm-crypt block size 4k and use a filesystem 
> > with 4k blocksize on it (it will never send requests not aligned on 4k 
> > boundary, so we could reject such requests with an error).
> 
> Is there ever a reason to use something other than 4K block size on
> dm-crypt?
> 
> 	Arnd

You can't use 4k block on CBC (and most other encryption modes). If only a 
part of 4k block is written (and then system crash happens), CBC would 
corrupt the block completely.

For example, suppose that EXT2 directory block is updated, the first 
512-byte sector is written and the rest of the sectors is not written 
because of a crash. CBC would corrupt all sectors except the first one in 
this case.

You could use 4k block on XTS and ECB.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-12 23:38           ` Arnd Bergmann
  2016-01-13  2:18             ` Mikulas Patocka
@ 2016-01-13  7:01             ` Milan Broz
  1 sibling, 0 replies; 20+ messages in thread
From: Milan Broz @ 2016-01-13  7:01 UTC (permalink / raw)
  To: Arnd Bergmann, Mikulas Patocka
  Cc: device-mapper development, Mark Brown, Milan Broz, Jens Axboe,
	keith.busch, linux-raid, martin.petersen, Mike Snitzer,
	Baolin Wang, linux-block, neilb, LKML, sagig, tj, dan.j.williams,
	Kent Overstreet, Alasdair G Kergon

On 01/13/2016 12:38 AM, Arnd Bergmann wrote:
> On Tuesday 12 January 2016 18:31:19 Mikulas Patocka wrote:
>>
>> Another possibility is to use dm-crypt block size 4k and use a filesystem 
>> with 4k blocksize on it (it will never send requests not aligned on 4k 
>> boundary, so we could reject such requests with an error).
> 
> Is there ever a reason to use something other than 4K block size on
> dm-crypt?

Most existing sw FDE systems use 512bytes blocks. I would like to see
configurable block size (at least up to 4k) but as Mikulas pointed out
it opens several new problems. 

Anyway, I do not see reason why crypto accelerators should not process
these small sectors better - just hw must be designed for it.

Milan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-13  2:18             ` Mikulas Patocka
@ 2016-01-13 10:17               ` Arnd Bergmann
  2016-01-13 15:00                 ` Mikulas Patocka
  0 siblings, 1 reply; 20+ messages in thread
From: Arnd Bergmann @ 2016-01-13 10:17 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Mark Brown, Milan Broz, Jens Axboe,
	keith.busch, linux-raid, martin.petersen, Mike Snitzer,
	Baolin Wang, linux-block, neilb, LKML, sagig, tj, dan.j.williams,
	Kent Overstreet, Alasdair G Kergon

On Tuesday 12 January 2016 21:18:12 Mikulas Patocka wrote:
> On Wed, 13 Jan 2016, Arnd Bergmann wrote:
> 
> > On Tuesday 12 January 2016 18:31:19 Mikulas Patocka wrote:
> > > 
> > > Another possibility is to use dm-crypt block size 4k and use a filesystem 
> > > with 4k blocksize on it (it will never send requests not aligned on 4k 
> > > boundary, so we could reject such requests with an error).
> > 
> > Is there ever a reason to use something other than 4K block size on
> > dm-crypt?
> > 
> >       Arnd
> 
> You can't use 4k block on CBC (and most other encryption modes). If only a 
> part of 4k block is written (and then system crash happens), CBC would 
> corrupt the block completely.
> 
> For example, suppose that EXT2 directory block is updated, the first 
> 512-byte sector is written and the rest of the sectors is not written 
> because of a crash. CBC would corrupt all sectors except the first one in 
> this case.
> 
> You could use 4k block on XTS and ECB.

Ah, I did not know that ext2 was doing sub-block writes. This may be
something to address in the ext4 code (and other file systems), as
a lot of flash storage devices (SD cards and eMMC) get really slow
when you do writes smaller than 4K because of the internal
read-modify-write cycle. Ideally you want to always drive those
using 64K writes (for reads, it doesn't matter much).

For hard drives, there are still a couple of older models that have
native 512 byte sectors, but the majority of new drivers also
prefers 4K writes. SSDs are typically optimized for 4K writes because
that is what they expect software to do, but they use larger writes
internally.

	Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-13 10:17               ` Arnd Bergmann
@ 2016-01-13 15:00                 ` Mikulas Patocka
  0 siblings, 0 replies; 20+ messages in thread
From: Mikulas Patocka @ 2016-01-13 15:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: device-mapper development, Mark Brown, Milan Broz, Jens Axboe,
	keith.busch, linux-raid, martin.petersen, Mike Snitzer,
	Baolin Wang, linux-block, neilb, LKML, sagig, tj, dan.j.williams,
	Kent Overstreet, Alasdair G Kergon



On Wed, 13 Jan 2016, Arnd Bergmann wrote:

> On Tuesday 12 January 2016 21:18:12 Mikulas Patocka wrote:
> > On Wed, 13 Jan 2016, Arnd Bergmann wrote:
> > 
> > > On Tuesday 12 January 2016 18:31:19 Mikulas Patocka wrote:
> > > > 
> > > > Another possibility is to use dm-crypt block size 4k and use a filesystem 
> > > > with 4k blocksize on it (it will never send requests not aligned on 4k 
> > > > boundary, so we could reject such requests with an error).
> > > 
> > > Is there ever a reason to use something other than 4K block size on
> > > dm-crypt?
> > > 
> > >       Arnd
> > 
> > You can't use 4k block on CBC (and most other encryption modes). If only a 
> > part of 4k block is written (and then system crash happens), CBC would 
> > corrupt the block completely.
> > 
> > For example, suppose that EXT2 directory block is updated, the first 
> > 512-byte sector is written and the rest of the sectors is not written 
> > because of a crash. CBC would corrupt all sectors except the first one in 
> > this case.
> > 
> > You could use 4k block on XTS and ECB.
> 
> Ah, I did not know that ext2 was doing sub-block writes. This may be

Ext2 is not doing sub-block writes.

Generally, disks and SSDs do not guarantee 4k write atomicity (only disks 
with hardware 4k sector guarantee it).

For example, ext2 writes a full 4k block, only part of the block is 
written to the disk and then power failure happens. On the next reboot CBC 
will corrupt the unwritten part of the 4k block.

Mikulas

> something to address in the ext4 code (and other file systems), as
> a lot of flash storage devices (SD cards and eMMC) get really slow
> when you do writes smaller than 4K because of the internal
> read-modify-write cycle. Ideally you want to always drive those
> using 64K writes (for reads, it doesn't matter much).
> 
> For hard drives, there are still a couple of older models that have
> native 512 byte sectors, but the majority of new drivers also
> prefers 4K writes. SSDs are typically optimized for 4K writes because
> that is what they expect software to do, but they use larger writes
> internally.
> 
> 	Arnd
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency
  2016-01-13  2:13             ` Mikulas Patocka
@ 2016-01-14 11:35               ` Mark Brown
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Brown @ 2016-01-14 11:35 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Milan Broz, Jens Axboe, keith.busch,
	linux-raid, martin.petersen, Mike Snitzer, Baolin Wang,
	linux-block, neilb, LKML, sagig, Arnd Bergmann, tj,
	dan.j.williams, Kent Overstreet, Alasdair G Kergon

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

On Tue, Jan 12, 2016 at 09:13:19PM -0500, Mikulas Patocka wrote:
> On Tue, 12 Jan 2016, Mark Brown wrote:

> > This isn't targeted at a specific driver or system, it's trying to make
> > dm-crypt better able to make use of hardware acceleration in general.

> If the hardware acceleration doesn't allow to set arbitrary XTS tweak, 
> then this "large block" optimization on XTS can't be done at all.

> So, we need to know which driver(s) you want to optimize for and how do 
> those driver(s) handle tweak generation.

Unfortunately the reality is just as I described it - we're looking for
general improvements, not at specific devices (well, Linaro is mainly
interested in ARM based SoCs but the range of ARM SoCs is such that that
that doesn't really narrow things down).  It's probably better to ask if
there exists any hardware which could use this usefully, software only
implementations (or hardware that only does AES) at least give us
control over supplying the tweak.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-01-14 11:35 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16  3:18 [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency Baolin Wang
2015-12-16  3:18 ` Baolin Wang
2015-12-16  3:18 ` [PATCH v2 1/2] block: Export the __blk_bios_map_sg() to map one bio Baolin Wang
2015-12-16  3:18 ` [PATCH v2 2/2] md: dm-crypt: Introduce the bulk IV mode for bulk crypto Baolin Wang
2015-12-16  8:08 ` [PATCH v2 0/2] Introduce the bulk IV mode for improving the crypto engine efficiency Milan Broz
2015-12-16  8:31   ` Baolin Wang
2015-12-17  7:37   ` Baolin Wang
2016-01-02 22:46     ` Milan Broz
2016-01-04  6:58       ` Baolin Wang
2016-01-04 20:13       ` Mark Brown
2016-01-06  6:49         ` Baolin Wang
2016-01-12 23:31         ` [dm-devel] " Mikulas Patocka
2016-01-12 23:38           ` Arnd Bergmann
2016-01-13  2:18             ` Mikulas Patocka
2016-01-13 10:17               ` Arnd Bergmann
2016-01-13 15:00                 ` Mikulas Patocka
2016-01-13  7:01             ` Milan Broz
2016-01-12 23:40           ` Mark Brown
2016-01-13  2:13             ` Mikulas Patocka
2016-01-14 11:35               ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.