linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
@ 2017-01-12 12:59 Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 1/6] crypto: skcipher - Add bulk request processing API Ondrej Mosnacek
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Ondrej Mosnacek @ 2017-01-12 12:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ondrej Mosnacek, linux-crypto, dm-devel, Mike Snitzer,
	Milan Broz, Mikulas Patocka, Binoy Jayan

Hi,

the goal of this patchset is to allow those skcipher API users that need to
process batches of small messages (especially dm-crypt) to do so efficiently.

The first patch introduces a new request type (and corresponding encrypt/decrypt
functions) to the skcipher API. The new API can be used to submit multiple
messages at once, thus enabling the drivers to reduce overhead as opposed to
processing each message separately.

The skcipher drivers can provide support for the new request type by setting the
corresponding fields of their skcipher_alg structure. If 'native' support is not
provided by a driver (i.e. the fields are left NULL), the crypto API
transparently provides a generic fallback implementation, which simply processes
the bulk request as a set of standard requests on the same tfm.

The second patch extends skcipher_walk so it can be used for processing the new
bulk requests, while preserving equivalent functionality when used with standard
requests.

The third and fourth patches add native bulk request support to the cryptd and
SIMD helper wrappers, respectively.

The fifth patch adds bulk request support to the AES-NI skcipher drivers, in
order to provide an example for both implementing the bulk request processing
and the usage of the extended skcipher_walk in such implementation. Also, this
patch provides a slight optimization, since the kernel_fpu_* functions are
called just once per the whole bulk request. Note that both the standard and
bulk implementation mostly use the same code under the hood.

The last patch converts dm-crypt to use bulk requests and makes it submit
multiple sectors at once, whenever they are stored sequentially within a single
page.

With all the patches applied, I was able to measure a small speedup (~5-10%)
with AES-NI ciphers and dm-crypt device mapped over a ramdisk.

To-be-done:
    testing the bulk API in testmgr.c
    documentation update

Ondrej Mosnacek (6):
  crypto: skcipher - Add bulk request processing API
  crypto: skcipher - Add bulk request support to walk
  crypto: cryptd - Add skcipher bulk request support
  crypto: simd - Add bulk request support
  crypto: aesni-intel - Add bulk request support
  dm-crypt: Add bulk crypto processing support

 arch/x86/crypto/aesni-intel_glue.c        | 267 +++++++++++++++++++------
 arch/x86/crypto/glue_helper.c             |  23 +--
 arch/x86/include/asm/crypto/glue_helper.h |   2 +-
 crypto/Makefile                           |   1 +
 crypto/cryptd.c                           | 111 +++++++++++
 crypto/simd.c                             |  61 ++++++
 crypto/skcipher.c                         | 207 +++++++++++++++-----
 crypto/skcipher_bulk.c                    | 312 ++++++++++++++++++++++++++++++
 drivers/md/dm-crypt.c                     | 254 +++++++++++++++---------
 include/crypto/internal/skcipher.h        |  42 +++-
 include/crypto/skcipher.h                 | 299 +++++++++++++++++++++++++++-
 11 files changed, 1369 insertions(+), 210 deletions(-)
 create mode 100644 crypto/skcipher_bulk.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/6] crypto: skcipher - Add bulk request processing API
  2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
@ 2017-01-12 12:59 ` Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 2/6] crypto: skcipher - Add bulk request support to walk Ondrej Mosnacek
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ondrej Mosnacek @ 2017-01-12 12:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ondrej Mosnacek, linux-crypto, dm-devel, Mike Snitzer,
	Milan Broz, Mikulas Patocka, Binoy Jayan

This patch adds bulk request processing to the skcipher interface.
Specifically, it adds a new type of request ('skcipher_bulk_request'), which
allows passing multiple independent messages to the skcipher driver.

The buffers for the message data are passed via just two sg lists (one for src
buffer, one for dst buffer). The IVs are passed via a single buffer, where they
are stored sequentially. The interface allows specifying either a fixed length
for all messages or a pointer to an array of message lengths.

A skcipher implementation that wants to provide support for bulk requests may
set the appropriate fields of its skcipher_alg struct. If these fields are not
provided (or the skcipher is created from an (a)blkcipher), the crypto API
automatically sets these fields to a fallback implementation, which just splits
the bulk request into a series of regular skcipher requests on the same tfm.

This means that the new type of request can be used with all skciphers, even if
they do not support bulk requests natively.

Note that when allocating a skcipher_bulk_request, the user must specify the
maximum number of messages that they are going to submit via the request. This
is necessary for the fallback implementation, which has to allocate space for
the appropriate number of subrequests so that they can be processed in
parallel. If the skcipher is synchronous, then the fallback implementation
only allocates space for one subrequest and processes the patrial requests
sequentially.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
---
 crypto/Makefile                    |   1 +
 crypto/skcipher.c                  |  15 ++
 crypto/skcipher_bulk.c             | 312 +++++++++++++++++++++++++++++++++++++
 include/crypto/internal/skcipher.h |  32 ++++
 include/crypto/skcipher.h          | 299 ++++++++++++++++++++++++++++++++++-
 5 files changed, 658 insertions(+), 1 deletion(-)
 create mode 100644 crypto/skcipher_bulk.c

diff --git a/crypto/Makefile b/crypto/Makefile
index b8f0e3e..cd1cf57 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_CRYPTO_AEAD2) += aead.o
 crypto_blkcipher-y := ablkcipher.o
 crypto_blkcipher-y += blkcipher.o
 crypto_blkcipher-y += skcipher.o
+crypto_blkcipher-y += skcipher_bulk.o
 obj-$(CONFIG_CRYPTO_BLKCIPHER2) += crypto_blkcipher.o
 obj-$(CONFIG_CRYPTO_SEQIV) += seqiv.o
 obj-$(CONFIG_CRYPTO_ECHAINIV) += echainiv.o
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 6ee6a15..8b6d684 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -667,6 +667,8 @@ static int crypto_init_skcipher_ops_blkcipher(struct crypto_tfm *tfm)
 	skcipher->ivsize = crypto_blkcipher_ivsize(blkcipher);
 	skcipher->keysize = calg->cra_blkcipher.max_keysize;
 
+	crypto_skcipher_bulk_set_fallback(skcipher);
+
 	return 0;
 }
 
@@ -760,6 +762,8 @@ static int crypto_init_skcipher_ops_ablkcipher(struct crypto_tfm *tfm)
 			    sizeof(struct ablkcipher_request);
 	skcipher->keysize = calg->cra_ablkcipher.max_keysize;
 
+	crypto_skcipher_bulk_set_fallback(skcipher);
+
 	return 0;
 }
 
@@ -789,6 +793,14 @@ static int crypto_skcipher_init_tfm(struct crypto_tfm *tfm)
 	skcipher->ivsize = alg->ivsize;
 	skcipher->keysize = alg->max_keysize;
 
+	if (!alg->encrypt_bulk || !alg->decrypt_bulk || !alg->reqsize_bulk)
+		crypto_skcipher_bulk_set_fallback(skcipher);
+	else {
+		skcipher->encrypt_bulk = alg->encrypt_bulk;
+		skcipher->decrypt_bulk = alg->decrypt_bulk;
+		skcipher->reqsize_bulk = alg->reqsize_bulk;
+	}
+
 	if (alg->exit)
 		skcipher->base.exit = crypto_skcipher_exit_tfm;
 
@@ -822,6 +834,9 @@ static void crypto_skcipher_show(struct seq_file *m, struct crypto_alg *alg)
 	seq_printf(m, "ivsize       : %u\n", skcipher->ivsize);
 	seq_printf(m, "chunksize    : %u\n", skcipher->chunksize);
 	seq_printf(m, "walksize     : %u\n", skcipher->walksize);
+	seq_printf(m, "bulk         : %s\n",
+		   (skcipher->encrypt_bulk && skcipher->decrypt_bulk &&
+		    skcipher->reqsize_bulk) ?  "yes" : "no");
 }
 
 #ifdef CONFIG_NET
diff --git a/crypto/skcipher_bulk.c b/crypto/skcipher_bulk.c
new file mode 100644
index 0000000..9630122
--- /dev/null
+++ b/crypto/skcipher_bulk.c
@@ -0,0 +1,312 @@
+/*
+ * Bulk IV fallback for skcipher.
+ *
+ * Copyright (C) 2016-2017 Red Hat, Inc. All rights reserved.
+ * Copyright (c) 2016-2017 Ondrej Mosnacek <omosnacek@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/atomic.h>
+
+struct skcipher_bulk_subreqctx {
+	struct scatterlist sg_src[2];
+	struct scatterlist sg_dst[2];
+
+	struct skcipher_request subreq;
+};
+
+struct skcipher_bulk_reqctx {
+	int (*crypt)(struct skcipher_request *req);
+
+	unsigned int next_slot;
+	atomic_t unfinished;
+	atomic_t busy_counter;
+	atomic_t err_unset;
+
+	int first_error;
+
+	struct skcipher_bulk_subreqctx slots[];
+};
+
+static void skcipher_bulk_continue(struct crypto_async_request *areq, int err);
+
+static int skcipher_bulk_spawn(struct skcipher_bulk_request *req,
+			       struct skcipher_bulk_subreqctx *slot, u32 flags)
+{
+	struct skcipher_bulk_reqctx *rctx = skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	unsigned int ivsize = crypto_skcipher_ivsize(tfm);
+	unsigned int i, offset, size;
+	struct scatterlist *src, *dst;
+	int err;
+
+	skcipher_request_set_tfm(&slot->subreq, tfm);
+	skcipher_request_set_callback(&slot->subreq, flags,
+				      skcipher_bulk_continue, req);
+	if (req->msgsizes) {
+		offset = 0;
+		for (i = 0; i < rctx->next_slot; i++)
+			offset += req->msgsizes[i];
+		size = req->msgsizes[rctx->next_slot];
+	} else {
+		offset = rctx->next_slot * req->msgsize;
+		size = req->msgsize;
+	}
+
+	/* perform the subrequest: */
+	src = scatterwalk_ffwd(slot->sg_src, req->src, offset);
+	dst = src;
+	if (req->src != req->dst)
+		dst = scatterwalk_ffwd(slot->sg_dst, req->dst, offset);
+
+	skcipher_request_set_crypt(&slot->subreq, src, dst, size,
+				   req->ivs + rctx->next_slot * ivsize);
+	err = rctx->crypt(&slot->subreq);
+	if (err == -EINPROGRESS || err == -EBUSY)
+		return err; /* successfully submitted */
+
+	if (err && atomic_dec_and_test(&rctx->err_unset))
+		rctx->first_error = err;
+
+	return atomic_dec_and_test(&rctx->unfinished) ? 0 : -EINPROGRESS;
+}
+
+static int skcipher_bulk_spawn_unstarted(struct skcipher_bulk_request *req,
+					 u32 flags)
+{
+	struct skcipher_bulk_reqctx *rctx = skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	unsigned int slot_size =
+			sizeof(struct skcipher_bulk_subreqctx) + tfm->reqsize;
+	u8 *slot_pos;
+	struct skcipher_bulk_subreqctx *slot;
+	int ret;
+	while (rctx->next_slot < req->nmsgs) {
+		slot_pos = (u8 *)rctx->slots + rctx->next_slot * slot_size;
+		slot = (struct skcipher_bulk_subreqctx *)slot_pos;
+
+		/* try to spawn request on the slot: */
+		ret = skcipher_bulk_spawn(req, slot, flags);
+		++rctx->next_slot;
+		if (ret == 0)
+			return 0; /* all finished */
+		if (ret == -EBUSY && !atomic_inc_and_test(&rctx->busy_counter))
+			return -EBUSY; /* EBUSY, don't spawn until notified */
+	}
+	return -EINPROGRESS;
+}
+
+static void skcipher_bulk_continue(struct crypto_async_request *areq, int err)
+{
+	struct skcipher_bulk_request *req = areq->data;
+	struct skcipher_bulk_reqctx *rctx = skcipher_bulk_request_ctx(req);
+
+	if (err == -EINPROGRESS) {
+		/* -EINPROGRESS after -EBUSY returned earlier */
+
+		if (!atomic_dec_and_test(&rctx->busy_counter))
+			return; /* -EBUSY not yet registered by caller */
+
+		/* let's continue spawning: */
+		err = skcipher_bulk_spawn_unstarted(req, 0);
+		BUG_ON(err == 0); /* this request couldn't yet be finished */
+		if (err == -EINPROGRESS)
+			skcipher_bulk_request_complete(req, -EINPROGRESS);
+		else if (err != -EBUSY && atomic_dec_and_test(&rctx->err_unset))
+			rctx->first_error = err;
+	} else {
+		/* request is finished, possibly with error */
+
+		if (err && atomic_dec_and_test(&rctx->err_unset))
+			rctx->first_error = err;
+
+		if (atomic_dec_and_test(&rctx->unfinished))
+			skcipher_bulk_request_complete(req, rctx->first_error);
+	}
+}
+
+static int skcipher_bulk_do_async(struct skcipher_bulk_request *req)
+{
+	struct skcipher_bulk_reqctx *rctx = skcipher_bulk_request_ctx(req);
+	u32 flags = skcipher_bulk_request_flags(req);
+
+	/* you never know... */
+	if (req->nmsgs > (unsigned int)INT_MAX)
+		return -EINVAL;
+
+	if (req->nmsgs == 0)
+		return 0;
+
+	/* initialize context variables: */
+	rctx->first_error = 0;
+	rctx->next_slot = 0;
+	rctx->busy_counter = (atomic_t)ATOMIC_INIT(0);
+	rctx->unfinished = (atomic_t)ATOMIC_INIT((int)req->nmsgs);
+	rctx->err_unset = (atomic_t)ATOMIC_INIT(1);
+
+	return skcipher_bulk_spawn_unstarted(req, flags);
+}
+
+static int skcipher_bulk_encrypt_async_many(struct skcipher_bulk_request *req)
+{
+	struct skcipher_bulk_reqctx *rctx = skcipher_bulk_request_ctx(req);
+
+	rctx->crypt = crypto_skcipher_encrypt;
+	return skcipher_bulk_do_async(req);
+}
+
+static int skcipher_bulk_decrypt_async_many(struct skcipher_bulk_request *req)
+{
+	struct skcipher_bulk_reqctx *rctx = skcipher_bulk_request_ctx(req);
+
+	rctx->crypt = crypto_skcipher_decrypt;
+	return skcipher_bulk_do_async(req);
+}
+
+static int skcipher_bulk_encrypt_async_one(struct skcipher_bulk_request *req)
+{
+	struct skcipher_request *subreq = skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	unsigned int cryptlen = req->msgsizes ? req->msgsizes[0] : req->msgsize;
+
+	skcipher_request_set_tfm(subreq, tfm);
+	skcipher_request_set_callback(subreq, req->base.flags,
+				      req->base.complete, req->base.data);
+	skcipher_request_set_crypt(subreq, req->src, req->dst, cryptlen,
+				   req->ivs);
+	return crypto_skcipher_encrypt(subreq);
+}
+
+static int skcipher_bulk_decrypt_async_one(struct skcipher_bulk_request *req)
+{
+	struct skcipher_request *subreq = skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	unsigned int cryptlen = req->msgsizes ? req->msgsizes[0] : req->msgsize;
+
+	skcipher_request_set_tfm(subreq, tfm);
+	skcipher_request_set_callback(subreq, req->base.flags,
+				      req->base.complete, req->base.data);
+	skcipher_request_set_crypt(subreq, req->src, req->dst, cryptlen,
+				   req->ivs);
+	return crypto_skcipher_decrypt(subreq);
+}
+
+static int skcipher_bulk_encrypt_async(struct skcipher_bulk_request *req)
+{
+	if (req->nmsgs == 0)
+		return 0;
+
+	if (req->maxmsgs == 1)
+		return skcipher_bulk_encrypt_async_one(req);
+
+	return skcipher_bulk_encrypt_async_many(req);
+}
+
+static int skcipher_bulk_decrypt_async(struct skcipher_bulk_request *req)
+{
+	if (req->nmsgs == 0)
+		return 0;
+
+	if (req->maxmsgs == 1)
+		return skcipher_bulk_decrypt_async_one(req);
+
+	return skcipher_bulk_decrypt_async_many(req);
+}
+
+static unsigned int skcipher_bulk_reqsize_async(struct crypto_skcipher *tfm,
+						unsigned int maxmsgs)
+{
+	unsigned int per_message;
+
+	/* special case for no message: */
+	if (maxmsgs == 0)
+		return 0;
+
+	/* special case for just one message: */
+	if (maxmsgs == 1)
+		return sizeof(struct skcipher_request) + tfm->reqsize;
+
+	per_message = sizeof(struct skcipher_bulk_subreqctx) + tfm->reqsize;
+	return sizeof(struct skcipher_bulk_reqctx) + maxmsgs * per_message;
+}
+
+static int skcipher_bulk_do_sync(struct skcipher_bulk_request *req,
+				 int (*crypt)(struct skcipher_request *))
+{
+	struct skcipher_request *subreq = skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	u32 flags = skcipher_bulk_request_flags(req);
+	unsigned int msg_idx, offset, ivsize = crypto_skcipher_ivsize(tfm);
+	const unsigned int *msgsize = req->msgsizes ?: &req->msgsize;
+	struct scatterlist *src, *dst;
+	struct scatterlist sg_src[2];
+	struct scatterlist sg_dst[2];
+	u8 *iv;
+	int err;
+
+	skcipher_request_set_tfm(subreq, tfm);
+	skcipher_request_set_callback(subreq, flags, NULL, NULL);
+
+	iv = req->ivs;
+	offset = 0;
+
+	for (msg_idx = 0; msg_idx < req->nmsgs; msg_idx++) {
+		src = scatterwalk_ffwd(sg_src, req->src, offset);
+		dst = src;
+		if (req->src != req->dst)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, offset);
+
+		skcipher_request_set_crypt(subreq, src, dst, *msgsize, iv);
+		err = crypt(subreq);
+		if (err)
+			return err;
+
+		iv += ivsize;
+		offset += *msgsize;
+		if (req->msgsizes)
+			msgsize++;
+	}
+	return 0;
+}
+
+static int skcipher_bulk_encrypt_sync(struct skcipher_bulk_request *req)
+{
+	return skcipher_bulk_do_sync(req, crypto_skcipher_encrypt);
+}
+
+static int skcipher_bulk_decrypt_sync(struct skcipher_bulk_request *req)
+{
+	return skcipher_bulk_do_sync(req, crypto_skcipher_decrypt);
+}
+
+static unsigned int skcipher_bulk_reqsize_sync(struct crypto_skcipher *tfm,
+					       unsigned int maxmsgs)
+{
+	return sizeof(struct skcipher_request) + tfm->reqsize;
+}
+
+void crypto_skcipher_bulk_set_fallback(struct crypto_skcipher *skcipher)
+{
+	if (crypto_skcipher_get_flags(skcipher) & CRYPTO_ALG_ASYNC) {
+		skcipher->encrypt_bulk = skcipher_bulk_encrypt_async;
+		skcipher->decrypt_bulk = skcipher_bulk_decrypt_async;
+		skcipher->reqsize_bulk = skcipher_bulk_reqsize_async;
+	} else {
+		skcipher->encrypt_bulk = skcipher_bulk_encrypt_sync;
+		skcipher->decrypt_bulk = skcipher_bulk_decrypt_sync;
+		skcipher->reqsize_bulk = skcipher_bulk_reqsize_sync;
+	}
+}
+EXPORT_SYMBOL_GPL(crypto_skcipher_bulk_set_fallback);
diff --git a/include/crypto/internal/skcipher.h b/include/crypto/internal/skcipher.h
index e42f706..f536b57 100644
--- a/include/crypto/internal/skcipher.h
+++ b/include/crypto/internal/skcipher.h
@@ -95,6 +95,12 @@ static inline void skcipher_request_complete(struct skcipher_request *req, int e
 	req->base.complete(&req->base, err);
 }
 
+static inline void skcipher_bulk_request_complete(
+		struct skcipher_bulk_request *req, int err)
+{
+	req->base.complete(&req->base, err);
+}
+
 static inline void crypto_set_skcipher_spawn(
 	struct crypto_skcipher_spawn *spawn, struct crypto_instance *inst)
 {
@@ -181,6 +187,30 @@ static inline u32 skcipher_request_flags(struct skcipher_request *req)
 	return req->base.flags;
 }
 
+static inline void *skcipher_bulk_request_ctx(struct skcipher_bulk_request *req)
+{
+	return req->__ctx;
+}
+
+static inline u32 skcipher_bulk_request_flags(struct skcipher_bulk_request *req)
+{
+	return req->base.flags;
+}
+
+static inline unsigned int skcipher_bulk_request_totalsize(
+		struct skcipher_bulk_request *req)
+{
+	unsigned int totalsize, msg_idx;
+
+	if (!req->msgsizes)
+		return req->nmsgs * req->msgsize;
+
+	totalsize = 0;
+	for (msg_idx = 0; msg_idx < req->nmsgs; msg_idx++)
+		totalsize += req->msgsizes[msg_idx];
+	return totalsize;
+}
+
 static inline unsigned int crypto_skcipher_alg_min_keysize(
 	struct skcipher_alg *alg)
 {
@@ -207,5 +237,7 @@ static inline unsigned int crypto_skcipher_alg_max_keysize(
 	return alg->max_keysize;
 }
 
+void crypto_skcipher_bulk_set_fallback(struct crypto_skcipher *skcipher);
+
 #endif	/* _CRYPTO_INTERNAL_SKCIPHER_H */
 
diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 562001c..e229546 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -52,11 +52,46 @@ struct skcipher_givcrypt_request {
 	struct ablkcipher_request creq;
 };
 
+/**
+ *	struct skcipher_bulk_request - Bulk symmetric key cipher request
+ *	@maxmsgs: Maximum number of messages, as specified on allocation
+ *	@nmsgs: Number of messages in plaintext/ciphertext
+ *	@msgsize: Size of plaintext/ciphertext message
+ *	@msgsizes: If not NULL, points to an array of @nmsgs unsigned
+ *	           integers specifying the size of each message (in such case
+ *	           the value of @msgsize is ignored)
+ *	@ivs: Initialisation vectors for all messages
+ *	@src: Source SG list
+ *	@dst: Destination SG list
+ *	@base: Underlying async request request
+ *	@__ctx: Start of private context data
+ */
+struct skcipher_bulk_request {
+	unsigned int maxmsgs;
+
+	unsigned int nmsgs;
+	unsigned int msgsize;
+	const unsigned int *msgsizes;
+
+	u8 *ivs;
+
+	struct scatterlist *src;
+	struct scatterlist *dst;
+
+	struct crypto_async_request base;
+
+	void *__ctx[] CRYPTO_MINALIGN_ATTR;
+};
+
 struct crypto_skcipher {
 	int (*setkey)(struct crypto_skcipher *tfm, const u8 *key,
 	              unsigned int keylen);
 	int (*encrypt)(struct skcipher_request *req);
 	int (*decrypt)(struct skcipher_request *req);
+	int (*encrypt_bulk)(struct skcipher_bulk_request *req);
+	int (*decrypt_bulk)(struct skcipher_bulk_request *req);
+	unsigned int (*reqsize_bulk)(struct crypto_skcipher *tfm,
+				     unsigned int maxmsgs);
 
 	unsigned int ivsize;
 	unsigned int reqsize;
@@ -100,6 +135,19 @@ struct crypto_skcipher {
  *	     be called in parallel with the same transformation object.
  * @decrypt: Decrypt a single block. This is a reverse counterpart to @encrypt
  *	     and the conditions are exactly the same.
+ * @encrypt_bulk: Similar to @encrypt, but operates in bulk mode, where
+ *                the plaintext/cryptotext consists of several messages, each
+ *                of which is transformed using a separate IV (all IVs are
+ *                passed concatenated via the request structure). This field
+ *                may be NULL if the algorithm does not natively support bulk
+ *                requests.
+ * @decrypt_bulk: Decrypt multiple messages. This is a reverse counterpart
+ *                to @encrypt_bulk and the conditions are exactly the same.
+ *                This field may be NULL if the algorithm does not natively
+ *                support bulk requests.
+ * @reqsize_bulk: Compute the bulk request size for the given tfm and maximum
+ *                message size. This field may be NULL if the algorithm does
+ *                not natively support bulk requests.
  * @init: Initialize the cryptographic transformation object. This function
  *	  is used to initialize the cryptographic transformation object.
  *	  This function is called only once at the instantiation time, right
@@ -120,13 +168,18 @@ struct crypto_skcipher {
  * 	      in parallel. Should be a multiple of chunksize.
  * @base: Definition of a generic crypto algorithm.
  *
- * All fields except @ivsize are mandatory and must be filled.
+ * All fields except @ivsize, @encrypt_bulk, @decrypt_bulk and @reqsize_bulk
+ * are mandatory and must be filled.
  */
 struct skcipher_alg {
 	int (*setkey)(struct crypto_skcipher *tfm, const u8 *key,
 	              unsigned int keylen);
 	int (*encrypt)(struct skcipher_request *req);
 	int (*decrypt)(struct skcipher_request *req);
+	int (*encrypt_bulk)(struct skcipher_bulk_request *req);
+	int (*decrypt_bulk)(struct skcipher_bulk_request *req);
+	unsigned int (*reqsize_bulk)(struct crypto_skcipher *tfm,
+				     unsigned int maxmsgs);
 	int (*init)(struct crypto_skcipher *tfm);
 	void (*exit)(struct crypto_skcipher *tfm);
 
@@ -428,6 +481,21 @@ static inline struct crypto_skcipher *crypto_skcipher_reqtfm(
 }
 
 /**
+ * crypto_skcipher_bulk_reqtfm() - obtain cipher handle from bulk request
+ * @req: skcipher_bulk_request out of which the cipher handle is to be obtained
+ *
+ * Return the crypto_skcipher handle when furnishing an skcipher_bulk_request
+ * data structure.
+ *
+ * Return: crypto_skcipher handle
+ */
+static inline struct crypto_skcipher *crypto_skcipher_bulk_reqtfm(
+	struct skcipher_bulk_request *req)
+{
+	return __crypto_skcipher_cast(req->base.tfm);
+}
+
+/**
  * crypto_skcipher_encrypt() - encrypt plaintext
  * @req: reference to the skcipher_request handle that holds all information
  *	 needed to perform the cipher operation
@@ -464,6 +532,44 @@ static inline int crypto_skcipher_decrypt(struct skcipher_request *req)
 }
 
 /**
+ * crypto_skcipher_encrypt_bulk() - encrypt plaintext in bulk mode
+ * @req: reference to the skcipher_bulk_request handle that holds all
+ *	 information needed to perform the cipher operation
+ *
+ * Encrypt plaintext data using the skcipher_bulk_request handle. That data
+ * structure and how it is filled with data is discussed with the
+ * skcipher_bulk_request_* functions.
+ *
+ * Return: 0 if the cipher operation was successful; < 0 if an error occurred
+ */
+static inline int crypto_skcipher_encrypt_bulk(
+		struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+
+	return tfm->encrypt_bulk(req);
+}
+
+/**
+ * crypto_skcipher_decrypt_bulk() - decrypt ciphertext in bulk mode
+ * @req: reference to the skcipher_bulk_request handle that holds all
+ *	 information needed to perform the cipher operation
+ *
+ * Decrypt ciphertext data using the skcipher_bulk_request handle. That data
+ * structure and how it is filled with data is discussed with the
+ * skcipher_bulk_request_* functions.
+ *
+ * Return: 0 if the cipher operation was successful; < 0 if an error occurred
+ */
+static inline int crypto_skcipher_decrypt_bulk(
+		struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+
+	return tfm->decrypt_bulk(req);
+}
+
+/**
  * DOC: Symmetric Key Cipher Request Handle
  *
  * The skcipher_request data structure contains all pointers to data
@@ -608,5 +714,196 @@ static inline void skcipher_request_set_crypt(
 	req->iv = iv;
 }
 
+/**
+ * DOC: Bulk Symmetric Key Cipher Request Handle
+ *
+ * The skcipher_bulk_request data structure contains all pointers to data
+ * required for the bulk symmetric key cipher operation. This includes the
+ * cipher handle (which can be used by multiple skcipher_bulk_request
+ * instances), pointer to plaintext and ciphertext, asynchronous callback
+ * function, etc. It acts as a handle to the skcipher_bulk_request_* API
+ * calls in a similar way as skcipher handle to the crypto_skcipher_* API calls.
+ */
+
+#define SKCIPHER_BULK_REQUEST_ON_STACK(name, max_messages, tfm) \
+	char __##name##_desc[sizeof(struct skcipher_bulk_request) + \
+		crypto_skcipher_bulk_reqsize(tfm, max_messages)] \
+		CRYPTO_MINALIGN_ATTR; \
+	struct skcipher_bulk_request *name = (void *)__##name##_desc; \
+	skcipher_bulk_request_set_maxmsgs(name, max_messages);
+
+/**
+ * crypto_skcipher_bulk_reqsize() - obtain size of the request data structure
+ * @tfm: cipher handle
+ * @maxmsgs: maximum number of messages that can be submitted in bulk
+ *
+ * Return: number of bytes
+ */
+static inline unsigned int crypto_skcipher_bulk_reqsize(
+		struct crypto_skcipher *tfm, unsigned int maxmsgs)
+{
+	return tfm->reqsize_bulk(tfm, maxmsgs);
+}
+
+/**
+ * skcipher_bulk_request_set_maxmsgs() - set the maxmsgs attribute in request
+ * @req: request handle to be modified
+ * @tfm: cipher handle that shall be added to the request handle
+ *
+ * This function must be called on skcipher_bulk_requests that have been
+ * allocated manually (not using @skcipher_bulk_request_alloc or
+ * SKCIPHER_BULK_REQUEST_ON_STACK). The context size of the request must be
+ * at least the value returned by the correspnding call to
+ * crypto_skcipher_bulk_reqsize (with the same value of @maxmsgs).
+ */
+static inline void skcipher_bulk_request_set_maxmsgs(
+		struct skcipher_bulk_request *req, unsigned int maxmsgs)
+{
+	req->maxmsgs = maxmsgs;
+}
+
+/**
+ * skcipher_bulk_request_set_tfm() - update cipher handle reference in request
+ * @req: request handle to be modified
+ * @tfm: cipher handle that shall be added to the request handle
+ *
+ * Allow the caller to replace the existing skcipher handle in the request
+ * data structure with a different one.
+ */
+static inline void skcipher_bulk_request_set_tfm(
+		struct skcipher_bulk_request *req,
+		struct crypto_skcipher *tfm)
+{
+	req->base.tfm = crypto_skcipher_tfm(tfm);
+}
+
+static inline struct skcipher_bulk_request *skcipher_bulk_request_cast(
+	struct crypto_async_request *req)
+{
+	return container_of(req, struct skcipher_bulk_request, base);
+}
+
+/**
+ * skcipher_bulk_request_alloc() - allocate request data structure
+ * @tfm: cipher handle to be registered with the request
+ * @maxmsgs: maximum number of messages
+ * @gfp: memory allocation flag that is handed to kmalloc by the API call.
+ *
+ * Allocate the bulk IV request data structure that must be used with the
+ * skcipher encrypt_bulk and decrypt_bulk API calls. During the allocation,
+ * the provided skcipher handle is registered in the request data structure.
+ *
+ * The @maxmsgs parameter should specify the maximum number of messages that
+ * will be submitted via the allocated request. It is mainly used by
+ * the fallback implementation to figure out how many subrequests it needs
+ * to allocate so that they can be executed in parallel. However, other drivers
+ * may also make use of it. The implementation may reject requests with higher
+ * number of messagess than @maxmsgs.
+ *
+ * Return: allocated request handle in case of success, or NULL if out of memory
+ */
+static inline struct skcipher_bulk_request *skcipher_bulk_request_alloc(
+	struct crypto_skcipher *tfm, unsigned int maxmsgs, gfp_t gfp)
+{
+	struct skcipher_bulk_request *req;
+
+	req = kmalloc(sizeof(struct skcipher_bulk_request) +
+		      crypto_skcipher_bulk_reqsize(tfm, maxmsgs), gfp);
+
+	if (likely(req)) {
+		skcipher_bulk_request_set_maxmsgs(req, maxmsgs);
+		skcipher_bulk_request_set_tfm(req, tfm);
+	}
+
+	return req;
+}
+
+/**
+ * skcipher_bulk_request_free() - zeroize and free request data structure
+ * @req: request data structure cipher handle to be freed
+ */
+static inline void skcipher_bulk_request_free(struct skcipher_bulk_request *req)
+{
+	kzfree(req);
+}
+
+static inline void skcipher_bulk_request_zero(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+
+	memzero_explicit(req, sizeof(*req) +
+			 crypto_skcipher_bulk_reqsize(tfm, req->maxmsgs));
+}
+
+/**
+ * skcipher_bulk_request_set_callback() - set asynchronous callback function
+ * @req: request handle
+ * @flags: specify zero or an ORing of the flags
+ *         CRYPTO_TFM_REQ_MAY_BACKLOG the request queue may back log and
+ *	   increase the wait queue beyond the initial maximum size;
+ *	   CRYPTO_TFM_REQ_MAY_SLEEP the request processing may sleep
+ * @compl: callback function pointer to be registered with the request handle
+ * @data: The data pointer refers to memory that is not used by the kernel
+ *	  crypto API, but provided to the callback function for it to use. Here,
+ *	  the caller can provide a reference to memory the callback function can
+ *	  operate on. As the callback function is invoked asynchronously to the
+ *	  related functionality, it may need to access data structures of the
+ *	  related functionality which can be referenced using this pointer. The
+ *	  callback function can access the memory via the "data" field in the
+ *	  crypto_async_request data structure provided to the callback function.
+ *
+ * This function allows setting the callback function that is triggered once the
+ * cipher operation completes.
+ *
+ * The callback function is registered with the skcipher_bulk_request handle
+ * and must comply with the following template
+ *
+ *	void callback_function(struct crypto_async_request *req, int error)
+ */
+static inline void skcipher_bulk_request_set_callback(
+		struct skcipher_bulk_request *req, u32 flags,
+		crypto_completion_t compl, void *data)
+{
+	req->base.complete = compl;
+	req->base.data = data;
+	req->base.flags = flags;
+}
+
+/**
+ * skcipher_bulk_request_set_crypt() - set data buffers
+ * @req: request handle
+ * @src: source scatter / gather list
+ * @dst: destination scatter / gather list
+ * @msgsize: number of bytes per message (if @msgsizes is not NULL)
+ * @msgsizes: array of message sizes (if NULL, size of all messages is @msgsize)
+ * @nmsgs: number of messages in @src and @dst
+ * @iv: IVs for the cipher operations which must comply with the IV size defined
+ *      by crypto_skcipher_ivsize (i.e. there must be @nmsgs * IV size bytes
+ *      of data)
+ *
+ * This function allows setting of the source data and destination data
+ * scatter / gather lists.
+ *
+ * For encryption, the source is treated as the plaintext and the
+ * destination is the ciphertext. For a decryption operation, the use is
+ * reversed - the source is the ciphertext and the destination is the plaintext.
+ *
+ * The plaintext/ciphertext must consist of @nmsgs messages, each @msgsize
+ * bytes long. Each message is encrypted/decrypted with its own IV extracted
+ * from the @ivs buffer.
+ */
+static inline void skcipher_bulk_request_set_crypt(
+	struct skcipher_bulk_request *req,
+	struct scatterlist *src, struct scatterlist *dst, unsigned int nmsgs,
+	unsigned int msgsize, const unsigned int *msgsizes, void *ivs)
+{
+	req->src = src;
+	req->dst = dst;
+	req->msgsize = msgsize;
+	req->msgsizes = msgsizes;
+	req->nmsgs = nmsgs;
+	req->ivs = ivs;
+}
+
 #endif	/* _CRYPTO_SKCIPHER_H */
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/6] crypto: skcipher - Add bulk request support to walk
  2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 1/6] crypto: skcipher - Add bulk request processing API Ondrej Mosnacek
@ 2017-01-12 12:59 ` Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 3/6] crypto: cryptd - Add skcipher bulk request support Ondrej Mosnacek
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ondrej Mosnacek @ 2017-01-12 12:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ondrej Mosnacek, linux-crypto, dm-devel, Mike Snitzer,
	Milan Broz, Mikulas Patocka, Binoy Jayan

This patch tweaks skcipher_walk so it can be used with the new bulk requests.

The new skipher_walk can be initialized either from a skcipher_request (in
which case its behavior is equivalent to the old code) or from a
skcipher_bulk_request, in which case the usage is almost identical, the most
significant exception being that skciphers which somehow tweak the IV
(e.g. XTS) must check the new nextmsg flag before processing each chunk and
re-tweak the IV if it is set. For other ciphers skcipher_walk automatically
switches to the next IV at message boundaries.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
---
 crypto/skcipher.c                  | 192 +++++++++++++++++++++++++++----------
 include/crypto/internal/skcipher.h |  10 +-
 2 files changed, 153 insertions(+), 49 deletions(-)

diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 8b6d684..b810e90 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -33,6 +33,7 @@ enum {
 	SKCIPHER_WALK_COPY = 1 << 2,
 	SKCIPHER_WALK_DIFF = 1 << 3,
 	SKCIPHER_WALK_SLEEP = 1 << 4,
+	SKCIPHER_WALK_HETEROGENOUS = 1 << 5,
 };
 
 struct skcipher_walk_buffer {
@@ -94,6 +95,41 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int len)
 	return max(start, end_page);
 }
 
+static int skcipher_copy_iv(struct skcipher_walk *walk)
+{
+	unsigned a = crypto_tfm_ctx_alignment() - 1;
+	unsigned alignmask = walk->alignmask;
+	unsigned ivsize = walk->ivsize;
+	unsigned bs = walk->stride;
+	unsigned aligned_bs;
+	unsigned size;
+	u8 *iv;
+
+	aligned_bs = ALIGN(bs, alignmask);
+
+	/* Minimum size to align buffer by alignmask. */
+	size = alignmask & ~a;
+
+	if (walk->flags & SKCIPHER_WALK_PHYS)
+		size += ivsize;
+	else {
+		size += aligned_bs + ivsize;
+
+		/* Minimum size to ensure buffer does not straddle a page. */
+		size += (bs - 1) & ~(alignmask | a);
+	}
+
+	walk->buffer = kmalloc(size, skcipher_walk_gfp(walk));
+	if (!walk->buffer)
+		return -ENOMEM;
+
+	iv = PTR_ALIGN(walk->buffer, alignmask + 1);
+	iv = skcipher_get_spot(iv, bs) + aligned_bs;
+
+	walk->iv = memcpy(iv, walk->iv, walk->ivsize);
+	return 0;
+}
+
 static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
 {
 	u8 *addr;
@@ -108,9 +144,12 @@ static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
 int skcipher_walk_done(struct skcipher_walk *walk, int err)
 {
 	unsigned int n = walk->nbytes - err;
-	unsigned int nbytes;
+	unsigned int nbytes, nbytes_msg;
+
+	walk->nextmsg = false; /* reset the nextmsg flag */
 
 	nbytes = walk->total - n;
+	nbytes_msg = walk->total_msg - n;
 
 	if (unlikely(err < 0)) {
 		nbytes = 0;
@@ -139,8 +178,31 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
 	if (err > 0)
 		err = 0;
 
+	if (nbytes && !nbytes_msg) {
+		walk->nextmsg = true;
+
+		/* write the output IV: */
+		if (walk->iv != walk->oiv)
+			memcpy(walk->oiv, walk->iv, walk->ivsize);
+
+		/* advance to the IV of next message: */
+		walk->oiv += walk->ivsize;
+		walk->iv = walk->oiv;
+
+		if (unlikely(((unsigned long)walk->iv & walk->alignmask))) {
+			err = skcipher_copy_iv(walk);
+			if (err)
+				return err;
+		}
+
+		nbytes_msg = *walk->nextmsgsize;
+		if (walk->flags & SKCIPHER_WALK_HETEROGENOUS)
+			++walk->nextmsgsize;
+	}
+
+	walk->nbytes = nbytes_msg;
+	walk->total_msg = nbytes_msg;
 	walk->total = nbytes;
-	walk->nbytes = nbytes;
 
 	scatterwalk_advance(&walk->in, n);
 	scatterwalk_advance(&walk->out, n);
@@ -343,13 +405,13 @@ static int skcipher_walk_next(struct skcipher_walk *walk)
 	walk->flags &= ~(SKCIPHER_WALK_SLOW | SKCIPHER_WALK_COPY |
 			 SKCIPHER_WALK_DIFF);
 
-	n = walk->total;
+	n = walk->total_msg;
 	bsize = min(walk->stride, max(n, walk->blocksize));
 	n = scatterwalk_clamp(&walk->in, n);
 	n = scatterwalk_clamp(&walk->out, n);
 
 	if (unlikely(n < bsize)) {
-		if (unlikely(walk->total < walk->blocksize))
+		if (unlikely(walk->total_msg < walk->blocksize))
 			return skcipher_walk_done(walk, -EINVAL);
 
 slow_path:
@@ -388,41 +450,6 @@ static int skcipher_walk_next(struct skcipher_walk *walk)
 }
 EXPORT_SYMBOL_GPL(skcipher_walk_next);
 
-static int skcipher_copy_iv(struct skcipher_walk *walk)
-{
-	unsigned a = crypto_tfm_ctx_alignment() - 1;
-	unsigned alignmask = walk->alignmask;
-	unsigned ivsize = walk->ivsize;
-	unsigned bs = walk->stride;
-	unsigned aligned_bs;
-	unsigned size;
-	u8 *iv;
-
-	aligned_bs = ALIGN(bs, alignmask);
-
-	/* Minimum size to align buffer by alignmask. */
-	size = alignmask & ~a;
-
-	if (walk->flags & SKCIPHER_WALK_PHYS)
-		size += ivsize;
-	else {
-		size += aligned_bs + ivsize;
-
-		/* Minimum size to ensure buffer does not straddle a page. */
-		size += (bs - 1) & ~(alignmask | a);
-	}
-
-	walk->buffer = kmalloc(size, skcipher_walk_gfp(walk));
-	if (!walk->buffer)
-		return -ENOMEM;
-
-	iv = PTR_ALIGN(walk->buffer, alignmask + 1);
-	iv = skcipher_get_spot(iv, bs) + aligned_bs;
-
-	walk->iv = memcpy(iv, walk->iv, walk->ivsize);
-	return 0;
-}
-
 static int skcipher_walk_first(struct skcipher_walk *walk)
 {
 	walk->nbytes = 0;
@@ -441,11 +468,28 @@ static int skcipher_walk_first(struct skcipher_walk *walk)
 	}
 
 	walk->page = NULL;
-	walk->nbytes = walk->total;
+	walk->nbytes = walk->total_msg;
 
 	return skcipher_walk_next(walk);
 }
 
+static int skcipher_walk_skcipher_common(struct skcipher_walk *walk,
+					 struct crypto_skcipher *tfm,
+					 u32 req_flags)
+{
+	walk->flags &= ~SKCIPHER_WALK_SLEEP;
+	walk->flags |= req_flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+		       SKCIPHER_WALK_SLEEP : 0;
+
+	walk->nextmsg = true;
+	walk->blocksize = crypto_skcipher_blocksize(tfm);
+	walk->stride = crypto_skcipher_walksize(tfm);
+	walk->ivsize = crypto_skcipher_ivsize(tfm);
+	walk->alignmask = crypto_skcipher_alignmask(tfm);
+
+	return skcipher_walk_first(walk);
+}
+
 static int skcipher_walk_skcipher(struct skcipher_walk *walk,
 				  struct skcipher_request *req)
 {
@@ -454,20 +498,45 @@ static int skcipher_walk_skcipher(struct skcipher_walk *walk,
 	scatterwalk_start(&walk->in, req->src);
 	scatterwalk_start(&walk->out, req->dst);
 
+	walk->nextmsgsize = NULL;
+	walk->total_msg = req->cryptlen;
 	walk->total = req->cryptlen;
 	walk->iv = req->iv;
 	walk->oiv = req->iv;
+	walk->flags &= ~SKCIPHER_WALK_HETEROGENOUS;
 
-	walk->flags &= ~SKCIPHER_WALK_SLEEP;
-	walk->flags |= req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
-		       SKCIPHER_WALK_SLEEP : 0;
+	return skcipher_walk_skcipher_common(walk, tfm, req->base.flags);
+}
 
-	walk->blocksize = crypto_skcipher_blocksize(tfm);
-	walk->stride = crypto_skcipher_walksize(tfm);
-	walk->ivsize = crypto_skcipher_ivsize(tfm);
-	walk->alignmask = crypto_skcipher_alignmask(tfm);
+static int skcipher_walk_skcipher_bulk(struct skcipher_walk *walk,
+				       struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	unsigned int total, i;
 
-	return skcipher_walk_first(walk);
+	scatterwalk_start(&walk->in, req->src);
+	scatterwalk_start(&walk->out, req->dst);
+
+	if (req->msgsizes) {
+		total = 0;
+		for (i = 0; i < req->nmsgs; i++)
+			total += req->msgsizes[i];
+
+		walk->nextmsgsize = req->msgsizes;
+		walk->total_msg = *walk->nextmsgsize++;
+		walk->total = total;
+		walk->flags |= SKCIPHER_WALK_HETEROGENOUS;
+	} else {
+		walk->nextmsgsize = &req->msgsize;
+		walk->total_msg = req->msgsize;
+		walk->total = req->nmsgs * req->msgsize;
+		walk->flags &= ~SKCIPHER_WALK_HETEROGENOUS;
+	}
+
+	walk->iv = req->ivs;
+	walk->oiv = req->ivs;
+
+	return skcipher_walk_skcipher_common(walk, tfm, req->base.flags);
 }
 
 int skcipher_walk_virt(struct skcipher_walk *walk,
@@ -485,6 +554,21 @@ int skcipher_walk_virt(struct skcipher_walk *walk,
 }
 EXPORT_SYMBOL_GPL(skcipher_walk_virt);
 
+int skcipher_walk_virt_bulk(struct skcipher_walk *walk,
+			    struct skcipher_bulk_request *req, bool atomic)
+{
+	int err;
+
+	walk->flags &= ~SKCIPHER_WALK_PHYS;
+
+	err = skcipher_walk_skcipher_bulk(walk, req);
+
+	walk->flags &= atomic ? ~SKCIPHER_WALK_SLEEP : ~0;
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(skcipher_walk_virt_bulk);
+
 void skcipher_walk_atomise(struct skcipher_walk *walk)
 {
 	walk->flags &= ~SKCIPHER_WALK_SLEEP;
@@ -502,6 +586,17 @@ int skcipher_walk_async(struct skcipher_walk *walk,
 }
 EXPORT_SYMBOL_GPL(skcipher_walk_async);
 
+int skcipher_walk_async_bulk(struct skcipher_walk *walk,
+			     struct skcipher_bulk_request *req)
+{
+	walk->flags |= SKCIPHER_WALK_PHYS;
+
+	INIT_LIST_HEAD(&walk->buffers);
+
+	return skcipher_walk_skcipher_bulk(walk, req);
+}
+EXPORT_SYMBOL_GPL(skcipher_walk_async_bulk);
+
 static int skcipher_walk_aead_common(struct skcipher_walk *walk,
 				     struct aead_request *req, bool atomic)
 {
@@ -509,6 +604,7 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk,
 	int err;
 
 	walk->flags &= ~SKCIPHER_WALK_PHYS;
+	walk->flags &= ~SKCIPHER_WALK_HETEROGENOUS;
 
 	scatterwalk_start(&walk->in, req->src);
 	scatterwalk_start(&walk->out, req->dst);
diff --git a/include/crypto/internal/skcipher.h b/include/crypto/internal/skcipher.h
index f536b57..1f789df 100644
--- a/include/crypto/internal/skcipher.h
+++ b/include/crypto/internal/skcipher.h
@@ -50,9 +50,12 @@ struct skcipher_walk {
 	} src, dst;
 
 	struct scatter_walk in;
+	struct scatter_walk out;
 	unsigned int nbytes;
 
-	struct scatter_walk out;
+	bool nextmsg;
+	const unsigned int *nextmsgsize;
+	unsigned int total_msg;
 	unsigned int total;
 
 	struct list_head buffers;
@@ -150,9 +153,14 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err);
 int skcipher_walk_virt(struct skcipher_walk *walk,
 		       struct skcipher_request *req,
 		       bool atomic);
+int skcipher_walk_virt_bulk(struct skcipher_walk *walk,
+			    struct skcipher_bulk_request *req,
+			    bool atomic);
 void skcipher_walk_atomise(struct skcipher_walk *walk);
 int skcipher_walk_async(struct skcipher_walk *walk,
 			struct skcipher_request *req);
+int skcipher_walk_async_bulk(struct skcipher_walk *walk,
+			     struct skcipher_bulk_request *req);
 int skcipher_walk_aead(struct skcipher_walk *walk, struct aead_request *req,
 		       bool atomic);
 int skcipher_walk_aead_encrypt(struct skcipher_walk *walk,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/6] crypto: cryptd - Add skcipher bulk request support
  2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 1/6] crypto: skcipher - Add bulk request processing API Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 2/6] crypto: skcipher - Add bulk request support to walk Ondrej Mosnacek
@ 2017-01-12 12:59 ` Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 4/6] crypto: simd - Add " Ondrej Mosnacek
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ondrej Mosnacek @ 2017-01-12 12:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ondrej Mosnacek, linux-crypto, dm-devel, Mike Snitzer,
	Milan Broz, Mikulas Patocka, Binoy Jayan

This patch adds proper support for the new bulk requests to cryptd.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
---
 crypto/cryptd.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index 0508c48..b7d6e13 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -555,6 +555,114 @@ static int cryptd_skcipher_decrypt_enqueue(struct skcipher_request *req)
 	return cryptd_skcipher_enqueue(req, cryptd_skcipher_decrypt);
 }
 
+static void cryptd_skcipher_bulk_complete(struct skcipher_bulk_request *req,
+					  int err)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct cryptd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct cryptd_skcipher_request_ctx *rctx =
+			skcipher_bulk_request_ctx(req);
+	int refcnt = atomic_read(&ctx->refcnt);
+
+	local_bh_disable();
+	rctx->complete(&req->base, err);
+	local_bh_enable();
+
+	if (err != -EINPROGRESS && refcnt && atomic_dec_and_test(&ctx->refcnt))
+		crypto_free_skcipher(tfm);
+}
+
+static void cryptd_skcipher_bulk_encrypt(struct crypto_async_request *base,
+					 int err)
+{
+	struct skcipher_bulk_request *req = skcipher_bulk_request_cast(base);
+	struct cryptd_skcipher_request_ctx *rctx =
+			skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct cryptd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct crypto_skcipher *child = ctx->child;
+	SKCIPHER_BULK_REQUEST_ON_STACK(subreq, req->maxmsgs, child);
+
+	if (unlikely(err == -EINPROGRESS))
+		goto out;
+
+	skcipher_bulk_request_set_tfm(subreq, child);
+	skcipher_bulk_request_set_callback(subreq, CRYPTO_TFM_REQ_MAY_SLEEP,
+					   NULL, NULL);
+	skcipher_bulk_request_set_crypt(subreq, req->src, req->dst, req->nmsgs,
+					req->msgsize, req->msgsizes, req->ivs);
+
+	err = crypto_skcipher_encrypt_bulk(subreq);
+	skcipher_bulk_request_zero(subreq);
+
+	req->base.complete = rctx->complete;
+
+out:
+	cryptd_skcipher_bulk_complete(req, err);
+}
+
+static void cryptd_skcipher_bulk_decrypt(struct crypto_async_request *base,
+					 int err)
+{
+	struct skcipher_bulk_request *req = skcipher_bulk_request_cast(base);
+	struct cryptd_skcipher_request_ctx *rctx =
+			skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct cryptd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct crypto_skcipher *child = ctx->child;
+	SKCIPHER_BULK_REQUEST_ON_STACK(subreq, req->maxmsgs, child);
+
+	if (unlikely(err == -EINPROGRESS))
+		goto out;
+
+	skcipher_bulk_request_set_tfm(subreq, child);
+	skcipher_bulk_request_set_callback(subreq, CRYPTO_TFM_REQ_MAY_SLEEP,
+					   NULL, NULL);
+	skcipher_bulk_request_set_crypt(subreq, req->src, req->dst, req->nmsgs,
+					req->msgsize, req->msgsizes, req->ivs);
+
+	err = crypto_skcipher_decrypt_bulk(subreq);
+	skcipher_bulk_request_zero(subreq);
+
+	req->base.complete = rctx->complete;
+
+out:
+	cryptd_skcipher_bulk_complete(req, err);
+}
+
+static int cryptd_skcipher_bulk_enqueue(struct skcipher_bulk_request *req,
+					crypto_completion_t compl)
+{
+	struct cryptd_skcipher_request_ctx *rctx =
+			skcipher_bulk_request_ctx(req);
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct cryptd_queue *queue;
+
+	queue = cryptd_get_queue(crypto_skcipher_tfm(tfm));
+	rctx->complete = req->base.complete;
+	req->base.complete = compl;
+
+	return cryptd_enqueue_request(queue, &req->base);
+}
+
+static int cryptd_skcipher_bulk_encrypt_enqueue(
+		struct skcipher_bulk_request *req)
+{
+	return cryptd_skcipher_bulk_enqueue(req, cryptd_skcipher_bulk_encrypt);
+}
+
+static int cryptd_skcipher_bulk_decrypt_enqueue(
+		struct skcipher_bulk_request *req)
+{
+	return cryptd_skcipher_bulk_enqueue(req, cryptd_skcipher_bulk_decrypt);
+}
+
+static unsigned int cryptd_skcipher_bulk_reqsize(struct crypto_skcipher *tfm,
+						 unsigned int maxmsgs)
+{
+	return sizeof(struct cryptd_skcipher_request_ctx);
+}
+
 static int cryptd_skcipher_init_tfm(struct crypto_skcipher *tfm)
 {
 	struct skcipher_instance *inst = skcipher_alg_instance(tfm);
@@ -641,6 +749,9 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
 	inst->alg.setkey = cryptd_skcipher_setkey;
 	inst->alg.encrypt = cryptd_skcipher_encrypt_enqueue;
 	inst->alg.decrypt = cryptd_skcipher_decrypt_enqueue;
+	inst->alg.encrypt_bulk = cryptd_skcipher_bulk_encrypt_enqueue;
+	inst->alg.decrypt_bulk = cryptd_skcipher_bulk_decrypt_enqueue;
+	inst->alg.reqsize_bulk = cryptd_skcipher_bulk_reqsize;
 
 	inst->free = cryptd_skcipher_free;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 4/6] crypto: simd - Add bulk request support
  2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
                   ` (2 preceding siblings ...)
  2017-01-12 12:59 ` [RFC PATCH 3/6] crypto: cryptd - Add skcipher bulk request support Ondrej Mosnacek
@ 2017-01-12 12:59 ` Ondrej Mosnacek
  2017-01-12 12:59 ` [RFC PATCH 5/6] crypto: aesni-intel " Ondrej Mosnacek
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ondrej Mosnacek @ 2017-01-12 12:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ondrej Mosnacek, linux-crypto, dm-devel, Mike Snitzer,
	Milan Broz, Mikulas Patocka, Binoy Jayan

This patch adds proper support for the new bulk requests to the SIMD helpers.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
---
 crypto/simd.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/crypto/simd.c b/crypto/simd.c
index 8820337..2ae5930 100644
--- a/crypto/simd.c
+++ b/crypto/simd.c
@@ -100,6 +100,64 @@ static int simd_skcipher_decrypt(struct skcipher_request *req)
 	return crypto_skcipher_decrypt(subreq);
 }
 
+static int simd_skcipher_encrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_bulk_request *subreq;
+	struct crypto_skcipher *child;
+
+	subreq = skcipher_bulk_request_ctx(req);
+	*subreq = *req;
+
+	if (!may_use_simd() ||
+	    (in_atomic() && cryptd_skcipher_queued(ctx->cryptd_tfm)))
+		child = &ctx->cryptd_tfm->base;
+	else
+		child = cryptd_skcipher_child(ctx->cryptd_tfm);
+
+	skcipher_bulk_request_set_tfm(subreq, child);
+
+	return crypto_skcipher_encrypt_bulk(subreq);
+}
+
+static int simd_skcipher_decrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_bulk_request *subreq;
+	struct crypto_skcipher *child;
+
+	subreq = skcipher_bulk_request_ctx(req);
+	*subreq = *req;
+
+	if (!may_use_simd() ||
+	    (in_atomic() && cryptd_skcipher_queued(ctx->cryptd_tfm)))
+		child = &ctx->cryptd_tfm->base;
+	else
+		child = cryptd_skcipher_child(ctx->cryptd_tfm);
+
+	skcipher_bulk_request_set_tfm(subreq, child);
+
+	return crypto_skcipher_decrypt_bulk(subreq);
+}
+
+static unsigned int simd_skcipher_reqsize_bulk(struct crypto_skcipher *tfm,
+					       unsigned int maxmsgs)
+{
+	struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct crypto_skcipher *tfm_cryptd, *tfm_child;
+	unsigned int reqsize_cryptd, reqsize_child;
+
+	tfm_cryptd = &ctx->cryptd_tfm->base;
+	tfm_child = cryptd_skcipher_child(ctx->cryptd_tfm);
+
+	reqsize_cryptd = crypto_skcipher_bulk_reqsize(tfm_cryptd, maxmsgs);
+	reqsize_child = crypto_skcipher_bulk_reqsize(tfm_child, maxmsgs);
+	return sizeof(struct skcipher_bulk_request) +
+			max(reqsize_cryptd, reqsize_child);
+}
+
 static void simd_skcipher_exit(struct crypto_skcipher *tfm)
 {
 	struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
@@ -187,6 +245,9 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(const char *algname,
 	alg->setkey = simd_skcipher_setkey;
 	alg->encrypt = simd_skcipher_encrypt;
 	alg->decrypt = simd_skcipher_decrypt;
+	alg->encrypt_bulk = simd_skcipher_encrypt_bulk;
+	alg->decrypt_bulk = simd_skcipher_decrypt_bulk;
+	alg->reqsize_bulk = simd_skcipher_reqsize_bulk;
 
 	err = crypto_register_skcipher(alg);
 	if (err)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support
  2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
                   ` (3 preceding siblings ...)
  2017-01-12 12:59 ` [RFC PATCH 4/6] crypto: simd - Add " Ondrej Mosnacek
@ 2017-01-12 12:59 ` Ondrej Mosnacek
  2017-01-13  3:19   ` Eric Biggers
  2017-01-12 12:59 ` [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support Ondrej Mosnacek
  2017-01-13 10:41 ` [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Herbert Xu
  6 siblings, 1 reply; 19+ messages in thread
From: Ondrej Mosnacek @ 2017-01-12 12:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ondrej Mosnacek, linux-crypto, dm-devel, Mike Snitzer,
	Milan Broz, Mikulas Patocka, Binoy Jayan

This patch implements bulk request handling in the AES-NI crypto drivers.
The major advantage of this is that with bulk requests, the kernel_fpu_*
functions (which are usually quite slow) are now called only once for the whole
request.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
---
 arch/x86/crypto/aesni-intel_glue.c        | 267 +++++++++++++++++++++++-------
 arch/x86/crypto/glue_helper.c             |  23 ++-
 arch/x86/include/asm/crypto/glue_helper.h |   2 +-
 3 files changed, 221 insertions(+), 71 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 36ca150..5f67afc 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -364,70 +364,116 @@ static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 				  crypto_skcipher_ctx(tfm), key, len);
 }
 
-static int ecb_encrypt(struct skcipher_request *req)
+typedef void (*aesni_crypt_t)(struct crypto_aes_ctx *ctx,
+			      u8 *out, const u8 *in, unsigned int len);
+
+typedef void (*aesni_ivcrypt_t)(struct crypto_aes_ctx *ctx,
+				u8 *out, const u8 *in, unsigned int len,
+				u8 *iv);
+
+static int ecb_crypt(struct crypto_aes_ctx *ctx, struct skcipher_walk *walk,
+		     aesni_crypt_t crypt)
 {
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
-	struct skcipher_walk walk;
 	unsigned int nbytes;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
-
 	kernel_fpu_begin();
-	while ((nbytes = walk.nbytes)) {
-		aesni_ecb_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
-			      nbytes & AES_BLOCK_MASK);
+	while ((nbytes = walk->nbytes)) {
+		crypt(ctx, walk->dst.virt.addr, walk->src.virt.addr,
+		      nbytes & AES_BLOCK_MASK);
 		nbytes &= AES_BLOCK_SIZE - 1;
-		err = skcipher_walk_done(&walk, nbytes);
+		err = skcipher_walk_done(walk, nbytes);
 	}
 	kernel_fpu_end();
 
 	return err;
 }
 
-static int ecb_decrypt(struct skcipher_request *req)
+static int cbc_crypt(struct crypto_aes_ctx *ctx, struct skcipher_walk *walk,
+		     aesni_ivcrypt_t crypt)
 {
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
-	struct skcipher_walk walk;
 	unsigned int nbytes;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
-
 	kernel_fpu_begin();
-	while ((nbytes = walk.nbytes)) {
-		aesni_ecb_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
-			      nbytes & AES_BLOCK_MASK);
+	while ((nbytes = walk->nbytes)) {
+		crypt(ctx, walk->dst.virt.addr, walk->src.virt.addr,
+		      nbytes & AES_BLOCK_MASK, walk->iv);
 		nbytes &= AES_BLOCK_SIZE - 1;
-		err = skcipher_walk_done(&walk, nbytes);
+		err = skcipher_walk_done(walk, nbytes);
 	}
 	kernel_fpu_end();
 
 	return err;
 }
 
-static int cbc_encrypt(struct skcipher_request *req)
+static int ecb_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
 	struct skcipher_walk walk;
-	unsigned int nbytes;
 	int err;
 
 	err = skcipher_walk_virt(&walk, req, true);
+	if (err)
+		return err;
 
-	kernel_fpu_begin();
-	while ((nbytes = walk.nbytes)) {
-		aesni_cbc_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
-			      nbytes & AES_BLOCK_MASK, walk.iv);
-		nbytes &= AES_BLOCK_SIZE - 1;
-		err = skcipher_walk_done(&walk, nbytes);
-	}
-	kernel_fpu_end();
+	return ecb_crypt(ctx, &walk, aesni_ecb_enc);
+}
 
-	return err;
+static int ecb_decrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+	if (err)
+		return err;
+
+	return ecb_crypt(ctx, &walk, aesni_ecb_dec);
+}
+
+static int ecb_encrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt_bulk(&walk, req, true);
+	if (err)
+		return err;
+
+	return ecb_crypt(ctx, &walk, aesni_ecb_enc);
+}
+
+static int ecb_decrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt_bulk(&walk, req, true);
+	if (err)
+		return err;
+
+	return ecb_crypt(ctx, &walk, aesni_ecb_dec);
+}
+
+static int cbc_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+	if (err)
+		return err;
+	return cbc_crypt(ctx, &walk, aesni_cbc_enc);
 }
 
 static int cbc_decrypt(struct skcipher_request *req)
@@ -435,21 +481,44 @@ static int cbc_decrypt(struct skcipher_request *req)
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
 	struct skcipher_walk walk;
-	unsigned int nbytes;
 	int err;
 
 	err = skcipher_walk_virt(&walk, req, true);
+	if (err)
+		return err;
+	return cbc_crypt(ctx, &walk, aesni_cbc_dec);
+}
 
-	kernel_fpu_begin();
-	while ((nbytes = walk.nbytes)) {
-		aesni_cbc_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
-			      nbytes & AES_BLOCK_MASK, walk.iv);
-		nbytes &= AES_BLOCK_SIZE - 1;
-		err = skcipher_walk_done(&walk, nbytes);
-	}
-	kernel_fpu_end();
+static int cbc_encrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
 
-	return err;
+	err = skcipher_walk_virt_bulk(&walk, req, true);
+	if (err)
+		return err;
+	return cbc_crypt(ctx, &walk, aesni_cbc_enc);
+}
+
+static int cbc_decrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt_bulk(&walk, req, true);
+	if (err)
+		return err;
+	return cbc_crypt(ctx, &walk, aesni_cbc_dec);
+}
+
+static unsigned int aesni_reqsize_bulk(struct crypto_skcipher *tfm,
+				       unsigned int maxmsgs)
+{
+	return 0;
 }
 
 #ifdef CONFIG_X86_64
@@ -487,32 +556,58 @@ static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
 }
 #endif
 
-static int ctr_crypt(struct skcipher_request *req)
+static int ctr_crypt_common(struct crypto_aes_ctx *ctx,
+			    struct skcipher_walk *walk)
 {
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
-	struct skcipher_walk walk;
 	unsigned int nbytes;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
-
 	kernel_fpu_begin();
-	while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) {
-		aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr,
-			              nbytes & AES_BLOCK_MASK, walk.iv);
+	while ((nbytes = walk->nbytes)) {
+		if (nbytes < AES_BLOCK_SIZE) {
+			ctr_crypt_final(ctx, walk);
+			err = skcipher_walk_done(walk, nbytes);
+			continue;
+		}
+
+		aesni_ctr_enc_tfm(ctx, walk->dst.virt.addr, walk->src.virt.addr,
+				  nbytes & AES_BLOCK_MASK, walk->iv);
 		nbytes &= AES_BLOCK_SIZE - 1;
-		err = skcipher_walk_done(&walk, nbytes);
-	}
-	if (walk.nbytes) {
-		ctr_crypt_final(ctx, &walk);
-		err = skcipher_walk_done(&walk, 0);
+		err = skcipher_walk_done(walk, nbytes);
 	}
 	kernel_fpu_end();
 
 	return err;
 }
 
+static int ctr_crypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+	if (err)
+		return err;
+
+	return ctr_crypt_common(ctx, &walk);
+}
+
+static int ctr_crypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt_bulk(&walk, req, true);
+	if (err)
+		return err;
+
+	return ctr_crypt_common(ctx, &walk);
+}
+
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
@@ -592,8 +687,14 @@ static int xts_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err;
 
-	return glue_xts_req_128bit(&aesni_enc_xts, req,
+	err = skcipher_walk_virt(&walk, req, false);
+	if (err)
+		return err;
+
+	return glue_xts_req_128bit(&aesni_enc_xts, &walk,
 				   XTS_TWEAK_CAST(aesni_xts_tweak),
 				   aes_ctx(ctx->raw_tweak_ctx),
 				   aes_ctx(ctx->raw_crypt_ctx));
@@ -603,8 +704,48 @@ static int xts_decrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (err)
+		return err;
+
+	return glue_xts_req_128bit(&aesni_dec_xts, &walk,
+				   XTS_TWEAK_CAST(aesni_xts_tweak),
+				   aes_ctx(ctx->raw_tweak_ctx),
+				   aes_ctx(ctx->raw_crypt_ctx));
+}
+
+static int xts_encrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt_bulk(&walk, req, false);
+	if (err)
+		return err;
+
+	return glue_xts_req_128bit(&aesni_enc_xts, &walk,
+				   XTS_TWEAK_CAST(aesni_xts_tweak),
+				   aes_ctx(ctx->raw_tweak_ctx),
+				   aes_ctx(ctx->raw_crypt_ctx));
+}
+
+static int xts_decrypt_bulk(struct skcipher_bulk_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt_bulk(&walk, req, false);
+	if (err)
+		return err;
 
-	return glue_xts_req_128bit(&aesni_dec_xts, req,
+	return glue_xts_req_128bit(&aesni_dec_xts, &walk,
 				   XTS_TWEAK_CAST(aesni_xts_tweak),
 				   aes_ctx(ctx->raw_tweak_ctx),
 				   aes_ctx(ctx->raw_crypt_ctx));
@@ -962,6 +1103,9 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.setkey		= aesni_skcipher_setkey,
 		.encrypt	= ecb_encrypt,
 		.decrypt	= ecb_decrypt,
+		.encrypt_bulk	= ecb_encrypt_bulk,
+		.decrypt_bulk	= ecb_decrypt_bulk,
+		.reqsize_bulk	= aesni_reqsize_bulk,
 	}, {
 		.base = {
 			.cra_name		= "__cbc(aes)",
@@ -978,6 +1122,9 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.setkey		= aesni_skcipher_setkey,
 		.encrypt	= cbc_encrypt,
 		.decrypt	= cbc_decrypt,
+		.encrypt_bulk	= cbc_encrypt_bulk,
+		.decrypt_bulk	= cbc_decrypt_bulk,
+		.reqsize_bulk	= aesni_reqsize_bulk,
 #ifdef CONFIG_X86_64
 	}, {
 		.base = {
@@ -996,6 +1143,9 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.setkey		= aesni_skcipher_setkey,
 		.encrypt	= ctr_crypt,
 		.decrypt	= ctr_crypt,
+		.encrypt_bulk	= ctr_crypt_bulk,
+		.decrypt_bulk	= ctr_crypt_bulk,
+		.reqsize_bulk	= aesni_reqsize_bulk,
 	}, {
 		.base = {
 			.cra_name		= "__xts(aes)",
@@ -1012,6 +1162,9 @@ static struct skcipher_alg aesni_skciphers[] = {
 		.setkey		= xts_aesni_setkey,
 		.encrypt	= xts_encrypt,
 		.decrypt	= xts_decrypt,
+		.encrypt_bulk	= xts_encrypt_bulk,
+		.decrypt_bulk	= xts_decrypt_bulk,
+		.reqsize_bulk	= aesni_reqsize_bulk,
 #endif
 	}
 };
diff --git a/arch/x86/crypto/glue_helper.c b/arch/x86/crypto/glue_helper.c
index 260a060..7bd28bf 100644
--- a/arch/x86/crypto/glue_helper.c
+++ b/arch/x86/crypto/glue_helper.c
@@ -415,34 +415,31 @@ int glue_xts_crypt_128bit(const struct common_glue_ctx *gctx,
 EXPORT_SYMBOL_GPL(glue_xts_crypt_128bit);
 
 int glue_xts_req_128bit(const struct common_glue_ctx *gctx,
-			struct skcipher_request *req,
+			struct skcipher_walk *walk,
 			common_glue_func_t tweak_fn, void *tweak_ctx,
 			void *crypt_ctx)
 {
 	const unsigned int bsize = 128 / 8;
-	struct skcipher_walk walk;
 	bool fpu_enabled = false;
 	unsigned int nbytes;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, false);
-	nbytes = walk.nbytes;
-	if (!nbytes)
-		return err;
+	nbytes = walk->nbytes;
 
 	/* set minimum length to bsize, for tweak_fn */
 	fpu_enabled = glue_skwalk_fpu_begin(bsize, gctx->fpu_blocks_limit,
-					    &walk, fpu_enabled,
+					    walk, fpu_enabled,
 					    nbytes < bsize ? bsize : nbytes);
 
-	/* calculate first value of T */
-	tweak_fn(tweak_ctx, walk.iv, walk.iv);
-
 	while (nbytes) {
-		nbytes = __glue_xts_req_128bit(gctx, crypt_ctx, &walk);
+		/* calculate first value of T */
+		if (walk->nextmsg)
+			tweak_fn(tweak_ctx, walk->iv, walk->iv);
 
-		err = skcipher_walk_done(&walk, nbytes);
-		nbytes = walk.nbytes;
+		nbytes = __glue_xts_req_128bit(gctx, crypt_ctx, walk);
+
+		err = skcipher_walk_done(walk, nbytes);
+		nbytes = walk->nbytes;
 	}
 
 	glue_fpu_end(fpu_enabled);
diff --git a/arch/x86/include/asm/crypto/glue_helper.h b/arch/x86/include/asm/crypto/glue_helper.h
index 29e53ea..e9806a8 100644
--- a/arch/x86/include/asm/crypto/glue_helper.h
+++ b/arch/x86/include/asm/crypto/glue_helper.h
@@ -172,7 +172,7 @@ extern int glue_xts_crypt_128bit(const struct common_glue_ctx *gctx,
 				 void *crypt_ctx);
 
 extern int glue_xts_req_128bit(const struct common_glue_ctx *gctx,
-			       struct skcipher_request *req,
+			       struct skcipher_walk *walk,
 			       common_glue_func_t tweak_fn, void *tweak_ctx,
 			       void *crypt_ctx);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support
  2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
                   ` (4 preceding siblings ...)
  2017-01-12 12:59 ` [RFC PATCH 5/6] crypto: aesni-intel " Ondrej Mosnacek
@ 2017-01-12 12:59 ` Ondrej Mosnacek
  2017-01-16  8:37   ` Binoy Jayan
  2017-01-13 10:41 ` [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Herbert Xu
  6 siblings, 1 reply; 19+ messages in thread
From: Ondrej Mosnacek @ 2017-01-12 12:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ondrej Mosnacek, linux-crypto, dm-devel, Mike Snitzer,
	Milan Broz, Mikulas Patocka, Binoy Jayan

This patch converts dm-crypt to use bulk requests when invoking skcipher
operations, allowing the crypto drivers to process multiple sectors at once,
while reducing the overhead caused by the small sector size.

The new code detects if multiple sectors from a bio are contigously stored
within a single page (which should almost always be the case), and in such case
processes all these sectors via a single bulk request.

Note that the bio can also consist of several (likely consecutive) pages, which
could be all bundled in a single request. However, since we need to specify an
upper bound on how many sectors we are going to send at once (and this bound
may affect the amount of memory allocated per single request), it is best to
just limit the request bundling to a single page.

Note that if the 'keycount' parameter of the cipher specification is set to a
value other than 1, dm-crypt still sends only one sector in each request, since
in such case the neighboring sectors are encrypted with different keys.

This change causes a detectable read/write speedup (about 5-10%) on a ramdisk
when AES-NI accelerated ciphers are used.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
---
 drivers/md/dm-crypt.c | 254 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 165 insertions(+), 89 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 7c6c572..d3f69e1 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -37,6 +37,9 @@
 
 #define DM_MSG_PREFIX "crypt"
 
+/* for now, we only bundle consecutve sectors within a single page */
+#define MAX_CONSEC_SECTORS (1 << (PAGE_SHIFT - SECTOR_SHIFT))
+
 /*
  * context holding the current state of a multi-part conversion
  */
@@ -48,7 +51,7 @@ struct convert_context {
 	struct bvec_iter iter_out;
 	sector_t cc_sector;
 	atomic_t cc_pending;
-	struct skcipher_request *req;
+	struct skcipher_bulk_request *req;
 };
 
 /*
@@ -73,6 +76,7 @@ struct dm_crypt_request {
 	struct scatterlist sg_in;
 	struct scatterlist sg_out;
 	sector_t iv_sector;
+	sector_t sector_count;
 };
 
 struct crypt_config;
@@ -83,9 +87,9 @@ struct crypt_iv_operations {
 	void (*dtr)(struct crypt_config *cc);
 	int (*init)(struct crypt_config *cc);
 	int (*wipe)(struct crypt_config *cc);
-	int (*generator)(struct crypt_config *cc, u8 *iv,
+	int (*generator)(struct crypt_config *cc, u8 *iv, unsigned int sector,
 			 struct dm_crypt_request *dmreq);
-	int (*post)(struct crypt_config *cc, u8 *iv,
+	int (*post)(struct crypt_config *cc, u8 *iv, unsigned int sector,
 		    struct dm_crypt_request *dmreq);
 };
 
@@ -163,14 +167,14 @@ struct crypt_config {
 	/*
 	 * Layout of each crypto request:
 	 *
-	 *   struct skcipher_request
+	 *   struct skcipher_bulk_request
 	 *      context
 	 *      padding
 	 *   struct dm_crypt_request
 	 *      padding
-	 *   IV
+	 *   IVs
 	 *
-	 * The padding is added so that dm_crypt_request and the IV are
+	 * The padding is added so that dm_crypt_request and the IVs are
 	 * correctly aligned.
 	 */
 	unsigned int dmreq_start;
@@ -245,20 +249,24 @@ static struct crypto_skcipher *any_tfm(struct crypt_config *cc)
  * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454
  */
 
-static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			      struct dm_crypt_request *dmreq)
 {
+	u8 *iv = ivs + i * cc->iv_size;
+
 	memset(iv, 0, cc->iv_size);
-	*(__le32 *)iv = cpu_to_le32(dmreq->iv_sector & 0xffffffff);
+	*(__le32 *)iv = cpu_to_le32((dmreq->iv_sector + i) & 0xffffffff);
 
 	return 0;
 }
 
-static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *iv,
-				struct dm_crypt_request *dmreq)
+static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *ivs,
+				unsigned int i, struct dm_crypt_request *dmreq)
 {
+	u8 *iv = ivs + i * cc->iv_size;
+
 	memset(iv, 0, cc->iv_size);
-	*(__le64 *)iv = cpu_to_le64(dmreq->iv_sector);
+	*(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i);
 
 	return 0;
 }
@@ -410,13 +418,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
 	return err;
 }
 
-static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			      struct dm_crypt_request *dmreq)
 {
 	struct crypto_cipher *essiv_tfm = cc->iv_private;
+	u8 *iv = ivs + i * cc->iv_size;
 
 	memset(iv, 0, cc->iv_size);
-	*(__le64 *)iv = cpu_to_le64(dmreq->iv_sector);
+	*(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i);
 	crypto_cipher_encrypt_one(essiv_tfm, iv, iv);
 
 	return 0;
@@ -450,22 +459,26 @@ static void crypt_iv_benbi_dtr(struct crypt_config *cc)
 {
 }
 
-static int crypt_iv_benbi_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_benbi_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			      struct dm_crypt_request *dmreq)
 {
+	u8 *iv = ivs + i * cc->iv_size;
+	u64 sector = (u64)(dmreq->iv_sector + i);
 	__be64 val;
 
 	memset(iv, 0, cc->iv_size - sizeof(u64)); /* rest is cleared below */
 
-	val = cpu_to_be64(((u64)dmreq->iv_sector << cc->iv_gen_private.benbi.shift) + 1);
+	val = cpu_to_be64((sector << cc->iv_gen_private.benbi.shift) + 1);
 	put_unaligned(val, (__be64 *)(iv + cc->iv_size - sizeof(u64)));
 
 	return 0;
 }
 
-static int crypt_iv_null_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_null_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			     struct dm_crypt_request *dmreq)
 {
+	u8 *iv = ivs + i * cc->iv_size;
+
 	memset(iv, 0, cc->iv_size);
 
 	return 0;
@@ -534,8 +547,7 @@ static int crypt_iv_lmk_wipe(struct crypt_config *cc)
 }
 
 static int crypt_iv_lmk_one(struct crypt_config *cc, u8 *iv,
-			    struct dm_crypt_request *dmreq,
-			    u8 *data)
+			    u64 sector, u8 *data)
 {
 	struct iv_lmk_private *lmk = &cc->iv_gen_private.lmk;
 	SHASH_DESC_ON_STACK(desc, lmk->hash_tfm);
@@ -562,8 +574,8 @@ static int crypt_iv_lmk_one(struct crypt_config *cc, u8 *iv,
 		return r;
 
 	/* Sector is cropped to 56 bits here */
-	buf[0] = cpu_to_le32(dmreq->iv_sector & 0xFFFFFFFF);
-	buf[1] = cpu_to_le32((((u64)dmreq->iv_sector >> 32) & 0x00FFFFFF) | 0x80000000);
+	buf[0] = cpu_to_le32(sector & 0xFFFFFFFF);
+	buf[1] = cpu_to_le32(((sector >> 32) & 0x00FFFFFF) | 0x80000000);
 	buf[2] = cpu_to_le32(4024);
 	buf[3] = 0;
 	r = crypto_shash_update(desc, (u8 *)buf, sizeof(buf));
@@ -582,39 +594,43 @@ static int crypt_iv_lmk_one(struct crypt_config *cc, u8 *iv,
 	return 0;
 }
 
-static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			    struct dm_crypt_request *dmreq)
 {
-	u8 *src;
+	u8 *iv = ivs + i * cc->iv_size;
+	u8 *mapped, *src;
 	int r = 0;
 
 	if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) {
-		src = kmap_atomic(sg_page(&dmreq->sg_in));
-		r = crypt_iv_lmk_one(cc, iv, dmreq, src + dmreq->sg_in.offset);
-		kunmap_atomic(src);
+		mapped = kmap_atomic(sg_page(&dmreq->sg_in));
+		src = mapped + dmreq->sg_in.offset + i * (1 << SECTOR_SHIFT);
+		r = crypt_iv_lmk_one(cc, iv, dmreq->iv_sector + i, src);
+		kunmap_atomic(mapped);
 	} else
 		memset(iv, 0, cc->iv_size);
 
 	return r;
 }
 
-static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			     struct dm_crypt_request *dmreq)
 {
-	u8 *dst;
+	u8 *iv = ivs + i * cc->iv_size;
+	u8 *mapped, *dst;
 	int r;
 
 	if (bio_data_dir(dmreq->ctx->bio_in) == WRITE)
 		return 0;
 
-	dst = kmap_atomic(sg_page(&dmreq->sg_out));
-	r = crypt_iv_lmk_one(cc, iv, dmreq, dst + dmreq->sg_out.offset);
+	mapped = kmap_atomic(sg_page(&dmreq->sg_out));
+	dst = mapped + dmreq->sg_out.offset + i * (1 << SECTOR_SHIFT);
+	r = crypt_iv_lmk_one(cc, iv, dmreq->iv_sector + i, dst);
 
 	/* Tweak the first block of plaintext sector */
 	if (!r)
-		crypto_xor(dst + dmreq->sg_out.offset, iv, cc->iv_size);
+		crypto_xor(dst, iv, cc->iv_size);
 
-	kunmap_atomic(dst);
+	kunmap_atomic(mapped);
 	return r;
 }
 
@@ -682,11 +698,10 @@ static int crypt_iv_tcw_wipe(struct crypt_config *cc)
 }
 
 static int crypt_iv_tcw_whitening(struct crypt_config *cc,
-				  struct dm_crypt_request *dmreq,
-				  u8 *data)
+				  u64 iv_sector, u8 *data)
 {
 	struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
-	__le64 sector = cpu_to_le64(dmreq->iv_sector);
+	__le64 sector = cpu_to_le64(iv_sector);
 	u8 buf[TCW_WHITENING_SIZE];
 	SHASH_DESC_ON_STACK(desc, tcw->crc32_tfm);
 	int i, r;
@@ -721,19 +736,21 @@ static int crypt_iv_tcw_whitening(struct crypt_config *cc,
 	return r;
 }
 
-static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			    struct dm_crypt_request *dmreq)
 {
 	struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
-	__le64 sector = cpu_to_le64(dmreq->iv_sector);
-	u8 *src;
+	__le64 sector = cpu_to_le64(dmreq->iv_sector + i);
+	u8 *iv = ivs + i * cc->iv_size;
+	u8 *mapped, *src;
 	int r = 0;
 
 	/* Remove whitening from ciphertext */
 	if (bio_data_dir(dmreq->ctx->bio_in) != WRITE) {
-		src = kmap_atomic(sg_page(&dmreq->sg_in));
-		r = crypt_iv_tcw_whitening(cc, dmreq, src + dmreq->sg_in.offset);
-		kunmap_atomic(src);
+		mapped = kmap_atomic(sg_page(&dmreq->sg_in));
+		src = mapped + dmreq->sg_in.offset + i * (1 << SECTOR_SHIFT);
+		r = crypt_iv_tcw_whitening(cc, dmreq->iv_sector + i, src);
+		kunmap_atomic(mapped);
 	}
 
 	/* Calculate IV */
@@ -745,19 +762,20 @@ static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *iv,
 	return r;
 }
 
-static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *ivs, unsigned int i,
 			     struct dm_crypt_request *dmreq)
 {
-	u8 *dst;
+	u8 *mapped, *dst;
 	int r;
 
 	if (bio_data_dir(dmreq->ctx->bio_in) != WRITE)
 		return 0;
 
 	/* Apply whitening on ciphertext */
-	dst = kmap_atomic(sg_page(&dmreq->sg_out));
-	r = crypt_iv_tcw_whitening(cc, dmreq, dst + dmreq->sg_out.offset);
-	kunmap_atomic(dst);
+	mapped = kmap_atomic(sg_page(&dmreq->sg_out));
+	dst = mapped + dmreq->sg_out.offset + i * (1 << SECTOR_SHIFT);
+	r = crypt_iv_tcw_whitening(cc, dmreq->iv_sector + i, dst);
+	kunmap_atomic(mapped);
 
 	return r;
 }
@@ -821,16 +839,22 @@ static void crypt_convert_init(struct crypt_config *cc,
 	init_completion(&ctx->restart);
 }
 
+static unsigned int crypt_max_bulk_sectors(struct crypt_config *cc)
+{
+	/* go by one sector only if tfms_count > 1: */
+	return cc->tfms_count == 1 ? MAX_CONSEC_SECTORS : 1;
+}
+
 static struct dm_crypt_request *dmreq_of_req(struct crypt_config *cc,
-					     struct skcipher_request *req)
+					     struct skcipher_bulk_request *req)
 {
 	return (struct dm_crypt_request *)((char *)req + cc->dmreq_start);
 }
 
-static struct skcipher_request *req_of_dmreq(struct crypt_config *cc,
-					       struct dm_crypt_request *dmreq)
+static struct skcipher_bulk_request *req_of_dmreq(
+		struct crypt_config *cc, struct dm_crypt_request *dmreq)
 {
-	return (struct skcipher_request *)((char *)dmreq - cc->dmreq_start);
+	return (struct skcipher_bulk_request *)((u8 *)dmreq - cc->dmreq_start);
 }
 
 static u8 *iv_of_dmreq(struct crypt_config *cc,
@@ -840,48 +864,53 @@ static u8 *iv_of_dmreq(struct crypt_config *cc,
 		crypto_skcipher_alignmask(any_tfm(cc)) + 1);
 }
 
-static int crypt_convert_block(struct crypt_config *cc,
-			       struct convert_context *ctx,
-			       struct skcipher_request *req)
+static int crypt_convert_sectors(struct crypt_config *cc,
+				 struct convert_context *ctx,
+				 struct page *page_in, struct page *page_out,
+				 unsigned int off_in, unsigned int off_out,
+				 sector_t sectors)
 {
-	struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
-	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
+	unsigned int cryptlen = (1 << SECTOR_SHIFT) * (unsigned int)sectors;
+	struct skcipher_bulk_request *req = ctx->req;
 	struct dm_crypt_request *dmreq;
+	unsigned int i;
 	u8 *iv;
 	int r;
 
-	dmreq = dmreq_of_req(cc, req);
+	dmreq = dmreq_of_req(cc, ctx->req);
 	iv = iv_of_dmreq(cc, dmreq);
 
 	dmreq->iv_sector = ctx->cc_sector;
+	dmreq->sector_count = sectors;
 	dmreq->ctx = ctx;
+
 	sg_init_table(&dmreq->sg_in, 1);
-	sg_set_page(&dmreq->sg_in, bv_in.bv_page, 1 << SECTOR_SHIFT,
-		    bv_in.bv_offset);
+	sg_set_page(&dmreq->sg_in, page_in, cryptlen, off_in);
 
 	sg_init_table(&dmreq->sg_out, 1);
-	sg_set_page(&dmreq->sg_out, bv_out.bv_page, 1 << SECTOR_SHIFT,
-		    bv_out.bv_offset);
+	sg_set_page(&dmreq->sg_out, page_out, cryptlen, off_out);
 
-	bio_advance_iter(ctx->bio_in, &ctx->iter_in, 1 << SECTOR_SHIFT);
-	bio_advance_iter(ctx->bio_out, &ctx->iter_out, 1 << SECTOR_SHIFT);
-
-	if (cc->iv_gen_ops) {
-		r = cc->iv_gen_ops->generator(cc, iv, dmreq);
-		if (r < 0)
-			return r;
-	}
+	if (cc->iv_gen_ops)
+		for (i = 0; i < sectors; i++) {
+			r = cc->iv_gen_ops->generator(cc, iv, i, dmreq);
+			if (r < 0)
+				return r;
+		}
 
-	skcipher_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out,
-				   1 << SECTOR_SHIFT, iv);
+	skcipher_bulk_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out,
+					sectors, 1 << SECTOR_SHIFT, NULL, iv);
 
 	if (bio_data_dir(ctx->bio_in) == WRITE)
-		r = crypto_skcipher_encrypt(req);
+		r = crypto_skcipher_encrypt_bulk(req);
 	else
-		r = crypto_skcipher_decrypt(req);
+		r = crypto_skcipher_decrypt_bulk(req);
 
 	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
-		r = cc->iv_gen_ops->post(cc, iv, dmreq);
+		for (i = 0; i < sectors; i++) {
+			r = cc->iv_gen_ops->post(cc, iv, i, dmreq);
+			if (r < 0)
+				return r;
+		}
 
 	return r;
 }
@@ -897,23 +926,25 @@ static void crypt_alloc_req(struct crypt_config *cc,
 	if (!ctx->req)
 		ctx->req = mempool_alloc(cc->req_pool, GFP_NOIO);
 
-	skcipher_request_set_tfm(ctx->req, cc->tfms[key_index]);
+	skcipher_bulk_request_set_maxmsgs(ctx->req, crypt_max_bulk_sectors(cc));
+	skcipher_bulk_request_set_tfm(ctx->req, cc->tfms[key_index]);
 
 	/*
 	 * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 	 * requests if driver request queue is full.
 	 */
-	skcipher_request_set_callback(ctx->req,
+	skcipher_bulk_request_set_callback(ctx->req,
 	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
 	    kcryptd_async_done, dmreq_of_req(cc, ctx->req));
 }
 
 static void crypt_free_req(struct crypt_config *cc,
-			   struct skcipher_request *req, struct bio *base_bio)
+			   struct skcipher_bulk_request *req,
+			   struct bio *base_bio)
 {
 	struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
 
-	if ((struct skcipher_request *)(io + 1) != req)
+	if ((struct skcipher_bulk_request *)(io + 1) != req)
 		mempool_free(req, cc->req_pool);
 }
 
@@ -923,6 +954,11 @@ static void crypt_free_req(struct crypt_config *cc,
 static int crypt_convert(struct crypt_config *cc,
 			 struct convert_context *ctx)
 {
+	struct bio_vec bv_in, bv_out;
+	struct page *page_in, *page_out;
+	unsigned int off_in, off_out;
+	unsigned int maxsectors = crypt_max_bulk_sectors(cc);
+	sector_t sectors;
 	int r;
 
 	atomic_set(&ctx->cc_pending, 1);
@@ -933,7 +969,41 @@ static int crypt_convert(struct crypt_config *cc,
 
 		atomic_inc(&ctx->cc_pending);
 
-		r = crypt_convert_block(cc, ctx, ctx->req);
+		bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
+		bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
+
+		page_in = bv_in.bv_page;
+		page_out = bv_out.bv_page;
+
+		off_in = bv_in.bv_offset;
+		off_out = bv_out.bv_offset;
+
+		bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+				 1 << SECTOR_SHIFT);
+		bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+				 1 << SECTOR_SHIFT);
+		sectors = 1;
+
+		/* count consecutive sectors: */
+		while (sectors < maxsectors &&
+				ctx->iter_in.bi_size && ctx->iter_out.bi_size) {
+
+			bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
+			bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
+
+			if (bv_in.bv_page != page_in ||
+			    bv_out.bv_page != page_out)
+				break;
+
+			bio_advance_iter(ctx->bio_in, &ctx->iter_in,
+					 1 << SECTOR_SHIFT);
+			bio_advance_iter(ctx->bio_out, &ctx->iter_out,
+					 1 << SECTOR_SHIFT);
+			++sectors;
+		}
+
+		r = crypt_convert_sectors(cc, ctx, page_in, page_out,
+					  off_in, off_out, sectors);
 
 		switch (r) {
 		/*
@@ -950,14 +1020,14 @@ static int crypt_convert(struct crypt_config *cc,
 		 */
 		case -EINPROGRESS:
 			ctx->req = NULL;
-			ctx->cc_sector++;
+			ctx->cc_sector += sectors;
 			continue;
 		/*
 		 * The request was already processed (synchronously).
 		 */
 		case 0:
 			atomic_dec(&ctx->cc_pending);
-			ctx->cc_sector++;
+			ctx->cc_sector += sectors;
 			cond_resched();
 			continue;
 
@@ -1360,6 +1430,7 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 	struct convert_context *ctx = dmreq->ctx;
 	struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx);
 	struct crypt_config *cc = io->cc;
+	unsigned int i;
 
 	/*
 	 * A request from crypto driver backlog is going to be processed now,
@@ -1372,10 +1443,12 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 	}
 
 	if (!error && cc->iv_gen_ops && cc->iv_gen_ops->post)
-		error = cc->iv_gen_ops->post(cc, iv_of_dmreq(cc, dmreq), dmreq);
-
-	if (error < 0)
-		io->error = -EIO;
+		for (i = 0; i < dmreq->sector_count; i++) {
+			error = cc->iv_gen_ops->post(cc, iv_of_dmreq(cc, dmreq),
+						     i, dmreq);
+			if (error < 0)
+				io->error = -EIO;
+		}
 
 	crypt_free_req(cc, req_of_dmreq(cc, dmreq), io->base_bio);
 
@@ -1865,7 +1938,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 {
 	struct crypt_config *cc;
 	int key_size;
-	unsigned int opt_params;
+	unsigned int opt_params, iv_space;
 	unsigned long long tmpll;
 	int ret;
 	size_t iv_size_padding;
@@ -1900,8 +1973,9 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (ret < 0)
 		goto bad;
 
-	cc->dmreq_start = sizeof(struct skcipher_request);
-	cc->dmreq_start += crypto_skcipher_reqsize(any_tfm(cc));
+	cc->dmreq_start = sizeof(struct skcipher_bulk_request);
+	cc->dmreq_start += crypto_skcipher_bulk_reqsize(
+				any_tfm(cc), crypt_max_bulk_sectors(cc));
 	cc->dmreq_start = ALIGN(cc->dmreq_start, __alignof__(struct dm_crypt_request));
 
 	if (crypto_skcipher_alignmask(any_tfm(cc)) < CRYPTO_MINALIGN) {
@@ -1917,9 +1991,11 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		iv_size_padding = crypto_skcipher_alignmask(any_tfm(cc));
 	}
 
+	iv_space = cc->iv_size * crypt_max_bulk_sectors(cc);
+
 	ret = -ENOMEM;
 	cc->req_pool = mempool_create_kmalloc_pool(MIN_IOS, cc->dmreq_start +
-			sizeof(struct dm_crypt_request) + iv_size_padding + cc->iv_size);
+			sizeof(struct dm_crypt_request) + iv_size_padding + iv_space);
 	if (!cc->req_pool) {
 		ti->error = "Cannot allocate crypt request mempool";
 		goto bad;
@@ -1927,7 +2003,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	cc->per_bio_data_size = ti->per_io_data_size =
 		ALIGN(sizeof(struct dm_crypt_io) + cc->dmreq_start +
-		      sizeof(struct dm_crypt_request) + iv_size_padding + cc->iv_size,
+		      sizeof(struct dm_crypt_request) + iv_size_padding + iv_space,
 		      ARCH_KMALLOC_MINALIGN);
 
 	cc->page_pool = mempool_create_page_pool(BIO_MAX_PAGES, 0);
@@ -2067,7 +2143,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 
 	io = dm_per_bio_data(bio, cc->per_bio_data_size);
 	crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
-	io->ctx.req = (struct skcipher_request *)(io + 1);
+	io->ctx.req = (struct skcipher_bulk_request *)(io + 1);
 
 	if (bio_data_dir(io->base_bio) == READ) {
 		if (kcryptd_io_read(io, GFP_NOWAIT))
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support
  2017-01-12 12:59 ` [RFC PATCH 5/6] crypto: aesni-intel " Ondrej Mosnacek
@ 2017-01-13  3:19   ` Eric Biggers
  2017-01-13 11:27     ` Ondrej Mosnáček
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Biggers @ 2017-01-13  3:19 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

On Thu, Jan 12, 2017 at 01:59:57PM +0100, Ondrej Mosnacek wrote:
> This patch implements bulk request handling in the AES-NI crypto drivers.
> The major advantage of this is that with bulk requests, the kernel_fpu_*
> functions (which are usually quite slow) are now called only once for the whole
> request.
> 

Hi Ondrej,

To what extent does the performance benefit of this patchset result from just
the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()?

If it's most of the benefit, would it make any sense to optimize
kernel_fpu_begin() and kernel_fpu_end() instead?

And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where
the bulk API would provide a significant performance boost, can you mention
them?

Interestingly, the arm64 equivalent to kernel_fpu_begin()
(kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an
optimization where the SIMD registers aren't saved if they were already saved.
I wonder why something similar isn't done on x86.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
                   ` (5 preceding siblings ...)
  2017-01-12 12:59 ` [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support Ondrej Mosnacek
@ 2017-01-13 10:41 ` Herbert Xu
  2017-01-13 12:01   ` Ondrej Mosnáček
  6 siblings, 1 reply; 19+ messages in thread
From: Herbert Xu @ 2017-01-13 10:41 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

On Thu, Jan 12, 2017 at 01:59:52PM +0100, Ondrej Mosnacek wrote:
> 
> the goal of this patchset is to allow those skcipher API users that need to
> process batches of small messages (especially dm-crypt) to do so efficiently.

Please explain why this can't be done with the existing framework
using IV generators similar to the ones used for IPsec.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support
  2017-01-13  3:19   ` Eric Biggers
@ 2017-01-13 11:27     ` Ondrej Mosnáček
  0 siblings, 0 replies; 19+ messages in thread
From: Ondrej Mosnáček @ 2017-01-13 11:27 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

Hi Eric,

2017-01-13 4:19 GMT+01:00 Eric Biggers <ebiggers3@gmail.com>:
> To what extent does the performance benefit of this patchset result from just
> the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()?
>
> If it's most of the benefit, would it make any sense to optimize
> kernel_fpu_begin() and kernel_fpu_end() instead?
>
> And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where
> the bulk API would provide a significant performance boost, can you mention
> them?

In the case of AES-NI ciphers, this is the only benefit. However, this
change is not intended solely (or primarily) for AES-NI ciphers, but
also for other drivers that have a high per-request overhead.

This patchset is in fact a reaction to Binoy Jayan's efforts (see
[1]). The problem with small requests to HW crypto drivers comes up
for example in Qualcomm's Android [2], where they actually hacked
together their own version of dm-crypt (called 'dm-req-crypt'), which
in turn used a driver-specific crypto mode, which does the IV
generation on its own, and thereby is able to process several sectors
at once. The goal is to extend the crypto API so that vendors don't
have to roll out their own workarounds to have efficient disk
encryption.

> Interestingly, the arm64 equivalent to kernel_fpu_begin()
> (kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an
> optimization where the SIMD registers aren't saved if they were already saved.
> I wonder why something similar isn't done on x86.

AFAIK, there can't be done much about the kernel_fpu_* functions, see e.g. [3].

Regards,
Ondrej

[1] https://lkml.org/lkml/2016/12/20/111
[2] https://nelenkov.blogspot.com/2015/05/hardware-accelerated-disk-encryption-in.html
[3] https://lkml.org/lkml/2016/12/21/354

>
> Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-13 10:41 ` [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Herbert Xu
@ 2017-01-13 12:01   ` Ondrej Mosnáček
  2017-01-13 14:29     ` Herbert Xu
  2017-01-18 17:09     ` Binoy Jayan
  0 siblings, 2 replies; 19+ messages in thread
From: Ondrej Mosnáček @ 2017-01-13 12:01 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

2017-01-13 11:41 GMT+01:00 Herbert Xu <herbert@gondor.apana.org.au>:
> On Thu, Jan 12, 2017 at 01:59:52PM +0100, Ondrej Mosnacek wrote:
>> the goal of this patchset is to allow those skcipher API users that need to
>> process batches of small messages (especially dm-crypt) to do so efficiently.
>
> Please explain why this can't be done with the existing framework
> using IV generators similar to the ones used for IPsec.

As I already mentioned in another thread, there are basically two reasons:

1) Milan would like to add authenticated encryption support to
dm-crypt (see [1]) and as part of this change, a new random IV mode
would be introduced. This mode generates a random IV for each sector
write, includes it in the authenticated data and stores it in the
sector's metadata (in a separate part of the disk). In this case
dm-crypt will need to have control over the IV generation (or at least
be able to somehow retrieve it after the crypto operation... but
passing RNG responsibility to drivers doesn't seem to be a good idea
anyway).

2) With this API, drivers wouldn't have to provide implementations for
specific IV generation modes, and just implement bulk requests for the
common modes/algorithms (XTS, CBC, ...) while still getting
performance benefit.

Regards,
Ondrej

[1] https://www.redhat.com/archives/dm-devel/2017-January/msg00028.html

>
> Thanks,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-13 12:01   ` Ondrej Mosnáček
@ 2017-01-13 14:29     ` Herbert Xu
  2017-01-17 11:20       ` Ondrej Mosnáček
  2017-01-18 17:09     ` Binoy Jayan
  1 sibling, 1 reply; 19+ messages in thread
From: Herbert Xu @ 2017-01-13 14:29 UTC (permalink / raw)
  To: Ondrej Mosnáček
  Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

On Fri, Jan 13, 2017 at 01:01:56PM +0100, Ondrej Mosnáček wrote:
>
> As I already mentioned in another thread, there are basically two reasons:
> 
> 1) Milan would like to add authenticated encryption support to
> dm-crypt (see [1]) and as part of this change, a new random IV mode
> would be introduced. This mode generates a random IV for each sector
> write, includes it in the authenticated data and stores it in the
> sector's metadata (in a separate part of the disk). In this case
> dm-crypt will need to have control over the IV generation (or at least
> be able to somehow retrieve it after the crypto operation... but
> passing RNG responsibility to drivers doesn't seem to be a good idea
> anyway).

This sounds exactly like the IV generator for IPsec modes such as
CTR or GCM.  The only difference is that you deal with sectors
instead of packets.

> 2) With this API, drivers wouldn't have to provide implementations for
> specific IV generation modes, and just implement bulk requests for the
> common modes/algorithms (XTS, CBC, ...) while still getting
> performance benefit.

What if the driver had hardware support for generating these IVs?
With your scheme this cannot be supported at all.

Getting the IVs back is not actually that hard.  We could simply
change the algorithm definition for the IV generator so that
the IVs are embedded in the plaintext and ciphertext.  For
example, you could declare it so that the for n sectors the
first n*ivsize bytes would be the IV, and the actual plaintext
or ciphertext would follow.

With such a definition you could either generate the IVs in dm-crypt
or have them generated in the IV generator.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support
  2017-01-12 12:59 ` [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support Ondrej Mosnacek
@ 2017-01-16  8:37   ` Binoy Jayan
  2017-01-17 11:15     ` Ondrej Mosnáček
  0 siblings, 1 reply; 19+ messages in thread
From: Binoy Jayan @ 2017-01-16  8:37 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Mark Brown, Arnd Bergmann

Hi Ondrej,

On 12 January 2017 at 18:29, Ondrej Mosnacek <omosnacek@gmail.com> wrote:
> This patch converts dm-crypt to use bulk requests when invoking skcipher
> operations, allowing the crypto drivers to process multiple sectors at once,
> while reducing the overhead caused by the small sector size.
>
> The new code detects if multiple sectors from a bio are contigously stored
> within a single page (which should almost always be the case), and in such case
> processes all these sectors via a single bulk request.
>
> Note that the bio can also consist of several (likely consecutive) pages, which
> could be all bundled in a single request. However, since we need to specify an
> upper bound on how many sectors we are going to send at once (and this bound
> may affect the amount of memory allocated per single request), it is best to
> just limit the request bundling to a single page.

The initial goal of our proposal was to process the encryption requests with the
maximum possible block sizes with a hardware which has automated iv generation
capabilities. But when it is done in software, and if the bulk
requests are processed
sequentially, one block at a time, the memory foot print could be
reduced even if
the bulk request exceeds a page. While your patch looks good, there
are couple of
drawbacks one of which is the maximum size of a bulk request is a page. This
could limit the capability of the crypto hardware. If the whole bio is
processed at
once, which is what qualcomm's version of dm-req-crypt does, it achieves an even
better performance.

> Note that if the 'keycount' parameter of the cipher specification is set to a
> value other than 1, dm-crypt still sends only one sector in each request, since
> in such case the neighboring sectors are encrypted with different keys.

This could be avoided if the key management is done at the crypto layer.

Thanks,
Binoy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support
  2017-01-16  8:37   ` Binoy Jayan
@ 2017-01-17 11:15     ` Ondrej Mosnáček
  0 siblings, 0 replies; 19+ messages in thread
From: Ondrej Mosnáček @ 2017-01-17 11:15 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Mark Brown, Arnd Bergmann

Hi Binoy,

2017-01-16 9:37 GMT+01:00 Binoy Jayan <binoy.jayan@linaro.org>:
> The initial goal of our proposal was to process the encryption requests with the
> maximum possible block sizes with a hardware which has automated iv generation
> capabilities. But when it is done in software, and if the bulk
> requests are processed
> sequentially, one block at a time, the memory foot print could be
> reduced even if
> the bulk request exceeds a page. While your patch looks good, there
> are couple of
> drawbacks one of which is the maximum size of a bulk request is a page. This
> could limit the capability of the crypto hardware. If the whole bio is
> processed at
> once, which is what qualcomm's version of dm-req-crypt does, it achieves an even
> better performance.

I see... well, I added the limit only so that the async fallback
implementation can allocate multiple requests, so they can be
processed in parallel, as they would be in the current dm-crypt code.
I'm not really sure if that brings any benefit, but I guess if some HW
accelerator has multiple engines, then this allows distributing the
work among them. (I wonder how switching to the crypto API's IV
generation will affect the situation for drivers that can process
requests in parallel, but do not support the IV generators...)

I could remove the limit and switch the fallback to sequential
processing (or maybe even allocate the requests from a mempool, the
way dm-crypt does it now...), but after Herbert's feedback I'm
probably going to scrap this patchset anyway...

>> Note that if the 'keycount' parameter of the cipher specification is set to a
>> value other than 1, dm-crypt still sends only one sector in each request, since
>> in such case the neighboring sectors are encrypted with different keys.
>
> This could be avoided if the key management is done at the crypto layer.

Yes, but remember that the only reasonable use-case for using keycount
!= 1 is mounting loop-AES partitions (which is kind of a legacy
format, so there is not much point in making HW drivers for it). It is
an unfortunate consequence of Milan's decision to make keycount an
independent part of the cipher specification (instead of making it
specific for the LMK mode), that all the other IV modes are now
'polluted' with the requirement to support it.

I discussed with Milan the possibility of deprecating the keycount
parameter (i.e. allowing only value of 64 for LMK and 1 for all the
other IV modes) and then converting the IV modes to skciphers (or IV
generators, or some combination of both). This would significantly
simplify the key management and allow for better optimization
strategies. However, I don't know if such change would be accepted by
device-mapper maintainers, since it may break someone's unusual
dm-crypt configuration...

Cheers,
Ondrej

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-13 14:29     ` Herbert Xu
@ 2017-01-17 11:20       ` Ondrej Mosnáček
  2017-01-18  4:48         ` Herbert Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Ondrej Mosnáček @ 2017-01-17 11:20 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

2017-01-13 15:29 GMT+01:00 Herbert Xu <herbert@gondor.apana.org.au>:
> What if the driver had hardware support for generating these IVs?
> With your scheme this cannot be supported at all.

That's true... I'm starting to think that this isn't really a good
idea. I was mainly trying to keep the door open for the random IV
support and also to keep the multi-key stuff (which was really only
intended for loop-AES partition support) out of the crypto API, but
both of these can be probably solved in a better way...

> Getting the IVs back is not actually that hard.  We could simply
> change the algorithm definition for the IV generator so that
> the IVs are embedded in the plaintext and ciphertext.  For
> example, you could declare it so that the for n sectors the
> first n*ivsize bytes would be the IV, and the actual plaintext
> or ciphertext would follow.
>
> With such a definition you could either generate the IVs in dm-crypt
> or have them generated in the IV generator.

That seems kind of hacky to me... but if that's what you prefer, then so be it.

Cheers,
Ondrej

>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-17 11:20       ` Ondrej Mosnáček
@ 2017-01-18  4:48         ` Herbert Xu
  2017-01-19 14:21           ` Ondrej Mosnáček
  0 siblings, 1 reply; 19+ messages in thread
From: Herbert Xu @ 2017-01-18  4:48 UTC (permalink / raw)
  To: Ondrej Mosnáček
  Cc: Binoy Jayan, Mike Snitzer, dm-devel, Mikulas Patocka,
	linux-crypto, Milan Broz

On Tue, Jan 17, 2017 at 12:20:02PM +0100, Ondrej Mosnáček wrote:
> 2017-01-13 15:29 GMT+01:00 Herbert Xu <herbert@gondor.apana.org.au>:
> > What if the driver had hardware support for generating these IVs?
> > With your scheme this cannot be supported at all.
> 
> That's true... I'm starting to think that this isn't really a good
> idea. I was mainly trying to keep the door open for the random IV
> support and also to keep the multi-key stuff (which was really only
> intended for loop-AES partition support) out of the crypto API, but
> both of these can be probably solved in a better way...

As you said that the multi-key stuff is legacy-only I too would like
to see a way to keep that complexity out of the common path.

> > With such a definition you could either generate the IVs in dm-crypt
> > or have them generated in the IV generator.
> 
> That seems kind of hacky to me... but if that's what you prefer, then so be it.

I'm open to other proposals.  The basic requirement is to be able to
process multiple blocks as one entity at the driver level, potentially
generating the IVs there too.

It's essentially the equivalent to full IPsec offload.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-13 12:01   ` Ondrej Mosnáček
  2017-01-13 14:29     ` Herbert Xu
@ 2017-01-18 17:09     ` Binoy Jayan
  1 sibling, 0 replies; 19+ messages in thread
From: Binoy Jayan @ 2017-01-18 17:09 UTC (permalink / raw)
  To: Ondrej Mosnáček, Arnd Bergmann, Mark Brown
  Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka

Hi Milan,

On 13 January 2017 at 17:31, Ondrej Mosnáček <omosnacek@gmail.com> wrote:
> 2017-01-13 11:41 GMT+01:00 Herbert Xu <herbert@gondor.apana.org.au>:
>> On Thu, Jan 12, 2017 at 01:59:52PM +0100, Ondrej Mosnacek wrote:
>>> the goal of this patchset is to allow those skcipher API users that need to
>>> process batches of small messages (especially dm-crypt) to do so efficiently.
>>
>> Please explain why this can't be done with the existing framework
>> using IV generators similar to the ones used for IPsec.
>
> As I already mentioned in another thread, there are basically two reasons:
>
> 1) Milan would like to add authenticated encryption support to
> dm-crypt (see [1]) and as part of this change, a new random IV mode
> would be introduced. This mode generates a random IV for each sector
> write, includes it in the authenticated data and stores it in the
> sector's metadata (in a separate part of the disk). In this case
> dm-crypt will need to have control over the IV generation (or at least
> be able to somehow retrieve it after the crypto operation... but
> passing RNG responsibility to drivers doesn't seem to be a good idea
> anyway).
>
> 2) With this API, drivers wouldn't have to provide implementations for
> specific IV generation modes, and just implement bulk requests for the
> common modes/algorithms (XTS, CBC, ...) while still getting
> performance benefit.

I just sent out v3 for the dm-crypt changes I was working on. I
came across your patches for authenticated encryption support.
Although I haven't looked at it entirely, I was wondering how it could
be put together including the points Ondrej was mentioning. Will look at
it more. Please keep me in cc when you send out the next revision if
that is possible.

Thanks,
Binoy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-18  4:48         ` Herbert Xu
@ 2017-01-19 14:21           ` Ondrej Mosnáček
  2017-01-23 13:04             ` Herbert Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Ondrej Mosnáček @ 2017-01-19 14:21 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

2017-01-18 5:48 GMT+01:00 Herbert Xu <herbert@gondor.apana.org.au>:
> I'm open to other proposals.  The basic requirement is to be able to
> process multiple blocks as one entity at the driver level, potentially
> generating the IVs there too.
>
> It's essentially the equivalent to full IPsec offload.

Hm, I just looked at what the IPsec IV generation is actually doing
and it seems to me that it's basically a crypto template that just
somehow transforms the IV before it is passed to the child cipher... I
thought for a while that you were implying that there already is some
facility in the crypto API that allows submitting multiple messages +
some initial sequence number that is auto-incremented and IVs are
generated from the numbers. However, I could not find anything like
that in the code, so now I think what you meant was just that I should
somehow pull the actual IV generators into the crypto layer so that
the IVs can be generated inside the hardware.

If all you had in mind is just an equivalent of the current IPsec IV
generation (as I understood it), then my bulk request scheme can in
fact support it (you'd just pass sector numbers as the IVs). Of
course, it would require additional changes over my patchset,
specifically the creation of crypto templates for the dm-crypt IV
modes, so they can be implemented by drivers. However, I wanted to
avoid this until the key management in dm-crypt is simplified...

If we also want to let the drivers process an offset+count chunk of
sectors while auto-incrementing the sector number, then something like
Binoy's approach would indeed be necessary, where the IV generators
would be just regular skciphers, taking the initial sector number as
the IV (although a disadvantage would be hard-coded sector/message
size). Note, though, that the generic implementation of such transform
could still use bulk requests on the underlying cipher so that
encryption/decryption is performed efficiently even if there are no
optimized/HW drivers for the specific IV generator templates.

I will now try to focus on the key management simplification and when
it is accepted/rejected we can discuss further about the best
approach.

Cheers,
Ondrej

>
> Thanks,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt
  2017-01-19 14:21           ` Ondrej Mosnáček
@ 2017-01-23 13:04             ` Herbert Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2017-01-23 13:04 UTC (permalink / raw)
  To: Ondrej Mosnáček
  Cc: linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan

On Thu, Jan 19, 2017 at 03:21:37PM +0100, Ondrej Mosnáček wrote:
> 
> Hm, I just looked at what the IPsec IV generation is actually doing
> and it seems to me that it's basically a crypto template that just
> somehow transforms the IV before it is passed to the child cipher... I
> thought for a while that you were implying that there already is some
> facility in the crypto API that allows submitting multiple messages +
> some initial sequence number that is auto-incremented and IVs are
> generated from the numbers. However, I could not find anything like
> that in the code, so now I think what you meant was just that I should
> somehow pull the actual IV generators into the crypto layer so that
> the IVs can be generated inside the hardware.

IPsec currently only deals with one packet at a time, but the
point is that the IV generator handles everything transparently
and the IV is actually part of the cipher text for the AEAD op.

IOW it would be trivial to extend our current IPsec IV generators
to handle multiple packets as the IVs are embedded with the cipher
text.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-01-23 13:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 12:59 [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Ondrej Mosnacek
2017-01-12 12:59 ` [RFC PATCH 1/6] crypto: skcipher - Add bulk request processing API Ondrej Mosnacek
2017-01-12 12:59 ` [RFC PATCH 2/6] crypto: skcipher - Add bulk request support to walk Ondrej Mosnacek
2017-01-12 12:59 ` [RFC PATCH 3/6] crypto: cryptd - Add skcipher bulk request support Ondrej Mosnacek
2017-01-12 12:59 ` [RFC PATCH 4/6] crypto: simd - Add " Ondrej Mosnacek
2017-01-12 12:59 ` [RFC PATCH 5/6] crypto: aesni-intel " Ondrej Mosnacek
2017-01-13  3:19   ` Eric Biggers
2017-01-13 11:27     ` Ondrej Mosnáček
2017-01-12 12:59 ` [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support Ondrej Mosnacek
2017-01-16  8:37   ` Binoy Jayan
2017-01-17 11:15     ` Ondrej Mosnáček
2017-01-13 10:41 ` [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt Herbert Xu
2017-01-13 12:01   ` Ondrej Mosnáček
2017-01-13 14:29     ` Herbert Xu
2017-01-17 11:20       ` Ondrej Mosnáček
2017-01-18  4:48         ` Herbert Xu
2017-01-19 14:21           ` Ondrej Mosnáček
2017-01-23 13:04             ` Herbert Xu
2017-01-18 17:09     ` Binoy Jayan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).