All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] dm-crypt: parallel processing
@ 2012-08-21  9:08 Milan Broz
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Milan Broz @ 2012-08-21  9:08 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

Hello,

I promised to Mikulas that I will post remaining of his dm-crypt parallel
processing patchset (plus some related changes) with some comments.

The problem:

The current implementation (using per cpu struct) always use encryption
on the same CPU which submitted IO.

With very fast storage (usually SSD or MD RAID0) one CPU core can
be limiting and the throughput of encrypted disk is worse in
comparison with underlying storage.

Idea here is to distribute encryption to other CPU cores/CPUs.

(Side effect of patches is nice clean up dmcrypt code. :)

Mikulas Patocka (20):
  dm-crypt: remove per-cpu structure
  dm-crypt: use unbound workqueue for request processing
  dm-crypt: remove completion restart
  dm-crypt: use encryption threads
  dm-crypt: Unify spinlock
  dm-crypt: Introduce an option that sets the number of threads.
  dm-crypt: don't use write queue
  dm-crypt: simplify io queue
  dm-crypt: unify io_queue and crypt_queue
  dm-crypt: don't allocate pages for a partial request.
  dm-crypt: avoid deadlock in mempools
  dm-crypt: simplify cc_pending
  dm-crypt merge convert_context and dm_crypt_io
  dm-crypt: move error handling to crypt_convert.
  dm-crypt: remove io_pending field
  dm-crypt: small changes
  dm-crypt: move temporary values to stack
  dm-crypt: offload writes to thread
  dm-crypt: retain write ordering
  dm-crypt: sort writes

 drivers/md/dm-crypt.c |  838 +++++++++++++++++++++++++++----------------------
 1 file changed, 464 insertions(+), 374 deletions(-)


My notes:

I extensively tested this (on top of 3.6.0-rc2) and while I like
simplification and the main logic (if we have enough power why
not use other CPUs) I see several problems here.

1) The implementation is not much useful on modern CPUs with hw accelerated
AES (with AES-NI even one core can saturate very fast storage).
(Some optimized crypto behaves similar, like twofish optimized modules.)

2) The patchset targets linear access pattern mainly and one
process generating IOs. (With more processes/CPUs generating IOs
you get parallel processing even with current code.)

I can see significant improvement (~30%) for linear read
(if not using AES-NI) and if underlying storage is zero target
(ideal situation removing scheduler from the stack).

For random access pattern (and more IO producers) I cannot find
reliable improvement except ideal zero target case (where improvement
is always >30%). For more producers it doesn't help at all.
I tested RAID0 over SATA disks and very fast SSD on quad core cpu,
dmcrypt running with 1, 3 or 4 threads (and with cfq and noop scheduler)
using fio threaded test with 1 or 4 threads.

Notes to implementation:

1) Last two patches (19/20) provides sorting of IO requests, this
logically should be done in IO scheduler.

I don't think this should be in dmcrypt, if scheduler doesn't work
properly, it should be fixed or tuned for crypt access pattern.

2) Could be kernel workqueue used/fixed here instead? Basically all it needs
is to prefer submitting CPU, if it is busy just move work to another CPU.

3) It doesn't honour CPU hotplugging. Number of CPU is determined
in crypt constructor. If hotplugging is used for CPU power saving,
it will not work properly.

4) With debugging options on, everything is extremely slower
(Perhaps bug in some debug option, I just noticed this on Fedora
rawhide with all debug on.)


I posted it mainly because I think this should move forward,
whatever direction...

Milan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/20] dm-crypt: remove per-cpu structure
  2012-08-21  9:08 [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
@ 2012-08-21  9:09 ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 02/20] dm-crypt: use unbound workqueue for request processing Mikulas Patocka
                     ` (18 more replies)
  2012-08-21  9:37 ` [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
  2012-08-21 13:32 ` Mike Snitzer
  2 siblings, 19 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Remove per-cpu structure and make it per-convert_context instead.
This allows moving requests between different cpus.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   61 ++++++++++---------------------------------------
 1 file changed, 12 insertions(+), 49 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 664743d..7a3e8eb 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -18,7 +18,6 @@
 #include <linux/crypto.h>
 #include <linux/workqueue.h>
 #include <linux/backing-dev.h>
-#include <linux/percpu.h>
 #include <linux/atomic.h>
 #include <linux/scatterlist.h>
 #include <asm/page.h>
@@ -44,6 +43,7 @@ struct convert_context {
 	unsigned int idx_out;
 	sector_t cc_sector;
 	atomic_t cc_pending;
+	struct ablkcipher_request *req;
 };
 
 /*
@@ -105,15 +105,7 @@ struct iv_lmk_private {
 enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID };
 
 /*
- * Duplicated per-CPU state for cipher.
- */
-struct crypt_cpu {
-	struct ablkcipher_request *req;
-};
-
-/*
- * The fields in here must be read only after initialization,
- * changing state should be in crypt_cpu.
+ * The fields in here must be read only after initialization.
  */
 struct crypt_config {
 	struct dm_dev *dev;
@@ -143,12 +135,6 @@ struct crypt_config {
 	sector_t iv_offset;
 	unsigned int iv_size;
 
-	/*
-	 * Duplicated per cpu state. Access through
-	 * per_cpu_ptr() only.
-	 */
-	struct crypt_cpu __percpu *cpu;
-
 	/* ESSIV: struct crypto_cipher *essiv_tfm */
 	void *iv_private;
 	struct crypto_ablkcipher **tfms;
@@ -184,11 +170,6 @@ static void clone_init(struct dm_crypt_io *, struct bio *);
 static void kcryptd_queue_crypt(struct dm_crypt_io *io);
 static u8 *iv_of_dmreq(struct crypt_config *cc, struct dm_crypt_request *dmreq);
 
-static struct crypt_cpu *this_crypt_config(struct crypt_config *cc)
-{
-	return this_cpu_ptr(cc->cpu);
-}
-
 /*
  * Use this to access cipher attributes that are the same for each CPU.
  */
@@ -738,16 +719,15 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 static void crypt_alloc_req(struct crypt_config *cc,
 			    struct convert_context *ctx)
 {
-	struct crypt_cpu *this_cc = this_crypt_config(cc);
 	unsigned key_index = ctx->cc_sector & (cc->tfms_count - 1);
 
-	if (!this_cc->req)
-		this_cc->req = mempool_alloc(cc->req_pool, GFP_NOIO);
+	if (!ctx->req)
+		ctx->req = mempool_alloc(cc->req_pool, GFP_NOIO);
 
-	ablkcipher_request_set_tfm(this_cc->req, cc->tfms[key_index]);
-	ablkcipher_request_set_callback(this_cc->req,
+	ablkcipher_request_set_tfm(ctx->req, cc->tfms[key_index]);
+	ablkcipher_request_set_callback(ctx->req,
 	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-	    kcryptd_async_done, dmreq_of_req(cc, this_cc->req));
+	    kcryptd_async_done, dmreq_of_req(cc, ctx->req));
 }
 
 /*
@@ -756,7 +736,6 @@ static void crypt_alloc_req(struct crypt_config *cc,
 static int crypt_convert(struct crypt_config *cc,
 			 struct convert_context *ctx)
 {
-	struct crypt_cpu *this_cc = this_crypt_config(cc);
 	int r;
 
 	atomic_set(&ctx->cc_pending, 1);
@@ -768,7 +747,7 @@ static int crypt_convert(struct crypt_config *cc,
 
 		atomic_inc(&ctx->cc_pending);
 
-		r = crypt_convert_block(cc, ctx, this_cc->req);
+		r = crypt_convert_block(cc, ctx, ctx->req);
 
 		switch (r) {
 		/* async */
@@ -777,7 +756,7 @@ static int crypt_convert(struct crypt_config *cc,
 			INIT_COMPLETION(ctx->restart);
 			/* fall through*/
 		case -EINPROGRESS:
-			this_cc->req = NULL;
+			ctx->req = NULL;
 			ctx->cc_sector++;
 			continue;
 
@@ -885,6 +864,7 @@ static struct dm_crypt_io *crypt_io_alloc(struct crypt_config *cc,
 	io->sector = sector;
 	io->error = 0;
 	io->base_io = NULL;
+	io->ctx.req = NULL;
 	atomic_set(&io->io_pending, 0);
 
 	return io;
@@ -910,6 +890,8 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
 	if (!atomic_dec_and_test(&io->io_pending))
 		return;
 
+	if (io->ctx.req)
+		mempool_free(io->ctx.req, cc->req_pool);
 	mempool_free(io, cc->io_pool);
 
 	if (likely(!base_io))
@@ -1355,8 +1337,6 @@ static int crypt_wipe_key(struct crypt_config *cc)
 static void crypt_dtr(struct dm_target *ti)
 {
 	struct crypt_config *cc = ti->private;
-	struct crypt_cpu *cpu_cc;
-	int cpu;
 
 	ti->private = NULL;
 
@@ -1368,13 +1348,6 @@ static void crypt_dtr(struct dm_target *ti)
 	if (cc->crypt_queue)
 		destroy_workqueue(cc->crypt_queue);
 
-	if (cc->cpu)
-		for_each_possible_cpu(cpu) {
-			cpu_cc = per_cpu_ptr(cc->cpu, cpu);
-			if (cpu_cc->req)
-				mempool_free(cpu_cc->req, cc->req_pool);
-		}
-
 	crypt_free_tfms(cc);
 
 	if (cc->bs)
@@ -1393,9 +1366,6 @@ static void crypt_dtr(struct dm_target *ti)
 	if (cc->dev)
 		dm_put_device(ti, cc->dev);
 
-	if (cc->cpu)
-		free_percpu(cc->cpu);
-
 	kzfree(cc->cipher);
 	kzfree(cc->cipher_string);
 
@@ -1450,13 +1420,6 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 	if (tmp)
 		DMWARN("Ignoring unexpected additional cipher options");
 
-	cc->cpu = __alloc_percpu(sizeof(*(cc->cpu)),
-				 __alignof__(struct crypt_cpu));
-	if (!cc->cpu) {
-		ti->error = "Cannot allocate per cpu state";
-		goto bad_mem;
-	}
-
 	/*
 	 * For compatibility with the original dm-crypt mapping format, if
 	 * only the cipher name is supplied, use cbc-plain.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/20] dm-crypt: use unbound workqueue for request processing
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 03/20] dm-crypt: remove completion restart Mikulas Patocka
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Use unbound workqueue so that work is automatically ballanced between
available CPUs.

Note: workqueue implementation is somehow buggy, so it still does not yield
full parallelization across multiple CPUs.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 7a3e8eb..4c8b4e0 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1649,8 +1649,9 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	cc->crypt_queue = alloc_workqueue("kcryptd",
 					  WQ_NON_REENTRANT|
 					  WQ_CPU_INTENSIVE|
-					  WQ_MEM_RECLAIM,
-					  1);
+					  WQ_MEM_RECLAIM|
+					  WQ_UNBOUND,
+					  num_online_cpus());
 	if (!cc->crypt_queue) {
 		ti->error = "Couldn't create kcryptd queue";
 		goto bad;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/20] dm-crypt: remove completion restart
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 02/20] dm-crypt: use unbound workqueue for request processing Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 04/20] dm-crypt: use encryption threads Mikulas Patocka
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

It won't be used in this context anyway.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4c8b4e0..144c337 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -34,7 +34,6 @@
  * context holding the current state of a multi-part conversion
  */
 struct convert_context {
-	struct completion restart;
 	struct bio *bio_in;
 	struct bio *bio_out;
 	unsigned int offset_in;
@@ -636,7 +635,6 @@ static void crypt_convert_init(struct crypt_config *cc,
 	ctx->idx_in = bio_in ? bio_in->bi_idx : 0;
 	ctx->idx_out = bio_out ? bio_out->bi_idx : 0;
 	ctx->cc_sector = sector + cc->iv_offset;
-	init_completion(&ctx->restart);
 }
 
 static struct dm_crypt_request *dmreq_of_req(struct crypt_config *cc,
@@ -752,8 +750,6 @@ static int crypt_convert(struct crypt_config *cc,
 		switch (r) {
 		/* async */
 		case -EBUSY:
-			wait_for_completion(&ctx->restart);
-			INIT_COMPLETION(ctx->restart);
 			/* fall through*/
 		case -EINPROGRESS:
 			ctx->req = NULL;
@@ -1167,7 +1163,6 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 	struct crypt_config *cc = io->cc;
 
 	if (error == -EINPROGRESS) {
-		complete(&ctx->restart);
 		return;
 	}
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/20] dm-crypt: use encryption threads
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 02/20] dm-crypt: use unbound workqueue for request processing Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 03/20] dm-crypt: remove completion restart Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 05/20] dm-crypt: Unify spinlock Mikulas Patocka
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Use encryption threads, one per CPU, to improve dm-crypt parallelization.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |  228 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 176 insertions(+), 52 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 144c337..cb0e26f 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -17,6 +17,7 @@
 #include <linux/slab.h>
 #include <linux/crypto.h>
 #include <linux/workqueue.h>
+#include <linux/kthread.h>
 #include <linux/backing-dev.h>
 #include <linux/atomic.h>
 #include <linux/scatterlist.h>
@@ -30,6 +31,9 @@
 
 #define DM_MSG_PREFIX "crypt"
 
+#define DMREQ_PULL_BATCH			16
+#define DMREQ_PUSH_BATCH			16
+
 /*
  * context holding the current state of a multi-part conversion
  */
@@ -42,7 +46,6 @@ struct convert_context {
 	unsigned int idx_out;
 	sector_t cc_sector;
 	atomic_t cc_pending;
-	struct ablkcipher_request *req;
 };
 
 /*
@@ -62,10 +65,12 @@ struct dm_crypt_io {
 };
 
 struct dm_crypt_request {
+	struct list_head list;
 	struct convert_context *ctx;
 	struct scatterlist sg_in;
 	struct scatterlist sg_out;
 	sector_t iv_sector;
+	struct completion *busy_wait;
 };
 
 struct crypt_config;
@@ -121,6 +126,12 @@ struct crypt_config {
 
 	struct workqueue_struct *io_queue;
 	struct workqueue_struct *crypt_queue;
+	unsigned crypt_threads_size;
+	struct task_struct **crypt_threads;
+
+	wait_queue_head_t crypt_thread_wait;
+	spinlock_t crypt_thread_spinlock;
+	struct list_head crypt_thread_list;
 
 	char *cipher;
 	char *cipher_string;
@@ -656,9 +667,80 @@ static u8 *iv_of_dmreq(struct crypt_config *cc,
 		crypto_ablkcipher_alignmask(any_tfm(cc)) + 1);
 }
 
+static void kcryptd_async_done(struct crypto_async_request *async_req,
+			       int error);
+
+static int dmcrypt_thread(void *data)
+{
+	struct crypt_config *cc = data;
+	while (1) {
+		struct dm_crypt_request *dmreqs[DMREQ_PULL_BATCH];
+		unsigned n_dmreqs;
+		unsigned i;
+
+		DECLARE_WAITQUEUE(wait, current);
+
+		spin_lock(&cc->crypt_thread_spinlock);
+
+		if (!list_empty(&cc->crypt_thread_list))
+			goto pop_from_list;
+
+		__set_current_state(TASK_INTERRUPTIBLE);
+		add_wait_queue(&cc->crypt_thread_wait, &wait);
+
+		spin_unlock(&cc->crypt_thread_spinlock);
+
+		if (unlikely(kthread_should_stop())) {
+			set_task_state(current, TASK_RUNNING);
+			remove_wait_queue(&cc->crypt_thread_wait, &wait);
+			break;
+		}
+
+		schedule();
+
+		set_task_state(current, TASK_RUNNING);
+		remove_wait_queue(&cc->crypt_thread_wait, &wait);
+		continue;
+
+pop_from_list:
+		n_dmreqs = 0;
+		do {
+			struct dm_crypt_request *dmreq = container_of(
+						cc->crypt_thread_list.next,
+						struct dm_crypt_request, list);
+			list_del(&dmreq->list);
+			dmreqs[n_dmreqs++] = dmreq;
+		} while (n_dmreqs < DMREQ_PULL_BATCH &&
+			 !list_empty(&cc->crypt_thread_list));
+		spin_unlock(&cc->crypt_thread_spinlock);
+
+		i = 0;
+		do {
+			struct dm_crypt_request *dmreq = dmreqs[i];
+			struct ablkcipher_request *req = req_of_dmreq(cc, dmreq);
+			int r;
+			DECLARE_COMPLETION(busy_wait);
+			dmreq->busy_wait = &busy_wait;
+			if (bio_data_dir(dmreq->ctx->bio_in) == WRITE)
+				r = crypto_ablkcipher_encrypt(req);
+			else
+				r = crypto_ablkcipher_decrypt(req);
+			if (unlikely(r == -EBUSY)) {
+				wait_for_completion(&busy_wait);
+			} else if (likely(r != -EINPROGRESS)) {
+				struct crypto_async_request as_rq;
+				as_rq.data = dmreq;
+				kcryptd_async_done(&as_rq, r);
+			}
+		} while (++i < n_dmreqs);
+	}
+	return 0;
+}
+
 static int crypt_convert_block(struct crypt_config *cc,
 			       struct convert_context *ctx,
-			       struct ablkcipher_request *req)
+			       struct ablkcipher_request *req,
+			       struct list_head *batch)
 {
 	struct bio_vec *bv_in = bio_iovec_idx(ctx->bio_in, ctx->idx_in);
 	struct bio_vec *bv_out = bio_iovec_idx(ctx->bio_out, ctx->idx_out);
@@ -700,32 +782,34 @@ static int crypt_convert_block(struct crypt_config *cc,
 	ablkcipher_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out,
 				     1 << SECTOR_SHIFT, iv);
 
-	if (bio_data_dir(ctx->bio_in) == WRITE)
-		r = crypto_ablkcipher_encrypt(req);
-	else
-		r = crypto_ablkcipher_decrypt(req);
-
-	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
-		r = cc->iv_gen_ops->post(cc, iv, dmreq);
-
-	return r;
+	list_add_tail(&dmreq->list, batch);
+ 
+	return 0;
 }
 
-static void kcryptd_async_done(struct crypto_async_request *async_req,
-			       int error);
-
-static void crypt_alloc_req(struct crypt_config *cc,
-			    struct convert_context *ctx)
+static struct ablkcipher_request *crypt_alloc_req(struct crypt_config *cc,
+			    struct convert_context *ctx, gfp_t gfp_mask)
 {
 	unsigned key_index = ctx->cc_sector & (cc->tfms_count - 1);
+	struct ablkcipher_request *req = mempool_alloc(cc->req_pool, gfp_mask);
+	if (!req)
+		return NULL;
 
-	if (!ctx->req)
-		ctx->req = mempool_alloc(cc->req_pool, GFP_NOIO);
-
-	ablkcipher_request_set_tfm(ctx->req, cc->tfms[key_index]);
-	ablkcipher_request_set_callback(ctx->req,
+	ablkcipher_request_set_tfm(req, cc->tfms[key_index]);
+	ablkcipher_request_set_callback(req,
 	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-	    kcryptd_async_done, dmreq_of_req(cc, ctx->req));
+	    kcryptd_async_done, dmreq_of_req(cc, req));
+
+	return req;
+}
+
+static void crypt_flush_batch(struct crypt_config *cc, struct list_head *batch)
+{
+	spin_lock(&cc->crypt_thread_spinlock);
+	list_splice_tail(batch, &cc->crypt_thread_list);
+	spin_unlock(&cc->crypt_thread_spinlock);
+	wake_up_all(&cc->crypt_thread_wait);
+	INIT_LIST_HEAD(batch);
 }
 
 /*
@@ -735,42 +819,46 @@ static int crypt_convert(struct crypt_config *cc,
 			 struct convert_context *ctx)
 {
 	int r;
+	LIST_HEAD(batch);
+	unsigned batch_count = 0;
 
 	atomic_set(&ctx->cc_pending, 1);
 
 	while(ctx->idx_in < ctx->bio_in->bi_vcnt &&
 	      ctx->idx_out < ctx->bio_out->bi_vcnt) {
 
-		crypt_alloc_req(cc, ctx);
+		struct ablkcipher_request *req = crypt_alloc_req(cc, ctx, GFP_NOWAIT);
+		if (!req) {
+			/*
+			 * We must flush our request queue before we attempt
+			 * non-failing GFP_NOIO allocation.
+			 */
+			batch_count = 0;
+			crypt_flush_batch(cc, &batch);
+			req = crypt_alloc_req(cc, ctx, GFP_NOIO);
+		}
 
 		atomic_inc(&ctx->cc_pending);
 
-		r = crypt_convert_block(cc, ctx, ctx->req);
-
-		switch (r) {
-		/* async */
-		case -EBUSY:
-			/* fall through*/
-		case -EINPROGRESS:
-			ctx->req = NULL;
-			ctx->cc_sector++;
-			continue;
-
-		/* sync */
-		case 0:
+		r = crypt_convert_block(cc, ctx, req, &batch);
+		if (unlikely(r < 0)) {
 			atomic_dec(&ctx->cc_pending);
-			ctx->cc_sector++;
-			cond_resched();
-			continue;
+			goto flush_ret;
+		}
 
-		/* error */
-		default:
-			atomic_dec(&ctx->cc_pending);
-			return r;
+		ctx->sector++;
+
+		if (unlikely(++batch_count >= DMREQ_PUSH_BATCH)) {
+			batch_count = 0;
+			crypt_flush_batch(cc, &batch);
 		}
 	}
+	r = 0;
 
-	return 0;
+flush_ret:
+	crypt_flush_batch(cc, &batch);
+
+	return r;
 }
 
 static void dm_crypt_bio_destructor(struct bio *bio)
@@ -860,7 +948,6 @@ static struct dm_crypt_io *crypt_io_alloc(struct crypt_config *cc,
 	io->sector = sector;
 	io->error = 0;
 	io->base_io = NULL;
-	io->ctx.req = NULL;
 	atomic_set(&io->io_pending, 0);
 
 	return io;
@@ -886,8 +973,6 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
 	if (!atomic_dec_and_test(&io->io_pending))
 		return;
 
-	if (io->ctx.req)
-		mempool_free(io->ctx.req, cc->req_pool);
 	mempool_free(io, cc->io_pool);
 
 	if (likely(!base_io))
@@ -1163,6 +1248,7 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 	struct crypt_config *cc = io->cc;
 
 	if (error == -EINPROGRESS) {
+		complete(dmreq->busy_wait);
 		return;
 	}
 
@@ -1338,6 +1424,15 @@ static void crypt_dtr(struct dm_target *ti)
 	if (!cc)
 		return;
 
+	if (cc->crypt_threads) {
+		int i;
+		for (i = 0; i < cc->crypt_threads_size; i++) {
+			if (cc->crypt_threads[i])
+				kthread_stop(cc->crypt_threads[i]);
+		}
+		kfree(cc->crypt_threads);
+	}
+
 	if (cc->io_queue)
 		destroy_workqueue(cc->io_queue);
 	if (cc->crypt_queue)
@@ -1529,7 +1624,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	struct crypt_config *cc;
 	unsigned int key_size, opt_params;
 	unsigned long long tmpll;
-	int ret;
+	int i, ret;
 	struct dm_arg_set as;
 	const char *opt_string;
 	char dummy;
@@ -1643,15 +1738,44 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	cc->crypt_queue = alloc_workqueue("kcryptd",
 					  WQ_NON_REENTRANT|
-					  WQ_CPU_INTENSIVE|
-					  WQ_MEM_RECLAIM|
-					  WQ_UNBOUND,
-					  num_online_cpus());
+					  WQ_MEM_RECLAIM,
+					  1);
 	if (!cc->crypt_queue) {
 		ti->error = "Couldn't create kcryptd queue";
 		goto bad;
 	}
 
+	for (i = 0; i < NR_CPUS; i++)
+		if (cpu_online(i))
+			cc->crypt_threads_size = i + 1;
+
+	init_waitqueue_head(&cc->crypt_thread_wait);
+	spin_lock_init(&cc->crypt_thread_spinlock);
+	INIT_LIST_HEAD(&cc->crypt_thread_list);
+
+	cc->crypt_threads = kzalloc(cc->crypt_threads_size *
+				    sizeof(struct task_struct *), GFP_KERNEL);
+	if (!cc->crypt_threads) {
+		ti->error = "Couldn't allocate crypt threads";
+		goto bad;
+	}
+
+	for (i = 0; i < cc->crypt_threads_size; i++) {
+		if (cpu_online(i)) {
+			cc->crypt_threads[i] = kthread_create_on_node(
+				dmcrypt_thread, cc, cpu_to_node(i),
+				"dmcryptd/%d", i);
+			if (IS_ERR(cc->crypt_threads[i])) {
+				ret = PTR_ERR(cc->crypt_threads[i]);
+				cc->crypt_threads[i] = NULL;
+				ti->error = "Couldn't spawn thread";
+				goto bad;
+			}
+			kthread_bind(cc->crypt_threads[i], i);
+			wake_up_process(cc->crypt_threads[i]);
+		}
+	}
+
 	ti->num_flush_requests = 1;
 	ti->discard_zeroes_data_unsupported = true;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/20] dm-crypt: Unify spinlock
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (2 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 04/20] dm-crypt: use encryption threads Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 06/20] dm-crypt: Introduce an option that sets the number of threads Mikulas Patocka
                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Remove "crypt_thread_spinlock" and use wait queue spinlock
"crypt_thread_wait.spinlock" instead.

This saves few atomic operations in the encryption thread.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index cb0e26f..b251e15 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -130,7 +130,6 @@ struct crypt_config {
 	struct task_struct **crypt_threads;
 
 	wait_queue_head_t crypt_thread_wait;
-	spinlock_t crypt_thread_spinlock;
 	struct list_head crypt_thread_list;
 
 	char *cipher;
@@ -680,15 +679,16 @@ static int dmcrypt_thread(void *data)
 
 		DECLARE_WAITQUEUE(wait, current);
 
-		spin_lock(&cc->crypt_thread_spinlock);
+		spin_lock_irq(&cc->crypt_thread_wait.lock);
+continue_locked:
 
 		if (!list_empty(&cc->crypt_thread_list))
 			goto pop_from_list;
 
 		__set_current_state(TASK_INTERRUPTIBLE);
-		add_wait_queue(&cc->crypt_thread_wait, &wait);
+		__add_wait_queue(&cc->crypt_thread_wait, &wait);
 
-		spin_unlock(&cc->crypt_thread_spinlock);
+		spin_unlock_irq(&cc->crypt_thread_wait.lock);
 
 		if (unlikely(kthread_should_stop())) {
 			set_task_state(current, TASK_RUNNING);
@@ -699,8 +699,9 @@ static int dmcrypt_thread(void *data)
 		schedule();
 
 		set_task_state(current, TASK_RUNNING);
-		remove_wait_queue(&cc->crypt_thread_wait, &wait);
-		continue;
+		spin_lock_irq(&cc->crypt_thread_wait.lock);
+		__remove_wait_queue(&cc->crypt_thread_wait, &wait);
+		goto continue_locked;
 
 pop_from_list:
 		n_dmreqs = 0;
@@ -712,7 +713,8 @@ pop_from_list:
 			dmreqs[n_dmreqs++] = dmreq;
 		} while (n_dmreqs < DMREQ_PULL_BATCH &&
 			 !list_empty(&cc->crypt_thread_list));
-		spin_unlock(&cc->crypt_thread_spinlock);
+
+		spin_unlock_irq(&cc->crypt_thread_wait.lock);
 
 		i = 0;
 		do {
@@ -805,10 +807,10 @@ static struct ablkcipher_request *crypt_alloc_req(struct crypt_config *cc,
 
 static void crypt_flush_batch(struct crypt_config *cc, struct list_head *batch)
 {
-	spin_lock(&cc->crypt_thread_spinlock);
+	spin_lock_irq(&cc->crypt_thread_wait.lock);
 	list_splice_tail(batch, &cc->crypt_thread_list);
-	spin_unlock(&cc->crypt_thread_spinlock);
-	wake_up_all(&cc->crypt_thread_wait);
+	wake_up_locked(&cc->crypt_thread_wait);
+	spin_unlock_irq(&cc->crypt_thread_wait.lock);
 	INIT_LIST_HEAD(batch);
 }
 
@@ -1750,7 +1752,6 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 			cc->crypt_threads_size = i + 1;
 
 	init_waitqueue_head(&cc->crypt_thread_wait);
-	spin_lock_init(&cc->crypt_thread_spinlock);
 	INIT_LIST_HEAD(&cc->crypt_thread_list);
 
 	cc->crypt_threads = kzalloc(cc->crypt_threads_size *
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/20] dm-crypt: Introduce an option that sets the number of threads.
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (3 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 05/20] dm-crypt: Unify spinlock Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 07/20] dm-crypt: don't use write queue Mikulas Patocka
                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Introduce an option "num_cpus". It allows the user to set the number of
threads used for encryption. The value "0" means a default.

The default is the number of CPUs in the system, but at most 3.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   96 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 76 insertions(+), 20 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b251e15..fd2909d 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -34,6 +34,8 @@
 #define DMREQ_PULL_BATCH			16
 #define DMREQ_PUSH_BATCH			16
 
+#define DM_CRYPT_DEFAULT_CPUS			3
+
 /*
  * context holding the current state of a multi-part conversion
  */
@@ -127,6 +129,7 @@ struct crypt_config {
 	struct workqueue_struct *io_queue;
 	struct workqueue_struct *crypt_queue;
 	unsigned crypt_threads_size;
+	int num_threads_value;	/* the value entered in the arguments */
 	struct task_struct **crypt_threads;
 
 	wait_queue_head_t crypt_thread_wait;
@@ -1630,9 +1633,14 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	struct dm_arg_set as;
 	const char *opt_string;
 	char dummy;
+	int num_threads = -1;
 
 	static struct dm_arg _args[] = {
-		{0, 1, "Invalid number of feature args"},
+		{0, INT_MAX, "Invalid number of feature args"},
+	};
+
+	static struct dm_arg num_cpu_arg[] = {
+		{0, 65536, "Number of CPUs"},
 	};
 
 	if (argc < 5) {
@@ -1716,18 +1724,31 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		if (ret)
 			goto bad;
 
-		opt_string = dm_shift_arg(&as);
-
-		if (opt_params == 1 && opt_string &&
-		    !strcasecmp(opt_string, "allow_discards"))
-			ti->num_discard_requests = 1;
-		else if (opt_params) {
-			ret = -EINVAL;
-			ti->error = "Invalid feature arguments";
-			goto bad;
+		for (i = 0; i < opt_params; i++) {
+			opt_string = dm_shift_arg(&as);
+			if (!strcasecmp(opt_string, "allow_discards")) {
+				ti->num_discard_requests = 1;
+			} else if (!strcasecmp(opt_string, "num_cpus") && i + 1 < opt_params) {
+				ret = dm_read_arg(num_cpu_arg, &as, &num_threads, &ti->error);
+				if (ret)
+					goto bad;
+				i++;
+			} else {
+				ret = -EINVAL;
+				ti->error = "Invalid feature arguments";
+				goto bad;
+			}
 		}
 	}
 
+	cc->num_threads_value = num_threads;
+
+	if (num_threads <= 0) {
+		num_threads = num_online_cpus();
+		if (num_threads > DM_CRYPT_DEFAULT_CPUS)
+			num_threads = DM_CRYPT_DEFAULT_CPUS;
+	}
+
 	ret = -ENOMEM;
 	cc->io_queue = alloc_workqueue("kcryptd_io",
 				       WQ_NON_REENTRANT|
@@ -1747,9 +1768,15 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
-	for (i = 0; i < NR_CPUS; i++)
-		if (cpu_online(i))
-			cc->crypt_threads_size = i + 1;
+	if (num_threads == num_online_cpus()) {
+		for (i = 0; i < NR_CPUS; i++)
+			if (cpu_online(i))
+				cc->crypt_threads_size = i + 1;
+	} else {
+		if (num_threads > INT_MAX / sizeof(struct task_struct *))
+			num_threads = INT_MAX / sizeof(struct task_struct *);
+		cc->crypt_threads_size = num_threads;
+	}
 
 	init_waitqueue_head(&cc->crypt_thread_wait);
 	INIT_LIST_HEAD(&cc->crypt_thread_list);
@@ -1761,18 +1788,31 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
-	for (i = 0; i < cc->crypt_threads_size; i++) {
-		if (cpu_online(i)) {
-			cc->crypt_threads[i] = kthread_create_on_node(
-				dmcrypt_thread, cc, cpu_to_node(i),
-				"dmcryptd/%d", i);
+	if (num_threads == num_online_cpus())
+		for (i = 0; i < cc->crypt_threads_size; i++) {
+			if (cpu_online(i)) {
+				cc->crypt_threads[i] = kthread_create_on_node(
+					dmcrypt_thread, cc, cpu_to_node(i),
+					"dmcryptd/%d", i);
+				if (IS_ERR(cc->crypt_threads[i])) {
+					ret = PTR_ERR(cc->crypt_threads[i]);
+					cc->crypt_threads[i] = NULL;
+					ti->error = "Couldn't spawn thread";
+					goto bad;
+				}
+				kthread_bind(cc->crypt_threads[i], i);
+				wake_up_process(cc->crypt_threads[i]);
+			}
+	} else {
+		for (i = 0; i < cc->crypt_threads_size; i++) {
+			cc->crypt_threads[i] = kthread_create(
+				dmcrypt_thread, cc, "dmcryptd/%d", i);
 			if (IS_ERR(cc->crypt_threads[i])) {
 				ret = PTR_ERR(cc->crypt_threads[i]);
 				cc->crypt_threads[i] = NULL;
 				ti->error = "Couldn't spawn thread";
 				goto bad;
 			}
-			kthread_bind(cc->crypt_threads[i], i);
 			wake_up_process(cc->crypt_threads[i]);
 		}
 	}
@@ -1821,6 +1861,7 @@ static int crypt_status(struct dm_target *ti, status_type_t type,
 {
 	struct crypt_config *cc = ti->private;
 	unsigned int sz = 0;
+	unsigned int num_args;
 
 	switch (type) {
 	case STATUSTYPE_INFO:
@@ -1845,8 +1886,23 @@ static int crypt_status(struct dm_target *ti, status_type_t type,
 		DMEMIT(" %llu %s %llu", (unsigned long long)cc->iv_offset,
 				cc->dev->name, (unsigned long long)cc->start);
 
+		num_args = 0;
+		if (ti->num_discard_requests)
+			num_args++;
+
+		if (cc->num_threads_value >= 0)
+			num_args += 2;
+
+		if (!num_args)
+			break;
+
+		DMEMIT(" %u", num_args);
+
 		if (ti->num_discard_requests)
-			DMEMIT(" 1 allow_discards");
+			DMEMIT(" allow_discards");
+
+		if (cc->num_threads_value >= 0)
+			DMEMIT(" num_cpus %d", cc->num_threads_value);
 
 		break;
 	}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/20] dm-crypt: don't use write queue
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (4 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 06/20] dm-crypt: Introduce an option that sets the number of threads Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 08/20] dm-crypt: simplify io queue Mikulas Patocka
                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Don't use write queue and allocate write data directly from the request routine.
There is no need to bounce requests through a queue.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index fd2909d..4fd0d4b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1278,10 +1278,8 @@ static void kcryptd_crypt(struct work_struct *work)
 {
 	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
 
-	if (bio_data_dir(io->base_bio) == READ)
-		kcryptd_crypt_read_convert(io);
-	else
-		kcryptd_crypt_write_convert(io);
+	BUG_ON(bio_data_dir(io->base_bio) != READ);
+	kcryptd_crypt_read_convert(io);
 }
 
 static void kcryptd_queue_crypt(struct dm_crypt_io *io)
@@ -1850,8 +1848,9 @@ static int crypt_map(struct dm_target *ti, struct bio *bio,
 	if (bio_data_dir(io->base_bio) == READ) {
 		if (kcryptd_io_read(io, GFP_NOWAIT))
 			kcryptd_queue_io(io);
-	} else
-		kcryptd_queue_crypt(io);
+	} else {
+		kcryptd_crypt_write_convert(io);
+	}
 
 	return DM_MAPIO_SUBMITTED;
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/20] dm-crypt: simplify io queue
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (5 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 07/20] dm-crypt: don't use write queue Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 09/20] dm-crypt: unify io_queue and crypt_queue Mikulas Patocka
                     ` (11 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4fd0d4b..967d218 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1084,13 +1084,7 @@ static void kcryptd_io(struct work_struct *work)
 {
 	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
 
-	if (bio_data_dir(io->base_bio) == READ) {
-		crypt_inc_pending(io);
-		if (kcryptd_io_read(io, GFP_NOIO))
-			io->error = -ENOMEM;
-		crypt_dec_pending(io);
-	} else
-		kcryptd_io_write(io);
+	kcryptd_io_write(io);
 }
 
 static void kcryptd_queue_io(struct dm_crypt_io *io)
@@ -1846,8 +1840,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio,
 	io = crypt_io_alloc(cc, bio, dm_target_offset(ti, bio->bi_sector));
 
 	if (bio_data_dir(io->base_bio) == READ) {
-		if (kcryptd_io_read(io, GFP_NOWAIT))
-			kcryptd_queue_io(io);
+		kcryptd_io_read(io, GFP_NOIO);
 	} else {
 		kcryptd_crypt_write_convert(io);
 	}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/20] dm-crypt: unify io_queue and crypt_queue
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (6 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 08/20] dm-crypt: simplify io queue Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 10/20] dm-crypt: don't allocate pages for a partial request Mikulas Patocka
                     ` (10 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Unify these two request queues. There is no need to have two queues, one
is enough. No encryption operations are done in the queue anyway.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 967d218..d6306f1 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -127,7 +127,6 @@ struct crypt_config {
 	struct bio_set *bs;
 
 	struct workqueue_struct *io_queue;
-	struct workqueue_struct *crypt_queue;
 	unsigned crypt_threads_size;
 	int num_threads_value;	/* the value entered in the arguments */
 	struct task_struct **crypt_threads;
@@ -1281,7 +1280,7 @@ static void kcryptd_queue_crypt(struct dm_crypt_io *io)
 	struct crypt_config *cc = io->cc;
 
 	INIT_WORK(&io->work, kcryptd_crypt);
-	queue_work(cc->crypt_queue, &io->work);
+	queue_work(cc->io_queue, &io->work);
 }
 
 /*
@@ -1432,8 +1431,6 @@ static void crypt_dtr(struct dm_target *ti)
 
 	if (cc->io_queue)
 		destroy_workqueue(cc->io_queue);
-	if (cc->crypt_queue)
-		destroy_workqueue(cc->crypt_queue);
 
 	crypt_free_tfms(cc);
 
@@ -1751,15 +1748,6 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
-	cc->crypt_queue = alloc_workqueue("kcryptd",
-					  WQ_NON_REENTRANT|
-					  WQ_MEM_RECLAIM,
-					  1);
-	if (!cc->crypt_queue) {
-		ti->error = "Couldn't create kcryptd queue";
-		goto bad;
-	}
-
 	if (num_threads == num_online_cpus()) {
 		for (i = 0; i < NR_CPUS; i++)
 			if (cpu_online(i))
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/20] dm-crypt: don't allocate pages for a partial request.
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (7 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 09/20] dm-crypt: unify io_queue and crypt_queue Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 11/20] dm-crypt: avoid deadlock in mempools Mikulas Patocka
                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

This patch changes crypt_alloc_buffer so that it always allocates pages for
a full request.

This change enables further simplification and removing of one refcounts
in the next patches.

Note: the next patch is needed to fix a theoretical deadlock

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |  134 ++++++++++---------------------------------------
 1 file changed, 26 insertions(+), 108 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index d6306f1..7f277ef 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -63,7 +63,6 @@ struct dm_crypt_io {
 	atomic_t io_pending;
 	int error;
 	sector_t sector;
-	struct dm_crypt_io *base_io;
 };
 
 struct dm_crypt_request {
@@ -173,7 +172,6 @@ struct crypt_config {
 };
 
 #define MIN_IOS        16
-#define MIN_POOL_PAGES 32
 
 static struct kmem_cache *_crypt_io_pool;
 
@@ -873,14 +871,13 @@ static void dm_crypt_bio_destructor(struct bio *bio)
 	bio_free(bio, cc->bs);
 }
 
+static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
+
 /*
  * Generate a new unfragmented bio with the given size
  * This should never violate the device limitations
- * May return a smaller bio when running out of pages, indicated by
- * *out_of_pages set to 1.
  */
-static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size,
-				      unsigned *out_of_pages)
+static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 {
 	struct crypt_config *cc = io->cc;
 	struct bio *clone;
@@ -894,37 +891,22 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size,
 		return NULL;
 
 	clone_init(io, clone);
-	*out_of_pages = 0;
 
 	for (i = 0; i < nr_iovecs; i++) {
 		page = mempool_alloc(cc->page_pool, gfp_mask);
-		if (!page) {
-			*out_of_pages = 1;
-			break;
-		}
-
-		/*
-		 * If additional pages cannot be allocated without waiting,
-		 * return a partially-allocated bio.  The caller will then try
-		 * to allocate more bios while submitting this partial bio.
-		 */
-		gfp_mask = (gfp_mask | __GFP_NOWARN) & ~__GFP_WAIT;
 
 		len = (size > PAGE_SIZE) ? PAGE_SIZE : size;
 
 		if (!bio_add_page(clone, page, len, 0)) {
 			mempool_free(page, cc->page_pool);
-			break;
+			crypt_free_buffer_pages(cc, clone);
+			bio_put(clone);
+			return NULL;
 		}
 
 		size -= len;
 	}
 
-	if (!clone->bi_size) {
-		bio_put(clone);
-		return NULL;
-	}
-
 	return clone;
 }
 
@@ -951,7 +933,6 @@ static struct dm_crypt_io *crypt_io_alloc(struct crypt_config *cc,
 	io->base_bio = bio;
 	io->sector = sector;
 	io->error = 0;
-	io->base_io = NULL;
 	atomic_set(&io->io_pending, 0);
 
 	return io;
@@ -965,13 +946,11 @@ static void crypt_inc_pending(struct dm_crypt_io *io)
 /*
  * One of the bios was finished. Check for completion of
  * the whole request and correctly clean up the buffer.
- * If base_io is set, wait for the last fragment to complete.
  */
 static void crypt_dec_pending(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 	struct bio *base_bio = io->base_bio;
-	struct dm_crypt_io *base_io = io->base_io;
 	int error = io->error;
 
 	if (!atomic_dec_and_test(&io->io_pending))
@@ -979,13 +958,7 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
 
 	mempool_free(io, cc->io_pool);
 
-	if (likely(!base_io))
-		bio_endio(base_bio, error);
-	else {
-		if (error && !base_io->error)
-			base_io->error = error;
-		crypt_dec_pending(base_io);
-	}
+	bio_endio(base_bio, error);
 }
 
 /*
@@ -1121,9 +1094,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 	struct bio *clone;
-	struct dm_crypt_io *new_io;
 	int crypt_finished;
-	unsigned out_of_pages = 0;
 	unsigned remaining = io->base_bio->bi_size;
 	sector_t sector = io->sector;
 	int r;
@@ -1134,81 +1105,28 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	crypt_inc_pending(io);
 	crypt_convert_init(cc, &io->ctx, NULL, io->base_bio, sector);
 
-	/*
-	 * The allocated buffers can be smaller than the whole bio,
-	 * so repeat the whole process until all the data can be handled.
-	 */
-	while (remaining) {
-		clone = crypt_alloc_buffer(io, remaining, &out_of_pages);
-		if (unlikely(!clone)) {
-			io->error = -ENOMEM;
-			break;
-		}
-
-		io->ctx.bio_out = clone;
-		io->ctx.idx_out = 0;
-
-		remaining -= clone->bi_size;
-		sector += bio_sectors(clone);
-
-		crypt_inc_pending(io);
-
-		r = crypt_convert(cc, &io->ctx);
-		if (r < 0)
-			io->error = -EIO;
-
-		crypt_finished = atomic_dec_and_test(&io->ctx.cc_pending);
-
-		/* Encryption was already finished, submit io now */
-		if (crypt_finished) {
-			kcryptd_crypt_write_io_submit(io, 0);
-
-			/*
-			 * If there was an error, do not try next fragments.
-			 * For async, error is processed in async handler.
-			 */
-			if (unlikely(r < 0))
-				break;
-
-			io->sector = sector;
-		}
-
-		/*
-		 * Out of memory -> run queues
-		 * But don't wait if split was due to the io size restriction
-		 */
-		if (unlikely(out_of_pages))
-			congestion_wait(BLK_RW_ASYNC, HZ/100);
+	clone = crypt_alloc_buffer(io, remaining);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		goto dec;
+	}
 
-		/*
-		 * With async crypto it is unsafe to share the crypto context
-		 * between fragments, so switch to a new dm_crypt_io structure.
-		 */
-		if (unlikely(!crypt_finished && remaining)) {
-			new_io = crypt_io_alloc(io->cc, io->base_bio,
-						sector);
-			crypt_inc_pending(new_io);
-			crypt_convert_init(cc, &new_io->ctx, NULL,
-					   io->base_bio, sector);
-			new_io->ctx.idx_in = io->ctx.idx_in;
-			new_io->ctx.offset_in = io->ctx.offset_in;
+	io->ctx.bio_out = clone;
+	io->ctx.idx_out = 0;
 
-			/*
-			 * Fragments after the first use the base_io
-			 * pending count.
-			 */
-			if (!io->base_io)
-				new_io->base_io = io;
-			else {
-				new_io->base_io = io->base_io;
-				crypt_inc_pending(io->base_io);
-				crypt_dec_pending(io);
-			}
+	remaining -= clone->bi_size;
+	sector += bio_sectors(clone);
 
-			io = new_io;
-		}
-	}
+	crypt_inc_pending(io);
+	r = crypt_convert(cc, &io->ctx);
+	if (r)
+		io->error = -EIO;
+	crypt_finished = atomic_dec_and_test(&io->ctx.cc_pending);
 
+	/* Encryption was already finished, submit io now */
+	if (crypt_finished)
+		kcryptd_crypt_write_io_submit(io, 0);
+dec:
 	crypt_dec_pending(io);
 }
 
@@ -1671,7 +1589,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
-	cc->page_pool = mempool_create_page_pool(MIN_POOL_PAGES, 0);
+	cc->page_pool = mempool_create_page_pool(BIO_MAX_PAGES, 0);
 	if (!cc->page_pool) {
 		ti->error = "Cannot allocate page mempool";
 		goto bad;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/20] dm-crypt: avoid deadlock in mempools
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (8 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 10/20] dm-crypt: don't allocate pages for a partial request Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 12/20] dm-crypt: simplify cc_pending Mikulas Patocka
                     ` (8 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

The function crypt_alloc_buffer may be called concurrently. If we allocate
from the mempool concurrently, there is a possibility of deadlock.
For example, if we have mempool of 256 pages, two processes, each wanting 256,
pages allocate from the mempool concurrently, it may deadlock in a situation
where both processes have allocated 128 pages and the mempool is exhausted.

In order to avoid this scenarios, we allocate the pages under a mutex.

In order to not degrade performance with excessive locking, we try
non-blocking allocations without a mutex first and if it fails, we fallback
to a blocking allocation with a mutex.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 7f277ef..36bc19c 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -124,6 +124,7 @@ struct crypt_config {
 	mempool_t *req_pool;
 	mempool_t *page_pool;
 	struct bio_set *bs;
+	struct mutex bio_alloc_lock;
 
 	struct workqueue_struct *io_queue;
 	unsigned crypt_threads_size;
@@ -876,24 +877,46 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
 /*
  * Generate a new unfragmented bio with the given size
  * This should never violate the device limitations
+ *
+ * This function may be called concurrently. If we allocate from the mempool
+ * concurrently, there is a possibility of deadlock. For example, if we have
+ * mempool of 256 pages, two processes, each wanting 256, pages allocate from
+ * the mempool concurrently, it may deadlock in a situation where both processes
+ * have allocated 128 pages and the mempool is exhausted.
+ *
+ * In order to avoid this scenarios, we allocate the pages under a mutex.
+ *
+ * In order to not degrade performance with excessive locking, we try
+ * non-blocking allocations without a mutex first and if it fails, we fallback
+ * to a blocking allocation with a mutex.
  */
 static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 {
 	struct crypt_config *cc = io->cc;
 	struct bio *clone;
 	unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	gfp_t gfp_mask = GFP_NOIO | __GFP_HIGHMEM;
+	gfp_t gfp_mask = GFP_NOWAIT | __GFP_HIGHMEM;
 	unsigned i, len;
 	struct page *page;
 
+retry:
+	if (unlikely(gfp_mask & __GFP_WAIT))
+		mutex_lock(&cc->bio_alloc_lock);
+
 	clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
 	if (!clone)
-		return NULL;
+		goto return_clone;
 
 	clone_init(io, clone);
 
 	for (i = 0; i < nr_iovecs; i++) {
 		page = mempool_alloc(cc->page_pool, gfp_mask);
+		if (!page) {
+			crypt_free_buffer_pages(cc, clone);
+			bio_put(clone);
+			gfp_mask |= __GFP_WAIT;
+			goto retry;
+		}
 
 		len = (size > PAGE_SIZE) ? PAGE_SIZE : size;
 
@@ -901,12 +924,17 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 			mempool_free(page, cc->page_pool);
 			crypt_free_buffer_pages(cc, clone);
 			bio_put(clone);
-			return NULL;
+			clone = NULL;
+			goto return_clone;
 		}
 
 		size -= len;
 	}
 
+return_clone:
+	if (unlikely(gfp_mask & __GFP_WAIT))
+		mutex_unlock(&cc->bio_alloc_lock);
+
 	return clone;
 }
 
@@ -1601,6 +1629,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
+	mutex_init(&cc->bio_alloc_lock);
+
 	ret = -EINVAL;
 	if (sscanf(argv[2], "%llu%c", &tmpll, &dummy) != 1) {
 		ti->error = "Invalid iv_offset sector";
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/20] dm-crypt: simplify cc_pending
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (9 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 11/20] dm-crypt: avoid deadlock in mempools Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 13/20] dm-crypt merge convert_context and dm_crypt_io Mikulas Patocka
                     ` (7 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

This patch removes one extra cc_pending reference from crypt_convert.
Now, cc_pending represents real number of pending operations.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   61 +++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 32 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 36bc19c..36c9087 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -815,6 +815,20 @@ static void crypt_flush_batch(struct crypt_config *cc, struct list_head *batch)
 	INIT_LIST_HEAD(batch);
 }
 
+static void crypt_dec_pending(struct dm_crypt_io *io);
+static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async);
+
+static void crypt_dec_cc_pending(struct dm_crypt_io *io)
+{
+	if (!atomic_dec_and_test(&io->ctx.cc_pending))
+		return;
+
+	if (bio_data_dir(io->base_bio) == READ)
+		crypt_dec_pending(io);
+	else
+		kcryptd_crypt_write_io_submit(io, 1);
+}
+
 /*
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
@@ -827,9 +841,7 @@ static int crypt_convert(struct crypt_config *cc,
 
 	atomic_set(&ctx->cc_pending, 1);
 
-	while(ctx->idx_in < ctx->bio_in->bi_vcnt &&
-	      ctx->idx_out < ctx->bio_out->bi_vcnt) {
-
+	while (1) {
 		struct ablkcipher_request *req = crypt_alloc_req(cc, ctx, GFP_NOWAIT);
 		if (!req) {
 			/*
@@ -841,25 +853,30 @@ static int crypt_convert(struct crypt_config *cc,
 			req = crypt_alloc_req(cc, ctx, GFP_NOIO);
 		}
 
-		atomic_inc(&ctx->cc_pending);
-
 		r = crypt_convert_block(cc, ctx, req, &batch);
 		if (unlikely(r < 0)) {
-			atomic_dec(&ctx->cc_pending);
-			goto flush_ret;
+			crypt_flush_batch(cc, &batch);
+			crypt_dec_cc_pending(container_of(ctx, struct dm_crypt_io, ctx));
+			goto ret;
 		}
 
 		ctx->sector++;
 
-		if (unlikely(++batch_count >= DMREQ_PUSH_BATCH)) {
-			batch_count = 0;
-			crypt_flush_batch(cc, &batch);
+		if (ctx->idx_in < ctx->bio_in->bi_vcnt &&
+		    ctx->idx_out < ctx->bio_out->bi_vcnt) {
+			atomic_inc(&ctx->cc_pending);
+			if (unlikely(++batch_count >= DMREQ_PUSH_BATCH)) {
+				batch_count = 0;
+				crypt_flush_batch(cc, &batch);
+			}
+			continue;
 		}
+		break;
 	}
 	r = 0;
 
-flush_ret:
 	crypt_flush_batch(cc, &batch);
+ret:
 
 	return r;
 }
@@ -1122,7 +1139,6 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 	struct bio *clone;
-	int crypt_finished;
 	unsigned remaining = io->base_bio->bi_size;
 	sector_t sector = io->sector;
 	int r;
@@ -1149,20 +1165,10 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	r = crypt_convert(cc, &io->ctx);
 	if (r)
 		io->error = -EIO;
-	crypt_finished = atomic_dec_and_test(&io->ctx.cc_pending);
-
-	/* Encryption was already finished, submit io now */
-	if (crypt_finished)
-		kcryptd_crypt_write_io_submit(io, 0);
 dec:
 	crypt_dec_pending(io);
 }
 
-static void kcryptd_crypt_read_done(struct dm_crypt_io *io)
-{
-	crypt_dec_pending(io);
-}
-
 static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
@@ -1177,9 +1183,6 @@ static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 	if (r < 0)
 		io->error = -EIO;
 
-	if (atomic_dec_and_test(&io->ctx.cc_pending))
-		kcryptd_crypt_read_done(io);
-
 	crypt_dec_pending(io);
 }
 
@@ -1204,13 +1207,7 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 
 	mempool_free(req_of_dmreq(cc, dmreq), cc->req_pool);
 
-	if (!atomic_dec_and_test(&ctx->cc_pending))
-		return;
-
-	if (bio_data_dir(io->base_bio) == READ)
-		kcryptd_crypt_read_done(io);
-	else
-		kcryptd_crypt_write_io_submit(io, 1);
+	crypt_dec_cc_pending(io);
 }
 
 static void kcryptd_crypt(struct work_struct *work)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/20] dm-crypt merge convert_context and dm_crypt_io
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (10 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 12/20] dm-crypt: simplify cc_pending Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 14/20] dm-crypt: move error handling to crypt_convert Mikulas Patocka
                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

There is one-to-one relationship between convert_context and dm_crypt_io,
so we can merge these structures into one and simplify the code.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |  120 +++++++++++++++++++++++--------------------------
 1 file changed, 56 insertions(+), 64 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 36c9087..bb11a95 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -37,9 +37,13 @@
 #define DM_CRYPT_DEFAULT_CPUS			3
 
 /*
- * context holding the current state of a multi-part conversion
+ * per bio private data
  */
-struct convert_context {
+struct dm_crypt_io {
+	struct crypt_config *cc;
+	struct bio *base_bio;
+	struct work_struct work;
+
 	struct bio *bio_in;
 	struct bio *bio_out;
 	unsigned int offset_in;
@@ -48,17 +52,6 @@ struct convert_context {
 	unsigned int idx_out;
 	sector_t cc_sector;
 	atomic_t cc_pending;
-};
-
-/*
- * per bio private data
- */
-struct dm_crypt_io {
-	struct crypt_config *cc;
-	struct bio *base_bio;
-	struct work_struct work;
-
-	struct convert_context ctx;
 
 	atomic_t io_pending;
 	int error;
@@ -67,7 +60,7 @@ struct dm_crypt_io {
 
 struct dm_crypt_request {
 	struct list_head list;
-	struct convert_context *ctx;
+	struct dm_crypt_io *io;
 	struct scatterlist sg_in;
 	struct scatterlist sg_out;
 	sector_t iv_sector;
@@ -569,7 +562,7 @@ static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 *iv,
 	u8 *src;
 	int r = 0;
 
-	if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) {
+	if (bio_data_dir(dmreq->io->bio_in) == WRITE) {
 		src = kmap_atomic(sg_page(&dmreq->sg_in));
 		r = crypt_iv_lmk_one(cc, iv, dmreq, src + dmreq->sg_in.offset);
 		kunmap_atomic(src);
@@ -585,7 +578,7 @@ static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv,
 	u8 *dst;
 	int r;
 
-	if (bio_data_dir(dmreq->ctx->bio_in) == WRITE)
+	if (bio_data_dir(dmreq->io->bio_in) == WRITE)
 		return 0;
 
 	dst = kmap_atomic(sg_page(&dmreq->sg_out));
@@ -635,17 +628,17 @@ static struct crypt_iv_operations crypt_iv_lmk_ops = {
 };
 
 static void crypt_convert_init(struct crypt_config *cc,
-			       struct convert_context *ctx,
+			       struct dm_crypt_io *io,
 			       struct bio *bio_out, struct bio *bio_in,
 			       sector_t sector)
 {
-	ctx->bio_in = bio_in;
-	ctx->bio_out = bio_out;
-	ctx->offset_in = 0;
-	ctx->offset_out = 0;
-	ctx->idx_in = bio_in ? bio_in->bi_idx : 0;
-	ctx->idx_out = bio_out ? bio_out->bi_idx : 0;
-	ctx->cc_sector = sector + cc->iv_offset;
+	io->bio_in = bio_in;
+	io->bio_out = bio_out;
+	io->offset_in = 0;
+	io->offset_out = 0;
+	io->idx_in = bio_in ? bio_in->bi_idx : 0;
+	io->idx_out = bio_out ? bio_out->bi_idx : 0;
+	io->cc_sector = sector + cc->iv_offset;
 }
 
 static struct dm_crypt_request *dmreq_of_req(struct crypt_config *cc,
@@ -724,7 +717,7 @@ pop_from_list:
 			int r;
 			DECLARE_COMPLETION(busy_wait);
 			dmreq->busy_wait = &busy_wait;
-			if (bio_data_dir(dmreq->ctx->bio_in) == WRITE)
+			if (bio_data_dir(dmreq->io->bio_in) == WRITE)
 				r = crypto_ablkcipher_encrypt(req);
 			else
 				r = crypto_ablkcipher_decrypt(req);
@@ -741,12 +734,12 @@ pop_from_list:
 }
 
 static int crypt_convert_block(struct crypt_config *cc,
-			       struct convert_context *ctx,
+			       struct dm_crypt_io *io,
 			       struct ablkcipher_request *req,
 			       struct list_head *batch)
 {
-	struct bio_vec *bv_in = bio_iovec_idx(ctx->bio_in, ctx->idx_in);
-	struct bio_vec *bv_out = bio_iovec_idx(ctx->bio_out, ctx->idx_out);
+	struct bio_vec *bv_in = bio_iovec_idx(io->bio_in, io->idx_in);
+	struct bio_vec *bv_out = bio_iovec_idx(io->bio_out, io->idx_out);
 	struct dm_crypt_request *dmreq;
 	u8 *iv;
 	int r;
@@ -754,26 +747,26 @@ static int crypt_convert_block(struct crypt_config *cc,
 	dmreq = dmreq_of_req(cc, req);
 	iv = iv_of_dmreq(cc, dmreq);
 
-	dmreq->iv_sector = ctx->cc_sector;
-	dmreq->ctx = ctx;
+	dmreq->iv_sector = io->cc_sector;
+	dmreq->io = io;
 	sg_init_table(&dmreq->sg_in, 1);
 	sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << SECTOR_SHIFT,
-		    bv_in->bv_offset + ctx->offset_in);
+		    bv_in->bv_offset + io->offset_in);
 
 	sg_init_table(&dmreq->sg_out, 1);
 	sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << SECTOR_SHIFT,
-		    bv_out->bv_offset + ctx->offset_out);
+		    bv_out->bv_offset + io->offset_out);
 
-	ctx->offset_in += 1 << SECTOR_SHIFT;
-	if (ctx->offset_in >= bv_in->bv_len) {
-		ctx->offset_in = 0;
-		ctx->idx_in++;
+	io->offset_in += 1 << SECTOR_SHIFT;
+	if (io->offset_in >= bv_in->bv_len) {
+		io->offset_in = 0;
+		io->idx_in++;
 	}
 
-	ctx->offset_out += 1 << SECTOR_SHIFT;
-	if (ctx->offset_out >= bv_out->bv_len) {
-		ctx->offset_out = 0;
-		ctx->idx_out++;
+	io->offset_out += 1 << SECTOR_SHIFT;
+	if (io->offset_out >= bv_out->bv_len) {
+		io->offset_out = 0;
+		io->idx_out++;
 	}
 
 	if (cc->iv_gen_ops) {
@@ -791,9 +784,9 @@ static int crypt_convert_block(struct crypt_config *cc,
 }
 
 static struct ablkcipher_request *crypt_alloc_req(struct crypt_config *cc,
-			    struct convert_context *ctx, gfp_t gfp_mask)
+			    struct dm_crypt_io *io, gfp_t gfp_mask)
 {
-	unsigned key_index = ctx->cc_sector & (cc->tfms_count - 1);
+	unsigned key_index = io->cc_sector & (cc->tfms_count - 1);
 	struct ablkcipher_request *req = mempool_alloc(cc->req_pool, gfp_mask);
 	if (!req)
 		return NULL;
@@ -820,7 +813,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async);
 
 static void crypt_dec_cc_pending(struct dm_crypt_io *io)
 {
-	if (!atomic_dec_and_test(&io->ctx.cc_pending))
+	if (!atomic_dec_and_test(&io->cc_pending))
 		return;
 
 	if (bio_data_dir(io->base_bio) == READ)
@@ -833,16 +826,16 @@ static void crypt_dec_cc_pending(struct dm_crypt_io *io)
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
 static int crypt_convert(struct crypt_config *cc,
-			 struct convert_context *ctx)
+			 struct dm_crypt_io *io)
 {
 	int r;
 	LIST_HEAD(batch);
 	unsigned batch_count = 0;
 
-	atomic_set(&ctx->cc_pending, 1);
+	atomic_set(&io->cc_pending, 1);
 
 	while (1) {
-		struct ablkcipher_request *req = crypt_alloc_req(cc, ctx, GFP_NOWAIT);
+		struct ablkcipher_request *req = crypt_alloc_req(cc, io, GFP_NOWAIT);
 		if (!req) {
 			/*
 			 * We must flush our request queue before we attempt
@@ -850,21 +843,21 @@ static int crypt_convert(struct crypt_config *cc,
 			 */
 			batch_count = 0;
 			crypt_flush_batch(cc, &batch);
-			req = crypt_alloc_req(cc, ctx, GFP_NOIO);
+			req = crypt_alloc_req(cc, io, GFP_NOIO);
 		}
 
-		r = crypt_convert_block(cc, ctx, req, &batch);
+		r = crypt_convert_block(cc, io, req, &batch);
 		if (unlikely(r < 0)) {
 			crypt_flush_batch(cc, &batch);
-			crypt_dec_cc_pending(container_of(ctx, struct dm_crypt_io, ctx));
+			crypt_dec_cc_pending(io);
 			goto ret;
 		}
 
-		ctx->sector++;
+		io->sector++;
 
-		if (ctx->idx_in < ctx->bio_in->bi_vcnt &&
-		    ctx->idx_out < ctx->bio_out->bi_vcnt) {
-			atomic_inc(&ctx->cc_pending);
+		if (io->idx_in < io->bio_in->bi_vcnt &&
+		    io->idx_out < io->bio_out->bi_vcnt) {
+			atomic_inc(&io->cc_pending);
 			if (unlikely(++batch_count >= DMREQ_PUSH_BATCH)) {
 				batch_count = 0;
 				crypt_flush_batch(cc, &batch);
@@ -1093,7 +1086,7 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
 
 static void kcryptd_io_write(struct dm_crypt_io *io)
 {
-	struct bio *clone = io->ctx.bio_out;
+	struct bio *clone = io->bio_out;
 	generic_make_request(clone);
 }
 
@@ -1114,7 +1107,7 @@ static void kcryptd_queue_io(struct dm_crypt_io *io)
 
 static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
 {
-	struct bio *clone = io->ctx.bio_out;
+	struct bio *clone = io->bio_out;
 	struct crypt_config *cc = io->cc;
 
 	if (unlikely(io->error < 0)) {
@@ -1125,7 +1118,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
 	}
 
 	/* crypt_convert should have filled the clone bio */
-	BUG_ON(io->ctx.idx_out < clone->bi_vcnt);
+	BUG_ON(io->idx_out < clone->bi_vcnt);
 
 	clone->bi_sector = cc->start + io->sector;
 
@@ -1147,7 +1140,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	 * Prevent io from disappearing until this function completes.
 	 */
 	crypt_inc_pending(io);
-	crypt_convert_init(cc, &io->ctx, NULL, io->base_bio, sector);
+	crypt_convert_init(cc, io, NULL, io->base_bio, sector);
 
 	clone = crypt_alloc_buffer(io, remaining);
 	if (unlikely(!clone)) {
@@ -1155,14 +1148,14 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 		goto dec;
 	}
 
-	io->ctx.bio_out = clone;
-	io->ctx.idx_out = 0;
+	io->bio_out = clone;
+	io->idx_out = 0;
 
 	remaining -= clone->bi_size;
 	sector += bio_sectors(clone);
 
 	crypt_inc_pending(io);
-	r = crypt_convert(cc, &io->ctx);
+	r = crypt_convert(cc, io);
 	if (r)
 		io->error = -EIO;
 dec:
@@ -1176,10 +1169,10 @@ static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 
 	crypt_inc_pending(io);
 
-	crypt_convert_init(cc, &io->ctx, io->base_bio, io->base_bio,
+	crypt_convert_init(cc, io, io->base_bio, io->base_bio,
 			   io->sector);
 
-	r = crypt_convert(cc, &io->ctx);
+	r = crypt_convert(cc, io);
 	if (r < 0)
 		io->error = -EIO;
 
@@ -1190,8 +1183,7 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 			       int error)
 {
 	struct dm_crypt_request *dmreq = async_req->data;
-	struct convert_context *ctx = dmreq->ctx;
-	struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx);
+	struct dm_crypt_io *io = dmreq->io;
 	struct crypt_config *cc = io->cc;
 
 	if (error == -EINPROGRESS) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/20] dm-crypt: move error handling to crypt_convert.
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (11 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 13/20] dm-crypt merge convert_context and dm_crypt_io Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 15/20] dm-crypt: remove io_pending field Mikulas Patocka
                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

This patch moves error handling to crypt_convert and makes crypt_convert
return void instead of the error number.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index bb11a95..6f18838 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -825,16 +825,16 @@ static void crypt_dec_cc_pending(struct dm_crypt_io *io)
 /*
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
-static int crypt_convert(struct crypt_config *cc,
-			 struct dm_crypt_io *io)
+static void crypt_convert(struct crypt_config *cc,
+			  struct dm_crypt_io *io)
 {
-	int r;
 	LIST_HEAD(batch);
 	unsigned batch_count = 0;
 
 	atomic_set(&io->cc_pending, 1);
 
 	while (1) {
+		int r;
 		struct ablkcipher_request *req = crypt_alloc_req(cc, io, GFP_NOWAIT);
 		if (!req) {
 			/*
@@ -849,8 +849,9 @@ static int crypt_convert(struct crypt_config *cc,
 		r = crypt_convert_block(cc, io, req, &batch);
 		if (unlikely(r < 0)) {
 			crypt_flush_batch(cc, &batch);
+			io->error = -EIO;
 			crypt_dec_cc_pending(io);
-			goto ret;
+			return;
 		}
 
 		io->sector++;
@@ -866,12 +867,8 @@ static int crypt_convert(struct crypt_config *cc,
 		}
 		break;
 	}
-	r = 0;
 
 	crypt_flush_batch(cc, &batch);
-ret:
-
-	return r;
 }
 
 static void dm_crypt_bio_destructor(struct bio *bio)
@@ -1134,7 +1131,6 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	struct bio *clone;
 	unsigned remaining = io->base_bio->bi_size;
 	sector_t sector = io->sector;
-	int r;
 
 	/*
 	 * Prevent io from disappearing until this function completes.
@@ -1155,9 +1151,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	sector += bio_sectors(clone);
 
 	crypt_inc_pending(io);
-	r = crypt_convert(cc, io);
-	if (r)
-		io->error = -EIO;
+	crypt_convert(cc, io);
 dec:
 	crypt_dec_pending(io);
 }
@@ -1165,17 +1159,13 @@ dec:
 static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
-	int r = 0;
 
 	crypt_inc_pending(io);
 
 	crypt_convert_init(cc, io, io->base_bio, io->base_bio,
 			   io->sector);
 
-	r = crypt_convert(cc, io);
-	if (r < 0)
-		io->error = -EIO;
-
+	crypt_convert(cc, io);
 	crypt_dec_pending(io);
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 15/20] dm-crypt: remove io_pending field
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (12 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 14/20] dm-crypt: move error handling to crypt_convert Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 16/20] dm-crypt: small changes Mikulas Patocka
                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Remove io_pending field. Since we changed the code so that it allocates
all pages for one request, we no longer need it. There is always just one
pending request, so we may remove the pending counter.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 6f18838..9740774 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -53,7 +53,6 @@ struct dm_crypt_io {
 	sector_t cc_sector;
 	atomic_t cc_pending;
 
-	atomic_t io_pending;
 	int error;
 	sector_t sector;
 };
@@ -808,7 +807,7 @@ static void crypt_flush_batch(struct crypt_config *cc, struct list_head *batch)
 	INIT_LIST_HEAD(batch);
 }
 
-static void crypt_dec_pending(struct dm_crypt_io *io);
+static void crypt_end_io(struct dm_crypt_io *io);
 static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async);
 
 static void crypt_dec_cc_pending(struct dm_crypt_io *io)
@@ -817,7 +816,7 @@ static void crypt_dec_cc_pending(struct dm_crypt_io *io)
 		return;
 
 	if (bio_data_dir(io->base_bio) == READ)
-		crypt_dec_pending(io);
+		crypt_end_io(io);
 	else
 		kcryptd_crypt_write_io_submit(io, 1);
 }
@@ -968,29 +967,20 @@ static struct dm_crypt_io *crypt_io_alloc(struct crypt_config *cc,
 	io->base_bio = bio;
 	io->sector = sector;
 	io->error = 0;
-	atomic_set(&io->io_pending, 0);
 
 	return io;
 }
 
-static void crypt_inc_pending(struct dm_crypt_io *io)
-{
-	atomic_inc(&io->io_pending);
-}
-
 /*
  * One of the bios was finished. Check for completion of
  * the whole request and correctly clean up the buffer.
  */
-static void crypt_dec_pending(struct dm_crypt_io *io)
+static void crypt_end_io(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 	struct bio *base_bio = io->base_bio;
 	int error = io->error;
 
-	if (!atomic_dec_and_test(&io->io_pending))
-		return;
-
 	mempool_free(io, cc->io_pool);
 
 	bio_endio(base_bio, error);
@@ -1038,7 +1028,7 @@ static void crypt_endio(struct bio *clone, int error)
 	if (unlikely(error))
 		io->error = error;
 
-	crypt_dec_pending(io);
+	crypt_end_io(io);
 }
 
 static void clone_init(struct dm_crypt_io *io, struct bio *clone)
@@ -1067,8 +1057,6 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
 	if (!clone)
 		return 1;
 
-	crypt_inc_pending(io);
-
 	clone_init(io, clone);
 	clone->bi_idx = 0;
 	clone->bi_vcnt = bio_segments(base_bio);
@@ -1110,7 +1098,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
 	if (unlikely(io->error < 0)) {
 		crypt_free_buffer_pages(cc, clone);
 		bio_put(clone);
-		crypt_dec_pending(io);
+		crypt_end_io(io);
 		return;
 	}
 
@@ -1132,16 +1120,13 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	unsigned remaining = io->base_bio->bi_size;
 	sector_t sector = io->sector;
 
-	/*
-	 * Prevent io from disappearing until this function completes.
-	 */
-	crypt_inc_pending(io);
 	crypt_convert_init(cc, io, NULL, io->base_bio, sector);
 
 	clone = crypt_alloc_buffer(io, remaining);
 	if (unlikely(!clone)) {
 		io->error = -ENOMEM;
-		goto dec;
+		crypt_end_io(io);
+		return;
 	}
 
 	io->bio_out = clone;
@@ -1150,23 +1135,17 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 	remaining -= clone->bi_size;
 	sector += bio_sectors(clone);
 
-	crypt_inc_pending(io);
 	crypt_convert(cc, io);
-dec:
-	crypt_dec_pending(io);
 }
 
 static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 
-	crypt_inc_pending(io);
-
 	crypt_convert_init(cc, io, io->base_bio, io->base_bio,
 			   io->sector);
 
 	crypt_convert(cc, io);
-	crypt_dec_pending(io);
 }
 
 static void kcryptd_async_done(struct crypto_async_request *async_req,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 16/20] dm-crypt: small changes
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (13 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 15/20] dm-crypt: remove io_pending field Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 17/20] dm-crypt: move temporary values to stack Mikulas Patocka
                     ` (3 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Small changes:
- bio_in and base_bio are always the same, so we can remove bio_in.
- simplify arguments of crypt_convert_init
- remove parameter from kcryptd_io_read because it is always GFP_NOIO
- remove "cc" parameter from crypt_alloc_req because the value can be obtained
  from io->cc
- the rest of the patch just moves functions around without changing any logic

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |  181 +++++++++++++++++++++----------------------------
 1 file changed, 76 insertions(+), 105 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 9740774..097171b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -44,7 +44,6 @@ struct dm_crypt_io {
 	struct bio *base_bio;
 	struct work_struct work;
 
-	struct bio *bio_in;
 	struct bio *bio_out;
 	unsigned int offset_in;
 	unsigned int offset_out;
@@ -561,7 +560,7 @@ static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 *iv,
 	u8 *src;
 	int r = 0;
 
-	if (bio_data_dir(dmreq->io->bio_in) == WRITE) {
+	if (bio_data_dir(dmreq->io->base_bio) == WRITE) {
 		src = kmap_atomic(sg_page(&dmreq->sg_in));
 		r = crypt_iv_lmk_one(cc, iv, dmreq, src + dmreq->sg_in.offset);
 		kunmap_atomic(src);
@@ -577,7 +576,7 @@ static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv,
 	u8 *dst;
 	int r;
 
-	if (bio_data_dir(dmreq->io->bio_in) == WRITE)
+	if (bio_data_dir(dmreq->io->base_bio) == WRITE)
 		return 0;
 
 	dst = kmap_atomic(sg_page(&dmreq->sg_out));
@@ -626,18 +625,15 @@ static struct crypt_iv_operations crypt_iv_lmk_ops = {
 	.post	   = crypt_iv_lmk_post
 };
 
-static void crypt_convert_init(struct crypt_config *cc,
-			       struct dm_crypt_io *io,
-			       struct bio *bio_out, struct bio *bio_in,
-			       sector_t sector)
+static void crypt_convert_init(struct dm_crypt_io *io, struct bio *bio_out)
 {
-	io->bio_in = bio_in;
+	struct crypt_config *cc = io->cc;
 	io->bio_out = bio_out;
 	io->offset_in = 0;
 	io->offset_out = 0;
-	io->idx_in = bio_in ? bio_in->bi_idx : 0;
-	io->idx_out = bio_out ? bio_out->bi_idx : 0;
-	io->cc_sector = sector + cc->iv_offset;
+	io->idx_in = io->base_bio->bi_idx;
+	io->idx_out = bio_out->bi_idx;
+	io->cc_sector = io->sector + cc->iv_offset;
 }
 
 static struct dm_crypt_request *dmreq_of_req(struct crypt_config *cc,
@@ -716,7 +712,7 @@ pop_from_list:
 			int r;
 			DECLARE_COMPLETION(busy_wait);
 			dmreq->busy_wait = &busy_wait;
-			if (bio_data_dir(dmreq->io->bio_in) == WRITE)
+			if (bio_data_dir(dmreq->io->base_bio) == WRITE)
 				r = crypto_ablkcipher_encrypt(req);
 			else
 				r = crypto_ablkcipher_decrypt(req);
@@ -732,12 +728,53 @@ pop_from_list:
 	return 0;
 }
 
-static int crypt_convert_block(struct crypt_config *cc,
-			       struct dm_crypt_io *io,
+static struct ablkcipher_request *crypt_alloc_req(struct dm_crypt_io *io,
+						  gfp_t gfp_mask)
+{
+	struct crypt_config *cc = io->cc;
+	unsigned key_index = io->cc_sector & (cc->tfms_count - 1);
+	struct ablkcipher_request *req = mempool_alloc(cc->req_pool, gfp_mask);
+	if (!req)
+		return NULL;
+
+	ablkcipher_request_set_tfm(req, cc->tfms[key_index]);
+	ablkcipher_request_set_callback(req,
+	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
+	    kcryptd_async_done, dmreq_of_req(cc, req));
+
+	return req;
+}
+
+static void crypt_flush_batch(struct crypt_config *cc, struct list_head *batch)
+{
+	spin_lock_irq(&cc->crypt_thread_wait.lock);
+	list_splice_tail(batch, &cc->crypt_thread_list);
+	wake_up_locked(&cc->crypt_thread_wait);
+	spin_unlock_irq(&cc->crypt_thread_wait.lock);
+	INIT_LIST_HEAD(batch);
+
+}
+
+static void crypt_end_io(struct dm_crypt_io *io);
+static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io);
+
+static void crypt_dec_cc_pending(struct dm_crypt_io *io)
+{
+	if (!atomic_dec_and_test(&io->cc_pending))
+		return;
+
+	if (bio_data_dir(io->base_bio) == READ)
+		crypt_end_io(io);
+	else
+		kcryptd_crypt_write_io_submit(io);
+}
+
+static int crypt_convert_block(struct dm_crypt_io *io,
 			       struct ablkcipher_request *req,
 			       struct list_head *batch)
 {
-	struct bio_vec *bv_in = bio_iovec_idx(io->bio_in, io->idx_in);
+	struct crypt_config *cc = io->cc;
+	struct bio_vec *bv_in = bio_iovec_idx(io->base_bio, io->idx_in);
 	struct bio_vec *bv_out = bio_iovec_idx(io->bio_out, io->idx_out);
 	struct dm_crypt_request *dmreq;
 	u8 *iv;
@@ -782,51 +819,12 @@ static int crypt_convert_block(struct crypt_config *cc,
 	return 0;
 }
 
-static struct ablkcipher_request *crypt_alloc_req(struct crypt_config *cc,
-			    struct dm_crypt_io *io, gfp_t gfp_mask)
-{
-	unsigned key_index = io->cc_sector & (cc->tfms_count - 1);
-	struct ablkcipher_request *req = mempool_alloc(cc->req_pool, gfp_mask);
-	if (!req)
-		return NULL;
-
-	ablkcipher_request_set_tfm(req, cc->tfms[key_index]);
-	ablkcipher_request_set_callback(req,
-	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-	    kcryptd_async_done, dmreq_of_req(cc, req));
-
-	return req;
-}
-
-static void crypt_flush_batch(struct crypt_config *cc, struct list_head *batch)
-{
-	spin_lock_irq(&cc->crypt_thread_wait.lock);
-	list_splice_tail(batch, &cc->crypt_thread_list);
-	wake_up_locked(&cc->crypt_thread_wait);
-	spin_unlock_irq(&cc->crypt_thread_wait.lock);
-	INIT_LIST_HEAD(batch);
-}
-
-static void crypt_end_io(struct dm_crypt_io *io);
-static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async);
-
-static void crypt_dec_cc_pending(struct dm_crypt_io *io)
-{
-	if (!atomic_dec_and_test(&io->cc_pending))
-		return;
-
-	if (bio_data_dir(io->base_bio) == READ)
-		crypt_end_io(io);
-	else
-		kcryptd_crypt_write_io_submit(io, 1);
-}
-
 /*
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
-static void crypt_convert(struct crypt_config *cc,
-			  struct dm_crypt_io *io)
+static void crypt_convert(struct dm_crypt_io *io)
 {
+	struct crypt_config *cc = io->cc;
 	LIST_HEAD(batch);
 	unsigned batch_count = 0;
 
@@ -834,7 +832,7 @@ static void crypt_convert(struct crypt_config *cc,
 
 	while (1) {
 		int r;
-		struct ablkcipher_request *req = crypt_alloc_req(cc, io, GFP_NOWAIT);
+		struct ablkcipher_request *req = crypt_alloc_req(io, GFP_NOWAIT);
 		if (!req) {
 			/*
 			 * We must flush our request queue before we attempt
@@ -842,10 +840,10 @@ static void crypt_convert(struct crypt_config *cc,
 			 */
 			batch_count = 0;
 			crypt_flush_batch(cc, &batch);
-			req = crypt_alloc_req(cc, io, GFP_NOIO);
+			req = crypt_alloc_req(io, GFP_NOIO);
 		}
 
-		r = crypt_convert_block(cc, io, req, &batch);
+		r = crypt_convert_block(io, req, &batch);
 		if (unlikely(r < 0)) {
 			crypt_flush_batch(cc, &batch);
 			io->error = -EIO;
@@ -855,7 +853,7 @@ static void crypt_convert(struct crypt_config *cc,
 
 		io->sector++;
 
-		if (io->idx_in < io->bio_in->bi_vcnt &&
+		if (io->idx_in < io->base_bio->bi_vcnt &&
 		    io->idx_out < io->bio_out->bi_vcnt) {
 			atomic_inc(&io->cc_pending);
 			if (unlikely(++batch_count >= DMREQ_PUSH_BATCH)) {
@@ -1042,7 +1040,7 @@ static void clone_init(struct dm_crypt_io *io, struct bio *clone)
 	clone->bi_destructor = dm_crypt_bio_destructor;
 }
 
-static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
+static int kcryptd_io_read(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 	struct bio *base_bio = io->base_bio;
@@ -1053,7 +1051,7 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
 	 * copy the required bvecs because we need the original
 	 * one in order to decrypt the whole bio data *afterwards*.
 	 */
-	clone = bio_alloc_bioset(gfp, bio_segments(base_bio), cc->bs);
+	clone = bio_alloc_bioset(GFP_NOIO, bio_segments(base_bio), cc->bs);
 	if (!clone)
 		return 1;
 
@@ -1069,31 +1067,11 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
 	return 0;
 }
 
-static void kcryptd_io_write(struct dm_crypt_io *io)
-{
-	struct bio *clone = io->bio_out;
-	generic_make_request(clone);
-}
-
-static void kcryptd_io(struct work_struct *work)
+static void kcryptd_io_write(struct work_struct *work)
 {
 	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
-
-	kcryptd_io_write(io);
-}
-
-static void kcryptd_queue_io(struct dm_crypt_io *io)
-{
 	struct crypt_config *cc = io->cc;
-
-	INIT_WORK(&io->work, kcryptd_io);
-	queue_work(cc->io_queue, &io->work);
-}
-
-static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
-{
 	struct bio *clone = io->bio_out;
-	struct crypt_config *cc = io->cc;
 
 	if (unlikely(io->error < 0)) {
 		crypt_free_buffer_pages(cc, clone);
@@ -1107,45 +1085,38 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
 
 	clone->bi_sector = cc->start + io->sector;
 
-	if (async)
-		kcryptd_queue_io(io);
-	else
-		generic_make_request(clone);
+	generic_make_request(clone);
 }
 
-static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
+static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
-	struct bio *clone;
-	unsigned remaining = io->base_bio->bi_size;
-	sector_t sector = io->sector;
 
-	crypt_convert_init(cc, io, NULL, io->base_bio, sector);
+	INIT_WORK(&io->work, kcryptd_io_write);
+	queue_work(cc->io_queue, &io->work);
+}
+
+static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
+{
+	struct bio *clone;
 
-	clone = crypt_alloc_buffer(io, remaining);
+	clone = crypt_alloc_buffer(io, io->base_bio->bi_size);
 	if (unlikely(!clone)) {
 		io->error = -ENOMEM;
 		crypt_end_io(io);
 		return;
 	}
 
-	io->bio_out = clone;
-	io->idx_out = 0;
+	crypt_convert_init(io, clone);
 
-	remaining -= clone->bi_size;
-	sector += bio_sectors(clone);
-
-	crypt_convert(cc, io);
+	crypt_convert(io);
 }
 
 static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 {
-	struct crypt_config *cc = io->cc;
-
-	crypt_convert_init(cc, io, io->base_bio, io->base_bio,
-			   io->sector);
+	crypt_convert_init(io, io->base_bio);
 
-	crypt_convert(cc, io);
+	crypt_convert(io);
 }
 
 static void kcryptd_async_done(struct crypto_async_request *async_req,
@@ -1734,7 +1705,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio,
 	io = crypt_io_alloc(cc, bio, dm_target_offset(ti, bio->bi_sector));
 
 	if (bio_data_dir(io->base_bio) == READ) {
-		kcryptd_io_read(io, GFP_NOIO);
+		kcryptd_io_read(io);
 	} else {
 		kcryptd_crypt_write_convert(io);
 	}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 17/20] dm-crypt: move temporary values to stack
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (14 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 16/20] dm-crypt: small changes Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 18/20] dm-crypt: offload writes to thread Mikulas Patocka
                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Structure dm_crypt_io contains some values that are used only temporarily.
Move these values to a structure dm_crypt_position that is allocated on stack.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   89 +++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 47 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 097171b..9316630 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -45,11 +45,6 @@ struct dm_crypt_io {
 	struct work_struct work;
 
 	struct bio *bio_out;
-	unsigned int offset_in;
-	unsigned int offset_out;
-	unsigned int idx_in;
-	unsigned int idx_out;
-	sector_t cc_sector;
 	atomic_t cc_pending;
 
 	int error;
@@ -625,17 +620,6 @@ static struct crypt_iv_operations crypt_iv_lmk_ops = {
 	.post	   = crypt_iv_lmk_post
 };
 
-static void crypt_convert_init(struct dm_crypt_io *io, struct bio *bio_out)
-{
-	struct crypt_config *cc = io->cc;
-	io->bio_out = bio_out;
-	io->offset_in = 0;
-	io->offset_out = 0;
-	io->idx_in = io->base_bio->bi_idx;
-	io->idx_out = bio_out->bi_idx;
-	io->cc_sector = io->sector + cc->iv_offset;
-}
-
 static struct dm_crypt_request *dmreq_of_req(struct crypt_config *cc,
 					     struct ablkcipher_request *req)
 {
@@ -728,11 +712,20 @@ pop_from_list:
 	return 0;
 }
 
+struct dm_crypt_position {
+	unsigned int offset_in;
+	unsigned int offset_out;
+	unsigned int idx_in;
+	unsigned int idx_out;
+	sector_t cc_sector;
+};
+
 static struct ablkcipher_request *crypt_alloc_req(struct dm_crypt_io *io,
+						  struct dm_crypt_position *pos,
 						  gfp_t gfp_mask)
 {
 	struct crypt_config *cc = io->cc;
-	unsigned key_index = io->cc_sector & (cc->tfms_count - 1);
+	unsigned key_index = pos->cc_sector & (cc->tfms_count - 1);
 	struct ablkcipher_request *req = mempool_alloc(cc->req_pool, gfp_mask);
 	if (!req)
 		return NULL;
@@ -771,11 +764,12 @@ static void crypt_dec_cc_pending(struct dm_crypt_io *io)
 
 static int crypt_convert_block(struct dm_crypt_io *io,
 			       struct ablkcipher_request *req,
+			       struct dm_crypt_position *pos,
 			       struct list_head *batch)
 {
 	struct crypt_config *cc = io->cc;
-	struct bio_vec *bv_in = bio_iovec_idx(io->base_bio, io->idx_in);
-	struct bio_vec *bv_out = bio_iovec_idx(io->bio_out, io->idx_out);
+	struct bio_vec *bv_in = bio_iovec_idx(io->base_bio, pos->idx_in);
+	struct bio_vec *bv_out = bio_iovec_idx(io->bio_out, pos->idx_out);
 	struct dm_crypt_request *dmreq;
 	u8 *iv;
 	int r;
@@ -783,26 +777,26 @@ static int crypt_convert_block(struct dm_crypt_io *io,
 	dmreq = dmreq_of_req(cc, req);
 	iv = iv_of_dmreq(cc, dmreq);
 
-	dmreq->iv_sector = io->cc_sector;
+	dmreq->iv_sector = pos->cc_sector;
 	dmreq->io = io;
 	sg_init_table(&dmreq->sg_in, 1);
 	sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << SECTOR_SHIFT,
-		    bv_in->bv_offset + io->offset_in);
+		    bv_in->bv_offset + pos->offset_in);
 
 	sg_init_table(&dmreq->sg_out, 1);
 	sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << SECTOR_SHIFT,
-		    bv_out->bv_offset + io->offset_out);
+		    bv_out->bv_offset + pos->offset_out);
 
-	io->offset_in += 1 << SECTOR_SHIFT;
-	if (io->offset_in >= bv_in->bv_len) {
-		io->offset_in = 0;
-		io->idx_in++;
+	pos->offset_in += 1 << SECTOR_SHIFT;
+	if (pos->offset_in >= bv_in->bv_len) {
+		pos->offset_in = 0;
+		pos->idx_in++;
 	}
 
-	io->offset_out += 1 << SECTOR_SHIFT;
-	if (io->offset_out >= bv_out->bv_len) {
-		io->offset_out = 0;
-		io->idx_out++;
+	pos->offset_out += 1 << SECTOR_SHIFT;
+	if (pos->offset_out >= bv_out->bv_len) {
+		pos->offset_out = 0;
+		pos->idx_out++;
 	}
 
 	if (cc->iv_gen_ops) {
@@ -822,17 +816,25 @@ static int crypt_convert_block(struct dm_crypt_io *io,
 /*
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
-static void crypt_convert(struct dm_crypt_io *io)
+static void crypt_convert(struct dm_crypt_io *io, struct bio *bio_out)
 {
 	struct crypt_config *cc = io->cc;
 	LIST_HEAD(batch);
 	unsigned batch_count = 0;
+	struct dm_crypt_position pos;
+
+	io->bio_out = bio_out;
+	pos.offset_in = 0;
+	pos.offset_out = 0;
+	pos.idx_in = io->base_bio->bi_idx;
+	pos.idx_out = bio_out->bi_idx;
+	pos.cc_sector = io->sector + cc->iv_offset;
 
 	atomic_set(&io->cc_pending, 1);
 
 	while (1) {
 		int r;
-		struct ablkcipher_request *req = crypt_alloc_req(io, GFP_NOWAIT);
+		struct ablkcipher_request *req = crypt_alloc_req(io, &pos, GFP_NOWAIT);
 		if (!req) {
 			/*
 			 * We must flush our request queue before we attempt
@@ -840,10 +842,10 @@ static void crypt_convert(struct dm_crypt_io *io)
 			 */
 			batch_count = 0;
 			crypt_flush_batch(cc, &batch);
-			req = crypt_alloc_req(io, GFP_NOIO);
+			req = crypt_alloc_req(io, &pos, GFP_NOIO);
 		}
 
-		r = crypt_convert_block(io, req, &batch);
+		r = crypt_convert_block(io, req, &pos, &batch);
 		if (unlikely(r < 0)) {
 			crypt_flush_batch(cc, &batch);
 			io->error = -EIO;
@@ -851,10 +853,10 @@ static void crypt_convert(struct dm_crypt_io *io)
 			return;
 		}
 
-		io->sector++;
-
-		if (io->idx_in < io->base_bio->bi_vcnt &&
-		    io->idx_out < io->bio_out->bi_vcnt) {
+		pos.cc_sector++;
+ 
+		if (pos.idx_in < io->base_bio->bi_vcnt &&
+		    pos.idx_out < io->bio_out->bi_vcnt) {
 			atomic_inc(&io->cc_pending);
 			if (unlikely(++batch_count >= DMREQ_PUSH_BATCH)) {
 				batch_count = 0;
@@ -1080,9 +1082,6 @@ static void kcryptd_io_write(struct work_struct *work)
 		return;
 	}
 
-	/* crypt_convert should have filled the clone bio */
-	BUG_ON(io->idx_out < clone->bi_vcnt);
-
 	clone->bi_sector = cc->start + io->sector;
 
 	generic_make_request(clone);
@@ -1107,16 +1106,12 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 		return;
 	}
 
-	crypt_convert_init(io, clone);
-
-	crypt_convert(io);
+	crypt_convert(io, clone);
 }
 
 static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 {
-	crypt_convert_init(io, io->base_bio);
-
-	crypt_convert(io);
+	crypt_convert(io, io->base_bio);
 }
 
 static void kcryptd_async_done(struct crypto_async_request *async_req,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 18/20] dm-crypt: offload writes to thread
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (15 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 17/20] dm-crypt: move temporary values to stack Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 19/20] dm-crypt: retain write ordering Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 20/20] dm-crypt: sort writes Mikulas Patocka
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

Submitting write bios directly in the encryption thread caused serious
performance degradation. On multiprocessor machine encryption requests
finish in a different order than they were submitted in. Consequently, write
requests would be submitted in a different order and it could cause severe
performance degradation.

This patch moves submitting write requests to a separate thread so that
the requests can be sorted before submitting.

Sorting in implemented in the next patch.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 80 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 9316630..0e31454 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -49,6 +49,8 @@ struct dm_crypt_io {
 
 	int error;
 	sector_t sector;
+
+	struct list_head list;
 };
 
 struct dm_crypt_request {
@@ -120,6 +122,10 @@ struct crypt_config {
 	wait_queue_head_t crypt_thread_wait;
 	struct list_head crypt_thread_list;
 
+	struct task_struct *write_thread;
+	wait_queue_head_t write_thread_wait;
+	struct list_head write_thread_list;
+
 	char *cipher;
 	char *cipher_string;
 
@@ -1069,9 +1075,8 @@ static int kcryptd_io_read(struct dm_crypt_io *io)
 	return 0;
 }
 
-static void kcryptd_io_write(struct work_struct *work)
+static void kcryptd_io_write(struct dm_crypt_io *io)
 {
-	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
 	struct crypt_config *cc = io->cc;
 	struct bio *clone = io->bio_out;
 
@@ -1087,12 +1092,68 @@ static void kcryptd_io_write(struct work_struct *work)
 	generic_make_request(clone);
 }
 
+static int dmcrypt_write(void *data)
+{
+	struct crypt_config *cc = data;
+	while (1) {
+		struct list_head local_list;
+		struct blk_plug plug;
+
+		DECLARE_WAITQUEUE(wait, current);
+
+		spin_lock_irq(&cc->write_thread_wait.lock);
+continue_locked:
+
+		if (!list_empty(&cc->write_thread_list))
+			goto pop_from_list;
+
+		__set_current_state(TASK_INTERRUPTIBLE);
+		__add_wait_queue(&cc->write_thread_wait, &wait);
+
+		spin_unlock_irq(&cc->write_thread_wait.lock);
+
+		if (unlikely(kthread_should_stop())) {
+			set_task_state(current, TASK_RUNNING);
+			remove_wait_queue(&cc->write_thread_wait, &wait);
+			break;
+		}
+
+		schedule();
+
+		set_task_state(current, TASK_RUNNING);
+		spin_lock_irq(&cc->write_thread_wait.lock);
+		__remove_wait_queue(&cc->write_thread_wait, &wait);
+		goto continue_locked;
+
+pop_from_list:
+		local_list = cc->write_thread_list;
+		local_list.next->prev = &local_list;
+		local_list.prev->next = &local_list;
+		INIT_LIST_HEAD(&cc->write_thread_list);
+
+		spin_unlock_irq(&cc->write_thread_wait.lock);
+
+		blk_start_plug(&plug);
+		do {
+			struct dm_crypt_io *io = container_of(local_list.next,
+						struct dm_crypt_io, list);
+			list_del(&io->list);
+			kcryptd_io_write(io);
+		} while (!list_empty(&local_list));
+		blk_finish_plug(&plug);
+	}
+	return 0;
+}
+
 static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
+	unsigned long flags;
 
-	INIT_WORK(&io->work, kcryptd_io_write);
-	queue_work(cc->io_queue, &io->work);
+	spin_lock_irqsave(&cc->write_thread_wait.lock, flags);
+	list_add_tail(&io->list, &cc->write_thread_list);
+	wake_up_locked(&cc->write_thread_wait);
+	spin_unlock_irqrestore(&cc->write_thread_wait.lock, flags);
 }
 
 static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
@@ -1299,6 +1360,9 @@ static void crypt_dtr(struct dm_target *ti)
 		kfree(cc->crypt_threads);
 	}
 
+	if (cc->write_thread)
+		kthread_stop(cc->write_thread);
+
 	if (cc->io_queue)
 		destroy_workqueue(cc->io_queue);
 
@@ -1669,6 +1733,18 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		}
 	}
 
+	init_waitqueue_head(&cc->write_thread_wait);
+	INIT_LIST_HEAD(&cc->write_thread_list);
+
+	cc->write_thread = kthread_create(dmcrypt_write, cc, "dmcrypt_write");
+	if (IS_ERR(cc->write_thread)) {
+		ret = PTR_ERR(cc->write_thread);
+		cc->write_thread = NULL;
+		ti->error = "Couldn't spawn write thread";
+		goto bad;
+	}
+	wake_up_process(cc->write_thread);
+
 	ti->num_flush_requests = 1;
 	ti->discard_zeroes_data_unsupported = true;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 19/20] dm-crypt: retain write ordering
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (16 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 18/20] dm-crypt: offload writes to thread Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21  9:09   ` [PATCH 20/20] dm-crypt: sort writes Mikulas Patocka
  18 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

This patch implements reordering outgoing write requests.
The requests are submitted in the same order they were received in.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   47 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 0e31454..ccd3380 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -51,6 +51,8 @@ struct dm_crypt_io {
 	sector_t sector;
 
 	struct list_head list;
+
+	u64 sequence;
 };
 
 struct dm_crypt_request {
@@ -126,6 +128,9 @@ struct crypt_config {
 	wait_queue_head_t write_thread_wait;
 	struct list_head write_thread_list;
 
+	u64 write_sequence;
+	atomic64_t alloc_sequence;
+
 	char *cipher;
 	char *cipher_string;
 
@@ -1097,6 +1102,7 @@ static int dmcrypt_write(void *data)
 	struct crypt_config *cc = data;
 	while (1) {
 		struct list_head local_list;
+		unsigned spinlock_breaker;
 		struct blk_plug plug;
 
 		DECLARE_WAITQUEUE(wait, current);
@@ -1126,20 +1132,35 @@ continue_locked:
 		goto continue_locked;
 
 pop_from_list:
-		local_list = cc->write_thread_list;
-		local_list.next->prev = &local_list;
-		local_list.prev->next = &local_list;
-		INIT_LIST_HEAD(&cc->write_thread_list);
+		INIT_LIST_HEAD(&local_list);
+		spinlock_breaker = 0;
+		do {
+			struct dm_crypt_io *io = container_of(
+						cc->write_thread_list.next,
+						struct dm_crypt_io, list);
+
+			BUG_ON(io->sequence < cc->write_sequence);
+			if (io->sequence != cc->write_sequence)
+				break;
+			cc->write_sequence++;
+
+			list_del(&io->list);
+			list_add_tail(&io->list, &local_list);
+			if (unlikely(!(++spinlock_breaker & 63))) {
+				spin_unlock_irq(&cc->write_thread_wait.lock);
+				spin_lock_irq(&cc->write_thread_wait.lock);
+			}
+		} while (!list_empty(&cc->write_thread_list));
 
 		spin_unlock_irq(&cc->write_thread_wait.lock);
 
 		blk_start_plug(&plug);
-		do {
+		while (!list_empty(&local_list)) {
 			struct dm_crypt_io *io = container_of(local_list.next,
 						struct dm_crypt_io, list);
 			list_del(&io->list);
 			kcryptd_io_write(io);
-		} while (!list_empty(&local_list));
+		}
 		blk_finish_plug(&plug);
 	}
 	return 0;
@@ -1149,10 +1170,18 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 	unsigned long flags;
+	struct dm_crypt_io *io_list;
 
 	spin_lock_irqsave(&cc->write_thread_wait.lock, flags);
-	list_add_tail(&io->list, &cc->write_thread_list);
+	list_for_each_entry_reverse(io_list, &cc->write_thread_list, list) {
+		if (io_list->sequence < io->sequence) {
+			list_add(&io->list, &io_list->list);
+			goto added;
+		}
+	}
+	list_add(&io->list, &cc->write_thread_list);
 	wake_up_locked(&cc->write_thread_wait);
+added:
 	spin_unlock_irqrestore(&cc->write_thread_wait.lock, flags);
 }
 
@@ -1167,6 +1196,8 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 		return;
 	}
 
+	io->sequence = atomic64_inc_return(&io->cc->alloc_sequence) - 1;
+
 	crypt_convert(io, clone);
 }
 
@@ -1735,6 +1766,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	init_waitqueue_head(&cc->write_thread_wait);
 	INIT_LIST_HEAD(&cc->write_thread_list);
+	cc->write_sequence = 0;
+	atomic64_set(&cc->alloc_sequence, 0);
 
 	cc->write_thread = kthread_create(dmcrypt_write, cc, "dmcrypt_write");
 	if (IS_ERR(cc->write_thread)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 20/20] dm-crypt: sort writes
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
                     ` (17 preceding siblings ...)
  2012-08-21  9:09   ` [PATCH 19/20] dm-crypt: retain write ordering Mikulas Patocka
@ 2012-08-21  9:09   ` Mikulas Patocka
  2012-08-21 10:57     ` Alasdair G Kergon
  18 siblings, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21  9:09 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

An alternative to the previous patch.

Write requests are sorted in a red-black tree structure and are submitted
in the sorted order.

In theory the sorting should be performed by the underlying disk scheduler,
however, in practice the disk scheduler accepts and sorts only 128 requests.
In order to sort more requests, we need to implement our own sorting.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm-crypt.c |   87 +++++++++++++++++++++----------------------------
 1 file changed, 38 insertions(+), 49 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ccd3380..a61f285 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -21,6 +21,7 @@
 #include <linux/backing-dev.h>
 #include <linux/atomic.h>
 #include <linux/scatterlist.h>
+#include <linux/rbtree.h>
 #include <asm/page.h>
 #include <asm/unaligned.h>
 #include <crypto/hash.h>
@@ -50,9 +51,7 @@ struct dm_crypt_io {
 	int error;
 	sector_t sector;
 
-	struct list_head list;
-
-	u64 sequence;
+	struct rb_node rb_node;
 };
 
 struct dm_crypt_request {
@@ -126,10 +125,7 @@ struct crypt_config {
 
 	struct task_struct *write_thread;
 	wait_queue_head_t write_thread_wait;
-	struct list_head write_thread_list;
-
-	u64 write_sequence;
-	atomic64_t alloc_sequence;
+	struct rb_root write_tree;
 
 	char *cipher;
 	char *cipher_string;
@@ -711,6 +707,7 @@ pop_from_list:
 				r = crypto_ablkcipher_encrypt(req);
 			else
 				r = crypto_ablkcipher_decrypt(req);
+			r = 0;
 			if (unlikely(r == -EBUSY)) {
 				wait_for_completion(&busy_wait);
 			} else if (likely(r != -EINPROGRESS)) {
@@ -1101,8 +1098,8 @@ static int dmcrypt_write(void *data)
 {
 	struct crypt_config *cc = data;
 	while (1) {
-		struct list_head local_list;
-		unsigned spinlock_breaker;
+		struct rb_root write_tree;
+		struct dm_crypt_io *io;
 		struct blk_plug plug;
 
 		DECLARE_WAITQUEUE(wait, current);
@@ -1110,7 +1107,7 @@ static int dmcrypt_write(void *data)
 		spin_lock_irq(&cc->write_thread_wait.lock);
 continue_locked:
 
-		if (!list_empty(&cc->write_thread_list))
+		if (!RB_EMPTY_ROOT(&cc->write_tree))
 			goto pop_from_list;
 
 		__set_current_state(TASK_INTERRUPTIBLE);
@@ -1132,35 +1129,22 @@ continue_locked:
 		goto continue_locked;
 
 pop_from_list:
-		INIT_LIST_HEAD(&local_list);
-		spinlock_breaker = 0;
-		do {
-			struct dm_crypt_io *io = container_of(
-						cc->write_thread_list.next,
-						struct dm_crypt_io, list);
-
-			BUG_ON(io->sequence < cc->write_sequence);
-			if (io->sequence != cc->write_sequence)
-				break;
-			cc->write_sequence++;
-
-			list_del(&io->list);
-			list_add_tail(&io->list, &local_list);
-			if (unlikely(!(++spinlock_breaker & 63))) {
-				spin_unlock_irq(&cc->write_thread_wait.lock);
-				spin_lock_irq(&cc->write_thread_wait.lock);
-			}
-		} while (!list_empty(&cc->write_thread_list));
-
+		write_tree = cc->write_tree;
+		cc->write_tree = RB_ROOT;
 		spin_unlock_irq(&cc->write_thread_wait.lock);
 
+		BUG_ON(rb_parent(write_tree.rb_node));
+
+		/*
+		 * Note: we cannot walk the tree here with rb_next because
+		 * the structures may be freed when kcryptd_io_write is called.
+		 */
 		blk_start_plug(&plug);
-		while (!list_empty(&local_list)) {
-			struct dm_crypt_io *io = container_of(local_list.next,
-						struct dm_crypt_io, list);
-			list_del(&io->list);
+		do {
+			io = rb_entry(rb_first(&write_tree), struct dm_crypt_io, rb_node);
+			rb_erase(&io->rb_node, &write_tree);
 			kcryptd_io_write(io);
-		}
+		} while (!RB_EMPTY_ROOT(&write_tree));
 		blk_finish_plug(&plug);
 	}
 	return 0;
@@ -1170,18 +1154,27 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->cc;
 	unsigned long flags;
-	struct dm_crypt_io *io_list;
+	sector_t sector;
+	struct rb_node **p, *parent;
 
 	spin_lock_irqsave(&cc->write_thread_wait.lock, flags);
-	list_for_each_entry_reverse(io_list, &cc->write_thread_list, list) {
-		if (io_list->sequence < io->sequence) {
-			list_add(&io->list, &io_list->list);
-			goto added;
-		}
-	}
-	list_add(&io->list, &cc->write_thread_list);
+
+	p = &cc->write_tree.rb_node;
+	parent = NULL;
+	sector = io->sector;
+	while (*p) {
+		parent = *p;
+#define io_node	rb_entry(parent, struct dm_crypt_io, rb_node)
+		if (sector < io_node->sector)
+			p = &io_node->rb_node.rb_left;
+		else
+			p = &io_node->rb_node.rb_right;
+#undef io_node
+	}
+	rb_link_node(&io->rb_node, parent, p);
+	rb_insert_color(&io->rb_node, &cc->write_tree);
+
 	wake_up_locked(&cc->write_thread_wait);
-added:
 	spin_unlock_irqrestore(&cc->write_thread_wait.lock, flags);
 }
 
@@ -1196,8 +1189,6 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 		return;
 	}
 
-	io->sequence = atomic64_inc_return(&io->cc->alloc_sequence) - 1;
-
 	crypt_convert(io, clone);
 }
 
@@ -1765,9 +1756,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	}
 
 	init_waitqueue_head(&cc->write_thread_wait);
-	INIT_LIST_HEAD(&cc->write_thread_list);
-	cc->write_sequence = 0;
-	atomic64_set(&cc->alloc_sequence, 0);
+	cc->write_tree = RB_ROOT;
 
 	cc->write_thread = kthread_create(dmcrypt_write, cc, "dmcrypt_write");
 	if (IS_ERR(cc->write_thread)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] dm-crypt: parallel processing
  2012-08-21  9:08 [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
@ 2012-08-21  9:37 ` Milan Broz
  2012-08-21 18:23   ` Tejun Heo
  2012-08-21 13:32 ` Mike Snitzer
  2 siblings, 1 reply; 29+ messages in thread
From: Milan Broz @ 2012-08-21  9:37 UTC (permalink / raw)
  To: device-mapper development, Tejun Heo


On 08/21/2012 11:08 AM, Milan Broz wrote:
> Hello,
> 
> I promised to Mikulas that I will post remaining of his dm-crypt parallel
> processing patchset (plus some related changes) with some comments.
> 
> The problem:
> 
> The current implementation (using per cpu struct) always use encryption
> on the same CPU which submitted IO.
> 
> With very fast storage (usually SSD or MD RAID0) one CPU core can
> be limiting and the throughput of encrypted disk is worse in
> comparison with underlying storage.
> 
> Idea here is to distribute encryption to other CPU cores/CPUs.
> 
> (Side effect of patches is nice clean up dmcrypt code. :)

Better adding cc to Tejun here, I still think there are several things
which perhaps should be done through kernel wq...

(I would prefer to use kernel wq as well btw.)

Milan

> 
> Mikulas Patocka (20):
>   dm-crypt: remove per-cpu structure
>   dm-crypt: use unbound workqueue for request processing
>   dm-crypt: remove completion restart
>   dm-crypt: use encryption threads
>   dm-crypt: Unify spinlock
>   dm-crypt: Introduce an option that sets the number of threads.
>   dm-crypt: don't use write queue
>   dm-crypt: simplify io queue
>   dm-crypt: unify io_queue and crypt_queue
>   dm-crypt: don't allocate pages for a partial request.
>   dm-crypt: avoid deadlock in mempools
>   dm-crypt: simplify cc_pending
>   dm-crypt merge convert_context and dm_crypt_io
>   dm-crypt: move error handling to crypt_convert.
>   dm-crypt: remove io_pending field
>   dm-crypt: small changes
>   dm-crypt: move temporary values to stack
>   dm-crypt: offload writes to thread
>   dm-crypt: retain write ordering
>   dm-crypt: sort writes
> 
>  drivers/md/dm-crypt.c |  838 +++++++++++++++++++++++++++----------------------
>  1 file changed, 464 insertions(+), 374 deletions(-)
> 
> 
> My notes:
> 
> I extensively tested this (on top of 3.6.0-rc2) and while I like
> simplification and the main logic (if we have enough power why
> not use other CPUs) I see several problems here.
> 
> 1) The implementation is not much useful on modern CPUs with hw accelerated
> AES (with AES-NI even one core can saturate very fast storage).
> (Some optimized crypto behaves similar, like twofish optimized modules.)
> 
> 2) The patchset targets linear access pattern mainly and one
> process generating IOs. (With more processes/CPUs generating IOs
> you get parallel processing even with current code.)
> 
> I can see significant improvement (~30%) for linear read
> (if not using AES-NI) and if underlying storage is zero target
> (ideal situation removing scheduler from the stack).
> 
> For random access pattern (and more IO producers) I cannot find
> reliable improvement except ideal zero target case (where improvement
> is always >30%). For more producers it doesn't help at all.
> I tested RAID0 over SATA disks and very fast SSD on quad core cpu,
> dmcrypt running with 1, 3 or 4 threads (and with cfq and noop scheduler)
> using fio threaded test with 1 or 4 threads.
> 
> Notes to implementation:
> 
> 1) Last two patches (19/20) provides sorting of IO requests, this
> logically should be done in IO scheduler.
> 
> I don't think this should be in dmcrypt, if scheduler doesn't work
> properly, it should be fixed or tuned for crypt access pattern.
> 
> 2) Could be kernel workqueue used/fixed here instead? Basically all it needs
> is to prefer submitting CPU, if it is busy just move work to another CPU.
> 
> 3) It doesn't honour CPU hotplugging. Number of CPU is determined
> in crypt constructor. If hotplugging is used for CPU power saving,
> it will not work properly.
> 
> 4) With debugging options on, everything is extremely slower
> (Perhaps bug in some debug option, I just noticed this on Fedora
> rawhide with all debug on.)
> 
> 
> I posted it mainly because I think this should move forward,
> whatever direction...
> 
> Milan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 20/20] dm-crypt: sort writes
  2012-08-21  9:09   ` [PATCH 20/20] dm-crypt: sort writes Mikulas Patocka
@ 2012-08-21 10:57     ` Alasdair G Kergon
  2012-08-21 13:39       ` Mikulas Patocka
  0 siblings, 1 reply; 29+ messages in thread
From: Alasdair G Kergon @ 2012-08-21 10:57 UTC (permalink / raw)
  To: device-mapper development; +Cc: Mikulas Patocka, Milan Broz

On Tue, Aug 21, 2012 at 11:09:31AM +0200, Mikulas Patocka wrote:
> In theory the sorting should be performed by the underlying disk scheduler,
> however, in practice the disk scheduler accepts and sorts only 128 requests.
> In order to sort more requests, we need to implement our own sorting.
 
Why 128?  Isn't this nr_requests?
(I thought we discussed this before.)

Alasdair

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] dm-crypt: parallel processing
  2012-08-21  9:08 [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
  2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
  2012-08-21  9:37 ` [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
@ 2012-08-21 13:32 ` Mike Snitzer
  2 siblings, 0 replies; 29+ messages in thread
From: Mike Snitzer @ 2012-08-21 13:32 UTC (permalink / raw)
  To: Milan Broz; +Cc: dm-devel, chris.mason

[cc'ing Chris Mason]

On Tue, Aug 21 2012 at  5:08am -0400,
Milan Broz <mbroz@redhat.com> wrote:

> Hello,
> 
> I promised to Mikulas that I will post remaining of his dm-crypt parallel
> processing patchset (plus some related changes) with some comments.
> 
> The problem:
> 
> The current implementation (using per cpu struct) always use encryption
> on the same CPU which submitted IO.
> 
> With very fast storage (usually SSD or MD RAID0) one CPU core can
> be limiting and the throughput of encrypted disk is worse in
> comparison with underlying storage.
> 
> Idea here is to distribute encryption to other CPU cores/CPUs.
> 
> (Side effect of patches is nice clean up dmcrypt code. :)
> 
> Mikulas Patocka (20):
>   dm-crypt: remove per-cpu structure
>   dm-crypt: use unbound workqueue for request processing
>   dm-crypt: remove completion restart
>   dm-crypt: use encryption threads
>   dm-crypt: Unify spinlock
>   dm-crypt: Introduce an option that sets the number of threads.
>   dm-crypt: don't use write queue
>   dm-crypt: simplify io queue
>   dm-crypt: unify io_queue and crypt_queue
>   dm-crypt: don't allocate pages for a partial request.
>   dm-crypt: avoid deadlock in mempools
>   dm-crypt: simplify cc_pending
>   dm-crypt merge convert_context and dm_crypt_io
>   dm-crypt: move error handling to crypt_convert.
>   dm-crypt: remove io_pending field
>   dm-crypt: small changes
>   dm-crypt: move temporary values to stack
>   dm-crypt: offload writes to thread
>   dm-crypt: retain write ordering
>   dm-crypt: sort writes
> 
>  drivers/md/dm-crypt.c |  838 +++++++++++++++++++++++++++----------------------
>  1 file changed, 464 insertions(+), 374 deletions(-)
> 
> 
> My notes:
> 
> I extensively tested this (on top of 3.6.0-rc2) and while I like
> simplification and the main logic (if we have enough power why
> not use other CPUs) I see several problems here.
> 
> 1) The implementation is not much useful on modern CPUs with hw accelerated
> AES (with AES-NI even one core can saturate very fast storage).
> (Some optimized crypto behaves similar, like twofish optimized modules.)
> 
> 2) The patchset targets linear access pattern mainly and one
> process generating IOs. (With more processes/CPUs generating IOs
> you get parallel processing even with current code.)
> 
> I can see significant improvement (~30%) for linear read
> (if not using AES-NI) and if underlying storage is zero target
> (ideal situation removing scheduler from the stack).
> 
> For random access pattern (and more IO producers) I cannot find
> reliable improvement except ideal zero target case (where improvement
> is always >30%). For more producers it doesn't help at all.
> I tested RAID0 over SATA disks and very fast SSD on quad core cpu,
> dmcrypt running with 1, 3 or 4 threads (and with cfq and noop scheduler)
> using fio threaded test with 1 or 4 threads.
> 
> Notes to implementation:
> 
> 1) Last two patches (19/20) provides sorting of IO requests, this
> logically should be done in IO scheduler.
> 
> I don't think this should be in dmcrypt, if scheduler doesn't work
> properly, it should be fixed or tuned for crypt access pattern.
> 
> 2) Could be kernel workqueue used/fixed here instead? Basically all it needs
> is to prefer submitting CPU, if it is busy just move work to another CPU.
> 
> 3) It doesn't honour CPU hotplugging. Number of CPU is determined
> in crypt constructor. If hotplugging is used for CPU power saving,
> it will not work properly.
> 
> 4) With debugging options on, everything is extremely slower
> (Perhaps bug in some debug option, I just noticed this on Fedora
> rawhide with all debug on.)
> 
> 
> I posted it mainly because I think this should move forward,
> whatever direction...

I had a conversation with Chris Mason, maybe 2 years ago, where he
expressed concern about dm-crypt growing too much intelligence about
parallel cpu usage.  I seem to recall Chris' concern being relative to
btrfs and the requirement that IOs not get reordered (his point was any
dm-crypt change to improve cpu utilization shouldn't result in the
possibility to reorder IOs).

Could be that IO reordering isn't a concern dm-crypt (before or after
this patchset).  But I just wanted to make this point so that it is kept
in mind (also allows Chris to raise any other new dm-crypt issue/concern
he might have?).

Mike

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 20/20] dm-crypt: sort writes
  2012-08-21 10:57     ` Alasdair G Kergon
@ 2012-08-21 13:39       ` Mikulas Patocka
  0 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-21 13:39 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: device-mapper development, Milan Broz



On Tue, 21 Aug 2012, Alasdair G Kergon wrote:

> On Tue, Aug 21, 2012 at 11:09:31AM +0200, Mikulas Patocka wrote:
> > In theory the sorting should be performed by the underlying disk scheduler,
> > however, in practice the disk scheduler accepts and sorts only 128 requests.
> > In order to sort more requests, we need to implement our own sorting.
>  
> Why 128?  Isn't this nr_requests?
> (I thought we discussed this before.)
> 
> Alasdair

Yes, that's nr_requests. The user can raise it, but the default is 128.

Mikulas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] dm-crypt: parallel processing
  2012-08-21  9:37 ` [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
@ 2012-08-21 18:23   ` Tejun Heo
  2012-08-21 19:26     ` Vivek Goyal
  2012-08-22 10:28     ` Milan Broz
  0 siblings, 2 replies; 29+ messages in thread
From: Tejun Heo @ 2012-08-21 18:23 UTC (permalink / raw)
  To: Milan Broz; +Cc: Jens Axboe, device-mapper development, Vivek Goyal

Hello,

(cc'ing Jens and Vivek, hi!)

On Tue, Aug 21, 2012 at 11:37:43AM +0200, Milan Broz wrote:
> Better adding cc to Tejun here, I still think there are several things
> which perhaps should be done through kernel wq...
> 
> (I would prefer to use kernel wq as well btw.)

What do you mean by kernel wq?  One of the system_*_wq's?  If not,
from scanning the patch names, it seems like it's converting to
unbound workqueue from bound one.

> > 1) Last two patches (19/20) provides sorting of IO requests, this
> > logically should be done in IO scheduler.
> > 
> > I don't think this should be in dmcrypt, if scheduler doesn't work
> > properly, it should be fixed or tuned for crypt access pattern.

I kinda agree but preserving (not strictly but at least most of the
time) issuing order across stacking driver like dm probably isn't a
bad idea.  I *think* the direction block layer should be headed is to
reduce the amount of work it does as the speed and characteristics of
underlying devices improve.  We could afford to do a LOT of things to
better cater to devices with spindles but the operating margin
continues to become narrower.  Jens, Vivek, what do you guys think?

> > 2) Could be kernel workqueue used/fixed here instead? Basically all it needs
> > is to prefer submitting CPU, if it is busy just move work to another CPU.

The problem, I suppose, is that w/ wq, it's either bound or completely
unbound.  If bound, the local CPU can become the bottleneck.  If
unbound, wq doesn't discern local and remote at all and thus loses any
benefit from locality association.

It would be nice if workqueue can somehow accomodate the situation
better - maybe by migrating the worker to the issuing CPU before
setting it loose so that the scheduler needs to migrate it away
explicitly.  Maybe we can do it opportunistically - e.g. record which
CPU an unbound worker was on before entering idle and queue to local
one if it exists.  It wouldn't be trivial to implement tho.  I'll
think more about it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] dm-crypt: parallel processing
  2012-08-21 18:23   ` Tejun Heo
@ 2012-08-21 19:26     ` Vivek Goyal
  2012-08-22 10:28     ` Milan Broz
  1 sibling, 0 replies; 29+ messages in thread
From: Vivek Goyal @ 2012-08-21 19:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, device-mapper development, Milan Broz

On Tue, Aug 21, 2012 at 11:23:40AM -0700, Tejun Heo wrote:

[..]
> > > 1) Last two patches (19/20) provides sorting of IO requests, this
> > > logically should be done in IO scheduler.
> > > 
> > > I don't think this should be in dmcrypt, if scheduler doesn't work
> > > properly, it should be fixed or tuned for crypt access pattern.
> 
> I kinda agree but preserving (not strictly but at least most of the
> time) issuing order across stacking driver like dm probably isn't a
> bad idea.  I *think* the direction block layer should be headed is to
> reduce the amount of work it does as the speed and characteristics of
> underlying devices improve.  We could afford to do a LOT of things to
> better cater to devices with spindles but the operating margin
> continues to become narrower.  Jens, Vivek, what do you guys think?

I think we can make various block layer functionalities optional and
faster drivers can choose which ones do they want. For example,
"queue/nomerges" disables merging. May be another sys tunable or a queue
flag (which driver provides at the time of registration) for "nosorting"
can be used to reduce the amount of work block layer does.

That way slower devices can still make use of loaded block layer and
faster devices can opt out of those features.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] dm-crypt: parallel processing
  2012-08-21 18:23   ` Tejun Heo
  2012-08-21 19:26     ` Vivek Goyal
@ 2012-08-22 10:28     ` Milan Broz
  2012-08-23 20:15       ` Tejun Heo
  1 sibling, 1 reply; 29+ messages in thread
From: Milan Broz @ 2012-08-22 10:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, device-mapper development, Vivek Goyal

On 08/21/2012 08:23 PM, Tejun Heo wrote:
> Hello,
> 
> (cc'ing Jens and Vivek, hi!)
> 
> On Tue, Aug 21, 2012 at 11:37:43AM +0200, Milan Broz wrote:
>> Better adding cc to Tejun here, I still think there are several things
>> which perhaps should be done through kernel wq...
>>
>> (I would prefer to use kernel wq as well btw.)
> 
> What do you mean by kernel wq?  One of the system_*_wq's?  If not,
> from scanning the patch names, it seems like it's converting to
> unbound workqueue from bound one.

I meant just extend bound workqueue as you mentioned below.

...

>>> 2) Could be kernel workqueue used/fixed here instead? Basically all it needs
>>> is to prefer submitting CPU, if it is busy just move work to another CPU.
> 
> The problem, I suppose, is that w/ wq, it's either bound or completely
> unbound.  If bound, the local CPU can become the bottleneck.  If
> unbound, wq doesn't discern local and remote at all and thus loses any
> benefit from locality association.
> 
> It would be nice if workqueue can somehow accomodate the situation
> better - maybe by migrating the worker to the issuing CPU before
> setting it loose so that the scheduler needs to migrate it away
> explicitly.  Maybe we can do it opportunistically - e.g. record which
> CPU an unbound worker was on before entering idle and queue to local
> one if it exists.  It wouldn't be trivial to implement tho.  I'll
> think more about it.

Yes, something like this.

dmcrypt basically should have parameter which says how it will use workqueue(s)

IMHO three basic cases:

1) just use bound 1 wq per device. (this mode is usual for specific cases,
some people have several dmcrypt devices with RAID on top - it was workaround
to 1 thread per dmcrypt device in older kernel. This increased throughput
but with recent kernel it does exact opposite... So this mode provides
workaround for them.)

2) Use all CPU possible just prefer local CPU if available
(something between bound and unbound wq)

3) the same as 2) just with limited # of CPU per crypt device.

(but Mikulas' code uses also some batching of request - not sure how
to incorporate this)

Whatever, if logic is implemented in workqueue code, others can use t as well.
I would really prefer not to have "too smart" dmcrypt...
(Someone mentioned btrfs on top of it with all workqueues - how it can behave nicely
if every layer will try to implement own smart logic.)

Anyway, thanks for discussion, this is exactly what was missing here :)

Milan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] dm-crypt: parallel processing
  2012-08-22 10:28     ` Milan Broz
@ 2012-08-23 20:15       ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2012-08-23 20:15 UTC (permalink / raw)
  To: Milan Broz; +Cc: Jens Axboe, device-mapper development, Vivek Goyal

Hello,

On Wed, Aug 22, 2012 at 12:28:41PM +0200, Milan Broz wrote:
> Whatever, if logic is implemented in workqueue code, others can use t as well.
> I would really prefer not to have "too smart" dmcrypt...
> (Someone mentioned btrfs on top of it with all workqueues - how it can behave nicely
> if every layer will try to implement own smart logic.)
> 
> Anyway, thanks for discussion, this is exactly what was missing here :)

I've been thinking about it and am still unsure.  If there are enough
use cases, we can try to manage unbound workers such that each CPU has
some standby ones, but at this point such approach seems a bit
overkill and I'm skeptical how useful a purely opportunistic approach
would be.

Another thing is that this is something which really belongs to the
scheduler.  The scheduler can know better and do things like this much
better.  Unfortunately, kthreads don't have mechanisms to be
discovered in terms of its optimal association (as opposed to, say,
autonuma for userland).

So... I don't know.  dm-crypt probably is the most extreme use case in
kernel, so maybe going for specialized solution might not be too
crazy.  Also, how much of a problem is this?  Is it really worth
solving?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-08-23 20:15 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-21  9:08 [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
2012-08-21  9:09 ` [PATCH 01/20] dm-crypt: remove per-cpu structure Mikulas Patocka
2012-08-21  9:09   ` [PATCH 02/20] dm-crypt: use unbound workqueue for request processing Mikulas Patocka
2012-08-21  9:09   ` [PATCH 03/20] dm-crypt: remove completion restart Mikulas Patocka
2012-08-21  9:09   ` [PATCH 04/20] dm-crypt: use encryption threads Mikulas Patocka
2012-08-21  9:09   ` [PATCH 05/20] dm-crypt: Unify spinlock Mikulas Patocka
2012-08-21  9:09   ` [PATCH 06/20] dm-crypt: Introduce an option that sets the number of threads Mikulas Patocka
2012-08-21  9:09   ` [PATCH 07/20] dm-crypt: don't use write queue Mikulas Patocka
2012-08-21  9:09   ` [PATCH 08/20] dm-crypt: simplify io queue Mikulas Patocka
2012-08-21  9:09   ` [PATCH 09/20] dm-crypt: unify io_queue and crypt_queue Mikulas Patocka
2012-08-21  9:09   ` [PATCH 10/20] dm-crypt: don't allocate pages for a partial request Mikulas Patocka
2012-08-21  9:09   ` [PATCH 11/20] dm-crypt: avoid deadlock in mempools Mikulas Patocka
2012-08-21  9:09   ` [PATCH 12/20] dm-crypt: simplify cc_pending Mikulas Patocka
2012-08-21  9:09   ` [PATCH 13/20] dm-crypt merge convert_context and dm_crypt_io Mikulas Patocka
2012-08-21  9:09   ` [PATCH 14/20] dm-crypt: move error handling to crypt_convert Mikulas Patocka
2012-08-21  9:09   ` [PATCH 15/20] dm-crypt: remove io_pending field Mikulas Patocka
2012-08-21  9:09   ` [PATCH 16/20] dm-crypt: small changes Mikulas Patocka
2012-08-21  9:09   ` [PATCH 17/20] dm-crypt: move temporary values to stack Mikulas Patocka
2012-08-21  9:09   ` [PATCH 18/20] dm-crypt: offload writes to thread Mikulas Patocka
2012-08-21  9:09   ` [PATCH 19/20] dm-crypt: retain write ordering Mikulas Patocka
2012-08-21  9:09   ` [PATCH 20/20] dm-crypt: sort writes Mikulas Patocka
2012-08-21 10:57     ` Alasdair G Kergon
2012-08-21 13:39       ` Mikulas Patocka
2012-08-21  9:37 ` [RFC PATCH 00/20] dm-crypt: parallel processing Milan Broz
2012-08-21 18:23   ` Tejun Heo
2012-08-21 19:26     ` Vivek Goyal
2012-08-22 10:28     ` Milan Broz
2012-08-23 20:15       ` Tejun Heo
2012-08-21 13:32 ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.