All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-crypt] En/decrypting in multi-sector batches?
@ 2011-04-26 15:17 Will Drewry
  2011-04-26 17:46 ` Arno Wagner
  2011-04-27  7:19 ` Milan Broz
  0 siblings, 2 replies; 7+ messages in thread
From: Will Drewry @ 2011-04-26 15:17 UTC (permalink / raw)
  To: dm-crypt; +Cc: Varun Wadekar

Hi,

Recently, I've been benchmarking some different hardware crypto
accelerators and many of them appear to be tuned toward largish
requests (up to 16k) with a given key and a base IV.  I've created a
very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
the driver performance testing, but I lack the cryptographic
understanding to determine if there is significant exposure by
allowing a dm-crypt device to use a block size that exceeds the sector
size.  For instance, I was thinking about allowing a block size that
is a multiple of sectors - rather than just one sector.  (I was
picturing adding an argument to the end of the table that was the
number of sectors to use as the block size with a default of "1".)
But I'm not confident that I understand the full impact (but it sure
is nice to more fully utilize the available hardware :).

So,
1. Does anyone know if there will be significant exposure to the
plaintext if dm-crypt used larger block sizes?
2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
interest?  If so, would it make sense to be per-target or a
compile-time constant?

I've attached the basic patch without any sort of ability to configure
the value to provide a more concrete explanation of what I'm trying to
explain It changes the dm-crypt block size to PAGE_SIZE from sector
size.

Any and all thoughts about change viability or cryptographic impact
would be appreciated!  I'm more than happy to rewrite this into
something that would be acceptable, if there's interest.

thanks!
will



--- drivers.orig/md/dm-crypt.c	2011-02-17 17:26:08.246685915 -0600
+++ drivers/md/dm-crypt.c	2011-02-01 16:53:03.764387497 -0600
@@ -416,20 +416,20 @@ static int crypt_convert_block(struct cr

 	dmreq->ctx = ctx;
 	sg_init_table(&dmreq->sg_in, 1);
-	sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << SECTOR_SHIFT,
+	sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << PAGE_SHIFT,
 		    bv_in->bv_offset + ctx->offset_in);

 	sg_init_table(&dmreq->sg_out, 1);
-	sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << SECTOR_SHIFT,
+	sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << PAGE_SHIFT,
 		    bv_out->bv_offset + ctx->offset_out);

-	ctx->offset_in += 1 << SECTOR_SHIFT;
+	ctx->offset_in += 1 << PAGE_SHIFT;
 	if (ctx->offset_in >= bv_in->bv_len) {
 		ctx->offset_in = 0;
 		ctx->idx_in++;
 	}

-	ctx->offset_out += 1 << SECTOR_SHIFT;
+	ctx->offset_out += 1 << PAGE_SHIFT;
 	if (ctx->offset_out >= bv_out->bv_len) {
 		ctx->offset_out = 0;
 		ctx->idx_out++;
@@ -442,7 +442,7 @@ static int crypt_convert_block(struct cr
 	}

 	ablkcipher_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out,
-				     1 << SECTOR_SHIFT, iv);
+				     1 << PAGE_SHIFT, iv);

 	if (bio_data_dir(ctx->bio_in) == WRITE)
 		r = crypto_ablkcipher_encrypt(req);
@@ -1294,6 +1294,17 @@ static int crypt_map(struct dm_target *t
 	return DM_MAPIO_SUBMITTED;
 }

+#define CRYPT_BLOCK_SIZE PAGE_SIZE
+static void crypt_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	limits->logical_block_size = CRYPT_BLOCK_SIZE;
+	limits->physical_block_size = CRYPT_BLOCK_SIZE;
+	blk_limits_io_min(limits, CRYPT_BLOCK_SIZE);
+}
+
+
+
 static int crypt_status(struct dm_target *ti, status_type_t type,
 			char *result, unsigned int maxlen)
 {
@@ -1433,6 +1444,7 @@ static struct target_type crypt_target =
 	.message = crypt_message,
 	.merge  = crypt_merge,
 	.iterate_devices = crypt_iterate_devices,
+	.io_hints = crypt_io_hints,
 };

 static int __init dm_crypt_init(void)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-crypt] En/decrypting in multi-sector batches?
  2011-04-26 15:17 [dm-crypt] En/decrypting in multi-sector batches? Will Drewry
@ 2011-04-26 17:46 ` Arno Wagner
  2011-04-27  7:19 ` Milan Broz
  1 sibling, 0 replies; 7+ messages in thread
From: Arno Wagner @ 2011-04-26 17:46 UTC (permalink / raw)
  To: dm-crypt


Hi Will,

as far as I know, unless you go to really large block size, in 
escess of megabytes, there is no negative security impact.
For really large block sizes, I would have to check, but if I 
remember correctly, the problem with CBC starts only at the 
point where you start to have a real chance to re-use the IV 
by the birthday paradox. That would be somewhere above the TB
range.

As to why these small sectors are used, as far as I understand, 
they are what the device mapper uses internally as smallest 
possible blocksize, but Milan would know more.

Any benchmarkks about your results? 

Arno


On Tue, Apr 26, 2011 at 10:17:40AM -0500, Will Drewry wrote:
> Hi,
> 
> Recently, I've been benchmarking some different hardware crypto
> accelerators and many of them appear to be tuned toward largish
> requests (up to 16k) with a given key and a base IV.  I've created a
> very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
> the driver performance testing, but I lack the cryptographic
> understanding to determine if there is significant exposure by
> allowing a dm-crypt device to use a block size that exceeds the sector
> size.  For instance, I was thinking about allowing a block size that
> is a multiple of sectors - rather than just one sector.  (I was
> picturing adding an argument to the end of the table that was the
> number of sectors to use as the block size with a default of "1".)
> But I'm not confident that I understand the full impact (but it sure
> is nice to more fully utilize the available hardware :).
> 
> So,
> 1. Does anyone know if there will be significant exposure to the
> plaintext if dm-crypt used larger block sizes?
> 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
> interest?  If so, would it make sense to be per-target or a
> compile-time constant?
> 
> I've attached the basic patch without any sort of ability to configure
> the value to provide a more concrete explanation of what I'm trying to
> explain It changes the dm-crypt block size to PAGE_SIZE from sector
> size.
> 
> Any and all thoughts about change viability or cryptographic impact
> would be appreciated!  I'm more than happy to rewrite this into
> something that would be acceptable, if there's interest.
> 
> thanks!
> will
> 
> 
> 
> --- drivers.orig/md/dm-crypt.c	2011-02-17 17:26:08.246685915 -0600
> +++ drivers/md/dm-crypt.c	2011-02-01 16:53:03.764387497 -0600
> @@ -416,20 +416,20 @@ static int crypt_convert_block(struct cr
> 
>  	dmreq->ctx = ctx;
>  	sg_init_table(&dmreq->sg_in, 1);
> -	sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << SECTOR_SHIFT,
> +	sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << PAGE_SHIFT,
>  		    bv_in->bv_offset + ctx->offset_in);
> 
>  	sg_init_table(&dmreq->sg_out, 1);
> -	sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << SECTOR_SHIFT,
> +	sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << PAGE_SHIFT,
>  		    bv_out->bv_offset + ctx->offset_out);
> 
> -	ctx->offset_in += 1 << SECTOR_SHIFT;
> +	ctx->offset_in += 1 << PAGE_SHIFT;
>  	if (ctx->offset_in >= bv_in->bv_len) {
>  		ctx->offset_in = 0;
>  		ctx->idx_in++;
>  	}
> 
> -	ctx->offset_out += 1 << SECTOR_SHIFT;
> +	ctx->offset_out += 1 << PAGE_SHIFT;
>  	if (ctx->offset_out >= bv_out->bv_len) {
>  		ctx->offset_out = 0;
>  		ctx->idx_out++;
> @@ -442,7 +442,7 @@ static int crypt_convert_block(struct cr
>  	}
> 
>  	ablkcipher_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out,
> -				     1 << SECTOR_SHIFT, iv);
> +				     1 << PAGE_SHIFT, iv);
> 
>  	if (bio_data_dir(ctx->bio_in) == WRITE)
>  		r = crypto_ablkcipher_encrypt(req);
> @@ -1294,6 +1294,17 @@ static int crypt_map(struct dm_target *t
>  	return DM_MAPIO_SUBMITTED;
>  }
> 
> +#define CRYPT_BLOCK_SIZE PAGE_SIZE
> +static void crypt_io_hints(struct dm_target *ti,
> +			    struct queue_limits *limits)
> +{
> +	limits->logical_block_size = CRYPT_BLOCK_SIZE;
> +	limits->physical_block_size = CRYPT_BLOCK_SIZE;
> +	blk_limits_io_min(limits, CRYPT_BLOCK_SIZE);
> +}
> +
> +
> +
>  static int crypt_status(struct dm_target *ti, status_type_t type,
>  			char *result, unsigned int maxlen)
>  {
> @@ -1433,6 +1444,7 @@ static struct target_type crypt_target =
>  	.message = crypt_message,
>  	.merge  = crypt_merge,
>  	.iterate_devices = crypt_iterate_devices,
> +	.io_hints = crypt_io_hints,
>  };
> 
>  static int __init dm_crypt_init(void)
> _______________________________________________
> dm-crypt mailing list
> dm-crypt@saout.de
> http://www.saout.de/mailman/listinfo/dm-crypt
> 

-- 
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno@wagner.name 
GnuPG:  ID: 1E25338F  FP: 0C30 5782 9D93 F785 E79C  0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

If it's in the news, don't worry about it.  The very definition of 
"news" is "something that hardly ever happens." -- Bruce Schneier 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-crypt] En/decrypting in multi-sector batches?
  2011-04-26 15:17 [dm-crypt] En/decrypting in multi-sector batches? Will Drewry
  2011-04-26 17:46 ` Arno Wagner
@ 2011-04-27  7:19 ` Milan Broz
  2011-04-27 15:07   ` Will Drewry
  2011-04-29 17:33   ` Mandeep Singh Baines
  1 sibling, 2 replies; 7+ messages in thread
From: Milan Broz @ 2011-04-27  7:19 UTC (permalink / raw)
  To: Will Drewry; +Cc: dm-crypt, Varun Wadekar

Hi,

On 04/26/2011 05:17 PM, Will Drewry wrote:
> Recently, I've been benchmarking some different hardware crypto
> accelerators and many of them appear to be tuned toward largish
> requests (up to 16k) with a given key and a base IV.

Please can you explicitly say which accelerators you are using and
show some benchmarks?

Does dmcrypt work in async crypto mode or you have also
"accelerators" like special instructions which run synchronously
(like AES-NI)?

Of course large block means smaller overhead but the difference should
not be significant (at least in theory).
If it is, we need to know why - it can be because of timing or
the way how the request are submitted not the time
or real encryption (initialization) itself.

In this case the crypto driver should be optimised first.

>  I've created a
> very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
> the driver performance testing, but I lack the cryptographic
> understanding to determine if there is significant exposure by
> allowing a dm-crypt device to use a block size that exceeds the sector
> size.

As Arno said, there should be no real security problem for these block
sizes. Basically we are just using CBC or XTS mode today.

For XTS-AES, definition explicitly says that data unit (= your block)
size should not exceed 2^20 128bit blocks (128bit = AES cipher block).
(And even here possible attacks are closely related to birthday
bound, IOW you need to have enough blocks encrypted with the same key.)

So I do not see real security problem here. But problems are elsewhere.

> 1. Does anyone know if there will be significant exposure to the
> plaintext if dm-crypt used larger block sizes?

Should not be.

> 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
> interest?

Short answer would be no :-)

As I said, I would like to prove first that the problem is really in block
size and not in related problem.

Now the real problems:

The whole device mapper and dmcrypt works as transparent block encryption
and we are always operating on 512B sectors.

Even if device is 4k blocks, this is hidden in underlying layer and
DM just properly aligns data and propagates limits but
still operates on 512B sectors. (It can be ineffective for some
IO patterns, but it works).

Changing encryption block size causes device to be incompatible with other
systems (note stacked devices, a common thing here - LVM over dmcrypt)
and IOs. You have to generate only aligned IO of your encryption block size.

(or change dmcrypt significantly)

IO hints is not enough - maybe example is better here:

Testing device (some random data there, not important)
# dmsetup table --showkeys
x: 0 417792 crypt aes-cbc-essiv:sha256 aeb26d1f69eb6dddfb9381eed4d7299f091e99aa5d3ff06866d4ce9f620f7aca 0 8:16 0

Let's generate some direct IOs (to avoid page cache)

*Without* your patch:

# dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum 
eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -

# dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum 
eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -

# dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum 
eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -

As you can see, we get the same plain data with different IO sizes.

Now *with* your patch (page size is 4096):

# dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum 
dd: reading `/dev/mapper/x': Invalid argument

# dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum 
4f4271e7799097b6e0ed66d81a8341163b8a5a06a2c57f50b930d429a7aa94d1  -

# dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum 
17cf9897059800f5b43af38766471048b872d20a0f565ee553a351b1a6251141  -

So block size of 512B causes operation to fail (ok - IO hints).
IO of block encryption size and multiple of encryption size returns
apparently something different now.

This is probably not what we want...

(Note that I did not even tested cross-encryption-block operations.)

Even if this is somehow solved, many other problems remains:

- we need to extend mapping table parameters so the block size
must be configurable (encrypted device image must be readable
on system with different page size, I have e.g. Sparc with 8k page size.
(This will be needed for other extensions so it is not real
problem, just it need to be done first.)

- you need to store this block size info in header,
for LUKS it means using new LUKS header version
(requiring parameter on commandline is dangerous - it must be enforced)


I would really better to not support this yet and first try to optimize
crypto layer such way that it can process 512B blocks more
efficiently (of course it will not fix bad hw but it can help batching
sector encryption, maybe suing some hints, dunno).

Milan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-crypt] En/decrypting in multi-sector batches?
  2011-04-27  7:19 ` Milan Broz
@ 2011-04-27 15:07   ` Will Drewry
  2011-04-28 18:43     ` Arno Wagner
  2011-04-29 17:33   ` Mandeep Singh Baines
  1 sibling, 1 reply; 7+ messages in thread
From: Will Drewry @ 2011-04-27 15:07 UTC (permalink / raw)
  To: Milan Broz; +Cc: dm-crypt, Varun Wadekar

On Wed, Apr 27, 2011 at 2:19 AM, Milan Broz <mbroz@redhat.com> wrote:
> Hi,
>
> On 04/26/2011 05:17 PM, Will Drewry wrote:
>> Recently, I've been benchmarking some different hardware crypto
>> accelerators and many of them appear to be tuned toward largish
>> requests (up to 16k) with a given key and a base IV.
>
> Please can you explicitly say which accelerators you are using and
> show some benchmarks?

Sure, I was looking at the in-development tegra_aes kernel module.  As
I was trying to understand the available performance ignoring
in-kernel overhead, I wanted to push as much data as possible into the
requests at a time as I tweaked how the different pieces worked.

The difference was a change from 2.2 megabytes/s -> 3.2 megabytes/s
using a simplistic synchronous dd + drop_caches scaffold.   Without
using conv=sync, the performance difference was not as noticeable (for
obvious reasons).  A more comprehensive benchmark suite, like iozone,
would bubble up better results, though.

> Does dmcrypt work in async crypto mode or you have also
> "accelerators" like special instructions which run synchronously
> (like AES-NI)?

It was using the ablkcipher interface, but the engine itself is
synchronous, so requests were living on a queue in the driver waiting
to be serviced.  For dm-crypt and above, everything appears
asynchronous.  (Is that roughly answering the question? :)

> Of course large block means smaller overhead but the difference should
> not be significant (at least in theory).
> If it is, we need to know why - it can be because of timing or
> the way how the request are submitted not the time
> or real encryption (initialization) itself.
>
> In this case the crypto driver should be optimised first.

Agreed.  I was using this patch purely for driver optimization which
is why I didn't post a prettier one with an optional table argument,
etc.

>>  I've created a
>> very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
>> the driver performance testing, but I lack the cryptographic
>> understanding to determine if there is significant exposure by
>> allowing a dm-crypt device to use a block size that exceeds the sector
>> size.
>
> As Arno said, there should be no real security problem for these block
> sizes. Basically we are just using CBC or XTS mode today.

Cool.  (Also, sorry for not replying there. I failed to subscribe in
advance of the mailing and missed the response :/ )

> For XTS-AES, definition explicitly says that data unit (= your block)
> size should not exceed 2^20 128bit blocks (128bit = AES cipher block).
> (And even here possible attacks are closely related to birthday
> bound, IOW you need to have enough blocks encrypted with the same key.)
>
> So I do not see real security problem here. But problems are elsewhere.

Thanks - nice to have that confirmed.

>> 1. Does anyone know if there will be significant exposure to the
>> plaintext if dm-crypt used larger block sizes?
>
> Should not be.
>
>> 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
>> interest?
>
> Short answer would be no :-)
>
> As I said, I would like to prove first that the problem is really in block
> size and not in related problem.
>
> Now the real problems:
>
> The whole device mapper and dmcrypt works as transparent block encryption
> and we are always operating on 512B sectors.
>
> Even if device is 4k blocks, this is hidden in underlying layer and
> DM just properly aligns data and propagates limits but
> still operates on 512B sectors. (It can be ineffective for some
> IO patterns, but it works).
>
> Changing encryption block size causes device to be incompatible with other
> systems (note stacked devices, a common thing here - LVM over dmcrypt)
> and IOs. You have to generate only aligned IO of your encryption block size.
>
> (or change dmcrypt significantly)
>
> IO hints is not enough - maybe example is better here:
>
>
> Testing device (some random data there, not important)
> # dmsetup table --showkeys
> x: 0 417792 crypt aes-cbc-essiv:sha256 aeb26d1f69eb6dddfb9381eed4d7299f091e99aa5d3ff06866d4ce9f620f7aca 0 8:16 0
>
> Let's generate some direct IOs (to avoid page cache)

Ah! I had forgotten about direct I/O.  I was using fsync and
/proc/sys/vm/drop_caches to clear the page cache across each call.
direct is a whole other beast :/

> *Without* your patch:
>
> # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
>
> # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
>
> # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
>
> As you can see, we get the same plain data with different IO sizes.
>
> Now *with* your patch (page size is 4096):
>
> # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum
> dd: reading `/dev/mapper/x': Invalid argument
>
> # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum
> 4f4271e7799097b6e0ed66d81a8341163b8a5a06a2c57f50b930d429a7aa94d1  -
>
> # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum
> 17cf9897059800f5b43af38766471048b872d20a0f565ee553a351b1a6251141  -
>
> So block size of 512B causes operation to fail (ok - IO hints).
> IO of block encryption size and multiple of encryption size returns
> apparently something different now.
>
> This is probably not what we want...
>
> (Note that I did not even tested cross-encryption-block operations.)

Hehe - not at all!

> Even if this is somehow solved, many other problems remains:
>
> - we need to extend mapping table parameters so the block size
> must be configurable (encrypted device image must be readable
> on system with different page size, I have e.g. Sparc with 8k page size.
> (This will be needed for other extensions so it is not real
> problem, just it need to be done first.)

Certainly - I definitely wouldn't want it page-size bound in general,
and I suspect that most consumers of dm-crypt would still want a
single sector block size.

> - you need to store this block size info in header,
> for LUKS it means using new LUKS header version
> (requiring parameter on commandline is dangerous - it must be enforced)

Ouch - so much for being lazy!

> I would really better to not support this yet and first try to optimize
> crypto layer such way that it can process 512B blocks more
> efficiently (of course it will not fix bad hw but it can help batching
> sector encryption, maybe suing some hints, dunno).

That makes perfect sense to me.  I'll keep using this patch for easy
testing of known (non-direct io) test loads (or give in and just add
the drivers I'm playing with to the existing crypto test module :).


Thanks for the thorough and thoughtful response!
will

[I always seem to learn something new from mails to dm-*!]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-crypt] En/decrypting in multi-sector batches?
  2011-04-27 15:07   ` Will Drewry
@ 2011-04-28 18:43     ` Arno Wagner
  0 siblings, 0 replies; 7+ messages in thread
From: Arno Wagner @ 2011-04-28 18:43 UTC (permalink / raw)
  To: dm-crypt

On Wed, Apr 27, 2011 at 10:07:14AM -0500, Will Drewry wrote:
> On Wed, Apr 27, 2011 at 2:19 AM, Milan Broz <mbroz@redhat.com> wrote:
> > Hi,
> >
> > On 04/26/2011 05:17 PM, Will Drewry wrote:
> >> Recently, I've been benchmarking some different hardware crypto
> >> accelerators and many of them appear to be tuned toward largish
> >> requests (up to 16k) with a given key and a base IV.
> >
> > Please can you explicitly say which accelerators you are using and
> > show some benchmarks?
> 
> Sure, I was looking at the in-development tegra_aes kernel module.  As
> I was trying to understand the available performance ignoring
> in-kernel overhead, I wanted to push as much data as possible into the
> requests at a time as I tweaked how the different pieces worked.
> 
> The difference was a change from 2.2 megabytes/s -> 3.2 megabytes/s
> using a simplistic synchronous dd + drop_caches scaffold.   Without
> using conv=sync, the performance difference was not as noticeable (for
> obvious reasons).  A more comprehensive benchmark suite, like iozone,
> would bubble up better results, though.

So we are talking about a very slow CPU here. BTW, if
you use the device in /dev/mapper/<something> directly, that
should bypass all OS caches and completely encrypt/decrypt 
an each access.
 
3.2 MB/sec is pretty slow. A current desktop CPU gets something like
500MB/sec with aes-cbc-essiv:sha256.

[...]
> >> ?I've created a
> >> very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
> >> the driver performance testing, but I lack the cryptographic
> >> understanding to determine if there is significant exposure by
> >> allowing a dm-crypt device to use a block size that exceeds the sector
> >> size.
> >
> > As Arno said, there should be no real security problem for these block
> > sizes. Basically we are just using CBC or XTS mode today.
> 
> Cool.  (Also, sorry for not replying there. I failed to subscribe in
> advance of the mailing and missed the response :/ )

No problem. I appreciate getting feedback, but it does not have to
be immediate ;-)

> > For XTS-AES, definition explicitly says that data unit (= your block)
> > size should not exceed 2^20 128bit blocks (128bit = AES cipher block).
> > (And even here possible attacks are closely related to birthday
> > bound, IOW you need to have enough blocks encrypted with the same key.)

Ah, right, it was an XTS limit we discussed here some time ago.

> > So I do not see real security problem here. But problems are elsewhere.
> 
> Thanks - nice to have that confirmed.
> 
> >> 1. Does anyone know if there will be significant exposure to the
> >> plaintext if dm-crypt used larger block sizes?
> >
> > Should not be.
> >
> >> 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
> >> interest?
> >
> > Short answer would be no :-)
> >
> > As I said, I would like to prove first that the problem is really in block
> > size and not in related problem.
> >
> > Now the real problems:
> >
> > The whole device mapper and dmcrypt works as transparent block encryption
> > and we are always operating on 512B sectors.
> >
> > Even if device is 4k blocks, this is hidden in underlying layer and
> > DM just properly aligns data and propagates limits but
> > still operates on 512B sectors. (It can be ineffective for some
> > IO patterns, but it works).
> >
> > Changing encryption block size causes device to be incompatible with other
> > systems (note stacked devices, a common thing here - LVM over dmcrypt)
> > and IOs. You have to generate only aligned IO of your encryption block size.
> >
> > (or change dmcrypt significantly)
> >
> > IO hints is not enough - maybe example is better here:
> >
> >
> > Testing device (some random data there, not important)
> > # dmsetup table --showkeys
> > x: 0 417792 crypt aes-cbc-essiv:sha256 aeb26d1f69eb6dddfb9381eed4d7299f091e99aa5d3ff06866d4ce9f620f7aca 0 8:16 0
> >
> > Let's generate some direct IOs (to avoid page cache)
> 
> Ah! I had forgotten about direct I/O.  I was using fsync and
> /proc/sys/vm/drop_caches to clear the page cache across each call.
> direct is a whole other beast :/


Well, dm-crypt and LUKS sit conceptually directly on the block device
and give you another block device. Caching and buffering, other than 
by the disk itself, are a bit higher in the stack.

I don't know, but I think you do not need the "direct" flag
when accessing /dev/mapper/<something>. At least some quick 
measurements with 

  time head -c 100M /dev/mapper/c1 > /dev/null

seem to agree.

 
> > *Without* your patch:
> >
> > # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum
> > eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4 ?-
> >
> > # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum
> > eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4 ?-
> >
> > # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum
> > eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4 ?-
> >
> > As you can see, we get the same plain data with different IO sizes.
> >
> > Now *with* your patch (page size is 4096):
> >
> > # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum
> > dd: reading `/dev/mapper/x': Invalid argument
> >
> > # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum
> > 4f4271e7799097b6e0ed66d81a8341163b8a5a06a2c57f50b930d429a7aa94d1 ?-
> >
> > # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum
> > 17cf9897059800f5b43af38766471048b872d20a0f565ee553a351b1a6251141 ?-
> >
> > So block size of 512B causes operation to fail (ok - IO hints).
> > IO of block encryption size and multiple of encryption size returns
> > apparently something different now.


Could that be different sector counts messing with the IVs?


> > This is probably not what we want...
> >
> > (Note that I did not even tested cross-encryption-block operations.)
> 
> Hehe - not at all!


Nooo, I would never get finished updating the FAQ to explain this ;-)

 
> > Even if this is somehow solved, many other problems remains:
> >
> > - we need to extend mapping table parameters so the block size
> > must be configurable (encrypted device image must be readable
> > on system with different page size, I have e.g. Sparc with 8k page size.
> > (This will be needed for other extensions so it is not real
> > problem, just it need to be done first.)
> 
> Certainly - I definitely wouldn't want it page-size bound in general,
> and I suspect that most consumers of dm-crypt would still want a
> single sector block size.


Also with plain dm-crypt, any non-defaults would have to be
given on each call (no metadata) and this would confuse people
no end.

 
> > - you need to store this block size info in header,
> > for LUKS it means using new LUKS header version
> > (requiring parameter on commandline is dangerous - it must be enforced)
> 
> Ouch - so much for being lazy!
> 
> > I would really better to not support this yet and first try to optimize
> > crypto layer such way that it can process 512B blocks more
> > efficiently (of course it will not fix bad hw but it can help batching
> > sector encryption, maybe suing some hints, dunno).
> 
> That makes perfect sense to me.  I'll keep using this patch for easy
> testing of known (non-direct io) test loads (or give in and just add
> the drivers I'm playing with to the existing crypto test module :).

You should also measure set-up time (new IV, key stays the same) 
of the hardware vs encryption speed. Maybe using 512B blocks is 
actually not that much slower.

> Thanks for the thorough and thoughtful response!
> will
> 
> [I always seem to learn something new from mails to dm-*!]

Thanks, nice to hear that!

Arno
-- 
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno@wagner.name 
GnuPG:  ID: 1E25338F  FP: 0C30 5782 9D93 F785 E79C  0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

If it's in the news, don't worry about it.  The very definition of 
"news" is "something that hardly ever happens." -- Bruce Schneier 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-crypt] En/decrypting in multi-sector batches?
  2011-04-27  7:19 ` Milan Broz
  2011-04-27 15:07   ` Will Drewry
@ 2011-04-29 17:33   ` Mandeep Singh Baines
  2011-05-01  8:13     ` Milan Broz
  1 sibling, 1 reply; 7+ messages in thread
From: Mandeep Singh Baines @ 2011-04-29 17:33 UTC (permalink / raw)
  To: dm-crypt

Mandeep Singh Baines (msb@google.com) wrote:
> Hi,
> 
> On 04/26/2011 05:17 PM, Will Drewry wrote:
> > Recently, I've been benchmarking some different hardware crypto
> > accelerators and many of them appear to be tuned toward largish
> > requests (up to 16k) with a given key and a base IV.
> 
> Please can you explicitly say which accelerators you are using and
> show some benchmarks?
> 
> Does dmcrypt work in async crypto mode or you have also
> "accelerators" like special instructions which run synchronously
> (like AES-NI)?
> 
> Of course large block means smaller overhead but the difference should
> not be significant (at least in theory).
> If it is, we need to know why - it can be because of timing or
> the way how the request are submitted not the time
> or real encryption (initialization) itself.
> 
> In this case the crypto driver should be optimised first.
> 
> >  I've created a
> > very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
> > the driver performance testing, but I lack the cryptographic
> > understanding to determine if there is significant exposure by
> > allowing a dm-crypt device to use a block size that exceeds the sector
> > size.
> 
> As Arno said, there should be no real security problem for these block
> sizes. Basically we are just using CBC or XTS mode today.
> 
> For XTS-AES, definition explicitly says that data unit (= your block)
> size should not exceed 2^20 128bit blocks (128bit = AES cipher block).
> (And even here possible attacks are closely related to birthday
> bound, IOW you need to have enough blocks encrypted with the same key.)
> 
> So I do not see real security problem here. But problems are elsewhere.
> 
> > 1. Does anyone know if there will be significant exposure to the
> > plaintext if dm-crypt used larger block sizes?
> 
> Should not be.
> 
> > 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
> > interest?
> 
> Short answer would be no :-)
> 
> As I said, I would like to prove first that the problem is really in block
> size and not in related problem.
> 
> Now the real problems:
> 
> The whole device mapper and dmcrypt works as transparent block encryption
> and we are always operating on 512B sectors.
> 
> Even if device is 4k blocks, this is hidden in underlying layer and
> DM just properly aligns data and propagates limits but
> still operates on 512B sectors. (It can be ineffective for some
> IO patterns, but it works).
> 
> Changing encryption block size causes device to be incompatible with other
> systems (note stacked devices, a common thing here - LVM over dmcrypt)
> and IOs. You have to generate only aligned IO of your encryption block size.
> 
> (or change dmcrypt significantly)
> 
> IO hints is not enough - maybe example is better here:
> 
> Testing device (some random data there, not important)
> # dmsetup table --showkeys
> x: 0 417792 crypt aes-cbc-essiv:sha256 aeb26d1f69eb6dddfb9381eed4d7299f091e99aa5d3ff06866d4ce9f620f7aca 0 8:16 0
> 
> Let's generate some direct IOs (to avoid page cache)
> 
> *Without* your patch:
> 
> # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum 
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
> 
> # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum 
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
> 
> # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum 
> eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -
> 
> As you can see, we get the same plain data with different IO sizes.
> 
> Now *with* your patch (page size is 4096):
> 
> # dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum 
> dd: reading `/dev/mapper/x': Invalid argument
> 
> # dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum 
> 4f4271e7799097b6e0ed66d81a8341163b8a5a06a2c57f50b930d429a7aa94d1  -
> 
> # dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum 
> 17cf9897059800f5b43af38766471048b872d20a0f565ee553a351b1a6251141  -
> 
> So block size of 512B causes operation to fail (ok - IO hints).
> IO of block encryption size and multiple of encryption size returns
> apparently something different now.
> 
> This is probably not what we want...
> 

But isn't this what should happen? You're trying to do 512B direct I/O
on a block device with a 4096B logical block size. The same error would
happen if you tried to do 512B direct I/O on a disk with a 4096B
logical block size. A portable user-space application should not
assume a logical block size when using direct I/O.

> (Note that I did not even tested cross-encryption-block operations.)
> 
> Even if this is somehow solved, many other problems remains:
> 
> - we need to extend mapping table parameters so the block size
> must be configurable (encrypted device image must be readable
> on system with different page size, I have e.g. Sparc with 8k page size.
> (This will be needed for other extensions so it is not real
> problem, just it need to be done first.)
> 
> - you need to store this block size info in header,
> for LUKS it means using new LUKS header version
> (requiring parameter on commandline is dangerous - it must be enforced)
> 
> 
> I would really better to not support this yet and first try to optimize
> crypto layer such way that it can process 512B blocks more
> efficiently (of course it will not fix bad hw but it can help batching
> sector encryption, maybe suing some hints, dunno).
> 
> Milan
> _______________________________________________
> dm-crypt mailing list
> dm-crypt-4q3lyFh4P1g@public.gmane.org
> http://www.saout.de/mailman/listinfo/dm-crypt
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-crypt] En/decrypting in multi-sector batches?
  2011-04-29 17:33   ` Mandeep Singh Baines
@ 2011-05-01  8:13     ` Milan Broz
  0 siblings, 0 replies; 7+ messages in thread
From: Milan Broz @ 2011-05-01  8:13 UTC (permalink / raw)
  To: Mandeep Singh Baines; +Cc: dm-crypt

On 04/29/2011 07:33 PM, Mandeep Singh Baines wrote:
> But isn't this what should happen? You're trying to do 512B direct I/O
> on a block device with a 4096B logical block size. The same error would
> happen if you tried to do 512B direct I/O on a disk with a 4096B
> logical block size. A portable user-space application should not
> assume a logical block size when using direct I/O.

Sure. But I just tried to point out that dmcrypt is adds
additional restrictions not related to real underlying device
sector size. 
Today it works in all situations because 512B is always atomic.

The encryption unit size must be part of configuration, it cannot
be automatically derived from page size of hw sector size.
(e.g. mount image of disk through loopback will switch to 512B sector).

I am not saying it is not possible to use larger block sizes,
probably one day it will happen but it requires more changes in advance.

(and btw dmcrypt and cryptsetup are friedly to native 4k drives already,
just encryption unit is still 512B)

Milan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-05-01  8:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-26 15:17 [dm-crypt] En/decrypting in multi-sector batches? Will Drewry
2011-04-26 17:46 ` Arno Wagner
2011-04-27  7:19 ` Milan Broz
2011-04-27 15:07   ` Will Drewry
2011-04-28 18:43     ` Arno Wagner
2011-04-29 17:33   ` Mandeep Singh Baines
2011-05-01  8:13     ` Milan Broz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.