All of lore.kernel.org
 help / color / mirror / Atom feed
* Improving Data-At-Rest encryption in Ceph
@ 2015-12-14 13:17 Radoslaw Zarzynski
  2015-12-14 20:28 ` Gregory Farnum
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Radoslaw Zarzynski @ 2015-12-14 13:17 UTC (permalink / raw)
  To: Ceph Development; +Cc: Adam Kupczyk

Hello Folks,

I would like to publish a proposal regarding improvements to Ceph
data-at-rest encryption mechanism. Adam Kupczyk and I worked
on that in last weeks.

Initially we considered several architectural approaches and made
several iterations of discussions with Intel storage group. The proposal
is condensed description of the solution we see as the most promising
one.

We are open to any comments and questions.

Regards,
Adam Kupczyk
Radoslaw Zarzynski


=======================
Summary
=======================

Data at-rest encryption is mechanism for protecting data center
operator from revealing content of physical carriers.

Ceph already implements a form of at rest encryption. It is performed
through dm-crypt as intermediary layer between OSD and its physical
storage. The proposed at rest encryption mechanism will be orthogonal
and, in some ways, superior to already existing solution.

=======================
Owners
=======================

* Radoslaw Zarzynski (Mirantis)
* Adam Kupczyk (Mirantis)

=======================
Interested Parties
=======================

If you are interested in contributing to this blueprint, or want to be
a "speaker" during the Summit session, list your name here.

Name (Affiliation)
Name (Affiliation)
Name

=======================
Current Status
=======================

Current data at rest encryption is achieved through dm-crypt placed
under OSD’s filestore. This solution is a generic one and cannot
leverage Ceph-specific characteristics. The best example is that
encryption is done multiple times - one time for each replica. Another
issue is lack of granularity - either OSD encrypts nothing, or OSD
encrypts everything (with dm-crypt on).

Cryptographic keys are stored on filesystem of storage node that hosts
OSDs. Changing them require redeploying the OSDs.

The best way to address those issues seems to be introducing
encryption into Ceph OSD.

=======================
Detailed Description
=======================

In addition to the currently available solution, Ceph OSD would
accommodate encryption component placed in the replication mechanisms.

Data incoming from Ceph clients would be encrypted by primary OSD. It
would replicate ciphertext to non-primary members of an acting set.
Data sent to Ceph client would be decrypted by OSD handling read
operation. This allows to:
* perform only one encryption per write,
* achieve per-pool key granulation for both key and encryption itself.

Unfortunately, having always and everywhere the same key for a given
pool is unacceptable - it would make cluster migration and key change
extremely burdensome process. To address those issues crypto key
versioning would be introduced. All RADOS objects inside single
placement group stored on a given OSD would use the same crypto key
version. The same PG on other replica may use different version of the
same, per pool-granulated key.

In typical case ciphertext data transferred from OSD to OSD can be
used without change. This is when both OSDs have the same crypto key
version for given placement group. In rare cases when crypto keys are
different (key change or transition period) receiving OSD will recrypt
with local key versions.

For compression to be effective it must be done before encryption. Due
to that encryption may be applied differently for replication pools
and EC pools. Replicated pools do not implement compression; for those
pools encryption is applied right after data enters OSD. For EC pools
encryption is applied after compressing. When compression will be
implemented for replicated pools, it must be placed before encryption.

Ceph currently has thin abstraction layer over block ciphers
(CryptoHandler, CryptoKeyHandler). We want to extend this API to
introduce initialization vectors, chaining modes and asynchronous
operations. Implementation of this API may be based on AF_ALG kernel
interface. This assures the ability to use hardware accelerations
already implemented in Linux kernel. Moreover, due to working on
bigger chunks (dm-crypt operates on 512 byte long sectors) the raw
encryption performance may be even higher.

The encryption process must not impede random reads and random writes
to RADOS objects. Solution for this is to create encryption/decryption
process that will be applicable for arbitrary data range. This can be
done most easily by applying chaining mode that doesn’t impose
dependencies between subsequent data chunks. Good candidates are
CTR[1] and XTS[2].

Encryption-related metadata would be stored in extended attributes.

In order to coordinate encryption across acting set, all replicas will
share information about crypto key versions they use. Real
cryptographic keys never be stored permanently by Ceph OSD. Instead,
it would be gathered from monitors. Key management improvements will
be addressed in separate task based on dedicated proposal [3].


[1] https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29

[2] https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_.28XTS.29

[3] http://tracker.ceph.com/projects/ceph/wiki/Osd_-_simple_ceph-mon_dm-crypt_key_management

=======================
Work items
=======================

Coding tasks
* Extended Crypto API (CryptoHandler, CryptoKeyHandler).
* Encryption for replicated pools.
* Encryption for EC pools.
* Key management.

Build / release tasks
* Unit tests for extended Crypto API.
* Functional tests for encrypted replicated pools.
* Functional tests for encrypted EC pools.

Documentation tasks
* Document extended Crypto API.
* Document migration procedures.
* Document crypto key creation and versioning.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 13:17 Improving Data-At-Rest encryption in Ceph Radoslaw Zarzynski
@ 2015-12-14 20:28 ` Gregory Farnum
  2015-12-14 22:02   ` Martin Millnert
                     ` (2 more replies)
  2015-12-14 21:52 ` Martin Millnert
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 23+ messages in thread
From: Gregory Farnum @ 2015-12-14 20:28 UTC (permalink / raw)
  To: Radoslaw Zarzynski; +Cc: Ceph Development, Adam Kupczyk

On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
<rzarzynski@mirantis.com> wrote:
> Hello Folks,
>
> I would like to publish a proposal regarding improvements to Ceph
> data-at-rest encryption mechanism. Adam Kupczyk and I worked
> on that in last weeks.
>
> Initially we considered several architectural approaches and made
> several iterations of discussions with Intel storage group. The proposal
> is condensed description of the solution we see as the most promising
> one.
>
> We are open to any comments and questions.
>
> Regards,
> Adam Kupczyk
> Radoslaw Zarzynski
>
>
> =======================
> Summary
> =======================
>
> Data at-rest encryption is mechanism for protecting data center
> operator from revealing content of physical carriers.
>
> Ceph already implements a form of at rest encryption. It is performed
> through dm-crypt as intermediary layer between OSD and its physical
> storage. The proposed at rest encryption mechanism will be orthogonal
> and, in some ways, superior to already existing solution.
>
> =======================
> Owners
> =======================
>
> * Radoslaw Zarzynski (Mirantis)
> * Adam Kupczyk (Mirantis)
>
> =======================
> Interested Parties
> =======================
>
> If you are interested in contributing to this blueprint, or want to be
> a "speaker" during the Summit session, list your name here.
>
> Name (Affiliation)
> Name (Affiliation)
> Name
>
> =======================
> Current Status
> =======================
>
> Current data at rest encryption is achieved through dm-crypt placed
> under OSD’s filestore. This solution is a generic one and cannot
> leverage Ceph-specific characteristics. The best example is that
> encryption is done multiple times - one time for each replica. Another
> issue is lack of granularity - either OSD encrypts nothing, or OSD
> encrypts everything (with dm-crypt on).
>
> Cryptographic keys are stored on filesystem of storage node that hosts
> OSDs. Changing them require redeploying the OSDs.
>
> The best way to address those issues seems to be introducing
> encryption into Ceph OSD.
>
> =======================
> Detailed Description
> =======================
>
> In addition to the currently available solution, Ceph OSD would
> accommodate encryption component placed in the replication mechanisms.
>
> Data incoming from Ceph clients would be encrypted by primary OSD. It
> would replicate ciphertext to non-primary members of an acting set.
> Data sent to Ceph client would be decrypted by OSD handling read
> operation. This allows to:
> * perform only one encryption per write,
> * achieve per-pool key granulation for both key and encryption itself.
>
> Unfortunately, having always and everywhere the same key for a given
> pool is unacceptable - it would make cluster migration and key change
> extremely burdensome process. To address those issues crypto key
> versioning would be introduced. All RADOS objects inside single
> placement group stored on a given OSD would use the same crypto key
> version. The same PG on other replica may use different version of the
> same, per pool-granulated key.
>
> In typical case ciphertext data transferred from OSD to OSD can be
> used without change. This is when both OSDs have the same crypto key
> version for given placement group. In rare cases when crypto keys are
> different (key change or transition period) receiving OSD will recrypt
> with local key versions.

I don't understand this part at all. Do you plan to read and rewrite
the entire PG whenever you change the "key version"? How often do you
plan to change these keys? What is even the point of changing them,
since anybody who can control an OSD can grab the entire current key
set?

> For compression to be effective it must be done before encryption. Due
> to that encryption may be applied differently for replication pools
> and EC pools. Replicated pools do not implement compression; for those
> pools encryption is applied right after data enters OSD. For EC pools
> encryption is applied after compressing. When compression will be
> implemented for replicated pools, it must be placed before encryption.

So this means you'll be encrypting the object data, but not the omap
nor xattrs, and not the file names on disk. Is that acceptable to
people? It's probably fine for a lot of rbd use cases, but not for
RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
is stored in those regions. I'd rather a solution worked on the full
data set. :/
-Greg

>
> Ceph currently has thin abstraction layer over block ciphers
> (CryptoHandler, CryptoKeyHandler). We want to extend this API to
> introduce initialization vectors, chaining modes and asynchronous
> operations. Implementation of this API may be based on AF_ALG kernel
> interface. This assures the ability to use hardware accelerations
> already implemented in Linux kernel. Moreover, due to working on
> bigger chunks (dm-crypt operates on 512 byte long sectors) the raw
> encryption performance may be even higher.
>
> The encryption process must not impede random reads and random writes
> to RADOS objects. Solution for this is to create encryption/decryption
> process that will be applicable for arbitrary data range. This can be
> done most easily by applying chaining mode that doesn’t impose
> dependencies between subsequent data chunks. Good candidates are
> CTR[1] and XTS[2].
>
> Encryption-related metadata would be stored in extended attributes.
>
> In order to coordinate encryption across acting set, all replicas will
> share information about crypto key versions they use. Real
> cryptographic keys never be stored permanently by Ceph OSD. Instead,
> it would be gathered from monitors. Key management improvements will
> be addressed in separate task based on dedicated proposal [3].
>
>
> [1] https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29
>
> [2] https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_.28XTS.29
>
> [3] http://tracker.ceph.com/projects/ceph/wiki/Osd_-_simple_ceph-mon_dm-crypt_key_management
>
> =======================
> Work items
> =======================
>
> Coding tasks
> * Extended Crypto API (CryptoHandler, CryptoKeyHandler).
> * Encryption for replicated pools.
> * Encryption for EC pools.
> * Key management.
>
> Build / release tasks
> * Unit tests for extended Crypto API.
> * Functional tests for encrypted replicated pools.
> * Functional tests for encrypted EC pools.
>
> Documentation tasks
> * Document extended Crypto API.
> * Document migration procedures.
> * Document crypto key creation and versioning.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 13:17 Improving Data-At-Rest encryption in Ceph Radoslaw Zarzynski
  2015-12-14 20:28 ` Gregory Farnum
@ 2015-12-14 21:52 ` Martin Millnert
  2015-12-15 20:40   ` Radoslaw Zarzynski
  2015-12-15 14:23 ` Lars Marowsky-Bree
  2016-01-24 13:45 ` John Hunter
  3 siblings, 1 reply; 23+ messages in thread
From: Martin Millnert @ 2015-12-14 21:52 UTC (permalink / raw)
  To: Radoslaw Zarzynski; +Cc: Ceph Development, Adam Kupczyk

On Mon, 2015-12-14 at 14:17 +0100, Radoslaw Zarzynski wrote:
> Hello Folks,
> 
> I would like to publish a proposal regarding improvements to Ceph
> data-at-rest encryption mechanism. Adam Kupczyk and I worked
> on that in last weeks.
> 
> Initially we considered several architectural approaches and made
> several iterations of discussions with Intel storage group. The proposal
> is condensed description of the solution we see as the most promising
> one.
> 
> We are open to any comments and questions.
> 
> Regards,
> Adam Kupczyk
> Radoslaw Zarzynski
> 
> 
> =======================
> Summary
> =======================
> 
> Data at-rest encryption is mechanism for protecting data center
> operator from revealing content of physical carriers.
> 
> Ceph already implements a form of at rest encryption. It is performed
> through dm-crypt as intermediary layer between OSD and its physical
> storage. The proposed at rest encryption mechanism will be orthogonal
> and, in some ways, superior to already existing solution.
> 
> =======================
> Owners
> =======================
> 
> * Radoslaw Zarzynski (Mirantis)
> * Adam Kupczyk (Mirantis)
> 
> =======================
> Interested Parties
> =======================
> 
> If you are interested in contributing to this blueprint, or want to be
> a "speaker" during the Summit session, list your name here.
> 
> Name (Affiliation)
> Name (Affiliation)
> Name
> 
> =======================
> Current Status
> =======================
> 
> Current data at rest encryption is achieved through dm-crypt placed
> under OSD’s filestore. This solution is a generic one and cannot
> leverage Ceph-specific characteristics. The best example is that
> encryption is done multiple times - one time for each replica. Another
> issue is lack of granularity - either OSD encrypts nothing, or OSD
> encrypts everything (with dm-crypt on).

All or nothing is some times a desired function of encryption.
"In-betweens" are tricky.

Additionally, dm-crypt is AFAICT fairly performant since at least
there's no need to context switch per crypto-op, since it sits in the dm
IO path within kernel.

These two points are not necessarily a critique of your proposal.

> Cryptographic keys are stored on filesystem of storage node that hosts
> OSDs. Changing them require redeploying the OSDs.

Not very familiar with what deployment technique of dm-crypt you refer
to (don't use ceph-deploy personally). But the LUKS FDE suite does allow
for separating encryption key from activation key (or whatever it is
called).

> The best way to address those issues seems to be introducing
> encryption into Ceph OSD.
> 
> =======================
> Detailed Description
> =======================
> 
> In addition to the currently available solution, Ceph OSD would
> accommodate encryption component placed in the replication mechanisms.
> 
> Data incoming from Ceph clients would be encrypted by primary OSD. It
> would replicate ciphertext to non-primary members of an acting set.
> Data sent to Ceph client would be decrypted by OSD handling read
> operation. This allows to:
> * perform only one encryption per write,
> * achieve per-pool key granulation for both key and encryption itself.

I.e. the primary OSD's key for the PG in question, would be the one used
for all replicas of the data, per acting set. I.e. granularity of
actually one key per acting set, controlled by primary OSD?

> Unfortunately, having always and everywhere the same key for a given
> pool is unacceptable - it would make cluster migration and key change
> extremely burdensome process. To address those issues crypto key
> versioning would be introduced. All RADOS objects inside single
> placement group stored on a given OSD would use the same crypto key
> version.

This seems to add key versioning on the primary OSD.

> The same PG on other replica may use different version of the
> same, per pool-granulated key.

Attempt to rewrite to see if I parsed correctly: Within a PG's acting
set, a non-primary OSD can use another version of the per-pool key.
That seems fair, to support asynchronous key roll forward/backward.

> In typical case ciphertext data transferred from OSD to OSD can be
> used without change. This is when both OSDs have the same crypto key
> version for given placement group. In rare cases when crypto keys are
> different (key change or transition period) receiving OSD will recrypt
> with local key versions.

Doesn't this presume the receiving OSD always has more up to date set of
keys than the sending OSD?
What if sending OSD has a newer key than the receiving OSD?

> For compression to be effective it must be done before encryption. Due
> to that encryption may be applied differently for replication pools
> and EC pools. Replicated pools do not implement compression; for those
> pools encryption is applied right after data enters OSD. For EC pools
> encryption is applied after compressing. When compression will be
> implemented for replicated pools, it must be placed before encryption.
> 
> Ceph currently has thin abstraction layer over block ciphers
> (CryptoHandler, CryptoKeyHandler). We want to extend this API to
> introduce initialization vectors, chaining modes and asynchronous
> operations. Implementation of this API may be based on AF_ALG kernel
> interface. This assures the ability to use hardware accelerations
> already implemented in Linux kernel. Moreover, due to working on
> bigger chunks (dm-crypt operates on 512 byte long sectors) the raw
> encryption performance may be even higher.


> The encryption process must not impede random reads and random writes
> to RADOS objects.

That's a brave statement. :-)

>  Solution for this is to create encryption/decryption
> process that will be applicable for arbitrary data range. This can be
> done most easily by applying chaining mode that doesn’t impose
> dependencies between subsequent data chunks. Good candidates are
> CTR[1] and XTS[2].
> 
> Encryption-related metadata would be stored in extended attributes.
> 
> In order to coordinate encryption across acting set, all replicas will
> share information about crypto key versions they use. Real
> cryptographic keys never be stored permanently by Ceph OSD. Instead,
> it would be gathered from monitors. Key management improvements will
> be addressed in separate task based on dedicated proposal [3].

Key management is indeed the Achilles heel of any cluster solution like
this, and depending on requirements sooner or later descends into some
sort of TPM or similar, I guess.  I.e. "to trust a computer someone else
may have arbitrary physical access to."

/M

> [1] https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29
> 
> [2] https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_.28XTS.29
> 
> [3] http://tracker.ceph.com/projects/ceph/wiki/Osd_-_simple_ceph-mon_dm-crypt_key_management
> 
> =======================
> Work items
> =======================
> 
> Coding tasks
> * Extended Crypto API (CryptoHandler, CryptoKeyHandler).
> * Encryption for replicated pools.
> * Encryption for EC pools.
> * Key management.
> 
> Build / release tasks
> * Unit tests for extended Crypto API.
> * Functional tests for encrypted replicated pools.
> * Functional tests for encrypted EC pools.
> 
> Documentation tasks
> * Document extended Crypto API.
> * Document migration procedures.
> * Document crypto key creation and versioning.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 20:28 ` Gregory Farnum
@ 2015-12-14 22:02   ` Martin Millnert
  2015-12-14 22:32     ` Gregory Farnum
  2015-12-15 10:13     ` Adam Kupczyk
  2015-12-15 10:04   ` Adam Kupczyk
       [not found]   ` <CAHMeWhGgHWq=jPZfj8s_KCB=wLhsBNCyJjZSBQQFZXc8r63M7A@mail.gmail.com>
  2 siblings, 2 replies; 23+ messages in thread
From: Martin Millnert @ 2015-12-14 22:02 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Radoslaw Zarzynski, Ceph Development, Adam Kupczyk

On Mon, 2015-12-14 at 12:28 -0800, Gregory Farnum wrote:
> On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
<snip>
> > In typical case ciphertext data transferred from OSD to OSD can be
> > used without change. This is when both OSDs have the same crypto key
> > version for given placement group. In rare cases when crypto keys are
> > different (key change or transition period) receiving OSD will recrypt
> > with local key versions.
> 
> I don't understand this part at all. Do you plan to read and rewrite
> the entire PG whenever you change the "key version"? How often do you
> plan to change these keys? What is even the point of changing them,
> since anybody who can control an OSD can grab the entire current key
> set?

You may have leaked keys without having leaked ciphertext.
The typical use case for FDE/SED is IMO being able to RMA drives.
Nothing more than that.

> > For compression to be effective it must be done before encryption. Due
> > to that encryption may be applied differently for replication pools
> > and EC pools. Replicated pools do not implement compression; for those
> > pools encryption is applied right after data enters OSD. For EC pools
> > encryption is applied after compressing. When compression will be
> > implemented for replicated pools, it must be placed before encryption.
> 
> So this means you'll be encrypting the object data, but not the omap
> nor xattrs, and not the file names on disk. Is that acceptable to
> people? It's probably fine for a lot of rbd use cases, but not for
> RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
> is stored in those regions. I'd rather a solution worked on the full
> data set. :/
> -Greg

This is indeed the largest weakness of the proposal.

I'm lacking a bit on the motivation for what problem this solution
solves that a dm-crypt-based solution doesn't address? When, except for
snooping, is it a desired design to not encrypt all the things?

I guess one could say: "ciphertext would be transferred on the network".
But, it's incomplete. Internal transport encryption (and better auth)
for Ceph is a different problem.

I'd probably rather dm-crypt key management processes were refined and
improved (saying this without knowing the state of any current
implementations for Ceph), and have a solid FDE solution than a solution
that doesn't encrypt metadata. Only encrypting data but not metadata
isn't sufficient anymore...

/M


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 22:02   ` Martin Millnert
@ 2015-12-14 22:32     ` Gregory Farnum
  2015-12-16  2:13       ` Andrew Bartlett
  2015-12-15 10:13     ` Adam Kupczyk
  1 sibling, 1 reply; 23+ messages in thread
From: Gregory Farnum @ 2015-12-14 22:32 UTC (permalink / raw)
  To: Martin Millnert; +Cc: Radoslaw Zarzynski, Ceph Development, Adam Kupczyk

On Mon, Dec 14, 2015 at 2:02 PM, Martin Millnert <martin@millnert.se> wrote:
> On Mon, 2015-12-14 at 12:28 -0800, Gregory Farnum wrote:
>> On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
> <snip>
>> > In typical case ciphertext data transferred from OSD to OSD can be
>> > used without change. This is when both OSDs have the same crypto key
>> > version for given placement group. In rare cases when crypto keys are
>> > different (key change or transition period) receiving OSD will recrypt
>> > with local key versions.
>>
>> I don't understand this part at all. Do you plan to read and rewrite
>> the entire PG whenever you change the "key version"? How often do you
>> plan to change these keys? What is even the point of changing them,
>> since anybody who can control an OSD can grab the entire current key
>> set?
>
> You may have leaked keys without having leaked ciphertext.
> The typical use case for FDE/SED is IMO being able to RMA drives.
> Nothing more than that.

Yeah, but you necessarily need to let people keep using the old key
*and* give them the new one on-demand if they've got access to the
system, in order to allow switching to the new key. You need to wait
for all the data to actually be rewritten with the new key before you
can consider it secure again, and that'll take a loooong time. I'm not
saying there isn't threat mitigation here, just that I'm not sure it's
useful against somebody who's already obtained access to your
encryption keys — if they've gotten those it's unlikely they won't
have gotten OSD keys as well, and if they've got network access they
can impersonate an OSD and get access to whatever data they like.

I guess that still protects against an external database hack from
somebody who gets access to your old hard drives, but...*shrug*

>
>> > For compression to be effective it must be done before encryption. Due
>> > to that encryption may be applied differently for replication pools
>> > and EC pools. Replicated pools do not implement compression; for those
>> > pools encryption is applied right after data enters OSD. For EC pools
>> > encryption is applied after compressing. When compression will be
>> > implemented for replicated pools, it must be placed before encryption.
>>
>> So this means you'll be encrypting the object data, but not the omap
>> nor xattrs, and not the file names on disk. Is that acceptable to
>> people? It's probably fine for a lot of rbd use cases, but not for
>> RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
>> is stored in those regions. I'd rather a solution worked on the full
>> data set. :/
>> -Greg
>
> This is indeed the largest weakness of the proposal.
>
> I'm lacking a bit on the motivation for what problem this solution
> solves that a dm-crypt-based solution doesn't address? When, except for
> snooping, is it a desired design to not encrypt all the things?
>
> I guess one could say: "ciphertext would be transferred on the network".
> But, it's incomplete. Internal transport encryption (and better auth)
> for Ceph is a different problem.
>
> I'd probably rather dm-crypt key management processes were refined and
> improved (saying this without knowing the state of any current
> implementations for Ceph), and have a solid FDE solution than a solution
> that doesn't encrypt metadata. Only encrypting data but not metadata
> isn't sufficient anymore...

Yeah, I'd rather see dm-crypt get done well rather than in-Ceph
encryption like this. If we want to protect data I think that's a lot
more secure (and will *stay* that way since encryption is all that
project does), and adding TLS or similar to the messenger code would
give us on-the-wire protection from the clients to the disk.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 20:28 ` Gregory Farnum
  2015-12-14 22:02   ` Martin Millnert
@ 2015-12-15 10:04   ` Adam Kupczyk
       [not found]   ` <CAHMeWhGgHWq=jPZfj8s_KCB=wLhsBNCyJjZSBQQFZXc8r63M7A@mail.gmail.com>
  2 siblings, 0 replies; 23+ messages in thread
From: Adam Kupczyk @ 2015-12-15 10:04 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Radoslaw Zarzynski, Ceph Development

On Mon, Dec 14, 2015 at 9:28 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
>
> On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
> <rzarzynski@mirantis.com> wrote:
> > Hello Folks,
> >
> > I would like to publish a proposal regarding improvements to Ceph
> > data-at-rest encryption mechanism. Adam Kupczyk and I worked
> > on that in last weeks.
> >
> > Initially we considered several architectural approaches and made
> > several iterations of discussions with Intel storage group. The proposal
> > is condensed description of the solution we see as the most promising
> > one.
> >
> > We are open to any comments and questions.
> >
> > Regards,
> > Adam Kupczyk
> > Radoslaw Zarzynski
> >
> >
> > =======================
> > Summary
> > =======================
> >
> > Data at-rest encryption is mechanism for protecting data center
> > operator from revealing content of physical carriers.
> >
> > Ceph already implements a form of at rest encryption. It is performed
> > through dm-crypt as intermediary layer between OSD and its physical
> > storage. The proposed at rest encryption mechanism will be orthogonal
> > and, in some ways, superior to already existing solution.
> >
> > =======================
> > Owners
> > =======================
> >
> > * Radoslaw Zarzynski (Mirantis)
> > * Adam Kupczyk (Mirantis)
> >
> > =======================
> > Interested Parties
> > =======================
> >
> > If you are interested in contributing to this blueprint, or want to be
> > a "speaker" during the Summit session, list your name here.
> >
> > Name (Affiliation)
> > Name (Affiliation)
> > Name
> >
> > =======================
> > Current Status
> > =======================
> >
> > Current data at rest encryption is achieved through dm-crypt placed
> > under OSD’s filestore. This solution is a generic one and cannot
> > leverage Ceph-specific characteristics. The best example is that
> > encryption is done multiple times - one time for each replica. Another
> > issue is lack of granularity - either OSD encrypts nothing, or OSD
> > encrypts everything (with dm-crypt on).
> >
> > Cryptographic keys are stored on filesystem of storage node that hosts
> > OSDs. Changing them require redeploying the OSDs.
> >
> > The best way to address those issues seems to be introducing
> > encryption into Ceph OSD.
> >
> > =======================
> > Detailed Description
> > =======================
> >
> > In addition to the currently available solution, Ceph OSD would
> > accommodate encryption component placed in the replication mechanisms.
> >
> > Data incoming from Ceph clients would be encrypted by primary OSD. It
> > would replicate ciphertext to non-primary members of an acting set.
> > Data sent to Ceph client would be decrypted by OSD handling read
> > operation. This allows to:
> > * perform only one encryption per write,
> > * achieve per-pool key granulation for both key and encryption itself.
> >
> > Unfortunately, having always and everywhere the same key for a given
> > pool is unacceptable - it would make cluster migration and key change
> > extremely burdensome process. To address those issues crypto key
> > versioning would be introduced. All RADOS objects inside single
> > placement group stored on a given OSD would use the same crypto key
> > version. The same PG on other replica may use different version of the
> > same, per pool-granulated key.
> >
> > In typical case ciphertext data transferred from OSD to OSD can be
> > used without change. This is when both OSDs have the same crypto key
> > version for given placement group. In rare cases when crypto keys are
> > different (key change or transition period) receiving OSD will recrypt
> > with local key versions.
>
> I don't understand this part at all. Do you plan to read and rewrite
> the entire PG whenever you change the "key version"? How often do you
> plan to change these keys? What is even the point of changing them,
> since anybody who can control an OSD can grab the entire current key
> set?
We envision that key change will happen very infrequently. Usually in
reaction to some possible security breach.
After key version is incremented, nothing happens automatically. Old
key is used for as long as  PG is not empty. When first RADOS object
is created, the current key version is locked to PG.
There is no solution when someone gets control over OSD - either by
running custom OSD binary or extracting data by impersonating client.
It is outside of scope of at-rest-encryption. We only addressed cases
when media storage somehow leaves datacenter premises. Ability to
change key is necessary, since we need procedure to recover data
security after keys are compromised.
>
> > For compression to be effective it must be done before encryption. Due
> > to that encryption may be applied differently for replication pools
> > and EC pools. Replicated pools do not implement compression; for those
> > pools encryption is applied right after data enters OSD. For EC pools
> > encryption is applied after compressing. When compression will be
> > implemented for replicated pools, it must be placed before encryption.
>
> So this means you'll be encrypting the object data, but not the omap
> nor xattrs, and not the file names on disk. Is that acceptable to
> people? It's probably fine for a lot of rbd use cases, but not for
> RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
> is stored in those regions. I'd rather a solution worked on the full
> data set. :/

We intend to encrypt:
- object data
- omap values
- xattr values
We consider to encrypt:
- object names
- xattr names
We are unable to encrypt:
- omap names
> -Greg
>
> >
> > Ceph currently has thin abstraction layer over block ciphers
> > (CryptoHandler, CryptoKeyHandler). We want to extend this API to
> > introduce initialization vectors, chaining modes and asynchronous
> > operations. Implementation of this API may be based on AF_ALG kernel
> > interface. This assures the ability to use hardware accelerations
> > already implemented in Linux kernel. Moreover, due to working on
> > bigger chunks (dm-crypt operates on 512 byte long sectors) the raw
> > encryption performance may be even higher.
> >
> > The encryption process must not impede random reads and random writes
> > to RADOS objects. Solution for this is to create encryption/decryption
> > process that will be applicable for arbitrary data range. This can be
> > done most easily by applying chaining mode that doesn’t impose
> > dependencies between subsequent data chunks. Good candidates are
> > CTR[1] and XTS[2].
> >
> > Encryption-related metadata would be stored in extended attributes.
> >
> > In order to coordinate encryption across acting set, all replicas will
> > share information about crypto key versions they use. Real
> > cryptographic keys never be stored permanently by Ceph OSD. Instead,
> > it would be gathered from monitors. Key management improvements will
> > be addressed in separate task based on dedicated proposal [3].
> >
> >
> > [1] https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29
> >
> > [2] https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_.28XTS.29
> >
> > [3] http://tracker.ceph.com/projects/ceph/wiki/Osd_-_simple_ceph-mon_dm-crypt_key_management
> >
> > =======================
> > Work items
> > =======================
> >
> > Coding tasks
> > * Extended Crypto API (CryptoHandler, CryptoKeyHandler).
> > * Encryption for replicated pools.
> > * Encryption for EC pools.
> > * Key management.
> >
> > Build / release tasks
> > * Unit tests for extended Crypto API.
> > * Functional tests for encrypted replicated pools.
> > * Functional tests for encrypted EC pools.
> >
> > Documentation tasks
> > * Document extended Crypto API.
> > * Document migration procedures.
> > * Document crypto key creation and versioning.
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 22:02   ` Martin Millnert
  2015-12-14 22:32     ` Gregory Farnum
@ 2015-12-15 10:13     ` Adam Kupczyk
  1 sibling, 0 replies; 23+ messages in thread
From: Adam Kupczyk @ 2015-12-15 10:13 UTC (permalink / raw)
  To: Martin Millnert; +Cc: Gregory Farnum, Radoslaw Zarzynski, Ceph Development

On Mon, Dec 14, 2015 at 11:02 PM, Martin Millnert <martin@millnert.se> wrote:
> On Mon, 2015-12-14 at 12:28 -0800, Gregory Farnum wrote:
>> On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
> <snip>
>> > In typical case ciphertext data transferred from OSD to OSD can be
>> > used without change. This is when both OSDs have the same crypto key
>> > version for given placement group. In rare cases when crypto keys are
>> > different (key change or transition period) receiving OSD will recrypt
>> > with local key versions.
>>
>> I don't understand this part at all. Do you plan to read and rewrite
>> the entire PG whenever you change the "key version"? How often do you
>> plan to change these keys? What is even the point of changing them,
>> since anybody who can control an OSD can grab the entire current key
>> set?
>
> You may have leaked keys without having leaked ciphertext.
> The typical use case for FDE/SED is IMO being able to RMA drives.
> Nothing more than that.
>
>> > For compression to be effective it must be done before encryption. Due
>> > to that encryption may be applied differently for replication pools
>> > and EC pools. Replicated pools do not implement compression; for those
>> > pools encryption is applied right after data enters OSD. For EC pools
>> > encryption is applied after compressing. When compression will be
>> > implemented for replicated pools, it must be placed before encryption.
>>
>> So this means you'll be encrypting the object data, but not the omap
>> nor xattrs, and not the file names on disk. Is that acceptable to
>> people? It's probably fine for a lot of rbd use cases, but not for
>> RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
>> is stored in those regions. I'd rather a solution worked on the full
>> data set. :/
>> -Greg
>
> This is indeed the largest weakness of the proposal.
>
> I'm lacking a bit on the motivation for what problem this solution
> solves that a dm-crypt-based solution doesn't address? When, except for
> snooping, is it a desired design to not encrypt all the things?
1) With dm-crypt encryption is performed separately for each replica.
With OSD solution it is possible to encrypt only once, and distribute encrypted.
2) It is best to encrypt everything. There just are some things we are
unable to encrypt, it actually means: omap names.
>
> I guess one could say: "ciphertext would be transferred on the network".
> But, it's incomplete. Internal transport encryption (and better auth)
> for Ceph is a different problem.
>
> I'd probably rather dm-crypt key management processes were refined and
> improved (saying this without knowing the state of any current
> implementations for Ceph), and have a solid FDE solution than a solution
> that doesn't encrypt metadata. Only encrypting data but not metadata
> isn't sufficient anymore...
>
> /M
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 13:17 Improving Data-At-Rest encryption in Ceph Radoslaw Zarzynski
  2015-12-14 20:28 ` Gregory Farnum
  2015-12-14 21:52 ` Martin Millnert
@ 2015-12-15 14:23 ` Lars Marowsky-Bree
  2015-12-15 14:59   ` Sage Weil
                     ` (2 more replies)
  2016-01-24 13:45 ` John Hunter
  3 siblings, 3 replies; 23+ messages in thread
From: Lars Marowsky-Bree @ 2015-12-15 14:23 UTC (permalink / raw)
  To: Ceph Development

On 2015-12-14T14:17:08, Radoslaw Zarzynski <rzarzynski@mirantis.com> wrote:

Hi all,

great to see this revived.

However, I have come to see some concerns with handling the encryption
within Ceph itself.

The key part to any such approach is formulating the threat scenario.
For the use cases we have seen, the data-at-rest encryption matters so
they can confidently throw away disks without leaking data. It's not
meant as a defense against an online attacker. There usually is no
problem with "a few" disks being privileged, or one or two nodes that
need an admin intervention for booting (to enter some master encryption
key somehow, somewhere).

However, that requires *all* data on the OSDs to be encrypted.

Crucially, that includes not just the file system meta data (so not just
the data), but also the root and especially the swap partition. Those
potentially include swapped out data, coredumps, logs, etc.

(As an optional feature, it'd be cool if an OSD could be moved to a
different chassis and continue operating there, to speed up recovery.
Another optional feature would be to eventually be able, for those
customers that trust them ;-), supply the key to the on-disk encryption
(OPAL et al).)

The proposal that Joshua posted a while ago essentially remained based
on dm-crypt, but put in simple hooks to retrieve the keys from some
"secured" server via sftp/ftps instead of loading them from the root fs.
Similar to deo, that ties the key to being on the network and knowing
the OSD UUID.

This would then also be somewhat easily extensible to utilize the same
key management server via initrd/dracut.

Yes, this means that each OSD disk is separately encrypted, but given
modern CPUs, this is less of a problem. It does have the benefit of
being completely transparent to Ceph, and actually covering the whole
node.

Of course, one of the key issues is always the key server.
Putting/retrieving/deleting keys is reasonably simple, but the question
of how to ensure HA for it is a bit tricky. But doable; people have been
building HA ftp/http servers for a while ;-) Also, a single key server
setup could theoretically serve multiple Ceph clusters.

It's not yet perfect, but I think the approach is superior to being
implemented in Ceph natively. If there's any encryption that should be
implemented in Ceph, I believe it'd be the on-the-wire encryption to
protect against evasedroppers.

Other scenarios would require client-side encryption.

> Current data at rest encryption is achieved through dm-crypt placed
> under OSD’s filestore. This solution is a generic one and cannot
> leverage Ceph-specific characteristics. The best example is that
> encryption is done multiple times - one time for each replica. Another
> issue is lack of granularity - either OSD encrypts nothing, or OSD
> encrypts everything (with dm-crypt on).

True. But for the threat scenario, a holistic approach to encryption
seems actually required.

> Cryptographic keys are stored on filesystem of storage node that hosts
> OSDs. Changing them require redeploying the OSDs.

This is solvable by storing the key on an external key server.

Changing the key is only necessary if the key has been exposed. And with
dm-crypt, that's still possible - it's not the actual encryption key
that's stored, but the secret that is needed to unlock it, and that can
be re-encrypted quite fast. (In theory; it's not implemented yet for
the Ceph OSDs.)


> Data incoming from Ceph clients would be encrypted by primary OSD. It
> would replicate ciphertext to non-primary members of an acting set.

This still exposes data in coredumps or on swap on the primary OSD, and
metadata on the secondaries.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-15 14:23 ` Lars Marowsky-Bree
@ 2015-12-15 14:59   ` Sage Weil
  2015-12-15 23:31   ` Matt Benjamin
  2015-12-16 22:29   ` Adam Kupczyk
  2 siblings, 0 replies; 23+ messages in thread
From: Sage Weil @ 2015-12-15 14:59 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5479 bytes --]

I agree with Lars's concerns: the main problems with the current dm-crypt 
approach are that there isn't any key management integration yet and the 
root volume and swap aren't encrypted. Those are easy to solve (and I'm 
hoping we'll be able to address them in time for Jewel).

On the other hand, implementing encryption within RADOS will be complex, 
and I don't see what the benefits are over whole-disk encryption.  Can 
someone summarize what per-pool encryption keys and the ability to rotate 
keys gives us?  If the threat is an attacker who is on the storage network 
and has compromised an OSD the game is pretty much up...

At a high level, I think almost anything beyond at-rest encryption (that 
is aimed at throwing out disks or physically walking a server out of the 
data center) turns into a key management and threat mitigation design 
nightmare (with few, if any, compelling solutions) until you give up and 
have clients encrypt their data and don't trust the cluster with the keys 
at all...

sage


On Tue, 15 Dec 2015, Lars Marowsky-Bree wrote:
> On 2015-12-14T14:17:08, Radoslaw Zarzynski <rzarzynski@mirantis.com> wrote:
> 
> Hi all,
> 
> great to see this revived.
> 
> However, I have come to see some concerns with handling the encryption
> within Ceph itself.
> 
> The key part to any such approach is formulating the threat scenario.
> For the use cases we have seen, the data-at-rest encryption matters so
> they can confidently throw away disks without leaking data. It's not
> meant as a defense against an online attacker. There usually is no
> problem with "a few" disks being privileged, or one or two nodes that
> need an admin intervention for booting (to enter some master encryption
> key somehow, somewhere).
> 
> However, that requires *all* data on the OSDs to be encrypted.
> 
> Crucially, that includes not just the file system meta data (so not just
> the data), but also the root and especially the swap partition. Those
> potentially include swapped out data, coredumps, logs, etc.
> 
> (As an optional feature, it'd be cool if an OSD could be moved to a
> different chassis and continue operating there, to speed up recovery.
> Another optional feature would be to eventually be able, for those
> customers that trust them ;-), supply the key to the on-disk encryption
> (OPAL et al).)
> 
> The proposal that Joshua posted a while ago essentially remained based
> on dm-crypt, but put in simple hooks to retrieve the keys from some
> "secured" server via sftp/ftps instead of loading them from the root fs.
> Similar to deo, that ties the key to being on the network and knowing
> the OSD UUID.
> 
> This would then also be somewhat easily extensible to utilize the same
> key management server via initrd/dracut.
> 
> Yes, this means that each OSD disk is separately encrypted, but given
> modern CPUs, this is less of a problem. It does have the benefit of
> being completely transparent to Ceph, and actually covering the whole
> node.
> 
> Of course, one of the key issues is always the key server.
> Putting/retrieving/deleting keys is reasonably simple, but the question
> of how to ensure HA for it is a bit tricky. But doable; people have been
> building HA ftp/http servers for a while ;-) Also, a single key server
> setup could theoretically serve multiple Ceph clusters.
> 
> It's not yet perfect, but I think the approach is superior to being
> implemented in Ceph natively. If there's any encryption that should be
> implemented in Ceph, I believe it'd be the on-the-wire encryption to
> protect against evasedroppers.
> 
> Other scenarios would require client-side encryption.
> 
> > Current data at rest encryption is achieved through dm-crypt placed
> > under OSD?s filestore. This solution is a generic one and cannot
> > leverage Ceph-specific characteristics. The best example is that
> > encryption is done multiple times - one time for each replica. Another
> > issue is lack of granularity - either OSD encrypts nothing, or OSD
> > encrypts everything (with dm-crypt on).
> 
> True. But for the threat scenario, a holistic approach to encryption
> seems actually required.
> 
> > Cryptographic keys are stored on filesystem of storage node that hosts
> > OSDs. Changing them require redeploying the OSDs.
> 
> This is solvable by storing the key on an external key server.
> 
> Changing the key is only necessary if the key has been exposed. And with
> dm-crypt, that's still possible - it's not the actual encryption key
> that's stored, but the secret that is needed to unlock it, and that can
> be re-encrypted quite fast. (In theory; it's not implemented yet for
> the Ceph OSDs.)
> 
> 
> > Data incoming from Ceph clients would be encrypted by primary OSD. It
> > would replicate ciphertext to non-primary members of an acting set.
> 
> This still exposes data in coredumps or on swap on the primary OSD, and
> metadata on the secondaries.
> 
> 
> Regards,
>     Lars
> 
> -- 
> Architect Storage/HA
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 21:52 ` Martin Millnert
@ 2015-12-15 20:40   ` Radoslaw Zarzynski
  0 siblings, 0 replies; 23+ messages in thread
From: Radoslaw Zarzynski @ 2015-12-15 20:40 UTC (permalink / raw)
  To: Martin Millnert; +Cc: Ceph Development, Adam Kupczyk

On Mon, Dec 14, 2015 at 10:52 PM, Martin Millnert <martin@millnert.se> wrote:
> On Mon, 2015-12-14 at 14:17 +0100, Radoslaw Zarzynski wrote:
>> Hello Folks,
>>
>> I would like to publish a proposal regarding improvements to Ceph
>> data-at-rest encryption mechanism. Adam Kupczyk and I worked
>> on that in last weeks.
>>
>> Initially we considered several architectural approaches and made
>> several iterations of discussions with Intel storage group. The proposal
>> is condensed description of the solution we see as the most promising
>> one.
>>
>> We are open to any comments and questions.
>>
>> Regards,
>> Adam Kupczyk
>> Radoslaw Zarzynski
>>
>>
>> =======================
>> Summary
>> =======================
>>
>> Data at-rest encryption is mechanism for protecting data center
>> operator from revealing content of physical carriers.
>>
>> Ceph already implements a form of at rest encryption. It is performed
>> through dm-crypt as intermediary layer between OSD and its physical
>> storage. The proposed at rest encryption mechanism will be orthogonal
>> and, in some ways, superior to already existing solution.
>>
>> =======================
>> Owners
>> =======================
>>
>> * Radoslaw Zarzynski (Mirantis)
>> * Adam Kupczyk (Mirantis)
>>
>> =======================
>> Interested Parties
>> =======================
>>
>> If you are interested in contributing to this blueprint, or want to be
>> a "speaker" during the Summit session, list your name here.
>>
>> Name (Affiliation)
>> Name (Affiliation)
>> Name
>>
>> =======================
>> Current Status
>> =======================
>>
>> Current data at rest encryption is achieved through dm-crypt placed
>> under OSD’s filestore. This solution is a generic one and cannot
>> leverage Ceph-specific characteristics. The best example is that
>> encryption is done multiple times - one time for each replica. Another
>> issue is lack of granularity - either OSD encrypts nothing, or OSD
>> encrypts everything (with dm-crypt on).
>
> All or nothing is some times a desired function of encryption.
> "In-betweens" are tricky.
>
> Additionally, dm-crypt is AFAICT fairly performant since at least
> there's no need to context switch per crypto-op, since it sits in the dm
> IO path within kernel.

Hello Martin,

I cannot agree about dm-crypt performance in comparison to the OSD
solution. Each BIO handled by dm-crypt must go through at least one
kernel workqueue (kcryptd) [1]. Some of them have to pass additional
one (kcryptd_io) [2]. Those wqueues are served by dedicated set of
kthreads, so context switches are present here. Moreover, the whole
BIO is split into small, 512 bytes long chunks before passing to
ablkcipher [3]. IMO that's far less than ideal.

In the case of application-layer encryption you would operate much
closer to the data. You may encrypt in much larger chunks. Costs of
context switches and op setup phase (important for hw accelerators)
would be negligible providing much better performance. Leveraging
some Ceph-specific characteristics (encrypting only selected pools;
constant complexity according to replica count) multiplies gain even
further.

Regards,
Radoslaw

[1] http://lxr.free-electrons.com/source/drivers/md/dm-crypt.c?v=3.19#L1350
[2] http://lxr.free-electrons.com/source/drivers/md/dm-crypt.c?v=3.19#L1355
[3] http://lxr.free-electrons.com/source/drivers/md/dm-crypt.c?v=3.19#L864

>
> These two points are not necessarily a critique of your proposal.
>
>> Cryptographic keys are stored on filesystem of storage node that hosts
>> OSDs. Changing them require redeploying the OSDs.
>
> Not very familiar with what deployment technique of dm-crypt you refer
> to (don't use ceph-deploy personally). But the LUKS FDE suite does allow
> for separating encryption key from activation key (or whatever it is
> called).
>
>> The best way to address those issues seems to be introducing
>> encryption into Ceph OSD.
>>
>> =======================
>> Detailed Description
>> =======================
>>
>> In addition to the currently available solution, Ceph OSD would
>> accommodate encryption component placed in the replication mechanisms.
>>
>> Data incoming from Ceph clients would be encrypted by primary OSD. It
>> would replicate ciphertext to non-primary members of an acting set.
>> Data sent to Ceph client would be decrypted by OSD handling read
>> operation. This allows to:
>> * perform only one encryption per write,
>> * achieve per-pool key granulation for both key and encryption itself.
>
> I.e. the primary OSD's key for the PG in question, would be the one used
> for all replicas of the data, per acting set. I.e. granularity of
> actually one key per acting set, controlled by primary OSD?
>
>> Unfortunately, having always and everywhere the same key for a given
>> pool is unacceptable - it would make cluster migration and key change
>> extremely burdensome process. To address those issues crypto key
>> versioning would be introduced. All RADOS objects inside single
>> placement group stored on a given OSD would use the same crypto key
>> version.
>
> This seems to add key versioning on the primary OSD.
>
>> The same PG on other replica may use different version of the
>> same, per pool-granulated key.
>
> Attempt to rewrite to see if I parsed correctly: Within a PG's acting
> set, a non-primary OSD can use another version of the per-pool key.
> That seems fair, to support asynchronous key roll forward/backward.
>
>> In typical case ciphertext data transferred from OSD to OSD can be
>> used without change. This is when both OSDs have the same crypto key
>> version for given placement group. In rare cases when crypto keys are
>> different (key change or transition period) receiving OSD will recrypt
>> with local key versions.
>
> Doesn't this presume the receiving OSD always has more up to date set of
> keys than the sending OSD?
> What if sending OSD has a newer key than the receiving OSD?
>
>> For compression to be effective it must be done before encryption. Due
>> to that encryption may be applied differently for replication pools
>> and EC pools. Replicated pools do not implement compression; for those
>> pools encryption is applied right after data enters OSD. For EC pools
>> encryption is applied after compressing. When compression will be
>> implemented for replicated pools, it must be placed before encryption.
>>
>> Ceph currently has thin abstraction layer over block ciphers
>> (CryptoHandler, CryptoKeyHandler). We want to extend this API to
>> introduce initialization vectors, chaining modes and asynchronous
>> operations. Implementation of this API may be based on AF_ALG kernel
>> interface. This assures the ability to use hardware accelerations
>> already implemented in Linux kernel. Moreover, due to working on
>> bigger chunks (dm-crypt operates on 512 byte long sectors) the raw
>> encryption performance may be even higher.
>
>
>> The encryption process must not impede random reads and random writes
>> to RADOS objects.
>
> That's a brave statement. :-)
>
>>  Solution for this is to create encryption/decryption
>> process that will be applicable for arbitrary data range. This can be
>> done most easily by applying chaining mode that doesn’t impose
>> dependencies between subsequent data chunks. Good candidates are
>> CTR[1] and XTS[2].
>>
>> Encryption-related metadata would be stored in extended attributes.
>>
>> In order to coordinate encryption across acting set, all replicas will
>> share information about crypto key versions they use. Real
>> cryptographic keys never be stored permanently by Ceph OSD. Instead,
>> it would be gathered from monitors. Key management improvements will
>> be addressed in separate task based on dedicated proposal [3].
>
> Key management is indeed the Achilles heel of any cluster solution like
> this, and depending on requirements sooner or later descends into some
> sort of TPM or similar, I guess.  I.e. "to trust a computer someone else
> may have arbitrary physical access to."
>
> /M
>
>> [1] https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29
>>
>> [2] https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_.28XTS.29
>>
>> [3] http://tracker.ceph.com/projects/ceph/wiki/Osd_-_simple_ceph-mon_dm-crypt_key_management
>>
>> =======================
>> Work items
>> =======================
>>
>> Coding tasks
>> * Extended Crypto API (CryptoHandler, CryptoKeyHandler).
>> * Encryption for replicated pools.
>> * Encryption for EC pools.
>> * Key management.
>>
>> Build / release tasks
>> * Unit tests for extended Crypto API.
>> * Functional tests for encrypted replicated pools.
>> * Functional tests for encrypted EC pools.
>>
>> Documentation tasks
>> * Document extended Crypto API.
>> * Document migration procedures.
>> * Document crypto key creation and versioning.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
       [not found]   ` <CAHMeWhGgHWq=jPZfj8s_KCB=wLhsBNCyJjZSBQQFZXc8r63M7A@mail.gmail.com>
@ 2015-12-15 21:04     ` Gregory Farnum
  2015-12-16 15:13       ` Adam Kupczyk
  2015-12-16 15:36       ` Radoslaw Zarzynski
  0 siblings, 2 replies; 23+ messages in thread
From: Gregory Farnum @ 2015-12-15 21:04 UTC (permalink / raw)
  To: Adam Kupczyk; +Cc: Radoslaw Zarzynski, Ceph Development

On Tue, Dec 15, 2015 at 1:58 AM, Adam Kupczyk <akupczyk@mirantis.com> wrote:
>
>
> On Mon, Dec 14, 2015 at 9:28 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
>>
>> On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
>> <rzarzynski@mirantis.com> wrote:
>> > Hello Folks,
>> >
>> > I would like to publish a proposal regarding improvements to Ceph
>> > data-at-rest encryption mechanism. Adam Kupczyk and I worked
>> > on that in last weeks.
>> >
>> > Initially we considered several architectural approaches and made
>> > several iterations of discussions with Intel storage group. The proposal
>> > is condensed description of the solution we see as the most promising
>> > one.
>> >
>> > We are open to any comments and questions.
>> >
>> > Regards,
>> > Adam Kupczyk
>> > Radoslaw Zarzynski
>> >
>> >
>> > =======================
>> > Summary
>> > =======================
>> >
>> > Data at-rest encryption is mechanism for protecting data center
>> > operator from revealing content of physical carriers.
>> >
>> > Ceph already implements a form of at rest encryption. It is performed
>> > through dm-crypt as intermediary layer between OSD and its physical
>> > storage. The proposed at rest encryption mechanism will be orthogonal
>> > and, in some ways, superior to already existing solution.
>> >
>> > =======================
>> > Owners
>> > =======================
>> >
>> > * Radoslaw Zarzynski (Mirantis)
>> > * Adam Kupczyk (Mirantis)
>> >
>> > =======================
>> > Interested Parties
>> > =======================
>> >
>> > If you are interested in contributing to this blueprint, or want to be
>> > a "speaker" during the Summit session, list your name here.
>> >
>> > Name (Affiliation)
>> > Name (Affiliation)
>> > Name
>> >
>> > =======================
>> > Current Status
>> > =======================
>> >
>> > Current data at rest encryption is achieved through dm-crypt placed
>> > under OSD’s filestore. This solution is a generic one and cannot
>> > leverage Ceph-specific characteristics. The best example is that
>> > encryption is done multiple times - one time for each replica. Another
>> > issue is lack of granularity - either OSD encrypts nothing, or OSD
>> > encrypts everything (with dm-crypt on).
>> >
>> > Cryptographic keys are stored on filesystem of storage node that hosts
>> > OSDs. Changing them require redeploying the OSDs.
>> >
>> > The best way to address those issues seems to be introducing
>> > encryption into Ceph OSD.
>> >
>> > =======================
>> > Detailed Description
>> > =======================
>> >
>> > In addition to the currently available solution, Ceph OSD would
>> > accommodate encryption component placed in the replication mechanisms.
>> >
>> > Data incoming from Ceph clients would be encrypted by primary OSD. It
>> > would replicate ciphertext to non-primary members of an acting set.
>> > Data sent to Ceph client would be decrypted by OSD handling read
>> > operation. This allows to:
>> > * perform only one encryption per write,
>> > * achieve per-pool key granulation for both key and encryption itself.
>> >
>> > Unfortunately, having always and everywhere the same key for a given
>> > pool is unacceptable - it would make cluster migration and key change
>> > extremely burdensome process. To address those issues crypto key
>> > versioning would be introduced. All RADOS objects inside single
>> > placement group stored on a given OSD would use the same crypto key
>> > version. The same PG on other replica may use different version of the
>> > same, per pool-granulated key.
>> >
>> > In typical case ciphertext data transferred from OSD to OSD can be
>> > used without change. This is when both OSDs have the same crypto key
>> > version for given placement group. In rare cases when crypto keys are
>> > different (key change or transition period) receiving OSD will recrypt
>> > with local key versions.
>>
>> I don't understand this part at all. Do you plan to read and rewrite
>> the entire PG whenever you change the "key version"? How often do you
>> plan to change these keys? What is even the point of changing them,
>> since anybody who can control an OSD can grab the entire current key
>> set?
>
> We envision that key change will happen very infrequently. Usually in
> reaction to some possible security breach.
> After key version is incremented, nothing happens automatically. Old key is
> used for as long as  PG is not empty. When first RADOS object is created,
> the current key version is locked to PG.
> There is no solution when someone gets control over OSD - either by running
> custom OSD binary or extracting data by impersonating client. It is outside
> of scope of at-rest-encryption. We only addressed cases when media storage
> somehow leaves datacenter premises. Ability to change key is necessary,
> since we need procedure to recover data security after keys are compromised.
>>
>>
>> > For compression to be effective it must be done before encryption. Due
>> > to that encryption may be applied differently for replication pools
>> > and EC pools. Replicated pools do not implement compression; for those
>> > pools encryption is applied right after data enters OSD. For EC pools
>> > encryption is applied after compressing. When compression will be
>> > implemented for replicated pools, it must be placed before encryption.
>>
>> So this means you'll be encrypting the object data, but not the omap
>> nor xattrs, and not the file names on disk. Is that acceptable to
>> people? It's probably fine for a lot of rbd use cases, but not for
>> RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
>> is stored in those regions. I'd rather a solution worked on the full
>> data set. :/
>
> We intend to encrypt:
> - object data
> - omap values
> - xattr values
> We consider to encrypt:
> - object names
> - xattr names
> We are unable to encrypt:
> - omap names

Are there any encryption mechanisms that can efficiently and
effectively encrypt pieces of data that small? I don't have any
expertise in crypto but I thought you needed a certain minimum size of
the output blob to get any security at all. If we're turning every
8-byte thing into a 64-byte thing that's going to go much worse for us
than just having every OSD encrypt their local disk...
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-15 14:23 ` Lars Marowsky-Bree
  2015-12-15 14:59   ` Sage Weil
@ 2015-12-15 23:31   ` Matt Benjamin
  2015-12-16 22:29   ` Adam Kupczyk
  2 siblings, 0 replies; 23+ messages in thread
From: Matt Benjamin @ 2015-12-15 23:31 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

Hi,

Thanks for this detailed response.

----- Original Message -----
> From: "Lars Marowsky-Bree" <lmb@suse.com>
> To: "Ceph Development" <ceph-devel@vger.kernel.org>
> Sent: Tuesday, December 15, 2015 9:23:04 AM
> Subject: Re: Improving Data-At-Rest encryption in Ceph

> 
> It's not yet perfect, but I think the approach is superior to being
> implemented in Ceph natively. If there's any encryption that should be
> implemented in Ceph, I believe it'd be the on-the-wire encryption to
> protect against evasedroppers.

++

> 
> Other scenarios would require client-side encryption.

++

> 
> > Cryptographic keys are stored on filesystem of storage node that hosts
> > OSDs. Changing them require redeploying the OSDs.
> 
> This is solvable by storing the key on an external key server.

++

> 
> Changing the key is only necessary if the key has been exposed. And with
> dm-crypt, that's still possible - it's not the actual encryption key
> that's stored, but the secret that is needed to unlock it, and that can
> be re-encrypted quite fast. (In theory; it's not implemented yet for
> the Ceph OSDs.)
> 
> 
> > Data incoming from Ceph clients would be encrypted by primary OSD. It
> > would replicate ciphertext to non-primary members of an acting set.
> 
> This still exposes data in coredumps or on swap on the primary OSD, and
> metadata on the secondaries.
> 
> 
> Regards,
>     Lars
> 
> --
> Architect Storage/HA
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB
> 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 


-- 
-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 22:32     ` Gregory Farnum
@ 2015-12-16  2:13       ` Andrew Bartlett
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Bartlett @ 2015-12-16  2:13 UTC (permalink / raw)
  To: Gregory Farnum, Martin Millnert
  Cc: Radoslaw Zarzynski, Ceph Development, Adam Kupczyk

On Mon, 2015-12-14 at 14:32 -0800, Gregory Farnum wrote:
> On Mon, Dec 14, 2015 at 2:02 PM, Martin Millnert <martin@millnert.se>
> wrote:
> > On Mon, 2015-12-14 at 12:28 -0800, Gregory Farnum wrote:
> > > On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
> > <snip>
> > > > In typical case ciphertext data transferred from OSD to OSD can
> > > > be
> > > > used without change. This is when both OSDs have the same
> > > > crypto key
> > > > version for given placement group. In rare cases when crypto
> > > > keys are
> > > > different (key change or transition period) receiving OSD will
> > > > recrypt
> > > > with local key versions.
> > > 
> > > I don't understand this part at all. Do you plan to read and
> > > rewrite
> > > the entire PG whenever you change the "key version"? How often do
> > > you
> > > plan to change these keys? What is even the point of changing
> > > them,
> > > since anybody who can control an OSD can grab the entire current
> > > key
> > > set?
> > 
> > You may have leaked keys without having leaked ciphertext.
> > The typical use case for FDE/SED is IMO being able to RMA drives.
> > Nothing more than that.
> 
> Yeah, but you necessarily need to let people keep using the old key
> *and* give them the new one on-demand if they've got access to the
> system, in order to allow switching to the new key. You need to wait
> for all the data to actually be rewritten with the new key before you
> can consider it secure again, and that'll take a loooong time. I'm
> not
> saying there isn't threat mitigation here, just that I'm not sure
> it's
> useful against somebody who's already obtained access to your
> encryption keys — if they've gotten those it's unlikely they won't
> have gotten OSD keys as well, and if they've got network access they
> can impersonate an OSD and get access to whatever data they like.
> 
> I guess that still protects against an external database hack from
> somebody who gets access to your old hard drives, but...*shrug*

An important part of why we moved to LUKS for key dm-crypt is that LUKS
does some useful things to allow a form of key rotation. 

The master key is never changed (except at reformat), but it also is
never disclosed beyond the host's kernel.  What is stored on the disks
and/or on the key servers is a key-encryption-key.  

The process for rotating the key encryption key is pretty sensible,
given the constraints, because they go to good lengths to rewrite the
blocks where the old KEK encrypted the master key. 

> Yeah, I'd rather see dm-crypt get done well rather than in-Ceph
> encryption like this. If we want to protect data I think that's a lot
> more secure (and will *stay* that way since encryption is all that
> project does), and adding TLS or similar to the messenger code would
> give us on-the-wire protection from the clients to the disk.
> -Greg

The the good reason to use dm-crypt is that novel cryptography is NOT a
good thing.  The dm-crypt stuff is well used and well understood, and
any potential attacks against it are likely to be widely reported and
property analysed. 

Andrew Bartlett

-- 
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT   
https://catalyst.net.nz/services/samba






--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-15 21:04     ` Gregory Farnum
@ 2015-12-16 15:13       ` Adam Kupczyk
  2015-12-16 15:36       ` Radoslaw Zarzynski
  1 sibling, 0 replies; 23+ messages in thread
From: Adam Kupczyk @ 2015-12-16 15:13 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Radoslaw Zarzynski, Ceph Development

On Tue, Dec 15, 2015 at 10:04 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
> On Tue, Dec 15, 2015 at 1:58 AM, Adam Kupczyk <akupczyk@mirantis.com> wrote:
>>
>>
>> On Mon, Dec 14, 2015 at 9:28 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
>>>
>>> On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
>>> <rzarzynski@mirantis.com> wrote:
>>> > Hello Folks,
>>> >
>>> > I would like to publish a proposal regarding improvements to Ceph
>>> > data-at-rest encryption mechanism. Adam Kupczyk and I worked
>>> > on that in last weeks.
>>> >
>>> > Initially we considered several architectural approaches and made
>>> > several iterations of discussions with Intel storage group. The proposal
>>> > is condensed description of the solution we see as the most promising
>>> > one.
>>> >
>>> > We are open to any comments and questions.
>>> >
>>> > Regards,
>>> > Adam Kupczyk
>>> > Radoslaw Zarzynski
>>> >
>>> >
>>> > =======================
>>> > Summary
>>> > =======================
>>> >
>>> > Data at-rest encryption is mechanism for protecting data center
>>> > operator from revealing content of physical carriers.
>>> >
>>> > Ceph already implements a form of at rest encryption. It is performed
>>> > through dm-crypt as intermediary layer between OSD and its physical
>>> > storage. The proposed at rest encryption mechanism will be orthogonal
>>> > and, in some ways, superior to already existing solution.
>>> >
>>> > =======================
>>> > Owners
>>> > =======================
>>> >
>>> > * Radoslaw Zarzynski (Mirantis)
>>> > * Adam Kupczyk (Mirantis)
>>> >
>>> > =======================
>>> > Interested Parties
>>> > =======================
>>> >
>>> > If you are interested in contributing to this blueprint, or want to be
>>> > a "speaker" during the Summit session, list your name here.
>>> >
>>> > Name (Affiliation)
>>> > Name (Affiliation)
>>> > Name
>>> >
>>> > =======================
>>> > Current Status
>>> > =======================
>>> >
>>> > Current data at rest encryption is achieved through dm-crypt placed
>>> > under OSD’s filestore. This solution is a generic one and cannot
>>> > leverage Ceph-specific characteristics. The best example is that
>>> > encryption is done multiple times - one time for each replica. Another
>>> > issue is lack of granularity - either OSD encrypts nothing, or OSD
>>> > encrypts everything (with dm-crypt on).
>>> >
>>> > Cryptographic keys are stored on filesystem of storage node that hosts
>>> > OSDs. Changing them require redeploying the OSDs.
>>> >
>>> > The best way to address those issues seems to be introducing
>>> > encryption into Ceph OSD.
>>> >
>>> > =======================
>>> > Detailed Description
>>> > =======================
>>> >
>>> > In addition to the currently available solution, Ceph OSD would
>>> > accommodate encryption component placed in the replication mechanisms.
>>> >
>>> > Data incoming from Ceph clients would be encrypted by primary OSD. It
>>> > would replicate ciphertext to non-primary members of an acting set.
>>> > Data sent to Ceph client would be decrypted by OSD handling read
>>> > operation. This allows to:
>>> > * perform only one encryption per write,
>>> > * achieve per-pool key granulation for both key and encryption itself.
>>> >
>>> > Unfortunately, having always and everywhere the same key for a given
>>> > pool is unacceptable - it would make cluster migration and key change
>>> > extremely burdensome process. To address those issues crypto key
>>> > versioning would be introduced. All RADOS objects inside single
>>> > placement group stored on a given OSD would use the same crypto key
>>> > version. The same PG on other replica may use different version of the
>>> > same, per pool-granulated key.
>>> >
>>> > In typical case ciphertext data transferred from OSD to OSD can be
>>> > used without change. This is when both OSDs have the same crypto key
>>> > version for given placement group. In rare cases when crypto keys are
>>> > different (key change or transition period) receiving OSD will recrypt
>>> > with local key versions.
>>>
>>> I don't understand this part at all. Do you plan to read and rewrite
>>> the entire PG whenever you change the "key version"? How often do you
>>> plan to change these keys? What is even the point of changing them,
>>> since anybody who can control an OSD can grab the entire current key
>>> set?
>>
>> We envision that key change will happen very infrequently. Usually in
>> reaction to some possible security breach.
>> After key version is incremented, nothing happens automatically. Old key is
>> used for as long as  PG is not empty. When first RADOS object is created,
>> the current key version is locked to PG.
>> There is no solution when someone gets control over OSD - either by running
>> custom OSD binary or extracting data by impersonating client. It is outside
>> of scope of at-rest-encryption. We only addressed cases when media storage
>> somehow leaves datacenter premises. Ability to change key is necessary,
>> since we need procedure to recover data security after keys are compromised.
>>>
>>>
>>> > For compression to be effective it must be done before encryption. Due
>>> > to that encryption may be applied differently for replication pools
>>> > and EC pools. Replicated pools do not implement compression; for those
>>> > pools encryption is applied right after data enters OSD. For EC pools
>>> > encryption is applied after compressing. When compression will be
>>> > implemented for replicated pools, it must be placed before encryption.
>>>
>>> So this means you'll be encrypting the object data, but not the omap
>>> nor xattrs, and not the file names on disk. Is that acceptable to
>>> people? It's probably fine for a lot of rbd use cases, but not for
>>> RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
>>> is stored in those regions. I'd rather a solution worked on the full
>>> data set. :/
>>
>> We intend to encrypt:
>> - object data
>> - omap values
>> - xattr values
>> We consider to encrypt:
>> - object names
>> - xattr names
>> We are unable to encrypt:
>> - omap names
>
> Are there any encryption mechanisms that can efficiently and
> effectively encrypt pieces of data that small? I don't have any
> expertise in crypto but I thought you needed a certain minimum size of
> the output blob to get any security at all. If we're turning every
> 8-byte thing into a 64-byte thing that's going to go much worse for us
> than just having every OSD encrypt their local disk...
> -Greg
The idea we have is to use stream ciphers.
Actually generator of stream ciphers; for encryption of omap entry Z
in RADOS object Y, in pool X stream cipher will be configured (both
key and initialization vector) with the function like
AES(X)*AES(Y)*AES(Z).
This is why the crypto key we need is actually multitude of atomic
block cipher keys.
Effectively, no data expansion will occur.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-15 21:04     ` Gregory Farnum
  2015-12-16 15:13       ` Adam Kupczyk
@ 2015-12-16 15:36       ` Radoslaw Zarzynski
  1 sibling, 0 replies; 23+ messages in thread
From: Radoslaw Zarzynski @ 2015-12-16 15:36 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Adam Kupczyk, Ceph Development

On Tue, Dec 15, 2015 at 10:04 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
> On Tue, Dec 15, 2015 at 1:58 AM, Adam Kupczyk <akupczyk@mirantis.com> wrote:
>>
>>
>> On Mon, Dec 14, 2015 at 9:28 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
>>>
>>> On Mon, Dec 14, 2015 at 5:17 AM, Radoslaw Zarzynski
>>> <rzarzynski@mirantis.com> wrote:
>>> > Hello Folks,
>>> >
>>> > I would like to publish a proposal regarding improvements to Ceph
>>> > data-at-rest encryption mechanism. Adam Kupczyk and I worked
>>> > on that in last weeks.
>>> >
>>> > Initially we considered several architectural approaches and made
>>> > several iterations of discussions with Intel storage group. The proposal
>>> > is condensed description of the solution we see as the most promising
>>> > one.
>>> >
>>> > We are open to any comments and questions.
>>> >
>>> > Regards,
>>> > Adam Kupczyk
>>> > Radoslaw Zarzynski
>>> >
>>> >
>>> > =======================
>>> > Summary
>>> > =======================
>>> >
>>> > Data at-rest encryption is mechanism for protecting data center
>>> > operator from revealing content of physical carriers.
>>> >
>>> > Ceph already implements a form of at rest encryption. It is performed
>>> > through dm-crypt as intermediary layer between OSD and its physical
>>> > storage. The proposed at rest encryption mechanism will be orthogonal
>>> > and, in some ways, superior to already existing solution.
>>> >
>>> > =======================
>>> > Owners
>>> > =======================
>>> >
>>> > * Radoslaw Zarzynski (Mirantis)
>>> > * Adam Kupczyk (Mirantis)
>>> >
>>> > =======================
>>> > Interested Parties
>>> > =======================
>>> >
>>> > If you are interested in contributing to this blueprint, or want to be
>>> > a "speaker" during the Summit session, list your name here.
>>> >
>>> > Name (Affiliation)
>>> > Name (Affiliation)
>>> > Name
>>> >
>>> > =======================
>>> > Current Status
>>> > =======================
>>> >
>>> > Current data at rest encryption is achieved through dm-crypt placed
>>> > under OSD’s filestore. This solution is a generic one and cannot
>>> > leverage Ceph-specific characteristics. The best example is that
>>> > encryption is done multiple times - one time for each replica. Another
>>> > issue is lack of granularity - either OSD encrypts nothing, or OSD
>>> > encrypts everything (with dm-crypt on).
>>> >
>>> > Cryptographic keys are stored on filesystem of storage node that hosts
>>> > OSDs. Changing them require redeploying the OSDs.
>>> >
>>> > The best way to address those issues seems to be introducing
>>> > encryption into Ceph OSD.
>>> >
>>> > =======================
>>> > Detailed Description
>>> > =======================
>>> >
>>> > In addition to the currently available solution, Ceph OSD would
>>> > accommodate encryption component placed in the replication mechanisms.
>>> >
>>> > Data incoming from Ceph clients would be encrypted by primary OSD. It
>>> > would replicate ciphertext to non-primary members of an acting set.
>>> > Data sent to Ceph client would be decrypted by OSD handling read
>>> > operation. This allows to:
>>> > * perform only one encryption per write,
>>> > * achieve per-pool key granulation for both key and encryption itself.
>>> >
>>> > Unfortunately, having always and everywhere the same key for a given
>>> > pool is unacceptable - it would make cluster migration and key change
>>> > extremely burdensome process. To address those issues crypto key
>>> > versioning would be introduced. All RADOS objects inside single
>>> > placement group stored on a given OSD would use the same crypto key
>>> > version. The same PG on other replica may use different version of the
>>> > same, per pool-granulated key.
>>> >
>>> > In typical case ciphertext data transferred from OSD to OSD can be
>>> > used without change. This is when both OSDs have the same crypto key
>>> > version for given placement group. In rare cases when crypto keys are
>>> > different (key change or transition period) receiving OSD will recrypt
>>> > with local key versions.
>>>
>>> I don't understand this part at all. Do you plan to read and rewrite
>>> the entire PG whenever you change the "key version"? How often do you
>>> plan to change these keys? What is even the point of changing them,
>>> since anybody who can control an OSD can grab the entire current key
>>> set?
>>
>> We envision that key change will happen very infrequently. Usually in
>> reaction to some possible security breach.
>> After key version is incremented, nothing happens automatically. Old key is
>> used for as long as  PG is not empty. When first RADOS object is created,
>> the current key version is locked to PG.
>> There is no solution when someone gets control over OSD - either by running
>> custom OSD binary or extracting data by impersonating client. It is outside
>> of scope of at-rest-encryption. We only addressed cases when media storage
>> somehow leaves datacenter premises. Ability to change key is necessary,
>> since we need procedure to recover data security after keys are compromised.
>>>
>>>
>>> > For compression to be effective it must be done before encryption. Due
>>> > to that encryption may be applied differently for replication pools
>>> > and EC pools. Replicated pools do not implement compression; for those
>>> > pools encryption is applied right after data enters OSD. For EC pools
>>> > encryption is applied after compressing. When compression will be
>>> > implemented for replicated pools, it must be placed before encryption.
>>>
>>> So this means you'll be encrypting the object data, but not the omap
>>> nor xattrs, and not the file names on disk. Is that acceptable to
>>> people? It's probably fine for a lot of rbd use cases, but not for
>>> RGW, CephFS, nor raw RADOS where meaningful metadata (and even *data*)
>>> is stored in those regions. I'd rather a solution worked on the full
>>> data set. :/
>>
>> We intend to encrypt:
>> - object data
>> - omap values
>> - xattr values
>> We consider to encrypt:
>> - object names
>> - xattr names
>> We are unable to encrypt:
>> - omap names
>
> Are there any encryption mechanisms that can efficiently and
> effectively encrypt pieces of data that small? I don't have any
> expertise in crypto but I thought you needed a certain minimum size of
> the output blob to get any security at all. If we're turning every
> 8-byte thing into a 64-byte thing that's going to go much worse for us
> than just having every OSD encrypt their local disk...
> -Greg

Hello Greg,

In addition to what Adam said:
 * stream ciphers don't have requirements on data length,
 * block cipher like AES may be turned into stream cipher
   using specific mode of operation (eg. CTR).

Regards,
Radoslaw
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-15 14:23 ` Lars Marowsky-Bree
  2015-12-15 14:59   ` Sage Weil
  2015-12-15 23:31   ` Matt Benjamin
@ 2015-12-16 22:29   ` Adam Kupczyk
  2015-12-16 22:33     ` Sage Weil
  2 siblings, 1 reply; 23+ messages in thread
From: Adam Kupczyk @ 2015-12-16 22:29 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

On Tue, Dec 15, 2015 at 3:23 PM, Lars Marowsky-Bree <lmb@suse.com> wrote:
> On 2015-12-14T14:17:08, Radoslaw Zarzynski <rzarzynski@mirantis.com> wrote:
>
> Hi all,
>
> great to see this revived.
>
> However, I have come to see some concerns with handling the encryption
> within Ceph itself.
>
> The key part to any such approach is formulating the threat scenario.
> For the use cases we have seen, the data-at-rest encryption matters so
> they can confidently throw away disks without leaking data. It's not
> meant as a defense against an online attacker. There usually is no
> problem with "a few" disks being privileged, or one or two nodes that
> need an admin intervention for booting (to enter some master encryption
> key somehow, somewhere).
>
> However, that requires *all* data on the OSDs to be encrypted.
>
> Crucially, that includes not just the file system meta data (so not just
> the data), but also the root and especially the swap partition. Those
> potentially include swapped out data, coredumps, logs, etc.
>
> (As an optional feature, it'd be cool if an OSD could be moved to a
> different chassis and continue operating there, to speed up recovery.
> Another optional feature would be to eventually be able, for those
> customers that trust them ;-), supply the key to the on-disk encryption
> (OPAL et al).)
>
> The proposal that Joshua posted a while ago essentially remained based
> on dm-crypt, but put in simple hooks to retrieve the keys from some
> "secured" server via sftp/ftps instead of loading them from the root fs.
> Similar to deo, that ties the key to being on the network and knowing
> the OSD UUID.
>
> This would then also be somewhat easily extensible to utilize the same
> key management server via initrd/dracut.
>
> Yes, this means that each OSD disk is separately encrypted, but given
> modern CPUs, this is less of a problem. It does have the benefit of
> being completely transparent to Ceph, and actually covering the whole
> node.
Agreed, if encryption is infinitely fast dm-crypt is best solution.
Below is short analysis of encryption burden for dm-crypt and
OSD-encryption when using replicated pools.

Summary:
OSD encryption requires 2.6 times less crypto operations then dm-crypt.
Crypto ops are bottleneck.
Possible solutions:
- make fewer crypto-ops (OSD based encryption can help)
- take crypto ops off CPU (H/W accelerators; not all are integrated
with kcrypto)

Calculations and explanations:
A) DM-CRYPT
When we use dm-crypt whole data and metadata is encrypted. In typical
deployment journal is located on different disc, but is also
encrypted.
On write data path is:
1) enc when writing to journal
2) dec when reading journal
3) enc when writing to storage
So for each byte 2-3 crypto operations are performed (2 can be skipped
if kernel's page allocated in 1 has not been evicted). Lets assume
2.5.
On read data path we have:
4) dec when reading from storage

Balance between reads and writes depends on deployment. Assuming 75%
are reads and 25% are writes and replication factor 3.
This gives us 1*0.75+2.5*0.25*3=2.625 bytes of crypto operation per
byte of disc i/o operation.

B) CRYPTO INSIDE OSD
When we do encryption in OSD less bytes are encrypted (dm-crypt has to
encrypt entire disc sectors); anyway we round it to 1.
Read requires 1 byte crypto op per byte. (when data comes from client)
Write requires 1 byte crypto op per byte. (when data goes to client)
This gives us 1*0.75+1*0.25=1 byte of crypto op per disc i/o.

C) OSD I/O performance calculation
Lets assume encryption speed 600MB/s per CPU core. (using AES-NI Haswell [1])
This gives us 600/2.625 = 229MB for dm-crypt and 600MB/s for OSD located crypt.
Usually there are few discs per CPU core in storage nodes. Lets say 6.
6xHDD=~600MB/s
6xSSD=~6000MB/s

It is clear that crypto is limit for speed.

https://software.intel.com/en-us/articles/intel-aes-ni-performance-enhancements-hytrust-datacontrol-case-study
>
> Of course, one of the key issues is always the key server.
> Putting/retrieving/deleting keys is reasonably simple, but the question
> of how to ensure HA for it is a bit tricky. But doable; people have been
> building HA ftp/http servers for a while ;-) Also, a single key server
> setup could theoretically serve multiple Ceph clusters.
>
> It's not yet perfect, but I think the approach is superior to being
> implemented in Ceph natively. If there's any encryption that should be
> implemented in Ceph, I believe it'd be the on-the-wire encryption to
> protect against evasedroppers.
>
> Other scenarios would require client-side encryption.
>
>> Current data at rest encryption is achieved through dm-crypt placed
>> under OSD’s filestore. This solution is a generic one and cannot
>> leverage Ceph-specific characteristics. The best example is that
>> encryption is done multiple times - one time for each replica. Another
>> issue is lack of granularity - either OSD encrypts nothing, or OSD
>> encrypts everything (with dm-crypt on).
>
> True. But for the threat scenario, a holistic approach to encryption
> seems actually required.
>
>> Cryptographic keys are stored on filesystem of storage node that hosts
>> OSDs. Changing them require redeploying the OSDs.
>
> This is solvable by storing the key on an external key server.
>
> Changing the key is only necessary if the key has been exposed. And with
> dm-crypt, that's still possible - it's not the actual encryption key
> that's stored, but the secret that is needed to unlock it, and that can
> be re-encrypted quite fast. (In theory; it's not implemented yet for
> the Ceph OSDs.)
>
>
>> Data incoming from Ceph clients would be encrypted by primary OSD. It
>> would replicate ciphertext to non-primary members of an acting set.
>
> This still exposes data in coredumps or on swap on the primary OSD, and
> metadata on the secondaries.
>
>
> Regards,
>     Lars
>
> --
> Architect Storage/HA
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-16 22:29   ` Adam Kupczyk
@ 2015-12-16 22:33     ` Sage Weil
  2015-12-21  8:21       ` Adam Kupczyk
  0 siblings, 1 reply; 23+ messages in thread
From: Sage Weil @ 2015-12-16 22:33 UTC (permalink / raw)
  To: Adam Kupczyk; +Cc: Lars Marowsky-Bree, Ceph Development

On Wed, 16 Dec 2015, Adam Kupczyk wrote:
> On Tue, Dec 15, 2015 at 3:23 PM, Lars Marowsky-Bree <lmb@suse.com> wrote:
> > On 2015-12-14T14:17:08, Radoslaw Zarzynski <rzarzynski@mirantis.com> wrote:
> >
> > Hi all,
> >
> > great to see this revived.
> >
> > However, I have come to see some concerns with handling the encryption
> > within Ceph itself.
> >
> > The key part to any such approach is formulating the threat scenario.
> > For the use cases we have seen, the data-at-rest encryption matters so
> > they can confidently throw away disks without leaking data. It's not
> > meant as a defense against an online attacker. There usually is no
> > problem with "a few" disks being privileged, or one or two nodes that
> > need an admin intervention for booting (to enter some master encryption
> > key somehow, somewhere).
> >
> > However, that requires *all* data on the OSDs to be encrypted.
> >
> > Crucially, that includes not just the file system meta data (so not just
> > the data), but also the root and especially the swap partition. Those
> > potentially include swapped out data, coredumps, logs, etc.
> >
> > (As an optional feature, it'd be cool if an OSD could be moved to a
> > different chassis and continue operating there, to speed up recovery.
> > Another optional feature would be to eventually be able, for those
> > customers that trust them ;-), supply the key to the on-disk encryption
> > (OPAL et al).)
> >
> > The proposal that Joshua posted a while ago essentially remained based
> > on dm-crypt, but put in simple hooks to retrieve the keys from some
> > "secured" server via sftp/ftps instead of loading them from the root fs.
> > Similar to deo, that ties the key to being on the network and knowing
> > the OSD UUID.
> >
> > This would then also be somewhat easily extensible to utilize the same
> > key management server via initrd/dracut.
> >
> > Yes, this means that each OSD disk is separately encrypted, but given
> > modern CPUs, this is less of a problem. It does have the benefit of
> > being completely transparent to Ceph, and actually covering the whole
> > node.
> Agreed, if encryption is infinitely fast dm-crypt is best solution.
> Below is short analysis of encryption burden for dm-crypt and
> OSD-encryption when using replicated pools.
> 
> Summary:
> OSD encryption requires 2.6 times less crypto operations then dm-crypt.

Yeah, I believe that, but

> Crypto ops are bottleneck.

is this really true?  I don't think we've tried to measure performance 
with dm-crypt, but I also have never heard anyone complain about the 
additional CPU utilization or performance impact.  Have you observed this?

sage

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-16 22:33     ` Sage Weil
@ 2015-12-21  8:21       ` Adam Kupczyk
  2016-01-18  8:05         ` Adam Kupczyk
  0 siblings, 1 reply; 23+ messages in thread
From: Adam Kupczyk @ 2015-12-21  8:21 UTC (permalink / raw)
  To: Sage Weil; +Cc: Lars Marowsky-Bree, Ceph Development

On Wed, Dec 16, 2015 at 11:33 PM, Sage Weil <sage@newdream.net> wrote:
> On Wed, 16 Dec 2015, Adam Kupczyk wrote:
>> On Tue, Dec 15, 2015 at 3:23 PM, Lars Marowsky-Bree <lmb@suse.com> wrote:
>> > On 2015-12-14T14:17:08, Radoslaw Zarzynski <rzarzynski@mirantis.com> wrote:
>> >
>> > Hi all,
>> >
>> > great to see this revived.
>> >
>> > However, I have come to see some concerns with handling the encryption
>> > within Ceph itself.
>> >
>> > The key part to any such approach is formulating the threat scenario.
>> > For the use cases we have seen, the data-at-rest encryption matters so
>> > they can confidently throw away disks without leaking data. It's not
>> > meant as a defense against an online attacker. There usually is no
>> > problem with "a few" disks being privileged, or one or two nodes that
>> > need an admin intervention for booting (to enter some master encryption
>> > key somehow, somewhere).
>> >
>> > However, that requires *all* data on the OSDs to be encrypted.
>> >
>> > Crucially, that includes not just the file system meta data (so not just
>> > the data), but also the root and especially the swap partition. Those
>> > potentially include swapped out data, coredumps, logs, etc.
>> >
>> > (As an optional feature, it'd be cool if an OSD could be moved to a
>> > different chassis and continue operating there, to speed up recovery.
>> > Another optional feature would be to eventually be able, for those
>> > customers that trust them ;-), supply the key to the on-disk encryption
>> > (OPAL et al).)
>> >
>> > The proposal that Joshua posted a while ago essentially remained based
>> > on dm-crypt, but put in simple hooks to retrieve the keys from some
>> > "secured" server via sftp/ftps instead of loading them from the root fs.
>> > Similar to deo, that ties the key to being on the network and knowing
>> > the OSD UUID.
>> >
>> > This would then also be somewhat easily extensible to utilize the same
>> > key management server via initrd/dracut.
>> >
>> > Yes, this means that each OSD disk is separately encrypted, but given
>> > modern CPUs, this is less of a problem. It does have the benefit of
>> > being completely transparent to Ceph, and actually covering the whole
>> > node.
>> Agreed, if encryption is infinitely fast dm-crypt is best solution.
>> Below is short analysis of encryption burden for dm-crypt and
>> OSD-encryption when using replicated pools.
>>
>> Summary:
>> OSD encryption requires 2.6 times less crypto operations then dm-crypt.
>
> Yeah, I believe that, but
>
>> Crypto ops are bottleneck.
>
> is this really true?  I don't think we've tried to measure performance
> with dm-crypt, but I also have never heard anyone complain about the
> additional CPU utilization or performance impact.  Have you observed this?
I made tests, mostly on my i7-4910MQ 2.9GHz(4 cores) with SSD.
The results for write were appallingly low, I guess due to kernel
problems with multi-cpu kcrypto[1]. I will not mention them, these
results will obfuscate discussion. And newer kernels >4.0.2 do fixes
the issue.

The results for read were 350MB/s, but CPU utilization was 44% in
kcrypto kernel worker(single core). This effectively means 11 % of
total crypto capacity, because intel-optimized AES-NI instruction is
used almost every cycle, making hyperthreading useless.

[1] http://unix.stackexchange.com/questions/203677/abysmal-general-dm-crypt-luks-write-performance
>
> sage

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-21  8:21       ` Adam Kupczyk
@ 2016-01-18  8:05         ` Adam Kupczyk
  2016-01-18 12:21           ` Lars Marowsky-Bree
  0 siblings, 1 reply; 23+ messages in thread
From: Adam Kupczyk @ 2016-01-18  8:05 UTC (permalink / raw)
  To: Sage Weil; +Cc: Lars Marowsky-Bree, Ceph Development

Hi All,

Previously I was making calculations based on assumptions that 25%
operations are write and 75% are read.

Lately I was working on customer's issues on Ceph deployment.
I extracted actual proportions of reads and writes:
pool read     write
X    11.6 G   35.8 G
Y    30.1 G   48.1 G
Z    21.7 G   37.5 G
V    10.6 G    7.0 G
--------------------
sum  74.0 G  130.4 G
     36%      64%

Plugging this into calculations I was using previously, gives us:
1) Dmcrypt:
1*0.36+2.5*0.64*3 = 5.16 bytes of crypto operations per byte of io data.
2) potential inside OSD encryption
1*0.36+1*0.64 = 1 byte of crypto operations per byte of io data.

This further deepens my concern that crypto transformations may be
limit for performance.

Best regards,
Adam Kupczyk

On Mon, Dec 21, 2015 at 9:21 AM, Adam Kupczyk <akupczyk@mirantis.com> wrote:
> On Wed, Dec 16, 2015 at 11:33 PM, Sage Weil <sage@newdream.net> wrote:
>> On Wed, 16 Dec 2015, Adam Kupczyk wrote:
>>> On Tue, Dec 15, 2015 at 3:23 PM, Lars Marowsky-Bree <lmb@suse.com> wrote:
>>> > On 2015-12-14T14:17:08, Radoslaw Zarzynski <rzarzynski@mirantis.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > great to see this revived.
>>> >
>>> > However, I have come to see some concerns with handling the encryption
>>> > within Ceph itself.
>>> >
>>> > The key part to any such approach is formulating the threat scenario.
>>> > For the use cases we have seen, the data-at-rest encryption matters so
>>> > they can confidently throw away disks without leaking data. It's not
>>> > meant as a defense against an online attacker. There usually is no
>>> > problem with "a few" disks being privileged, or one or two nodes that
>>> > need an admin intervention for booting (to enter some master encryption
>>> > key somehow, somewhere).
>>> >
>>> > However, that requires *all* data on the OSDs to be encrypted.
>>> >
>>> > Crucially, that includes not just the file system meta data (so not just
>>> > the data), but also the root and especially the swap partition. Those
>>> > potentially include swapped out data, coredumps, logs, etc.
>>> >
>>> > (As an optional feature, it'd be cool if an OSD could be moved to a
>>> > different chassis and continue operating there, to speed up recovery.
>>> > Another optional feature would be to eventually be able, for those
>>> > customers that trust them ;-), supply the key to the on-disk encryption
>>> > (OPAL et al).)
>>> >
>>> > The proposal that Joshua posted a while ago essentially remained based
>>> > on dm-crypt, but put in simple hooks to retrieve the keys from some
>>> > "secured" server via sftp/ftps instead of loading them from the root fs.
>>> > Similar to deo, that ties the key to being on the network and knowing
>>> > the OSD UUID.
>>> >
>>> > This would then also be somewhat easily extensible to utilize the same
>>> > key management server via initrd/dracut.
>>> >
>>> > Yes, this means that each OSD disk is separately encrypted, but given
>>> > modern CPUs, this is less of a problem. It does have the benefit of
>>> > being completely transparent to Ceph, and actually covering the whole
>>> > node.
>>> Agreed, if encryption is infinitely fast dm-crypt is best solution.
>>> Below is short analysis of encryption burden for dm-crypt and
>>> OSD-encryption when using replicated pools.
>>>
>>> Summary:
>>> OSD encryption requires 2.6 times less crypto operations then dm-crypt.
>>
>> Yeah, I believe that, but
>>
>>> Crypto ops are bottleneck.
>>
>> is this really true?  I don't think we've tried to measure performance
>> with dm-crypt, but I also have never heard anyone complain about the
>> additional CPU utilization or performance impact.  Have you observed this?
> I made tests, mostly on my i7-4910MQ 2.9GHz(4 cores) with SSD.
> The results for write were appallingly low, I guess due to kernel
> problems with multi-cpu kcrypto[1]. I will not mention them, these
> results will obfuscate discussion. And newer kernels >4.0.2 do fixes
> the issue.
>
> The results for read were 350MB/s, but CPU utilization was 44% in
> kcrypto kernel worker(single core). This effectively means 11 % of
> total crypto capacity, because intel-optimized AES-NI instruction is
> used almost every cycle, making hyperthreading useless.
>
> [1] http://unix.stackexchange.com/questions/203677/abysmal-general-dm-crypt-luks-write-performance
>>
>> sage

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2016-01-18  8:05         ` Adam Kupczyk
@ 2016-01-18 12:21           ` Lars Marowsky-Bree
  2016-01-25 19:05             ` Radoslaw Zarzynski
  0 siblings, 1 reply; 23+ messages in thread
From: Lars Marowsky-Bree @ 2016-01-18 12:21 UTC (permalink / raw)
  To: Ceph Development

On 2016-01-18T09:05:58, Adam Kupczyk <akupczyk@mirantis.com> wrote:

Hi Adam,

> Plugging this into calculations I was using previously, gives us:
> 1) Dmcrypt:
> 1*0.36+2.5*0.64*3 = 5.16 bytes of crypto operations per byte of io data.
> 2) potential inside OSD encryption
> 1*0.36+1*0.64 = 1 byte of crypto operations per byte of io data.
> 
> This further deepens my concern that crypto transformations may be
> limit for performance.

I see your concern, but my primary concern is not about performance,
rather security.

By not encrypting the entire OSD device, one becomes susceptible to
metadata analysis (on the file store), data exposure, etc. (Plus,
obviously, that the system devices need to be encrypted to avoid data
leaks via logs, swap, coredumps etc.)

It doesn't help my use case that your implementation is theoretically
faster if it doesn't fit the threat scenario.

I'd obviously be delighted to see this all sped up (and consume less
power), but as long as the system is fast enough to encrypt at
near-device speeds, this seems preferable.

I'm not opposed to your implementation - I just couldn't sell it to my
customers for data-at-rest encryption.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2015-12-14 13:17 Improving Data-At-Rest encryption in Ceph Radoslaw Zarzynski
                   ` (2 preceding siblings ...)
  2015-12-15 14:23 ` Lars Marowsky-Bree
@ 2016-01-24 13:45 ` John Hunter
  3 siblings, 0 replies; 23+ messages in thread
From: John Hunter @ 2016-01-24 13:45 UTC (permalink / raw)
  To: Radoslaw Zarzynski; +Cc: Ceph Development, Adam Kupczyk

Hi Radoslaw,

I am really interested in your proposal, is this WIP?
Can I get to know some details?

On Mon, Dec 14, 2015 at 9:17 PM, Radoslaw Zarzynski
<rzarzynski@mirantis.com> wrote:
> Hello Folks,
>
> I would like to publish a proposal regarding improvements to Ceph
> data-at-rest encryption mechanism. Adam Kupczyk and I worked
> on that in last weeks.
>
> Initially we considered several architectural approaches and made
> several iterations of discussions with Intel storage group. The proposal
> is condensed description of the solution we see as the most promising
> one.
>
> We are open to any comments and questions.
>
> Regards,
> Adam Kupczyk
> Radoslaw Zarzynski
>
>
> =======================
> Summary
> =======================
>
> Data at-rest encryption is mechanism for protecting data center
> operator from revealing content of physical carriers.
>
> Ceph already implements a form of at rest encryption. It is performed
> through dm-crypt as intermediary layer between OSD and its physical
> storage. The proposed at rest encryption mechanism will be orthogonal
> and, in some ways, superior to already existing solution.
>
> =======================
> Owners
> =======================
>
> * Radoslaw Zarzynski (Mirantis)
> * Adam Kupczyk (Mirantis)
>
> =======================
> Interested Parties
> =======================
>
> If you are interested in contributing to this blueprint, or want to be
> a "speaker" during the Summit session, list your name here.
>
> Name (Affiliation)
> Name (Affiliation)
> Name
>
> =======================
> Current Status
> =======================
>
> Current data at rest encryption is achieved through dm-crypt placed
> under OSD’s filestore. This solution is a generic one and cannot
> leverage Ceph-specific characteristics. The best example is that
> encryption is done multiple times - one time for each replica. Another
> issue is lack of granularity - either OSD encrypts nothing, or OSD
> encrypts everything (with dm-crypt on).
>
> Cryptographic keys are stored on filesystem of storage node that hosts
> OSDs. Changing them require redeploying the OSDs.
>
> The best way to address those issues seems to be introducing
> encryption into Ceph OSD.
>
> =======================
> Detailed Description
> =======================
>
> In addition to the currently available solution, Ceph OSD would
> accommodate encryption component placed in the replication mechanisms.
>
> Data incoming from Ceph clients would be encrypted by primary OSD. It
> would replicate ciphertext to non-primary members of an acting set.
> Data sent to Ceph client would be decrypted by OSD handling read
> operation. This allows to:
> * perform only one encryption per write,
> * achieve per-pool key granulation for both key and encryption itself.
>
> Unfortunately, having always and everywhere the same key for a given
> pool is unacceptable - it would make cluster migration and key change
> extremely burdensome process. To address those issues crypto key
> versioning would be introduced. All RADOS objects inside single
> placement group stored on a given OSD would use the same crypto key
> version. The same PG on other replica may use different version of the
> same, per pool-granulated key.
>
> In typical case ciphertext data transferred from OSD to OSD can be
> used without change. This is when both OSDs have the same crypto key
> version for given placement group. In rare cases when crypto keys are
> different (key change or transition period) receiving OSD will recrypt
> with local key versions.
>
> For compression to be effective it must be done before encryption. Due
> to that encryption may be applied differently for replication pools
> and EC pools. Replicated pools do not implement compression; for those
> pools encryption is applied right after data enters OSD. For EC pools
> encryption is applied after compressing. When compression will be
> implemented for replicated pools, it must be placed before encryption.
>
> Ceph currently has thin abstraction layer over block ciphers
> (CryptoHandler, CryptoKeyHandler). We want to extend this API to
> introduce initialization vectors, chaining modes and asynchronous
> operations. Implementation of this API may be based on AF_ALG kernel
> interface. This assures the ability to use hardware accelerations
> already implemented in Linux kernel. Moreover, due to working on
> bigger chunks (dm-crypt operates on 512 byte long sectors) the raw
> encryption performance may be even higher.
>
> The encryption process must not impede random reads and random writes
> to RADOS objects. Solution for this is to create encryption/decryption
> process that will be applicable for arbitrary data range. This can be
> done most easily by applying chaining mode that doesn’t impose
> dependencies between subsequent data chunks. Good candidates are
> CTR[1] and XTS[2].
>
> Encryption-related metadata would be stored in extended attributes.
>
> In order to coordinate encryption across acting set, all replicas will
> share information about crypto key versions they use. Real
> cryptographic keys never be stored permanently by Ceph OSD. Instead,
> it would be gathered from monitors. Key management improvements will
> be addressed in separate task based on dedicated proposal [3].
>
>
> [1] https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29
>
> [2] https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_.28XTS.29
>
> [3] http://tracker.ceph.com/projects/ceph/wiki/Osd_-_simple_ceph-mon_dm-crypt_key_management
>
> =======================
> Work items
> =======================
>
> Coding tasks
> * Extended Crypto API (CryptoHandler, CryptoKeyHandler).
> * Encryption for replicated pools.
> * Encryption for EC pools.
> * Key management.
>
> Build / release tasks
> * Unit tests for extended Crypto API.
> * Functional tests for encrypted replicated pools.
> * Functional tests for encrypted EC pools.
>
> Documentation tasks
> * Document extended Crypto API.
> * Document migration procedures.
> * Document crypto key creation and versioning.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2016-01-18 12:21           ` Lars Marowsky-Bree
@ 2016-01-25 19:05             ` Radoslaw Zarzynski
  2016-01-25 22:20               ` Kyle Bader
  0 siblings, 1 reply; 23+ messages in thread
From: Radoslaw Zarzynski @ 2016-01-25 19:05 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

Hello Lars,

> By not encrypting the entire OSD device, one becomes susceptible to
> metadata analysis (on the file store), data exposure, etc.

Agreed, filesystem-related metadata like mtime, ctime and size will be
available.

Alll data related to a RADOS object will be encrypted. This includes:
 * object name,
 * object data,
 * user attributes names,
 * user attributes values,
 * OMAP header content,
 * OMAP attributes names (keys),
 * OMAP values.
Metadata of RADOS object that will be available: data size, OMAP header
size, existence of user/OMAP attributes. For many use cases (eg. RBD)
this isn't an issue.

> I'd obviously be delighted to see this all sped up (and consume less
> power), but as long as the system is fast enough to encrypt at
> near-device speeds, this seems preferable.

I can agree that in case of HDDs full disk encryption is an affordable
solution. This is a consequence of relatively low throughput of a magnetic
carrier in comparison to modern crypto performance.
However, the status quo is challenged by proliferation of fast SSDs.
It drives demand for crypto performance much higher. HW acceleration
would become must-have.

There are at least three approaches that can be combined to form a holistic
solution:
 1) minimizing amount of plaintext through per-poor granulation,
 2) skipping repetitive encryption,
 3) utilizing full performance of an accelerator.

dm-crypt is not optimized for advanced HW accelerators. Using the same
accelerator in different way gives more performance gain.

In summary: expected performance benefits may justify introducing mechanism
targeting those use cases where covering full set of filesystem metadata (ctime,
mtime, size) isn't required.

Best regards,
Adam Kupczyk,
Radoslaw Zarzynski

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Improving Data-At-Rest encryption in Ceph
  2016-01-25 19:05             ` Radoslaw Zarzynski
@ 2016-01-25 22:20               ` Kyle Bader
  0 siblings, 0 replies; 23+ messages in thread
From: Kyle Bader @ 2016-01-25 22:20 UTC (permalink / raw)
  To: Radoslaw Zarzynski; +Cc: Lars Marowsky-Bree, Ceph Development

> I can agree that in case of HDDs full disk encryption is an affordable
> solution. This is a consequence of relatively low throughput of a magnetic
> carrier in comparison to modern crypto performance.
> However, the status quo is challenged by proliferation of fast SSDs.
> It drives demand for crypto performance much higher. HW acceleration
> would become must-have.

Right, most workloads are likely bottlenecked by seeks - not math.

>> Implementation of this API may be based on AF_ALG kernel
>> interface. This assures the ability to use hardware accelerations
>> already implemented in Linux kernel. Moreover, due to working on
>> bigger chunks (dm-crypt operates on 512 byte long sectors) the raw
>> encryption performance may be even higher.
>
> There are at least three approaches that can be combined to form a holistic
> solution:
>  1) minimizing amount of plaintext through per-poor granulation,
>  2) skipping repetitive encryption,
>  3) utilizing full performance of an accelerator.
>
> dm-crypt is not optimized for advanced HW accelerators. Using the same
> accelerator in different way gives more performance gain.

What hardware accelerators are we talking about here?

Intel's AES_NI instructions are already helping accelerate dm-crypt, and
you don't need AF_ALG in userland to use them either. Is there a significant
difference in cost between crypto PCIe card(s), and simply buying SED SSDs?

It seems to me that if you are willing to trust a black box (crypto
card), you might
as well trust the drive manufacturer to do your encryption. The benefit of SED
would be that the crypo throughput increases linearly with the storage.

-- 

Kyle Bader

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-01-25 22:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-14 13:17 Improving Data-At-Rest encryption in Ceph Radoslaw Zarzynski
2015-12-14 20:28 ` Gregory Farnum
2015-12-14 22:02   ` Martin Millnert
2015-12-14 22:32     ` Gregory Farnum
2015-12-16  2:13       ` Andrew Bartlett
2015-12-15 10:13     ` Adam Kupczyk
2015-12-15 10:04   ` Adam Kupczyk
     [not found]   ` <CAHMeWhGgHWq=jPZfj8s_KCB=wLhsBNCyJjZSBQQFZXc8r63M7A@mail.gmail.com>
2015-12-15 21:04     ` Gregory Farnum
2015-12-16 15:13       ` Adam Kupczyk
2015-12-16 15:36       ` Radoslaw Zarzynski
2015-12-14 21:52 ` Martin Millnert
2015-12-15 20:40   ` Radoslaw Zarzynski
2015-12-15 14:23 ` Lars Marowsky-Bree
2015-12-15 14:59   ` Sage Weil
2015-12-15 23:31   ` Matt Benjamin
2015-12-16 22:29   ` Adam Kupczyk
2015-12-16 22:33     ` Sage Weil
2015-12-21  8:21       ` Adam Kupczyk
2016-01-18  8:05         ` Adam Kupczyk
2016-01-18 12:21           ` Lars Marowsky-Bree
2016-01-25 19:05             ` Radoslaw Zarzynski
2016-01-25 22:20               ` Kyle Bader
2016-01-24 13:45 ` John Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.