All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-10 10:03 ` Dr. Greg Wettstein
  0 siblings, 0 replies; 37+ messages in thread
From: Dr. Greg Wettstein @ 2017-02-10 10:03 UTC (permalink / raw)
  To: James Bottomley, greg, Jarkko Sakkinen
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Feb 9, 11:24am, James Bottomley wrote:
} Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global sessi

Good morning to everyone.

> On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> > Referring back to Ken's comments about having 20+ clients waiting to
> > get access to the hardware.  Even with the focus in TPM2 on having it
> > be more of a cryptographic accelerator are we convinced that the
> > hardware is ever going to be fast enough for a model of having it
> > directly service large numbers of transactions in something like a
> > 'cloud' model?

> It's already in use as such today:
> 
> https://tectonic.com/assets/pdf/TectonicTrustedComputing.pdf

We are familiar with this work.  I'm not sure, however, that this work
is representative of the notion of using TPM hardware to support a
transactional environment, particularly at the cloud/container level.

There is not a great deal of technical detail on the CoreOS integrity
architecture but it appears they are using TPM hardware to validate
container integrity.  I'm not sure this type of environment reflects
the ability of TPM hardware to support transactional throughputs in an
environment such as financial transaction processing.

Intel's Clear Container work cites the need to achieve container
startup times of 150 milliseconds and they are currently claiming 45
milliseconds as their optimal time.  This work was designed to
demonstrate the feasibility of providing virtual machine isolation
guarantees to containers and as such one of the mandates was to
achieve container start times comparable to standard namespaces.

I ran some very rough timing metrics on one of our Skylake development
systems with hardware TPM2 support.  Here are the elapsed times for
two common verification operations which I assume would be at the
heart of generating any type of reasonable integrity guarantee:

quote: 810 milliseconds
verify signature: 635 milliseconds

This is with the verifying key loaded into the chip.  The elapsed time
to load and validate a key into the chip averages 1200 milliseconds.
Since we are discussing a resource manager which would be shuttling
context into and out of the limited resource slots on the chip I
believe it is valid to consider this overhead as well.

This suggests that just a signature verification on the integrity of a
container is a factor of 4.2 times greater then a well accepted start
time metric for container technology.

Based on that I'm assuming that if TPM based integrity guarantees are
being implemented they are only on ingress of the container into the
cloud environment.  I'm assuming an alternate methodology must be in
place to protect against time of measurement/time of use issues.

Maybe people have better TPM2 hardware then what we have.  I was going
to run this on a Kaby Lake reference system but it appears that TXT is
causing some type of context depletion problems which we we need to
run down.

> We're also planning something like this in the IBM Cloud.

I assume if there is an expection of true transactional times you
either will have better hardware then current generation TPM2
technology.  Either that or I assume you will be using userspace
simulators anchored with a hardware TPM trust root.

Ken's reflection of having 21-22 competing transactions would appear
to have problematic latency issues given our measurements.

I influence engineering for a company which builds deterministically
modeled Linux platforms.  We've spent a lot of time considering TPM2
hardware bottlenecks since they constrain the rate at which we can
validate platform behavioral measurements.

We have a variation of this work which allows SGX OCALL's to validate
platform behavior in order to provide a broader TCB resource spectrum
to the enclave and hardware TPM performance is problematic there as
well.

> James

Have a good weekend.

Greg

}-- End of excerpt from James Bottomley

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"After being a technician for 2 years, I've discovered if people took
 care of their health with the same reckless abandon as their computers,
 half would be at the kitchen table on the phone with the hospital, trying
 to remove their appendix with a butter knife."
                                -- Brian Jones

-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-10 10:03 ` Dr. Greg Wettstein
  0 siblings, 0 replies; 37+ messages in thread
From: Dr. Greg Wettstein @ 2017-02-10 10:03 UTC (permalink / raw)
  To: James Bottomley, greg-R92VP3DqSWVWk0Htik3J/w, Jarkko Sakkinen
  Cc: tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Ken Goldman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Feb 9, 11:24am, James Bottomley wrote:
} Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global sessi

Good morning to everyone.

> On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> > Referring back to Ken's comments about having 20+ clients waiting to
> > get access to the hardware.  Even with the focus in TPM2 on having it
> > be more of a cryptographic accelerator are we convinced that the
> > hardware is ever going to be fast enough for a model of having it
> > directly service large numbers of transactions in something like a
> > 'cloud' model?

> It's already in use as such today:
> 
> https://tectonic.com/assets/pdf/TectonicTrustedComputing.pdf

We are familiar with this work.  I'm not sure, however, that this work
is representative of the notion of using TPM hardware to support a
transactional environment, particularly at the cloud/container level.

There is not a great deal of technical detail on the CoreOS integrity
architecture but it appears they are using TPM hardware to validate
container integrity.  I'm not sure this type of environment reflects
the ability of TPM hardware to support transactional throughputs in an
environment such as financial transaction processing.

Intel's Clear Container work cites the need to achieve container
startup times of 150 milliseconds and they are currently claiming 45
milliseconds as their optimal time.  This work was designed to
demonstrate the feasibility of providing virtual machine isolation
guarantees to containers and as such one of the mandates was to
achieve container start times comparable to standard namespaces.

I ran some very rough timing metrics on one of our Skylake development
systems with hardware TPM2 support.  Here are the elapsed times for
two common verification operations which I assume would be at the
heart of generating any type of reasonable integrity guarantee:

quote: 810 milliseconds
verify signature: 635 milliseconds

This is with the verifying key loaded into the chip.  The elapsed time
to load and validate a key into the chip averages 1200 milliseconds.
Since we are discussing a resource manager which would be shuttling
context into and out of the limited resource slots on the chip I
believe it is valid to consider this overhead as well.

This suggests that just a signature verification on the integrity of a
container is a factor of 4.2 times greater then a well accepted start
time metric for container technology.

Based on that I'm assuming that if TPM based integrity guarantees are
being implemented they are only on ingress of the container into the
cloud environment.  I'm assuming an alternate methodology must be in
place to protect against time of measurement/time of use issues.

Maybe people have better TPM2 hardware then what we have.  I was going
to run this on a Kaby Lake reference system but it appears that TXT is
causing some type of context depletion problems which we we need to
run down.

> We're also planning something like this in the IBM Cloud.

I assume if there is an expection of true transactional times you
either will have better hardware then current generation TPM2
technology.  Either that or I assume you will be using userspace
simulators anchored with a hardware TPM trust root.

Ken's reflection of having 21-22 competing transactions would appear
to have problematic latency issues given our measurements.

I influence engineering for a company which builds deterministically
modeled Linux platforms.  We've spent a lot of time considering TPM2
hardware bottlenecks since they constrain the rate at which we can
validate platform behavioral measurements.

We have a variation of this work which allows SGX OCALL's to validate
platform behavior in order to provide a broader TCB resource spectrum
to the enclave and hardware TPM performance is problematic there as
well.

> James

Have a good weekend.

Greg

}-- End of excerpt from James Bottomley

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg-R92VP3DqSWVWk0Htik3J/w@public.gmane.org
------------------------------------------------------------------------------
"After being a technician for 2 years, I've discovered if people took
 care of their health with the same reckless abandon as their computers,
 half would be at the kitchen table on the phone with the hospital, trying
 to remove their appendix with a butter knife."
                                -- Brian Jones

-- 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-10 16:46   ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-02-10 16:46 UTC (permalink / raw)
  To: greg, Jarkko Sakkinen
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Fri, 2017-02-10 at 04:03 -0600, Dr. Greg Wettstein wrote:
> On Feb 9, 11:24am, James Bottomley wrote:
> } Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for
> global sessi
> 
> Good morning to everyone.

Is there any way you could fix your email client?  It's setting In
-Reply-To: headers like this

In-reply-to: James Bottomley <James.Bottomley@HansenPartnership.com> "Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion" (Feb  9, 11:24am)

Not using the message id breaks threading for everyone.

> > On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> > > Referring back to Ken's comments about having 20+ clients waiting
> > > to
> > > get access to the hardware.  Even with the focus in TPM2 on
> > > having it
> > > be more of a cryptographic accelerator are we convinced that the
> > > hardware is ever going to be fast enough for a model of having it
> > > directly service large numbers of transactions in something like
> > > a
> > > 'cloud' model?
> 
> > It's already in use as such today:
> > 
> > https://tectonic.com/assets/pdf/TectonicTrustedComputing.pdf
> 
> We are familiar with this work.  I'm not sure, however, that this 
> work is representative of the notion of using TPM hardware to support 
> a transactional environment, particularly at the cloud/container
> level.

It allows for cloud clients to request attestations.  The next step is
to allow containers to provision key material and PCR locked blobs
securely to the TPM for use by correctly attested containers all of
those are cloud scale use cases.

> There is not a great deal of technical detail on the CoreOS integrity
> architecture but it appears they are using TPM hardware to validate
> container integrity.  I'm not sure this type of environment reflects
> the ability of TPM hardware to support transactional throughputs in 
> an environment such as financial transaction processing.

OK, so in the cloud neither key provisioning nor attestation has a huge
latency requirement.  This appears to be your concern?  All I'd say is
that the fact that there are use cases that can work at cloud scale
doesn't mean that every use case can.

> Intel's Clear Container work cites the need to achieve container
> startup times of 150 milliseconds and they are currently claiming 45
> milliseconds as their optimal time.  This work was designed to
> demonstrate the feasibility of providing virtual machine isolation
> guarantees to containers and as such one of the mandates was to
> achieve container start times comparable to standard namespaces.

There are ephemeral container use cases where the lifetimes are of this
order, but they're not every use case (In fact, even in the devops
environment, they're still a minority).

> I ran some very rough timing metrics on one of our Skylake
> development systems with hardware TPM2 support.  Here are the elapsed
> times for two common verification operations which I assume would be
> at the heart of generating any type of reasonable integrity
> guarantee:
> 
> quote: 810 milliseconds
> verify signature: 635 milliseconds

That's interesting, my Skylake system has these figures down around
100ms or so ... however, I agree that 100ms is the order of this. 
 Which is still significant compared to container start times.

> This is with the verifying key loaded into the chip.  The elapsed
> time to load and validate a key into the chip averages 1200
> milliseconds. Since we are discussing a resource manager which would
> be shuttling context into and out of the limited resource slots on
> the chip I believe it is valid to consider this overhead as well.
> 
> This suggests that just a signature verification on the integrity of 
> a container is a factor of 4.2 times greater then a well accepted 
> start time metric for container technology.

Part of the way of reducing the latency is not to use the TPM for
things that don't require secrecy: container signature verification is
one such because the container is signed with a private key to which
you know the public component ... you can verify it on the host without
needing to trouble the TPM.  We only use the TPM for state quotes,
unsealing and signature generation.

> Based on that I'm assuming that if TPM based integrity guarantees are
> being implemented they are only on ingress of the container into the
> cloud environment.  I'm assuming an alternate methodology must be in
> place to protect against time of measurement/time of use issues.
> 
> Maybe people have better TPM2 hardware then what we have.  I was 
> going to run this on a Kaby Lake reference system but it appears that 
> TXT is causing some type of context depletion problems which we we 
> need to run down.
> 
> > We're also planning something like this in the IBM Cloud.
> 
> I assume if there is an expection of true transactional times you
> either will have better hardware then current generation TPM2
> technology.  Either that or I assume you will be using userspace
> simulators anchored with a hardware TPM trust root.

vTPM is a possibility, yes, so is making the TPM faster.

> Ken's reflection of having 21-22 competing transactions would appear
> to have problematic latency issues given our measurements.

Consider the canonical use case to be VPNaaS with a secure connection
back to the enterprise and the client key being the privacy guarded
material.  The signature generation is once per channel re-key and you
have up to half the re-key interval to generate the re-key over the
control channel.  In this use case, latency isn't a problem (most re
-key intervals are around 3000s) but volume is.  VPNs are long running
not short running, so start up time isn't hugely relevant either.

Anyway, precisely what we're doing and how is getting off point.  The
point is that there are existing cloud use cases for the TPM which can
cause high concurrency.

James

> I influence engineering for a company which builds deterministically
> modeled Linux platforms.  We've spent a lot of time considering TPM2
> hardware bottlenecks since they constrain the rate at which we can
> validate platform behavioral measurements.
> 
> We have a variation of this work which allows SGX OCALL's to validate
> platform behavior in order to provide a broader TCB resource spectrum
> to the enclave and hardware TPM performance is problematic there as
> well.
> 
> > James
> 
> Have a good weekend.
> 
> Greg
> 
> }-- End of excerpt from James Bottomley
> 
> As always,
> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
> 4206 N. 19th Ave.           Specializing in information infra
> -structure
> Fargo, ND  58102            development.
> PH: 701-281-1686
> FAX: 701-281-3949           EMAIL: greg@enjellic.com
> ---------------------------------------------------------------------
> ---------
> "After being a technician for 2 years, I've discovered if people took
>  care of their health with the same reckless abandon as their
> computers,
>  half would be at the kitchen table on the phone with the hospital,
> trying
>  to remove their appendix with a butter knife."
>                                 -- Brian Jones
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-10 16:46   ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-02-10 16:46 UTC (permalink / raw)
  To: greg-R92VP3DqSWVWk0Htik3J/w, Jarkko Sakkinen
  Cc: tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Ken Goldman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, 2017-02-10 at 04:03 -0600, Dr. Greg Wettstein wrote:
> On Feb 9, 11:24am, James Bottomley wrote:
> } Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for
> global sessi
> 
> Good morning to everyone.

Is there any way you could fix your email client?  It's setting In
-Reply-To: headers like this

In-reply-to: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> "Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion" (Feb  9, 11:24am)

Not using the message id breaks threading for everyone.

> > On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> > > Referring back to Ken's comments about having 20+ clients waiting
> > > to
> > > get access to the hardware.  Even with the focus in TPM2 on
> > > having it
> > > be more of a cryptographic accelerator are we convinced that the
> > > hardware is ever going to be fast enough for a model of having it
> > > directly service large numbers of transactions in something like
> > > a
> > > 'cloud' model?
> 
> > It's already in use as such today:
> > 
> > https://tectonic.com/assets/pdf/TectonicTrustedComputing.pdf
> 
> We are familiar with this work.  I'm not sure, however, that this 
> work is representative of the notion of using TPM hardware to support 
> a transactional environment, particularly at the cloud/container
> level.

It allows for cloud clients to request attestations.  The next step is
to allow containers to provision key material and PCR locked blobs
securely to the TPM for use by correctly attested containers all of
those are cloud scale use cases.

> There is not a great deal of technical detail on the CoreOS integrity
> architecture but it appears they are using TPM hardware to validate
> container integrity.  I'm not sure this type of environment reflects
> the ability of TPM hardware to support transactional throughputs in 
> an environment such as financial transaction processing.

OK, so in the cloud neither key provisioning nor attestation has a huge
latency requirement.  This appears to be your concern?  All I'd say is
that the fact that there are use cases that can work at cloud scale
doesn't mean that every use case can.

> Intel's Clear Container work cites the need to achieve container
> startup times of 150 milliseconds and they are currently claiming 45
> milliseconds as their optimal time.  This work was designed to
> demonstrate the feasibility of providing virtual machine isolation
> guarantees to containers and as such one of the mandates was to
> achieve container start times comparable to standard namespaces.

There are ephemeral container use cases where the lifetimes are of this
order, but they're not every use case (In fact, even in the devops
environment, they're still a minority).

> I ran some very rough timing metrics on one of our Skylake
> development systems with hardware TPM2 support.  Here are the elapsed
> times for two common verification operations which I assume would be
> at the heart of generating any type of reasonable integrity
> guarantee:
> 
> quote: 810 milliseconds
> verify signature: 635 milliseconds

That's interesting, my Skylake system has these figures down around
100ms or so ... however, I agree that 100ms is the order of this. 
 Which is still significant compared to container start times.

> This is with the verifying key loaded into the chip.  The elapsed
> time to load and validate a key into the chip averages 1200
> milliseconds. Since we are discussing a resource manager which would
> be shuttling context into and out of the limited resource slots on
> the chip I believe it is valid to consider this overhead as well.
> 
> This suggests that just a signature verification on the integrity of 
> a container is a factor of 4.2 times greater then a well accepted 
> start time metric for container technology.

Part of the way of reducing the latency is not to use the TPM for
things that don't require secrecy: container signature verification is
one such because the container is signed with a private key to which
you know the public component ... you can verify it on the host without
needing to trouble the TPM.  We only use the TPM for state quotes,
unsealing and signature generation.

> Based on that I'm assuming that if TPM based integrity guarantees are
> being implemented they are only on ingress of the container into the
> cloud environment.  I'm assuming an alternate methodology must be in
> place to protect against time of measurement/time of use issues.
> 
> Maybe people have better TPM2 hardware then what we have.  I was 
> going to run this on a Kaby Lake reference system but it appears that 
> TXT is causing some type of context depletion problems which we we 
> need to run down.
> 
> > We're also planning something like this in the IBM Cloud.
> 
> I assume if there is an expection of true transactional times you
> either will have better hardware then current generation TPM2
> technology.  Either that or I assume you will be using userspace
> simulators anchored with a hardware TPM trust root.

vTPM is a possibility, yes, so is making the TPM faster.

> Ken's reflection of having 21-22 competing transactions would appear
> to have problematic latency issues given our measurements.

Consider the canonical use case to be VPNaaS with a secure connection
back to the enterprise and the client key being the privacy guarded
material.  The signature generation is once per channel re-key and you
have up to half the re-key interval to generate the re-key over the
control channel.  In this use case, latency isn't a problem (most re
-key intervals are around 3000s) but volume is.  VPNs are long running
not short running, so start up time isn't hugely relevant either.

Anyway, precisely what we're doing and how is getting off point.  The
point is that there are existing cloud use cases for the TPM which can
cause high concurrency.

James

> I influence engineering for a company which builds deterministically
> modeled Linux platforms.  We've spent a lot of time considering TPM2
> hardware bottlenecks since they constrain the rate at which we can
> validate platform behavioral measurements.
> 
> We have a variation of this work which allows SGX OCALL's to validate
> platform behavior in order to provide a broader TCB resource spectrum
> to the enclave and hardware TPM performance is problematic there as
> well.
> 
> > James
> 
> Have a good weekend.
> 
> Greg
> 
> }-- End of excerpt from James Bottomley
> 
> As always,
> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
> 4206 N. 19th Ave.           Specializing in information infra
> -structure
> Fargo, ND  58102            development.
> PH: 701-281-1686
> FAX: 701-281-3949           EMAIL: greg-R92VP3DqSWVWk0Htik3J/w@public.gmane.org
> ---------------------------------------------------------------------
> ---------
> "After being a technician for 2 years, I've discovered if people took
>  care of their health with the same reckless abandon as their
> computers,
>  half would be at the kitchen table on the phone with the hospital,
> trying
>  to remove their appendix with a butter knife."
>                                 -- Brian Jones
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] tpm2-space: add handling for global session exhaustion
       [not found]   ` <1486745163.2502.26.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2017-02-10 21:13     ` Kenneth Goldman
  2017-02-14 14:38       ` [tpmdd-devel] " Dr. Greg Wettstein
  2017-02-10 21:18     ` Kenneth Goldman
  1 sibling, 1 reply; 37+ messages in thread
From: Kenneth Goldman @ 2017-02-10 21:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	greg-R92VP3DqSWVWk0Htik3J/w, linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 645 bytes --]

James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote on 
02/10/2017 11:46:03 AM:

> > quote: 810 milliseconds
> > verify signature: 635 milliseconds
> 
> Part of the way of reducing the latency is not to use the TPM for
> things that don't require secrecy: 

Agreed.  There are a few times one would verify a signature inside the 
TPM,
but they're far from mainstream:

1 - Early in the boot cycle, when there's no crypto library.

2 - When the crypto library doesn't support the required algorithm.

3 - When a ticket is needed to prove to the TPM later that it verified
the signature.


[-- Attachment #1.2: Type: text/html, Size: 914 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

[-- Attachment #3: Type: text/plain, Size: 192 bytes --]

_______________________________________________
tpmdd-devel mailing list
tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] tpm2-space: add handling for global session exhaustion
       [not found]   ` <1486745163.2502.26.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
  2017-02-10 21:13     ` Kenneth Goldman
@ 2017-02-10 21:18     ` Kenneth Goldman
  1 sibling, 0 replies; 37+ messages in thread
From: Kenneth Goldman @ 2017-02-10 21:18 UTC (permalink / raw)
  Cc: linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 487 bytes --]

> > quote: 810 milliseconds
> > verify signature: 635 milliseconds
> 
> Part of the way of reducing the latency is not to use the TPM for
> things that don't require secrecy: 

Agreed.  There are a few times one would verify a signature inside the 
TPM,
but they're far from mainstream:

1 - Early in the boot cycle, when there's no crypto library.

2 - When the crypto library doesn't support the required algorithm.

3 - When a ticket is needed to prove to the TPM la


[-- Attachment #1.2: Type: text/html, Size: 763 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

[-- Attachment #3: Type: text/plain, Size: 192 bytes --]

_______________________________________________
tpmdd-devel mailing list
tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-10 16:46   ` James Bottomley
@ 2017-02-12 20:29     ` Ken Goldman
  -1 siblings, 0 replies; 37+ messages in thread
From: Ken Goldman @ 2017-02-12 20:29 UTC (permalink / raw)
  Cc: tpmdd-devel, linux-security-module, linux-kernel

On 2/10/2017 11:46 AM, James Bottomley wrote:
> On Fri, 2017-02-10 at 04:03 -0600, Dr. Greg Wettstein wrote:
>> On Feb 9, 11:24am, James Bottomley wrote:

>> quote: 810 milliseconds
>> verify signature: 635 milliseconds
> ...
>
> Part of the way of reducing the latency is not to use the TPM for
> things that don't require secrecy: container signature verification is
> one such because the container is signed with a private key to which
> ...

Agreed.  There are a few times one would verify a signature inside the 
TPM, but they're far from mainstream:

1 - Early in the boot cycle, when there's no crypto library.

2 - When the crypto library doesn't support the required algorithm.

3 - When a ticket is needed to prove to the TPM later that it verified
the signature.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-12 20:29     ` Ken Goldman
  0 siblings, 0 replies; 37+ messages in thread
From: Ken Goldman @ 2017-02-12 20:29 UTC (permalink / raw)
  Cc: tpmdd-devel, linux-security-module, linux-kernel

On 2/10/2017 11:46 AM, James Bottomley wrote:
> On Fri, 2017-02-10 at 04:03 -0600, Dr. Greg Wettstein wrote:
>> On Feb 9, 11:24am, James Bottomley wrote:

>> quote: 810 milliseconds
>> verify signature: 635 milliseconds
> ...
>
> Part of the way of reducing the latency is not to use the TPM for
> things that don't require secrecy: container signature verification is
> one such because the container is signed with a private key to which
> ...

Agreed.  There are a few times one would verify a signature inside the 
TPM, but they're far from mainstream:

1 - Early in the boot cycle, when there's no crypto library.

2 - When the crypto library doesn't support the required algorithm.

3 - When a ticket is needed to prove to the TPM later that it verified
the signature.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-10 21:13     ` Kenneth Goldman
@ 2017-02-14 14:38       ` Dr. Greg Wettstein
  2017-02-14 16:47           ` James Bottomley
       [not found]         ` <71dc0e80-6678-a124-9184-1f93c8532d09@linux.vnet.ibm.com>
  0 siblings, 2 replies; 37+ messages in thread
From: Dr. Greg Wettstein @ 2017-02-14 14:38 UTC (permalink / raw)
  To: Kenneth Goldman
  Cc: James Bottomley, greg, Jarkko Sakkinen, linux-kernel,
	linux-security-module, tpmdd-devel

On Fri, Feb 10, 2017 at 04:13:05PM -0500, Kenneth Goldman wrote:

Good morning to everyone.

> James Bottomley <James.Bottomley@HansenPartnership.com> wrote on 
> 02/10/2017 11:46:03 AM:
> 
> > > quote: 810 milliseconds
> > > verify signature: 635 milliseconds

For those who may be interested in this sort of thing I grabbed a few
minutes and ran these basic verification primitives against a Kaby
Lake system.

Average time for a quote is 600 milliseconds with a signature
verification clocking in at 100 milliseconds.  The latter is
consistent with what James found on his Skylake machine.

Latencies are still significant with things like container start
times.

> > Part of the way of reducing the latency is not to use the TPM for
> > things that don't require secrecy: 

> Agreed.  There are a few times one would verify a signature inside the 
> TPM,
> but they're far from mainstream:
> 
> 1 - Early in the boot cycle, when there's no crypto library.
> 
> 2 - When the crypto library doesn't support the required algorithm.
> 
> 3 - When a ticket is needed to prove to the TPM later that it verified
> the signature.

I don't think there is any doubt that running cryptographic primitives
in userspace is going to be faster then going to hardware.  Obviously
that also means there is no need for a TPM resource manager which has
been the subject of much discussion here.

The CoreOS paper makes significant reference to increased security
guarantees inherent in the use of a TPM.  Obviously whatever uses
those are will have the noted latency constraints.

We have extended our behavior measurement verifications to the
container level so we offer an explicit guarantee that a container has
not operated in a manner which is inconsistent with the intent of its
designer.  Getting the security guarantee we need requires that an
linkage to a hardware root of trust hence our concerns about hardware
latency.

Have a good day.

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"UNIX is simple and coherent, but it takes a genious (or at any rate,
 a programmer) to understand and appreciate its simplicity."
                                -- Dennis Ritchie
                                   USENIX '87

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-14 16:47           ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-02-14 16:47 UTC (permalink / raw)
  To: Dr. Greg Wettstein, Kenneth Goldman
  Cc: greg, Jarkko Sakkinen, linux-kernel, linux-security-module, tpmdd-devel

On Tue, 2017-02-14 at 08:38 -0600, Dr. Greg Wettstein wrote:
> On Fri, Feb 10, 2017 at 04:13:05PM -0500, Kenneth Goldman wrote:
> 
> Good morning to everyone.
> 
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote on 
> > 02/10/2017 11:46:03 AM:
> > 
> > > > quote: 810 milliseconds
> > > > verify signature: 635 milliseconds
> 
> For those who may be interested in this sort of thing I grabbed a few
> minutes and ran these basic verification primitives against a Kaby
> Lake system.
> 
> Average time for a quote is 600 milliseconds with a signature
> verification clocking in at 100 milliseconds.  The latter is
> consistent with what James found on his Skylake machine.
> 
> Latencies are still significant with things like container start
> times.
> 
> > > Part of the way of reducing the latency is not to use the TPM for
> > > things that don't require secrecy: 
> 
> > Agreed.  There are a few times one would verify a signature inside 
> > the TPM, but they're far from mainstream:
> > 
> > 1 - Early in the boot cycle, when there's no crypto library.
> > 
> > 2 - When the crypto library doesn't support the required algorithm.
> > 
> > 3 - When a ticket is needed to prove to the TPM later that it
> > verified
> > the signature.
> 
> I don't think there is any doubt that running cryptographic 
> primitives in userspace is going to be faster then going to hardware.
>   Obviously that also means there is no need for a TPM resource 
> manager which has been the subject of much discussion here.

That's a bit of a non-sequitur.  Ken's and my point was that although
you could run every crypto operation through the TPM, you don't (as you
say, because it's too slow), so you carefully select the ones that
preserve the confidentiality you're looking for.  To take the VPNaaS
use case again: the key material you're protecting is the client
identity key, so the only crypto operation you run through the TPM is
creation of the TLS client certificate verification signature. 
 Everything else, including the server certificate signature 
 verification, the symmetric key agreement and all the symmetric
encryption operations, you keep in userspace.  That means that instead
of requiring thousands of crypto operations per second from the TPM,
you basically require about one per hour per VPNaaS instance.

We need a RM because without one, given the constraints of TPM2, as few
as two VPNaaS instances can cause a resource exhaustion failure.

James

> The CoreOS paper makes significant reference to increased security
> guarantees inherent in the use of a TPM.  Obviously whatever uses
> those are will have the noted latency constraints.
> 
> We have extended our behavior measurement verifications to the
> container level so we offer an explicit guarantee that a container 
> has not operated in a manner which is inconsistent with the intent of 
> its designer.  Getting the security guarantee we need requires that 
> an linkage to a hardware root of trust hence our concerns about 
> hardware latency.
> 
> Have a good day.
> 
> As always,
> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
> 4206 N. 19th Ave.           Specializing in information infra
> -structure
> Fargo, ND  58102            development.
> PH: 701-281-1686
> FAX: 701-281-3949           EMAIL: greg@enjellic.com
> ---------------------------------------------------------------------
> ---------
> "UNIX is simple and coherent, but it takes a genious (or at any rate,
>  a programmer) to understand and appreciate its simplicity."
>                                 -- Dennis Ritchie
>                                    USENIX '87
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-14 16:47           ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-02-14 16:47 UTC (permalink / raw)
  To: Dr. Greg Wettstein, Kenneth Goldman
  Cc: tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	greg-R92VP3DqSWVWk0Htik3J/w, linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, 2017-02-14 at 08:38 -0600, Dr. Greg Wettstein wrote:
> On Fri, Feb 10, 2017 at 04:13:05PM -0500, Kenneth Goldman wrote:
> 
> Good morning to everyone.
> 
> > James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote on 
> > 02/10/2017 11:46:03 AM:
> > 
> > > > quote: 810 milliseconds
> > > > verify signature: 635 milliseconds
> 
> For those who may be interested in this sort of thing I grabbed a few
> minutes and ran these basic verification primitives against a Kaby
> Lake system.
> 
> Average time for a quote is 600 milliseconds with a signature
> verification clocking in at 100 milliseconds.  The latter is
> consistent with what James found on his Skylake machine.
> 
> Latencies are still significant with things like container start
> times.
> 
> > > Part of the way of reducing the latency is not to use the TPM for
> > > things that don't require secrecy: 
> 
> > Agreed.  There are a few times one would verify a signature inside 
> > the TPM, but they're far from mainstream:
> > 
> > 1 - Early in the boot cycle, when there's no crypto library.
> > 
> > 2 - When the crypto library doesn't support the required algorithm.
> > 
> > 3 - When a ticket is needed to prove to the TPM later that it
> > verified
> > the signature.
> 
> I don't think there is any doubt that running cryptographic 
> primitives in userspace is going to be faster then going to hardware.
>   Obviously that also means there is no need for a TPM resource 
> manager which has been the subject of much discussion here.

That's a bit of a non-sequitur.  Ken's and my point was that although
you could run every crypto operation through the TPM, you don't (as you
say, because it's too slow), so you carefully select the ones that
preserve the confidentiality you're looking for.  To take the VPNaaS
use case again: the key material you're protecting is the client
identity key, so the only crypto operation you run through the TPM is
creation of the TLS client certificate verification signature. 
 Everything else, including the server certificate signature 
 verification, the symmetric key agreement and all the symmetric
encryption operations, you keep in userspace.  That means that instead
of requiring thousands of crypto operations per second from the TPM,
you basically require about one per hour per VPNaaS instance.

We need a RM because without one, given the constraints of TPM2, as few
as two VPNaaS instances can cause a resource exhaustion failure.

James

> The CoreOS paper makes significant reference to increased security
> guarantees inherent in the use of a TPM.  Obviously whatever uses
> those are will have the noted latency constraints.
> 
> We have extended our behavior measurement verifications to the
> container level so we offer an explicit guarantee that a container 
> has not operated in a manner which is inconsistent with the intent of 
> its designer.  Getting the security guarantee we need requires that 
> an linkage to a hardware root of trust hence our concerns about 
> hardware latency.
> 
> Have a good day.
> 
> As always,
> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
> 4206 N. 19th Ave.           Specializing in information infra
> -structure
> Fargo, ND  58102            development.
> PH: 701-281-1686
> FAX: 701-281-3949           EMAIL: greg-R92VP3DqSWVWk0Htik3J/w@public.gmane.org
> ---------------------------------------------------------------------
> ---------
> "UNIX is simple and coherent, but it takes a genious (or at any rate,
>  a programmer) to understand and appreciate its simplicity."
>                                 -- Dennis Ritchie
>                                    USENIX '87
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
       [not found]         ` <71dc0e80-6678-a124-9184-1f93c8532d09@linux.vnet.ibm.com>
@ 2017-02-16 20:06           ` Dr. Greg Wettstein
  2017-02-16 20:33             ` Jarkko Sakkinen
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. Greg Wettstein @ 2017-02-16 20:06 UTC (permalink / raw)
  To: Ken Goldman; +Cc: jarkko.sakkinen, linux-kernel, tpmdd-devel

On Thu, Feb 16, 2017 at 09:04:47AM -0500, Ken Goldman wrote:

Good morning to everyone, leveraging some time between planes.

> On 2/14/2017 9:38 AM, Dr. Greg Wettstein wrote:
> >
> >I don't think there is any doubt that running cryptographic primitives
> >in userspace is going to be faster then going to hardware.  Obviously
> >that also means there is no need for a TPM resource manager which has
> >been the subject of much discussion here.

> I don't understand that comment.
>
> The resource manager schedules user space access to the TPM.  It also
> handles swapping of objects in and out of the limited number of
> TPM slots.
> 
> Without a RM, either you'd have to permit only a single TPM connection,
> blocking all other connections, or you'd have different connections
> interfering with each other.

Yes, if multiple contexts of execution require access to the TPM a
resource manager is needed to arbitrate that access.

I think, however, that we are talking past one another a bit.

We design and build systems which implement autonomous
self-regulation.  As such we need a hardware based confirmation that
the machine is in a given behavioral state.  This requires that we
reference a hardware root of trust, ie. the TPM.

Depending on the assurance granularity requirements, that may mean a
high rate of TPM verifications.  When I noticed you and James talking
about 'cloud based' levels of transactions I was assuming you were
operating at transaction rates we build for, ie. 10-100's/second.
That didn't seem feasible given our hardware measurements on Skylake
and Kabylake based systems.

James had cited the CoreOS/Tectonic white paper as an example of TPM's
working at cloud scale.  Our conversation to date seems to indicate
that the accepted modality of security appers to be to do userspace
verification of container signatures.  Given the extensive dialogue in
the paper about using TPM's for security we had inadvertently believed
that container verifications were being pinned to current platform
status which didn't correlate with expected container start time
latencies.

Our behavioral assessment code is namespaced so a supervisory system
can make statements about the behavior of a container.  We have
concluded the only way that is possible is to use userspace TPM
implementations which can meet the necessary latency requirements.

Our point in all this is that it doesn't seem to make any sense to
implement anything in the kernel more then basic resource management.
If other 'virtualization' is needed, such as session state management
and the like, the community would seem to be served better by having a
solid userspace simulation environment, with appropriate hardware
security guarantees.  That would serve needs like re-keying support
for VPNaaS applications as well as high transaction rate environments,
ie. why load the kernel with code to virtualize a resource when a
'user' can just be given its own TPM2 instance.

Just as an aside, has anyone given any thought about TPM2 resource
management in things like TXT/tboot environments?  The current tboot
code makes a rather naive assumption that it can take a handle slot to
protect its platform verification secret.  Doing resource management
correctly will require addressing extra-OS environments such as this
which may have TPM2 state requirement issues.

Our take away from all this is that it doesn't seem that we need to
worry about the fact that someone may have invented TPM2 hardware
which is faster then what we are developing on.... :-)

Have a good weekend.

Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"If you ever teach a yodeling class, probably the hardest thing is to
 keep the students from just trying to yodel right off. You see, we build
 to that."
                                -- Jack Handey
                                   Deep Thoughts

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-16 20:06           ` [tpmdd-devel] " Dr. Greg Wettstein
@ 2017-02-16 20:33             ` Jarkko Sakkinen
  2017-02-17  9:56               ` Dr. Greg Wettstein
  0 siblings, 1 reply; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-02-16 20:33 UTC (permalink / raw)
  To: Dr. Greg Wettstein; +Cc: Ken Goldman, linux-kernel, tpmdd-devel

On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> Just as an aside, has anyone given any thought about TPM2 resource
> management in things like TXT/tboot environments?  The current tboot
> code makes a rather naive assumption that it can take a handle slot to
> protect its platform verification secret.  Doing resource management
> correctly will require addressing extra-OS environments such as this
> which may have TPM2 state requirement issues.

The current implementation handles stuff created from regular /dev/tpm0
so I do not think this would be an issue. You can only access objects
from a TPM space that are created within that space.

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-16 20:33             ` Jarkko Sakkinen
@ 2017-02-17  9:56               ` Dr. Greg Wettstein
  2017-02-17 12:37                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. Greg Wettstein @ 2017-02-17  9:56 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Dr. Greg Wettstein, Ken Goldman, linux-kernel, tpmdd-devel

On Thu, Feb 16, 2017 at 10:33:04PM +0200, Jarkko Sakkinen wrote:

Good morning to everyone.

> On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> > Just as an aside, has anyone given any thought about TPM2 resource
> > management in things like TXT/tboot environments?  The current tboot
> > code makes a rather naive assumption that it can take a handle slot to
> > protect its platform verification secret.  Doing resource management
> > correctly will require addressing extra-OS environments such as this
> > which may have TPM2 state requirement issues.

> The current implementation handles stuff created from regular
> /dev/tpm0 so I do not think this would be an issue. You can only
> access objects from a TPM space that are created within that space.

Unless I misunderstand the number of transient objects which can be
managed is a characteristic of the hardware and is a limited resource,
hence our discussion on the notion of a resource manager to shuttle
context in and out of these limited slots.

On a Kabylake system, running the following command:

getcapability -cap 6 | grep trans

After booting into a TXT mediated measured launch environment (MLE) yields
the following:

TPM_PT 0000010e value 00000003 TPM_PT_HR_TRANSIENT_MIN - the minimum number of transient objects that can be held in TPM RAM

TPM_PT 00000207 value 00000002 TPM_PT_HR_TRANSIENT_AVAIL - estimate of the number of additional transient objects that could be loaded into TPM RAM

Booting without TXT results in the getcapability call indicating that
three slots are available.  Based on that and reading the tboot code,
we are assuming the occupied slot is the ephemeral primary key
generated by tboot which seals the verification secret.

In an MLE it is possible to create and then flush a new ephemeral
primary key which results in the following getcapability output:

TPM_PT 00000207 value 00000003 TPM_PT_HR_TRANSIENT_AVAIL - estimate of
the number of additional transient objects that could be loaded into TPM RAM

Which is probably going to be pretty surprising to tboot in the event
that it tries to re-verify the system state after a suspend event.

So based on that it would seem there would need to be some semblance
of cooperation between the resource manager and an extra-OS
utilization of TPM2 resources such as tboot.

Thoughts?

> /Jarkko

Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"For a successful technology, reality must take precedence over public
 relations, for nature cannot be fooled."
                                -- Richard Feynmann

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-17  9:56               ` Dr. Greg Wettstein
@ 2017-02-17 12:37                 ` Jarkko Sakkinen
  2017-02-17 22:37                   ` Dr. Greg Wettstein
  0 siblings, 1 reply; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-02-17 12:37 UTC (permalink / raw)
  To: Dr. Greg Wettstein; +Cc: Ken Goldman, linux-kernel, tpmdd-devel

On Fri, Feb 17, 2017 at 03:56:26AM -0600, Dr. Greg Wettstein wrote:
> On Thu, Feb 16, 2017 at 10:33:04PM +0200, Jarkko Sakkinen wrote:
> 
> Good morning to everyone.
> 
> > On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> > > Just as an aside, has anyone given any thought about TPM2 resource
> > > management in things like TXT/tboot environments?  The current tboot
> > > code makes a rather naive assumption that it can take a handle slot to
> > > protect its platform verification secret.  Doing resource management
> > > correctly will require addressing extra-OS environments such as this
> > > which may have TPM2 state requirement issues.
> 
> > The current implementation handles stuff created from regular
> > /dev/tpm0 so I do not think this would be an issue. You can only
> > access objects from a TPM space that are created within that space.
> 
> Unless I misunderstand the number of transient objects which can be
> managed is a characteristic of the hardware and is a limited resource,
> hence our discussion on the notion of a resource manager to shuttle
> context in and out of these limited slots.
> 
> On a Kabylake system, running the following command:
> 
> getcapability -cap 6 | grep trans
> 
> After booting into a TXT mediated measured launch environment (MLE) yields
> the following:
> 
> TPM_PT 0000010e value 00000003 TPM_PT_HR_TRANSIENT_MIN - the minimum number of transient objects that can be held in TPM RAM
> 
> TPM_PT 00000207 value 00000002 TPM_PT_HR_TRANSIENT_AVAIL - estimate of the number of additional transient objects that could be loaded into TPM RAM
> 
> Booting without TXT results in the getcapability call indicating that
> three slots are available.  Based on that and reading the tboot code,
> we are assuming the occupied slot is the ephemeral primary key
> generated by tboot which seals the verification secret.
> 
> In an MLE it is possible to create and then flush a new ephemeral
> primary key which results in the following getcapability output:
> 
> TPM_PT 00000207 value 00000003 TPM_PT_HR_TRANSIENT_AVAIL - estimate of
> the number of additional transient objects that could be loaded into TPM RAM
> 
> Which is probably going to be pretty surprising to tboot in the event
> that it tries to re-verify the system state after a suspend event.
> 
> So based on that it would seem there would need to be some semblance
> of cooperation between the resource manager and an extra-OS
> utilization of TPM2 resources such as tboot.
> 
> Thoughts?

The driver swaps in and out all the objects for one send-receive cycle.
So unless the driver is sending a command to a TPM the resource manager
occupies zero slots. I do not see reason for forseeable future to change
this pattern.

I discussed about some "lazier" schemes for swapping with James an Ken
in the early Fall but came into conclusion that it would make the RM
really complicated. There would have to be something show stopper work
load to even to start consider it.

With the capacity of current TPMs and amount of traffic and workloads
it is really not a worth of the trouble.

I guess the way we do swapping kind of indirectly sorts out the issue
you described, doesn't it?

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-17 12:37                 ` Jarkko Sakkinen
@ 2017-02-17 22:37                   ` Dr. Greg Wettstein
  0 siblings, 0 replies; 37+ messages in thread
From: Dr. Greg Wettstein @ 2017-02-17 22:37 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Dr. Greg Wettstein, Ken Goldman, linux-kernel, tpmdd-devel

On Fri, Feb 17, 2017 at 02:37:12PM +0200, Jarkko Sakkinen wrote:

Hi, I hope the week is ending well for everyone.

> On Fri, Feb 17, 2017 at 03:56:26AM -0600, Dr. Greg Wettstein wrote:
> > On Thu, Feb 16, 2017 at 10:33:04PM +0200, Jarkko Sakkinen wrote:
> > 
> > Good morning to everyone.
> > 
> > > On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> > > > Just as an aside, has anyone given any thought about TPM2 resource
> > > > management in things like TXT/tboot environments?  The current tboot
> > > > code makes a rather naive assumption that it can take a handle slot to
> > > > protect its platform verification secret.  Doing resource management
> > > > correctly will require addressing extra-OS environments such as this
> > > > which may have TPM2 state requirement issues.
> > 
> > > The current implementation handles stuff created from regular
> > > /dev/tpm0 so I do not think this would be an issue. You can only
> > > access objects from a TPM space that are created within that space.
> > 
> > Unless I misunderstand the number of transient objects which can be
> > managed is a characteristic of the hardware and is a limited resource,
> > hence our discussion on the notion of a resource manager to shuttle
> > context in and out of these limited slots.
> > 
> > On a Kabylake system, running the following command:
> > 
> > getcapability -cap 6 | grep trans
> > 
> > After booting into a TXT mediated measured launch environment (MLE) yields
> > the following:
> > 
> > TPM_PT 0000010e value 00000003 TPM_PT_HR_TRANSIENT_MIN - the minimum number of transient objects that can be held in TPM RAM
> > 
> > TPM_PT 00000207 value 00000002 TPM_PT_HR_TRANSIENT_AVAIL - estimate of the number of additional transient objects that could be loaded into TPM RAM
> > 
> > Booting without TXT results in the getcapability call indicating that
> > three slots are available.  Based on that and reading the tboot code,
> > we are assuming the occupied slot is the ephemeral primary key
> > generated by tboot which seals the verification secret.
> > 
> > In an MLE it is possible to create and then flush a new ephemeral
> > primary key which results in the following getcapability output:
> > 
> > TPM_PT 00000207 value 00000003 TPM_PT_HR_TRANSIENT_AVAIL - estimate of
> > the number of additional transient objects that could be loaded into TPM RAM
> > 
> > Which is probably going to be pretty surprising to tboot in the event
> > that it tries to re-verify the system state after a suspend event.
> > 
> > So based on that it would seem there would need to be some semblance
> > of cooperation between the resource manager and an extra-OS
> > utilization of TPM2 resources such as tboot.
> > 
> > Thoughts?

> The driver swaps in and out all the objects for one send-receive
> cycle.  So unless the driver is sending a command to a TPM the
> resource manager occupies zero slots. I do not see reason for
> forseeable future to change this pattern.
>
> I discussed about some "lazier" schemes for swapping with James an
> Ken in the early Fall but came into conclusion that it would make
> the RM really complicated. There would have to be something show
> stopper work load to even to start consider it.
>
> With the capacity of current TPMs and amount of traffic and
> workloads it is really not a worth of the trouble.
>
> I guess the way we do swapping kind of indirectly sorts out the
> issue you described, doesn't it?

I'm not sure, we've pulled down your resource manager branch so we can
figure out the exact mechanics of how it works.  Based on a cursory
read of the code it appears as if it loops through all three transient
handle slots and attempts to context save each transient object it
finds.  So if it does that for each send/receive cycle it should
theoretically inter-operate with TXT/tboot.

As noted previously, with the current kernel driver, we can see that
tboot has allocated a slot for the ephemeral key which is used to seal
the memory verification secrets.  This key gets allocated to handle
80000000 as one would anticipate.  However when we attempt to issue a
context save against that handle we get an error.

Interestingly, when we attempt to flush that handle manually we
receive an error as well, but the number of available transient
handles increases by one which suggests the context flush cleared the
slot.

It seems that we should be able to manually replicate what the
resource manager is doing with the standard kernel driver or is this
an incorrect assumption?

We will have to spin up a kernel with your patches and see how it
reacts to the presence of the extra-OS handle allocation.

> /Jarkko

Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"We know that communication is a problem, but the company is not going
 to discuss it with the employees."
                                -- Switching supervisor
                                   AT&T Long Lines Division

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-09 19:04   ` Jason Gunthorpe
  2017-02-09 19:29     ` James Bottomley
@ 2017-02-10  8:48     ` Jarkko Sakkinen
  1 sibling, 0 replies; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-02-10  8:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: greg, James Bottomley, tpmdd-devel, linux-security-module,
	Ken Goldman, linux-kernel

On Thu, Feb 09, 2017 at 12:04:26PM -0700, Jason Gunthorpe wrote:
> On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > > userspace instance with subsequent relinquishment of privilege.  At
> > > that point one has the freedom to implement all sorts of policy.
> > 
> > If you look at the patch set that I sent yesterday it exactly has a
> > feature that makes it more lean for a privileged process to implement
> > a resource manager.
> 
> I continue to think, based on comments like this, that you should not
> implement tmps0 in the first revision either. That is also something
> we have to live with forever, and it can never become the 'policy
> limited' or 'unpriv safe' access point to the kernel.  ie go back to
> something based on tmp0 with ioctl.

With /dev/tpms0 I'm fairly certain that it is right way to go as it does
make sense to have it as close as being drop in replacement for
/dev/tpm0 as possible. There's factors more certainty that the API is
something that most people will like to have.

> This series should focus on allowing a user space RM to co-exist with
> the in-kernel services - lets try and tackle the idea of a
> policy-restricted or unpriv-safe cdev when someone comes up with a
> comprehensive proposal..

Sure. I do agree with this.

> > The current patch set does not define policy. The simple policy
> > addition that could be added soon is the limit of connections
> > because it is easy to implement in non-intrusive way.
> 
> It is also trivial for a userspace RM to limit the number of sessions
> or connections or otherwise to manage this limitation. It is hard to
> see why we'd need kernel support for this.
> 
> The main issue from the kernel perspecitive is how to allow sessions
> to be used in-kernel and continue to make progress when they start to
> run out.
> 
> Jason

This is an issue but in the current patch set there's nothing that would
make it harder to sort out.

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-09 19:29     ` James Bottomley
@ 2017-02-09 21:54       ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2017-02-09 21:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jarkko Sakkinen, Ken Goldman, greg, linux-kernel,
	linux-security-module, tpmdd-devel

On Thu, Feb 09, 2017 at 11:29:51AM -0800, James Bottomley wrote:
> On Thu, 2017-02-09 at 12:04 -0700, Jason Gunthorpe wrote:
> > On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > > The current patch set does not define policy. The simple policy
> > > addition that could be added soon is the limit of connections
> > > because it is easy to implement in non-intrusive way.
> > 
> > It is also trivial for a userspace RM to limit the number of sessions
> > or connections or otherwise to manage this limitation. It is hard to
> > see why we'd need kernel support for this.
> 
> Because the kernel is a primary TPM user.

When I said 'this' I meant a kernel policy to limit the number of
user connections.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-09  9:06 Dr. Greg Wettstein
  2017-02-09 15:19 ` Jarkko Sakkinen
  2017-02-09 19:24 ` James Bottomley
@ 2017-02-09 20:05 ` James Bottomley
  2 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-02-09 20:05 UTC (permalink / raw)
  To: greg, Jarkko Sakkinen
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> On Jan 30, 11:58pm, Jarkko Sakkinen wrote:
> } Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for
> global sessi
> 
> Good morning, I hope the day is going well for everyone.
> 
> > I'm kind dilating to an opinion that we would leave this commit out
> > from the first kernel release that will contain the resource 
> > manager with similar rationale as Jason gave me for whitelisting: 
> > get the basic stuff in and once it is used with some workloads 
> > whitelisting and exhaustion will take eventually the right form.
> > 
> > How would you feel about this?
> 
> I wasn't able to locate the exact context to include but we noted 
> with interest Ken's comments about his need to support a model where 
> a client needs a TPM session for transaction purposes which can last 
> a highly variable amount of time.  That and concerns about command
> white-listing, hardware denial of service and related issues tend to
> underscore our concerns about how much TPM resource management should
> go into the kernel.
> 
> Once an API is in the kernel we live with it forever.

This actually is far too strong a statement:  Once you make API
guarantees, you have to live with them forever, but there's a
considerable difference between an API guarantee and the API itself. 
 For instance the kernel overlay filesystem has gone through several
iterations of file whiteouts (showing a file as deleted above a read
only copy): we began with an inode flag, moved to an extended attribute
and finally ended up with a device.  Each of those three changes was
fairly radical to the VFS API, but didn't fundamentally alter the API
guarantee (that users wouldn't see a file after it was deleted on an
overlay).

The API guarantee /dev/tpms0 is adding is that you won't see TPM out of
memory errors based on what other people are doing, so I think it's a
simple isolation guarantee we can live with long term.  I think that's
a solidly defensible one.

However, right at the moment the guarantee isn't that you won't be
affcted by *anything* another user does, so it's a weak guarantee: you
will see uncorrectable regapping errors based on what others are doing
and you will see global session exhaustion.

I think we begin with the defensible weak guarantee and discuss how to
strengthen it.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-09 19:04   ` Jason Gunthorpe
@ 2017-02-09 19:29     ` James Bottomley
  2017-02-09 21:54       ` Jason Gunthorpe
  2017-02-10  8:48     ` Jarkko Sakkinen
  1 sibling, 1 reply; 37+ messages in thread
From: James Bottomley @ 2017-02-09 19:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Jarkko Sakkinen
  Cc: Ken Goldman, greg, linux-kernel, linux-security-module, tpmdd-devel

On Thu, 2017-02-09 at 12:04 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > The current patch set does not define policy. The simple policy
> > addition that could be added soon is the limit of connections
> > because it is easy to implement in non-intrusive way.
> 
> It is also trivial for a userspace RM to limit the number of sessions
> or connections or otherwise to manage this limitation. It is hard to
> see why we'd need kernel support for this.

Because the kernel is a primary TPM user.  We can't have the kernel
call on the in-userspace resource manager without causing a deadlock,
so we need as much of the RM as is needed to support the kernel in the
kernel itself.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-09  9:06 Dr. Greg Wettstein
  2017-02-09 15:19 ` Jarkko Sakkinen
@ 2017-02-09 19:24 ` James Bottomley
  2017-02-09 20:05 ` James Bottomley
  2 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-02-09 19:24 UTC (permalink / raw)
  To: greg, Jarkko Sakkinen
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> Referring back to Ken's comments about having 20+ clients waiting to
> get access to the hardware.  Even with the focus in TPM2 on having it
> be more of a cryptographic accelerator are we convinced that the
> hardware is ever going to be fast enough for a model of having it
> directly service large numbers of transactions in something like a
> 'cloud' model?

It's already in use as such today:

https://tectonic.com/assets/pdf/TectonicTrustedComputing.pdf

We're also planning something like this in the IBM Cloud.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-09 15:19 ` Jarkko Sakkinen
@ 2017-02-09 19:04   ` Jason Gunthorpe
  2017-02-09 19:29     ` James Bottomley
  2017-02-10  8:48     ` Jarkko Sakkinen
  0 siblings, 2 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2017-02-09 19:04 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: greg, James Bottomley, tpmdd-devel, linux-security-module,
	Ken Goldman, linux-kernel

On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > userspace instance with subsequent relinquishment of privilege.  At
> > that point one has the freedom to implement all sorts of policy.
> 
> If you look at the patch set that I sent yesterday it exactly has a
> feature that makes it more lean for a privileged process to implement
> a resource manager.

I continue to think, based on comments like this, that you should not
implement tmps0 in the first revision either. That is also something
we have to live with forever, and it can never become the 'policy
limited' or 'unpriv safe' access point to the kernel.  ie go back to
something based on tmp0 with ioctl.

This series should focus on allowing a user space RM to co-exist with
the in-kernel services - lets try and tackle the idea of a
policy-restricted or unpriv-safe cdev when someone comes up with a
comprehensive proposal..

> The current patch set does not define policy. The simple policy
> addition that could be added soon is the limit of connections
> because it is easy to implement in non-intrusive way.

It is also trivial for a userspace RM to limit the number of sessions
or connections or otherwise to manage this limitation. It is hard to
see why we'd need kernel support for this.

The main issue from the kernel perspecitive is how to allow sessions
to be used in-kernel and continue to make progress when they start to
run out.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-02-09  9:06 Dr. Greg Wettstein
@ 2017-02-09 15:19 ` Jarkko Sakkinen
  2017-02-09 19:04   ` Jason Gunthorpe
  2017-02-09 19:24 ` James Bottomley
  2017-02-09 20:05 ` James Bottomley
  2 siblings, 1 reply; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-02-09 15:19 UTC (permalink / raw)
  To: greg
  Cc: James Bottomley, Ken Goldman, tpmdd-devel, linux-security-module,
	linux-kernel

On Thu, Feb 09, 2017 at 03:06:38AM -0600, Dr. Greg Wettstein wrote:
> On Jan 30, 11:58pm, Jarkko Sakkinen wrote:
> } Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global sessi
> 
> Good morning, I hope the day is going well for everyone.
> 
> > I'm kind dilating to an opinion that we would leave this commit out
> > from the first kernel release that will contain the resource manager
> > with similar rationale as Jason gave me for whitelisting: get the
> > basic stuff in and once it is used with some workloads whitelisting
> > and exhaustion will take eventually the right form.
> >
> > How would you feel about this?
> 
> I wasn't able to locate the exact context to include but we noted with
> interest Ken's comments about his need to support a model where a
> client needs a TPM session for transaction purposes which can last a
> highly variable amount of time.  That and concerns about command
> white-listing, hardware denial of service and related issues tend to
> underscore our concerns about how much TPM resource management should
> go into the kernel.
> 
> Once an API is in the kernel we live with it forever.  Particularly
> with respect to TPM2, our field experiences suggest it is way too
> early to bake long term functionality into the kernel.
> 
> Referring back to Ken's comments about having 20+ clients waiting to
> get access to the hardware.  Even with the focus in TPM2 on having it
> be more of a cryptographic accelerator are we convinced that the
> hardware is ever going to be fast enough for a model of having it
> directly service large numbers of transactions in something like a
> 'cloud' model?

I doubt it. Personally I would rather just limit the number of
connections to /dev/tpms0 than have a complex lease model (like one
implemented in this commit). That could have '0' setting, which would
disable it so that it doesn't cause harm to those who do not need it.

> The industry has very solid userspace implementations of TPM2.  It
> seems that with respect to resource management about all we would want
> in the kernel is enough management to allow multiple privileged
> userspace process to establish a root of trust for a TPM2 based
> userspace instance with subsequent relinquishment of privilege.  At
> that point one has the freedom to implement all sorts of policy.

If you look at the patch set that I sent yesterday it exactly has a
feature that makes it more lean for a privileged process to implement
a resource manager.

> Given the potential lifespan of these security technologies I think a
> kernel design needs to factor in the availability of trusted execution
> environment's such as SGX as well.  Politics aside, such environments
> do have the ability to significantly modify the guarantees which can
> be afforded to architectural models which focus on using the hardware
> TPM as a root of trust for userspace implementations of 'TPM'
> functionality and policy.

Agreed.

> We can always add functionality to the kernel but we can never
> subtract.  It is way too early to lock security architecture decisions
> into the kernel.

The current patch set does not define policy. The simple policy
addition that could be added soon is the limit of connections
because it is easy to implement in non-intrusive way.

> 
> > /Jarkko
> 
> Have a good weekend.
> 
> Greg

Likewise!

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
@ 2017-02-09  9:06 Dr. Greg Wettstein
  2017-02-09 15:19 ` Jarkko Sakkinen
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Dr. Greg Wettstein @ 2017-02-09  9:06 UTC (permalink / raw)
  To: Jarkko Sakkinen, James Bottomley
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Jan 30, 11:58pm, Jarkko Sakkinen wrote:
} Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global sessi

Good morning, I hope the day is going well for everyone.

> I'm kind dilating to an opinion that we would leave this commit out
> from the first kernel release that will contain the resource manager
> with similar rationale as Jason gave me for whitelisting: get the
> basic stuff in and once it is used with some workloads whitelisting
> and exhaustion will take eventually the right form.
>
> How would you feel about this?

I wasn't able to locate the exact context to include but we noted with
interest Ken's comments about his need to support a model where a
client needs a TPM session for transaction purposes which can last a
highly variable amount of time.  That and concerns about command
white-listing, hardware denial of service and related issues tend to
underscore our concerns about how much TPM resource management should
go into the kernel.

Once an API is in the kernel we live with it forever.  Particularly
with respect to TPM2, our field experiences suggest it is way too
early to bake long term functionality into the kernel.

Referring back to Ken's comments about having 20+ clients waiting to
get access to the hardware.  Even with the focus in TPM2 on having it
be more of a cryptographic accelerator are we convinced that the
hardware is ever going to be fast enough for a model of having it
directly service large numbers of transactions in something like a
'cloud' model?

The industry has very solid userspace implementations of TPM2.  It
seems that with respect to resource management about all we would want
in the kernel is enough management to allow multiple privileged
userspace process to establish a root of trust for a TPM2 based
userspace instance with subsequent relinquishment of privilege.  At
that point one has the freedom to implement all sorts of policy.

Given the potential lifespan of these security technologies I think a
kernel design needs to factor in the availability of trusted execution
environment's such as SGX as well.  Politics aside, such environments
do have the ability to significantly modify the guarantees which can
be afforded to architectural models which focus on using the hardware
TPM as a root of trust for userspace implementations of 'TPM'
functionality and policy.

We can always add functionality to the kernel but we can never
subtract.  It is way too early to lock security architecture decisions
into the kernel.

> /Jarkko

Have a good weekend.

Greg

}-- End of excerpt from Jarkko Sakkinen

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"If I'd listened to customers, I'd have given them a faster horse."
                                -- Henry Ford

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-31 19:28         ` Ken Goldman
@ 2017-01-31 19:55           ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-01-31 19:55 UTC (permalink / raw)
  To: Ken Goldman, tpmdd-devel; +Cc: linux-security-module, linux-kernel

On Tue, 2017-01-31 at 14:28 -0500, Ken Goldman wrote:
> On 1/30/2017 11:04 AM, James Bottomley wrote:
> > 
> > This depends what your threat model is.  For ssh keys, you worry
> > that someone might be watching, so you use HMAC authority even for 
> > a local TPM.
> 
> If someone can "watch" my local process, they can capture my password
> anyway.  Does using a password that the attacker knows to HMAC the 
> command help?

It's about attack surface.  If you want my password and I use TPM_RS_PW
then you either prise it out of my app or snoop the command path.  If I
always use HMAC, I know you can only prise it out of my app (reduction
in attack surface) and I can plan defences accordingly (not saying I'll
be successful, just saying I have a better idea where the attack is
coming from).

> > In the cloud, you don't quite know where the TPM is, so again you'd
> > use HMAC sessions ... however, in both use cases the sessions 
> > should be very short lived.
> 
> If your entire application is in the cloud, then I think the same 
> question as above applies.
> 
> If you have your application on one platform (that you trust) and the
> TPM is on another (that you don't trust), then I absolutely agree 
> that HMAC (and parameter encryption) are necessary.

It's attack surface again ... although lengthening the transmission
pathway, which happens in the cloud, correspondingly increases that sur
face.

Look at it this way: if your TPM were network remote, would you still
think TPM_RS_PW to be appropriate?  I suspect not because the network
is seen as a very insecure pathway.  We can argue about the relative
security or insecurity of other pathways to the TPM, but it's
unarguable that using HMAC and parameter encryption means we don't have
to (and so is best practice).

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-30 22:13           ` James Bottomley
@ 2017-01-31 13:31             ` Jarkko Sakkinen
  0 siblings, 0 replies; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-01-31 13:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: tpmdd-devel, linux-security-module, Ken Goldman, linux-kernel

On Mon, Jan 30, 2017 at 02:13:08PM -0800, James Bottomley wrote:
> On Mon, 2017-01-30 at 23:58 +0200, Jarkko Sakkinen wrote:
> > On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> > > On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > > > On 1/27/2017 5:04 PM, James Bottomley wrote:
> > > > 
> > > > > > Beware the nasty corner case:
> > > > > > 
> > > > > > - Application asks for a session and gets 02000000
> > > > > > 
> > > > > > - Time elapses and 02000000 gets forcibly flushed
> > > > > > 
> > > > > > - Later, app comes back, asks for a second session and again
> > > > > > gets
> > > > > > 02000000.
> > > > > > 
> > > > > > - App gets very confused.
> > > > > > 
> > > > > > May it be better to close the connection completely, which
> > > > > > the
> > > > > > application can detect, than flush a session and give this
> > > > > > corner
> > > > > > case?
> > > > > 
> > > > > if I look at the code I've written, I don't know what the
> > > > > session
> > > > > number is, I just save sessionHandle in a variable for later
> > > > > use 
> > > > > (lets say to v1).  If I got the same session number returned at
> > > > > a 
> > > > > later time and placed it in v2, all I'd notice is that an 
> > > > > authorization using v1 would fail.  I'm not averse to killing
> > > > > the 
> > > > > entire connection but, assuming you have fallback, it might be 
> > > > > kinder simply to ensure that the operations with the reclaimed 
> > > > > session fail (which is what the code currently does).
> > > > 
> > > > My worry is that this session failure cannot be detected by the 
> > > > application.  An HMAC failure could cause the app to tell a user
> > > > that
> > > > they entered the wrong password.  Misleading.  On the TPM, it
> > > > could 
> > > > trigger the dictionary attack lockout.  For a PIN index, it could
> > > > consume a failure count.  Killing a policy session that has e.g.,
> > > > a 
> > > > policy signed term could cause the application to go back to some
> > > > external entity for another authorization signature.
> > > > 
> > > > Let's go up to the stack.  What's the attack?
> > > > 
> > > > If we're worried about many simultaneous applications (wouldn't
> > > > that 
> > > > be wonderful), why not just let startauthsession fail?  The 
> > > > application can just retry periodically.
> > > 
> > > How in that scenario do we ensure that a session becomes available?
> > >  Once that's established, there's no real difference between
> > > retrying
> > > the startauthsession in the kernel when we know the session is
> > > available and forcing userspace to do the retry except that the
> > > former
> > > has a far greater chance of success (and it's only about 6 lines of
> > > code).
> > > 
> > > >   Just allocate them in triples so there's no deadlock.
> > > 
> > > Is this the application or the kernel?  If it's the kernel, that
> > > adds a
> > > lot of complexity.
> > > 
> > > > If we're worried about a DoS attack, killing a session just helps
> > > > the
> > > > attacker.  The attacker can create a few connections and spin on 
> > > > startauthsession, locking everyone out anyway.
> > > 
> > > There are two considerations here: firstly we'd need to introduce a
> > > mechanism to "kill" the connection.  Probably we'd simply error
> > > every
> > > command on the space until it was closed.  The second is which
> > > scenario
> > > is more reasonable: Say the application simply forgot to flush the
> > > session and will never use it again.  Simply reclaiming the session
> > > would produce no effect at all on the application in this scenario.
> > >  However, I have no data to say what's likely.
> > > 
> > > > ~~
> > > > 
> > > > Also, let's remember that this is a rare application.  Sessions
> > > > are 
> > > > only needed for remote access (requiring encryption, HMAC or
> > > > salt), 
> > > > or policy sessions.
> > > 
> > > This depends what your threat model is.  For ssh keys, you worry
> > > that
> > > someone might be watching, so you use HMAC authority even for a
> > > local
> > > TPM.  In the cloud, you don't quite know where the TPM is, so again
> > > you'd use HMAC sessions ... however, in both use cases the sessions
> > > should be very short lived.
> > > 
> > > > ~~
> > > > 
> > > > Should the code also reserve a session for the kernel?  Mark it
> > > > not 
> > > > kill'able?
> > > 
> > > At the moment, the kernel doesn't use sessions, so let's worry
> > > about
> > > that problem at the point it arises (if it ever arises).
> > > 
> > > James
> > 
> > It does. My trusted keys implementation actually uses sessions.
> 
> But as I read the code, I can't find where the kernel creates a
> session.  It looks like the session and hmac are passed in as option
> arguments, aren't they?

Yes. Sorry, I mixed up things.

> > I'm kind dilating to an opinion that we would leave this commit out 
> > from the first kernel release that will contain the resource manager 
> > with similar rationale as Jason gave me for whitelisting: get the 
> > basic stuff in and once it is used with some workloads whitelisting 
> > and exhaustion will take eventually the right form.
> > 
> > How would you feel about this?
> 
> As long as we get patch 1/2 then applications using sessions will
> actually work with spaces, so taking more time with 2/2 is fine by me.
> 
> James

1/2 contains code that with a few iterations it is in the form that I'm
able to merge it.

With 2/2 I'm not saying it is wrong approach but I cannot yet say that
I'm confident that it would be the best approach.

I think that the transient object and infrastructure stuff that is
already in the patch set and 1/2 session is the subset of commits where
we can be fairly confident that we are doing the right thing.

I'll start preparing a patch set with this content without RFC tag.

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-30 21:58         ` Jarkko Sakkinen
@ 2017-01-30 22:13           ` James Bottomley
  2017-01-31 13:31             ` Jarkko Sakkinen
  0 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2017-01-30 22:13 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: tpmdd-devel, linux-security-module, Ken Goldman, linux-kernel

On Mon, 2017-01-30 at 23:58 +0200, Jarkko Sakkinen wrote:
> On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> > On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > > On 1/27/2017 5:04 PM, James Bottomley wrote:
> > > 
> > > > > Beware the nasty corner case:
> > > > > 
> > > > > - Application asks for a session and gets 02000000
> > > > > 
> > > > > - Time elapses and 02000000 gets forcibly flushed
> > > > > 
> > > > > - Later, app comes back, asks for a second session and again
> > > > > gets
> > > > > 02000000.
> > > > > 
> > > > > - App gets very confused.
> > > > > 
> > > > > May it be better to close the connection completely, which
> > > > > the
> > > > > application can detect, than flush a session and give this
> > > > > corner
> > > > > case?
> > > > 
> > > > if I look at the code I've written, I don't know what the
> > > > session
> > > > number is, I just save sessionHandle in a variable for later
> > > > use 
> > > > (lets say to v1).  If I got the same session number returned at
> > > > a 
> > > > later time and placed it in v2, all I'd notice is that an 
> > > > authorization using v1 would fail.  I'm not averse to killing
> > > > the 
> > > > entire connection but, assuming you have fallback, it might be 
> > > > kinder simply to ensure that the operations with the reclaimed 
> > > > session fail (which is what the code currently does).
> > > 
> > > My worry is that this session failure cannot be detected by the 
> > > application.  An HMAC failure could cause the app to tell a user
> > > that
> > > they entered the wrong password.  Misleading.  On the TPM, it
> > > could 
> > > trigger the dictionary attack lockout.  For a PIN index, it could
> > > consume a failure count.  Killing a policy session that has e.g.,
> > > a 
> > > policy signed term could cause the application to go back to some
> > > external entity for another authorization signature.
> > > 
> > > Let's go up to the stack.  What's the attack?
> > > 
> > > If we're worried about many simultaneous applications (wouldn't
> > > that 
> > > be wonderful), why not just let startauthsession fail?  The 
> > > application can just retry periodically.
> > 
> > How in that scenario do we ensure that a session becomes available?
> >  Once that's established, there's no real difference between
> > retrying
> > the startauthsession in the kernel when we know the session is
> > available and forcing userspace to do the retry except that the
> > former
> > has a far greater chance of success (and it's only about 6 lines of
> > code).
> > 
> > >   Just allocate them in triples so there's no deadlock.
> > 
> > Is this the application or the kernel?  If it's the kernel, that
> > adds a
> > lot of complexity.
> > 
> > > If we're worried about a DoS attack, killing a session just helps
> > > the
> > > attacker.  The attacker can create a few connections and spin on 
> > > startauthsession, locking everyone out anyway.
> > 
> > There are two considerations here: firstly we'd need to introduce a
> > mechanism to "kill" the connection.  Probably we'd simply error
> > every
> > command on the space until it was closed.  The second is which
> > scenario
> > is more reasonable: Say the application simply forgot to flush the
> > session and will never use it again.  Simply reclaiming the session
> > would produce no effect at all on the application in this scenario.
> >  However, I have no data to say what's likely.
> > 
> > > ~~
> > > 
> > > Also, let's remember that this is a rare application.  Sessions
> > > are 
> > > only needed for remote access (requiring encryption, HMAC or
> > > salt), 
> > > or policy sessions.
> > 
> > This depends what your threat model is.  For ssh keys, you worry
> > that
> > someone might be watching, so you use HMAC authority even for a
> > local
> > TPM.  In the cloud, you don't quite know where the TPM is, so again
> > you'd use HMAC sessions ... however, in both use cases the sessions
> > should be very short lived.
> > 
> > > ~~
> > > 
> > > Should the code also reserve a session for the kernel?  Mark it
> > > not 
> > > kill'able?
> > 
> > At the moment, the kernel doesn't use sessions, so let's worry
> > about
> > that problem at the point it arises (if it ever arises).
> > 
> > James
> 
> It does. My trusted keys implementation actually uses sessions.

But as I read the code, I can't find where the kernel creates a
session.  It looks like the session and hmac are passed in as option
arguments, aren't they?

> I'm kind dilating to an opinion that we would leave this commit out 
> from the first kernel release that will contain the resource manager 
> with similar rationale as Jason gave me for whitelisting: get the 
> basic stuff in and once it is used with some workloads whitelisting 
> and exhaustion will take eventually the right form.
> 
> How would you feel about this?

As long as we get patch 1/2 then applications using sessions will
actually work with spaces, so taking more time with 2/2 is fine by me.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-30 16:04       ` [tpmdd-devel] " James Bottomley
@ 2017-01-30 21:58         ` Jarkko Sakkinen
  2017-01-30 22:13           ` James Bottomley
  2017-01-31 19:28         ` Ken Goldman
  1 sibling, 1 reply; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-01-30 21:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > On 1/27/2017 5:04 PM, James Bottomley wrote:
> > 
> > > > Beware the nasty corner case:
> > > > 
> > > > - Application asks for a session and gets 02000000
> > > > 
> > > > - Time elapses and 02000000 gets forcibly flushed
> > > > 
> > > > - Later, app comes back, asks for a second session and again gets
> > > > 02000000.
> > > > 
> > > > - App gets very confused.
> > > > 
> > > > May it be better to close the connection completely, which the
> > > > application can detect, than flush a session and give this corner
> > > > case?
> > > 
> > > if I look at the code I've written, I don't know what the session
> > > number is, I just save sessionHandle in a variable for later use 
> > > (lets say to v1).  If I got the same session number returned at a 
> > > later time and placed it in v2, all I'd notice is that an 
> > > authorization using v1 would fail.  I'm not averse to killing the 
> > > entire connection but, assuming you have fallback, it might be 
> > > kinder simply to ensure that the operations with the reclaimed 
> > > session fail (which is what the code currently does).
> > 
> > My worry is that this session failure cannot be detected by the 
> > application.  An HMAC failure could cause the app to tell a user that
> > they entered the wrong password.  Misleading.  On the TPM, it could 
> > trigger the dictionary attack lockout.  For a PIN index, it could 
> > consume a failure count.  Killing a policy session that has e.g., a 
> > policy signed term could cause the application to go back to some 
> > external entity for another authorization signature.
> > 
> > Let's go up to the stack.  What's the attack?
> > 
> > If we're worried about many simultaneous applications (wouldn't that 
> > be wonderful), why not just let startauthsession fail?  The 
> > application can just retry periodically.
> 
> How in that scenario do we ensure that a session becomes available? 
>  Once that's established, there's no real difference between retrying
> the startauthsession in the kernel when we know the session is
> available and forcing userspace to do the retry except that the former
> has a far greater chance of success (and it's only about 6 lines of
> code).
> 
> >   Just allocate them in triples so there's no deadlock.
> 
> Is this the application or the kernel?  If it's the kernel, that adds a
> lot of complexity.
> 
> > If we're worried about a DoS attack, killing a session just helps the
> > attacker.  The attacker can create a few connections and spin on 
> > startauthsession, locking everyone out anyway.
> 
> There are two considerations here: firstly we'd need to introduce a
> mechanism to "kill" the connection.  Probably we'd simply error every
> command on the space until it was closed.  The second is which scenario
> is more reasonable: Say the application simply forgot to flush the
> session and will never use it again.  Simply reclaiming the session
> would produce no effect at all on the application in this scenario. 
>  However, I have no data to say what's likely.
> 
> > ~~
> > 
> > Also, let's remember that this is a rare application.  Sessions are 
> > only needed for remote access (requiring encryption, HMAC or salt), 
> > or policy sessions.
> 
> This depends what your threat model is.  For ssh keys, you worry that
> someone might be watching, so you use HMAC authority even for a local
> TPM.  In the cloud, you don't quite know where the TPM is, so again
> you'd use HMAC sessions ... however, in both use cases the sessions
> should be very short lived.
> 
> > ~~
> > 
> > Should the code also reserve a session for the kernel?  Mark it not 
> > kill'able?
> 
> At the moment, the kernel doesn't use sessions, so let's worry about
> that problem at the point it arises (if it ever arises).
> 
> James

It does. My trusted keys implementation actually uses sessions.

I'm kind dilating to an opinion that we would leave this commit out from
the first kernel release that will contain the resource manager with
similar rationale as Jason gave me for whitelisting: get the basic stuff
in and once it is used with some workloads whitelisting and exhaustion
will take eventually the right form.

How would you feel about this?

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-30  0:52     ` Ken Goldman
@ 2017-01-30 16:04       ` James Bottomley
  2017-01-30 21:58         ` Jarkko Sakkinen
  2017-01-31 19:28         ` Ken Goldman
  0 siblings, 2 replies; 37+ messages in thread
From: James Bottomley @ 2017-01-30 16:04 UTC (permalink / raw)
  To: Ken Goldman, tpmdd-devel; +Cc: linux-security-module, linux-kernel

On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> On 1/27/2017 5:04 PM, James Bottomley wrote:
> 
> > > Beware the nasty corner case:
> > > 
> > > - Application asks for a session and gets 02000000
> > > 
> > > - Time elapses and 02000000 gets forcibly flushed
> > > 
> > > - Later, app comes back, asks for a second session and again gets
> > > 02000000.
> > > 
> > > - App gets very confused.
> > > 
> > > May it be better to close the connection completely, which the
> > > application can detect, than flush a session and give this corner
> > > case?
> > 
> > if I look at the code I've written, I don't know what the session
> > number is, I just save sessionHandle in a variable for later use 
> > (lets say to v1).  If I got the same session number returned at a 
> > later time and placed it in v2, all I'd notice is that an 
> > authorization using v1 would fail.  I'm not averse to killing the 
> > entire connection but, assuming you have fallback, it might be 
> > kinder simply to ensure that the operations with the reclaimed 
> > session fail (which is what the code currently does).
> 
> My worry is that this session failure cannot be detected by the 
> application.  An HMAC failure could cause the app to tell a user that
> they entered the wrong password.  Misleading.  On the TPM, it could 
> trigger the dictionary attack lockout.  For a PIN index, it could 
> consume a failure count.  Killing a policy session that has e.g., a 
> policy signed term could cause the application to go back to some 
> external entity for another authorization signature.
> 
> Let's go up to the stack.  What's the attack?
> 
> If we're worried about many simultaneous applications (wouldn't that 
> be wonderful), why not just let startauthsession fail?  The 
> application can just retry periodically.

How in that scenario do we ensure that a session becomes available? 
 Once that's established, there's no real difference between retrying
the startauthsession in the kernel when we know the session is
available and forcing userspace to do the retry except that the former
has a far greater chance of success (and it's only about 6 lines of
code).

>   Just allocate them in triples so there's no deadlock.

Is this the application or the kernel?  If it's the kernel, that adds a
lot of complexity.

> If we're worried about a DoS attack, killing a session just helps the
> attacker.  The attacker can create a few connections and spin on 
> startauthsession, locking everyone out anyway.

There are two considerations here: firstly we'd need to introduce a
mechanism to "kill" the connection.  Probably we'd simply error every
command on the space until it was closed.  The second is which scenario
is more reasonable: Say the application simply forgot to flush the
session and will never use it again.  Simply reclaiming the session
would produce no effect at all on the application in this scenario. 
 However, I have no data to say what's likely.

> ~~
> 
> Also, let's remember that this is a rare application.  Sessions are 
> only needed for remote access (requiring encryption, HMAC or salt), 
> or policy sessions.

This depends what your threat model is.  For ssh keys, you worry that
someone might be watching, so you use HMAC authority even for a local
TPM.  In the cloud, you don't quite know where the TPM is, so again
you'd use HMAC sessions ... however, in both use cases the sessions
should be very short lived.

> ~~
> 
> Should the code also reserve a session for the kernel?  Mark it not 
> kill'able?

At the moment, the kernel doesn't use sessions, so let's worry about
that problem at the point it arises (if it ever arises).

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-27 23:35     ` Jason Gunthorpe
@ 2017-01-27 23:48       ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-01-27 23:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Fri, 2017-01-27 at 16:35 -0700, Jason Gunthorpe wrote:
> On Fri, Jan 27, 2017 at 02:04:59PM -0800, James Bottomley wrote:
> 
> > if I look at the code I've written, I don't know what the session
> > number is, I just save sessionHandle in a variable for later use 
> > (lets say to v1).  If I got the same session number returned at a 
> > later time and placed it in v2, all I'd notice is that an 
> > authorization using v1 would fail.
> 
> Is there any way that could be used to cause an op thinking it is
> using v1 to authorize something it shouldn't?

Not really: in the parameter or HMAC case, you have to compute based on
the initial nonce given by the TPM when the session was created. 
 Assuming the initial nonce belonged to the evicted session, the HMAC
will now fail because the nonce of the v2 session is different.  There
is a corner case where you track the nonce in a table indexed by
handle, so when v2 is created, its nonce replaces the old v1 nonce in
the table.  Now you can use v1 and v2 without error (because use picks
up the correct nonce) and effectively they're interchangeable as the
same session.  Even in this case, you're not authorising something you
shouldn't, you're just using one session for the authorisations where
you thought you had two.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-27 22:04   ` [tpmdd-devel] " James Bottomley
@ 2017-01-27 23:35     ` Jason Gunthorpe
  2017-01-27 23:48       ` James Bottomley
  2017-01-30  0:52     ` Ken Goldman
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2017-01-27 23:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ken Goldman, tpmdd-devel, linux-security-module, linux-kernel

On Fri, Jan 27, 2017 at 02:04:59PM -0800, James Bottomley wrote:

> if I look at the code I've written, I don't know what the session
> number is, I just save sessionHandle in a variable for later use (lets
> say to v1).  If I got the same session number returned at a later time
> and placed it in v2, all I'd notice is that an authorization using v1
> would fail.

Is there any way that could be used to cause an op thinking it is
using v1 to authorize something it shouldn't?

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-27 21:42 ` Ken Goldman
@ 2017-01-27 22:04   ` James Bottomley
  2017-01-27 23:35     ` Jason Gunthorpe
  2017-01-30  0:52     ` Ken Goldman
  0 siblings, 2 replies; 37+ messages in thread
From: James Bottomley @ 2017-01-27 22:04 UTC (permalink / raw)
  To: Ken Goldman, tpmdd-devel; +Cc: linux-security-module, linux-kernel

On Fri, 2017-01-27 at 16:42 -0500, Ken Goldman wrote:
> On 1/18/2017 3:48 PM, James Bottomley wrote:
> > In a TPM2, sessions can be globally exhausted once there are
> > TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context
> > saved).
> > The Strategy for handling this is to keep a global count of all the
> > sessions along with their creation time.  Then if we see the TPM
> > run
> > out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for
> > one
> > to become free, but if it doesn't, we forcibly evict an existing
> > one.
> > The eviction strategy waits until the current command is repeated
> > to
> > evict the session which should guarantee there is an available
> > slot.
> 
> Beware the nasty corner case:
> 
> - Application asks for a session and gets 02000000
> 
> - Time elapses and 02000000 gets forcibly flushed
> 
> - Later, app comes back, asks for a second session and again gets
> 02000000.
> 
> - App gets very confused.
> 
> May it be better to close the connection completely, which the 
> application can detect, than flush a session and give this corner
> case?

if I look at the code I've written, I don't know what the session
number is, I just save sessionHandle in a variable for later use (lets
say to v1).  If I got the same session number returned at a later time
and placed it in v2, all I'd notice is that an authorization using v1
would fail.  I'm not averse to killing the entire connection but,
assuming you have fallback, it might be kinder simply to ensure that
the operations with the reclaimed session fail (which is what the code
currently does).

> ~~~~
> 
> Part of me says to defer this.  That is:
> 
> 64 sessions / 3 = 21 simultaneous applications.  If we have 21 
> simultaneous TCG applications, we'll all celebrate.  For the DoS,
> chmod and chgrp /dev/tpm and let only well behaved applications in 
> the group.
> 
> Agreed, it's not a long term solution.

My use case is secret protection in the cloud.  I can certainly see >
21 applications wanting to do this at roughly the same time. However,
the periods over which they actually all need sessions should be very
short, hence the leasing proposal which would stagger them.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-27 21:20     ` Ken Goldman
@ 2017-01-27 21:59       ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2017-01-27 21:59 UTC (permalink / raw)
  To: Ken Goldman, tpmdd-devel; +Cc: linux-security-module, linux-kernel

On Fri, 2017-01-27 at 16:20 -0500, Ken Goldman wrote:
> On 1/19/2017 7:41 AM, Jarkko Sakkinen wrote:
> > 
> > I actually think that the very best solution would be such that
> > sessions would be *always* lease based. So when you create a
> > session you would always loose within a time limit.
> > 
> > There would not be any special victim selection mechanism. You
> > would just loose your session within a time limit.
> 
> I worry about the time limit.
> 
> I have a proposed use case (policy signed) where the user sends the 
> session nonce along with a "payment" to a vendor and receives back a 
> signature authorization over the nonce.
> 
> The time could be minutes or even hours.

So the problem is that sessions are a limited resource and we need a
way to allocate them when under resource pressure.  Leasing is the
fairest way I can think of but I'm open to other mechanisms if you
propose them.

Note that the lease mechanism doesn't mean every session expires after
the limit, it just means that every session becomes eligible for
reclaim after the limit.  If there's no-one else waiting, you can keep
your session for hours.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-19 12:59   ` James Bottomley
@ 2017-01-20 13:40     ` Jarkko Sakkinen
  0 siblings, 0 replies; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-01-20 13:40 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-security-module, tpmdd-devel, open list

On Thu, Jan 19, 2017 at 07:59:04AM -0500, James Bottomley wrote:
> On Thu, 2017-01-19 at 14:25 +0200, Jarkko Sakkinen wrote:
> > On Wed, Jan 18, 2017 at 03:48:09PM -0500, James Bottomley wrote:
> > > In a TPM2, sessions can be globally exhausted once there are
> > > TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context
> > > saved).
> > > The Strategy for handling this is to keep a global count of all the
> > > sessions along with their creation time.  Then if we see the TPM
> > > run
> > > out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for
> > > one
> > > to become free, but if it doesn't, we forcibly evict an existing
> > > one.
> > > The eviction strategy waits until the current command is repeated
> > > to
> > > evict the session which should guarantee there is an available
> > > slot.
> > > 
> > > On the force eviction case, we make sure that the victim session is
> > > at
> > > least SESSION_TIMEOUT old (currently 2 seconds).  The wait queue
> > > for
> > > session slots is a FIFO one, ensuring that once we run out of
> > > sessions, everyone will get a session in a bounded time and once
> > > they
> > > get one, they'll have SESSION_TIMEOUT to use it before it may be
> > > subject to eviction.
> > > 
> > > Signed-off-by: James Bottomley <
> > > James.Bottomley@HansenPartnership.com>
> > 
> > I didn't yet read the code properly. I'll do a more proper review
> > once I have v4 of my patch set together. This comment is solely
> > based on your commit message.
> > 
> > I'm just thinking that do we need this complicated timeout stuff
> > or could you just kick a session out in LRU fashion as we run
> > out of them?
> > 
> > Or one variation of what you are doing: couldn't the session that
> > needs a session handle to do something sleep for 2 seconds and then
> > take the oldest session? It would have essentially the same effect
> > but no waitqueue needed.
> > 
> > Yeah, as I said, this is just commentary based on the description.
> 
> If you don't have a wait queue you lose fairness in resource allocation
> on starvation.  What happens is that you get RC_SESSION_HANDLES and
> sleep for 2s and retry.  Meanwhile someone frees a session, then next
> user grabs it while you were sleeping and when you wake you still get
> RC_SESSION_HANDLES.  I can basically DoS your process if I understand
> this. The only way to make the resource fairly allocated: i.e. the
> first person to sleep waiting for a session is the one who gets it when
> they wake is to make sure that you wake one waiter as soon as a free
> session comes in so probabalistically, they get the session.  If you
> look, there are two mechanisms for ensuring fairness: one is the FIFO
> wait queue (probabalistic) and the other is the reserved session which
> really ensures it belongs to you when you wake (deterministic but
> expensive, so this is only activated on the penultimate go around).
> 
> James

Right, I see your point.

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-19 12:25 ` [tpmdd-devel] " Jarkko Sakkinen
  2017-01-19 12:41   ` Jarkko Sakkinen
@ 2017-01-19 12:59   ` James Bottomley
  2017-01-20 13:40     ` Jarkko Sakkinen
  1 sibling, 1 reply; 37+ messages in thread
From: James Bottomley @ 2017-01-19 12:59 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-security-module, tpmdd-devel, open list

On Thu, 2017-01-19 at 14:25 +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 18, 2017 at 03:48:09PM -0500, James Bottomley wrote:
> > In a TPM2, sessions can be globally exhausted once there are
> > TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context
> > saved).
> > The Strategy for handling this is to keep a global count of all the
> > sessions along with their creation time.  Then if we see the TPM
> > run
> > out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for
> > one
> > to become free, but if it doesn't, we forcibly evict an existing
> > one.
> > The eviction strategy waits until the current command is repeated
> > to
> > evict the session which should guarantee there is an available
> > slot.
> > 
> > On the force eviction case, we make sure that the victim session is
> > at
> > least SESSION_TIMEOUT old (currently 2 seconds).  The wait queue
> > for
> > session slots is a FIFO one, ensuring that once we run out of
> > sessions, everyone will get a session in a bounded time and once
> > they
> > get one, they'll have SESSION_TIMEOUT to use it before it may be
> > subject to eviction.
> > 
> > Signed-off-by: James Bottomley <
> > James.Bottomley@HansenPartnership.com>
> 
> I didn't yet read the code properly. I'll do a more proper review
> once I have v4 of my patch set together. This comment is solely
> based on your commit message.
> 
> I'm just thinking that do we need this complicated timeout stuff
> or could you just kick a session out in LRU fashion as we run
> out of them?
> 
> Or one variation of what you are doing: couldn't the session that
> needs a session handle to do something sleep for 2 seconds and then
> take the oldest session? It would have essentially the same effect
> but no waitqueue needed.
> 
> Yeah, as I said, this is just commentary based on the description.

If you don't have a wait queue you lose fairness in resource allocation
on starvation.  What happens is that you get RC_SESSION_HANDLES and
sleep for 2s and retry.  Meanwhile someone frees a session, then next
user grabs it while you were sleeping and when you wake you still get
RC_SESSION_HANDLES.  I can basically DoS your process if I understand
this. The only way to make the resource fairly allocated: i.e. the
first person to sleep waiting for a session is the one who gets it when
they wake is to make sure that you wake one waiter as soon as a free
session comes in so probabalistically, they get the session.  If you
look, there are two mechanisms for ensuring fairness: one is the FIFO
wait queue (probabalistic) and the other is the reserved session which
really ensures it belongs to you when you wake (deterministic but
expensive, so this is only activated on the penultimate go around).

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-19 12:25 ` [tpmdd-devel] " Jarkko Sakkinen
@ 2017-01-19 12:41   ` Jarkko Sakkinen
  2017-01-27 21:20     ` Ken Goldman
  2017-01-19 12:59   ` James Bottomley
  1 sibling, 1 reply; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-01-19 12:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: tpmdd-devel, linux-security-module, open list

On Thu, Jan 19, 2017 at 02:25:33PM +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 18, 2017 at 03:48:09PM -0500, James Bottomley wrote:
> > In a TPM2, sessions can be globally exhausted once there are
> > TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context saved).
> > The Strategy for handling this is to keep a global count of all the
> > sessions along with their creation time.  Then if we see the TPM run
> > out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for one
> > to become free, but if it doesn't, we forcibly evict an existing one.
> > The eviction strategy waits until the current command is repeated to
> > evict the session which should guarantee there is an available slot.
> > 
> > On the force eviction case, we make sure that the victim session is at
> > least SESSION_TIMEOUT old (currently 2 seconds).  The wait queue for
> > session slots is a FIFO one, ensuring that once we run out of
> > sessions, everyone will get a session in a bounded time and once they
> > get one, they'll have SESSION_TIMEOUT to use it before it may be
> > subject to eviction.
> > 
> > Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> 
> I didn't yet read the code properly. I'll do a more proper review
> once I have v4 of my patch set together. This comment is solely
> based on your commit message.
> 
> I'm just thinking that do we need this complicated timeout stuff
> or could you just kick a session out in LRU fashion as we run
> out of them?
> 
> Or one variation of what you are doing: couldn't the session that
> needs a session handle to do something sleep for 2 seconds and then
> take the oldest session? It would have essentially the same effect
> but no waitqueue needed.
> 
> Yeah, as I said, this is just commentary based on the description.

I actually think that the very best solution would be such that
sessions would be *always* lease based. So when you create a
session you would always loose within a time limit.

There would not be any special victim selection mechanism. You
would just loose your session within a time limit.

This could be already part of the session isolation and would
actually make only isolation usable.

We do not have API yet locked so why not make API that models
the nature of the resource. Here given that the amount of sessions
is always fixed leases make sense.

You just then need a wait queue for those waiting for leases.
They don't need to do any victim selectio or whatever. Everything
that takes above the lease gets flushed.

I strongly feel that this would be the best long term solution.

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion
  2017-01-18 20:48 James Bottomley
@ 2017-01-19 12:25 ` Jarkko Sakkinen
  2017-01-19 12:41   ` Jarkko Sakkinen
  2017-01-19 12:59   ` James Bottomley
  2017-01-27 21:42 ` Ken Goldman
  1 sibling, 2 replies; 37+ messages in thread
From: Jarkko Sakkinen @ 2017-01-19 12:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: tpmdd-devel, linux-security-module, open list

On Wed, Jan 18, 2017 at 03:48:09PM -0500, James Bottomley wrote:
> In a TPM2, sessions can be globally exhausted once there are
> TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context saved).
> The Strategy for handling this is to keep a global count of all the
> sessions along with their creation time.  Then if we see the TPM run
> out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for one
> to become free, but if it doesn't, we forcibly evict an existing one.
> The eviction strategy waits until the current command is repeated to
> evict the session which should guarantee there is an available slot.
> 
> On the force eviction case, we make sure that the victim session is at
> least SESSION_TIMEOUT old (currently 2 seconds).  The wait queue for
> session slots is a FIFO one, ensuring that once we run out of
> sessions, everyone will get a session in a bounded time and once they
> get one, they'll have SESSION_TIMEOUT to use it before it may be
> subject to eviction.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

I didn't yet read the code properly. I'll do a more proper review
once I have v4 of my patch set together. This comment is solely
based on your commit message.

I'm just thinking that do we need this complicated timeout stuff
or could you just kick a session out in LRU fashion as we run
out of them?

Or one variation of what you are doing: couldn't the session that
needs a session handle to do something sleep for 2 seconds and then
take the oldest session? It would have essentially the same effect
but no waitqueue needed.

Yeah, as I said, this is just commentary based on the description.

/Jarkko

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-02-17 22:37 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-10 10:03 [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion Dr. Greg Wettstein
2017-02-10 10:03 ` Dr. Greg Wettstein
2017-02-10 16:46 ` [tpmdd-devel] " James Bottomley
2017-02-10 16:46   ` James Bottomley
     [not found]   ` <1486745163.2502.26.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-02-10 21:13     ` Kenneth Goldman
2017-02-14 14:38       ` [tpmdd-devel] " Dr. Greg Wettstein
2017-02-14 16:47         ` James Bottomley
2017-02-14 16:47           ` James Bottomley
     [not found]         ` <71dc0e80-6678-a124-9184-1f93c8532d09@linux.vnet.ibm.com>
2017-02-16 20:06           ` [tpmdd-devel] " Dr. Greg Wettstein
2017-02-16 20:33             ` Jarkko Sakkinen
2017-02-17  9:56               ` Dr. Greg Wettstein
2017-02-17 12:37                 ` Jarkko Sakkinen
2017-02-17 22:37                   ` Dr. Greg Wettstein
2017-02-10 21:18     ` Kenneth Goldman
2017-02-12 20:29   ` [tpmdd-devel] " Ken Goldman
2017-02-12 20:29     ` Ken Goldman
  -- strict thread matches above, loose matches on Subject: below --
2017-02-09  9:06 Dr. Greg Wettstein
2017-02-09 15:19 ` Jarkko Sakkinen
2017-02-09 19:04   ` Jason Gunthorpe
2017-02-09 19:29     ` James Bottomley
2017-02-09 21:54       ` Jason Gunthorpe
2017-02-10  8:48     ` Jarkko Sakkinen
2017-02-09 19:24 ` James Bottomley
2017-02-09 20:05 ` James Bottomley
2017-01-18 20:48 James Bottomley
2017-01-19 12:25 ` [tpmdd-devel] " Jarkko Sakkinen
2017-01-19 12:41   ` Jarkko Sakkinen
2017-01-27 21:20     ` Ken Goldman
2017-01-27 21:59       ` [tpmdd-devel] " James Bottomley
2017-01-19 12:59   ` James Bottomley
2017-01-20 13:40     ` Jarkko Sakkinen
2017-01-27 21:42 ` Ken Goldman
2017-01-27 22:04   ` [tpmdd-devel] " James Bottomley
2017-01-27 23:35     ` Jason Gunthorpe
2017-01-27 23:48       ` James Bottomley
2017-01-30  0:52     ` Ken Goldman
2017-01-30 16:04       ` [tpmdd-devel] " James Bottomley
2017-01-30 21:58         ` Jarkko Sakkinen
2017-01-30 22:13           ` James Bottomley
2017-01-31 13:31             ` Jarkko Sakkinen
2017-01-31 19:28         ` Ken Goldman
2017-01-31 19:55           ` [tpmdd-devel] " James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.