linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* RNDR/SS vs. SMCCC
@ 2021-05-26 23:54 Benjamin Herrenschmidt
  2021-05-27 12:50 ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2021-05-26 23:54 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Mark Brown, Will Deacon, Saidi, Ali, benh, linux-arm-kernel

Hi folks !

I'm trying to understand the rationale for having SMCCC take precedence
over the architected instruction in
arch/arm64/include/asm/archrandom.h 

Especially I don't quite get the comment about SMCCC being more in the
"spirit" than let's say RNDRSS. It's definitely MUCH slower.

Also, why not implement arch_get_random_* using RNDR rather than
forcing these through the kernel CNRG ?

Finally, RNDR/RNDRSS can fail occasionally, we should probably
implement some kind of delay+retry here. I'm not super familiar with
the context in which arch_get_random_* can be called (are they allowed
to sleep ?) but from the implementation I'm aware of at least, the idea
is that they might under some HW recovery scenarii so we should sleep a
couple of ms and try again a few times.

Thoughts ?

Cheers,
Ben.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-05-26 23:54 RNDR/SS vs. SMCCC Benjamin Herrenschmidt
@ 2021-05-27 12:50 ` Mark Brown
  2021-05-27 23:12   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2021-05-27 12:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andre Przywara, Will Deacon, Saidi, Ali, benh, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2957 bytes --]

On Thu, May 27, 2021 at 09:54:03AM +1000, Benjamin Herrenschmidt wrote:

> I'm trying to understand the rationale for having SMCCC take precedence
> over the architected instruction in
> arch/arm64/include/asm/archrandom.h 

> Especially I don't quite get the comment about SMCCC being more in the
> "spirit" than let's say RNDRSS. It's definitely MUCH slower.

Like the comments say the arm64 RNDR register is not intended to
be directly providing entropy but rather returning the output of
a DRBG which is seeded from an actual entropy source whereas the
SMCCC call is intended to return something from an actual entropy
source directly.  The thinking was that the latter is more in
line with what the callers are expecting for the _seed variants,
it's kind of unclear what the performance/quality tradeoff is
supposed to be though.

See the thread at

   https://lore.kernel.org/r/CAKv+Gu8y4zwpesytU7vffSCuq8YAjWcHwFHwa_LhTW_cLiSfQw@mail.gmail.com

for discussion of the use of RNDR rather than RNDRRS - the main
concern was the potential for issues with excessive use of
entropy especially on bigger systems, possibly amplified if
there's lots of VMs, though in the meantime the random.c code got
updated to read seeds a lot less often (390596c9959c2a4f5b4
"random: avoid arch_get_random_seed_long() when collecting IRQ
randomness") so the considerations have changed a bit since the
original discussion and we could probably already revisit that
one already.

> Also, why not implement arch_get_random_* using RNDR rather than
> forcing these through the kernel CNRG ?

That was the same consideration with potential issues due to
excessive usage, it was dropped at the same time as switching to
RNDR for the _seed variants.  

This was all before I started looking at the series but my
understanding is that the general idea was to go for a
conservative implementation initially and then reevaluate once
we've got actual systems and can measure how things perform in
practice.

In practice most of the non-seed arch_get_random usage is either
a fallback if there's no seed variant or mixed in with the CRNG
output anyway.

> Finally, RNDR/RNDRSS can fail occasionally, we should probably
> implement some kind of delay+retry here. I'm not super familiar with
> the context in which arch_get_random_* can be called (are they allowed
> to sleep ?) but from the implementation I'm aware of at least, the idea
> is that they might under some HW recovery scenarii so we should sleep a
> couple of ms and try again a few times.

The arch_get_random_ interfaces already provide a return code
which the callers can handle as needed, they can do something
appropriate for the scenario rather than the architecture code
having to pick.  Sometimes that's to retry later (random.c does
this when seeding for example), sometimes that's to just carry on
with whatever other entropy they've already got.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-05-27 12:50 ` Mark Brown
@ 2021-05-27 23:12   ` Benjamin Herrenschmidt
  2021-05-28 12:56     ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2021-05-27 23:12 UTC (permalink / raw)
  To: Mark Brown
  Cc: Andre Przywara, Will Deacon, Saidi, Ali, linux-arm-kernel, benh

On Thu, 2021-05-27 at 13:50 +0100, Mark Brown wrote:
> On Thu, May 27, 2021 at 09:54:03AM +1000, Benjamin Herrenschmidt
> wrote:

Hi Mark !
> 
> Like the comments say the arm64 RNDR register is not intended to
> be directly providing entropy but rather returning the output of
> a DRBG which is seeded from an actual entropy source whereas the
> SMCCC call is intended to return something from an actual entropy
> source directly.

Right, I was thinking about using RNDRSS instead.

>   The thinking was that the latter is more in
> line with what the callers are expecting for the _seed variants,
> it's kind of unclear what the performance/quality tradeoff is
> supposed to be though.
> 
> See the thread at
>
>   
https://lore.kernel.org/r/CAKv+Gu8y4zwpesytU7vffSCuq8YAjWcHwFHwa_LhTW_cLiSfQw@mail.gmail.com
> 
> for discussion of the use of RNDR rather than RNDRRS - the main
> concern was the potential for issues with excessive use of
> entropy especially on bigger systems, possibly amplified if
> there's lots of VMs, though in the meantime the random.c code got
> updated to read seeds a lot less often (390596c9959c2a4f5b4
> "random: avoid arch_get_random_seed_long() when collecting IRQ
> randomness") so the considerations have changed a bit since the
> original discussion and we could probably already revisit that
> one already.

Right. It's also a lot of assumptions without data on what the actual
implementations can do but I suppose there was no much choice here :-)

> > Also, why not implement arch_get_random_* using RNDR rather than
> > forcing these through the kernel CNRG ?
> 
> That was the same consideration with potential issues due to
> excessive usage, it was dropped at the same time as switching to
> RNDR for the _seed variants.  
> 
> This was all before I started looking at the series but my
> understanding is that the general idea was to go for a
> conservative implementation initially and then reevaluate once
> we've got actual systems and can measure how things perform in
> practice.
> 
> In practice most of the non-seed arch_get_random usage is either
> a fallback if there's no seed variant or mixed in with the CRNG
> output anyway.

Right but I still don't believe the end result makes a whole lot of
sense. In absence of SMCCC we end up using RNDR as a seed which hits
the "not a great match" comment. Not a huge deal I suppose, for our
(EC2) case, we could just not implement the SMCCC call, and let it use
RNDR, it's still going to be better than no HW random source, I just
don't like those firmware calls much.

> The arch_get_random_ interfaces already provide a return code
> which the callers can handle as needed, they can do something
> appropriate for the scenario rather than the architecture code
> having to pick.  Sometimes that's to retry later (random.c does
> this when seeding for example), sometimes that's to just carry on
> with whatever other entropy they've already got.

Ok. It's unfortunate that the ISA is so vague on the circumstances
where the instructions are allowed to fail... it says "a reasonable
amount of time", it may as well have said a "random amount of time" for
the usefulness of it ;-)

The implementation I'm aware of will fail extremely rarely when the HW
detects an issues that requires corrective action, but I could imagine
some implemetations just failing when there's no entropy at hand (esp.
with RNDRSS).

As long as the callers don't give up permanently, that's fine. I was
just a bit concerned by cnrg_init_try_arch{_early}. It would be
preferable for these to "try harder".

Cheers,
Ben.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-05-27 23:12   ` Benjamin Herrenschmidt
@ 2021-05-28 12:56     ` Mark Brown
  2021-05-29  2:36       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2021-05-28 12:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andre Przywara, Will Deacon, Saidi, Ali, linux-arm-kernel, benh


[-- Attachment #1.1: Type: text/plain, Size: 4052 bytes --]

On Fri, May 28, 2021 at 09:12:38AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2021-05-27 at 13:50 +0100, Mark Brown wrote:

> > Like the comments say the arm64 RNDR register is not intended to
> > be directly providing entropy but rather returning the output of
> > a DRBG which is seeded from an actual entropy source whereas the
> > SMCCC call is intended to return something from an actual entropy
> > source directly.

> Right, I was thinking about using RNDRSS instead.

Yeah, I *suspect* that's where we'll end up longer term now that
the random.c code is less enthusiastic about calling the
function - it was where the patches where initially until the
concerns about overloading were raised and it does seem to map
more naturally onto the API.  I do think we should hold off until
we've got some concrete information on how real systems perform
simply to avoid churn but I wouldn't be surprised to see us
making changes once we have data.  Sounds like you'll be able to
help out here!

Note that they do both get washed through the PRNG, not that I
think it makes a huge difference to the argument here.

> > In practice most of the non-seed arch_get_random usage is either
> > a fallback if there's no seed variant or mixed in with the CRNG
> > output anyway.

> Right but I still don't believe the end result makes a whole lot of
> sense. In absence of SMCCC we end up using RNDR as a seed which hits
> the "not a great match" comment. Not a huge deal I suppose, for our
> (EC2) case, we could just not implement the SMCCC call, and let it use
> RNDR, it's still going to be better than no HW random source, I just
> don't like those firmware calls much.

I do see them as useful for the seeding case, it shouldn't be in
quite such sensitive fast paths as the regular versions and 
it means that if a system has a better entropy source than the
one backing RNDR (especially if the one backing RNDR has some
actual problem) then we can override in software.  As you say if
the SMCCC isn't offering anything over the system registers then
platforms don't need to implement it.

> > The arch_get_random_ interfaces already provide a return code
> > which the callers can handle as needed, they can do something
> > appropriate for the scenario rather than the architecture code
> > having to pick.  Sometimes that's to retry later (random.c does
> > this when seeding for example), sometimes that's to just carry on
> > with whatever other entropy they've already got.
> 
> Ok. It's unfortunate that the ISA is so vague on the circumstances
> where the instructions are allowed to fail... it says "a reasonable
> amount of time", it may as well have said a "random amount of time" for
> the usefulness of it ;-)

> The implementation I'm aware of will fail extremely rarely when the HW
> detects an issues that requires corrective action, but I could imagine
> some implemetations just failing when there's no entropy at hand (esp.
> with RNDRSS).

Yes, I think there being inadequate entropy at hand to reseed is
the big concern here - some of that's going to be a quality
tradeoff and it's very hard to actually enforce any constraints
even if you define them so ultimately it all comes down to
quality of implementation issues.

> As long as the callers don't give up permanently, that's fine. I was
> just a bit concerned by cnrg_init_try_arch{_early}. It would be
> preferable for these to "try harder".

Yeah, there is some scope for retries there - unfortunately the
arch_get_random_ interface can't distinguish between temporary
and permanent failures, and people won't want to to slow down the
boot path at all by actually blocking.  Looking again there's
some scope for improving this process, we will continue to pull
seed values in crng_reseed() but not in quite the same way.  I'm
now wondering if it'd make sense to hook retries of the
architecture init into or alongside try_to_generate_entropy() in
wait_for_random_bytes() when trust_cpu is set.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-05-28 12:56     ` Mark Brown
@ 2021-05-29  2:36       ` Benjamin Herrenschmidt
  2021-05-31  1:02         ` Andre Przywara
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2021-05-29  2:36 UTC (permalink / raw)
  To: Mark Brown; +Cc: Andre Przywara, Will Deacon, Saidi, Ali, linux-arm-kernel

On Fri, 2021-05-28 at 13:56 +0100, Mark Brown wrote:
> On Fri, May 28, 2021 at 09:12:38AM +1000, Benjamin Herrenschmidt
> wrote:
> > 
> > Right, I was thinking about using RNDRSS instead.
> 
> Yeah, I *suspect* that's where we'll end up longer term now that
> the random.c code is less enthusiastic about calling the
> function - it was where the patches where initially until the
> concerns about overloading were raised and it does seem to map
> more naturally onto the API.  I do think we should hold off until
> we've got some concrete information on how real systems perform
> simply to avoid churn but I wouldn't be surprised to see us
> making changes once we have data.  Sounds like you'll be able to
> help out here!

Hehe yup.

> Note that they do both get washed through the PRNG, not that I
> think it makes a huge difference to the argument here.

Right. At this point, I think we can wait until HW is there and we have
enough data to decide what to do with the policy. I wish the ISA was
clearer in defining timing characteristics (and fail behaviour) of
those instructions... as-is, we'll probably have to mess around based
on whatever HW comes out, worse, possibly with quirks.

> > > In practice most of the non-seed arch_get_random usage is either
> > > a fallback if there's no seed variant or mixed in with the CRNG
> > > output anyway.
> > Right but I still don't believe the end result makes a whole lot of
> > sense. In absence of SMCCC we end up using RNDR as a seed which
> > hits
> > the "not a great match" comment. Not a huge deal I suppose, for our
> > (EC2) case, we could just not implement the SMCCC call, and let it
> > use
> > RNDR, it's still going to be better than no HW random source, I
> > just
> > don't like those firmware calls much.
> 
> I do see them as useful for the seeding case, it shouldn't be in
> quite such sensitive fast paths as the regular versions and 
> it means that if a system has a better entropy source than the
> one backing RNDR (especially if the one backing RNDR has some
> actual problem) then we can override in software.  As you say if
> the SMCCC isn't offering anything over the system registers then
> platforms don't need to implement it.

As far as I can tell, the ISA has some pretty strict requirements for
the entropy source backing RNDR, so ideally, if implementations are
compliants, it *should* be a non-issue. Famous last words... :)

> > > The arch_get_random_ interfaces already provide a return code
> > > which the callers can handle as needed, they can do something
> > > appropriate for the scenario rather than the architecture code
> > > having to pick.  Sometimes that's to retry later (random.c does
> > > this when seeding for example), sometimes that's to just carry on
> > > with whatever other entropy they've already got.
> > 
> > Ok. It's unfortunate that the ISA is so vague on the circumstances
> > where the instructions are allowed to fail... it says "a reasonable
> > amount of time", it may as well have said a "random amount of time"
> > for
> > the usefulness of it ;-)
> > The implementation I'm aware of will fail extremely rarely when the
> > HW
> > detects an issues that requires corrective action, but I could
> > imagine
> > some implemetations just failing when there's no entropy at hand
> > (esp.
> > with RNDRSS).
> 
> Yes, I think there being inadequate entropy at hand to reseed is
> the big concern here - some of that's going to be a quality
> tradeoff and it's very hard to actually enforce any constraints
> even if you define them so ultimately it all comes down to
> quality of implementation issues.

Yup. I hate this but I foresee a future where we'll have implementation
quirks. I hope not but ...

> > As long as the callers don't give up permanently, that's fine. I
> > was
> > just a bit concerned by cnrg_init_try_arch{_early}. It would be
> > preferable for these to "try harder".
> 
> Yeah, there is some scope for retries there - unfortunately the
> arch_get_random_ interface can't distinguish between temporary
> and permanent failures, and people won't want to to slow down the
> boot path at all by actually blocking.  Looking again there's
> some scope for improving this process, we will continue to pull
> seed values in crng_reseed() but not in quite the same way.  I'm
> now wondering if it'd make sense to hook retries of the
> architecture init into or alongside try_to_generate_entropy() in
> wait_for_random_bytes() when trust_cpu is set.

Could be. I'm away from the code right now, but I would like to avoid
having the system behaviour change overall based on whether it happened
to hit a failure case at boot or not. For deployments at scale, that
sort of "randomness" isn't the sort we want :)

Cheers,
Ben.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-05-29  2:36       ` Benjamin Herrenschmidt
@ 2021-05-31  1:02         ` Andre Przywara
  2021-05-31  5:24           ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Andre Przywara @ 2021-05-31  1:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Mark Brown, Will Deacon, Saidi, Ali, linux-arm-kernel, Ard Biesheuvel

On Sat, 29 May 2021 12:36:06 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

Hi,

(adding Ard, as he had some say in the decisions back then).

> On Fri, 2021-05-28 at 13:56 +0100, Mark Brown wrote:
> > On Fri, May 28, 2021 at 09:12:38AM +1000, Benjamin Herrenschmidt
> > wrote:  
> > > 
> > > Right, I was thinking about using RNDRSS instead.  
> > 
> > Yeah, I *suspect* that's where we'll end up longer term now that
> > the random.c code is less enthusiastic about calling the
> > function - it was where the patches where initially until the
> > concerns about overloading were raised and it does seem to map
> > more naturally onto the API.  I do think we should hold off until
> > we've got some concrete information on how real systems perform
> > simply to avoid churn but I wouldn't be surprised to see us
> > making changes once we have data.  Sounds like you'll be able to
> > help out here!  
> 
> Hehe yup.

In general quality of implementation was one concern against using the
instructions. The actual entropy source would be provided by the *SoC
vendor*, so we can't easily match this against, say the MIDR.
The firmware interface provides means to disqualify certain
implementations (using its GUID), also the firmware might get
workarounds that fix problems (or even use another entropy source).

> > Note that they do both get washed through the PRNG, not that I
> > think it makes a huge difference to the argument here.  
> 
> Right. At this point, I think we can wait until HW is there and we have
> enough data to decide what to do with the policy. I wish the ISA was
> clearer in defining timing characteristics (and fail behaviour) of
> those instructions... as-is, we'll probably have to mess around based
> on whatever HW comes out, worse, possibly with quirks.

I was playing around with the TRNG on our Juno board, and figured the
raw performance of the TRNG (through MMIO!) to be around 60MB/s. This
seems to be inline with most x86 entropy sources, I guess they also use
clock jitter or thermal noise, which generally should provide
"enough" (TM) entropy, at least for this seeding use case.
So I wouldn't expect many problems with well behaved users. The
firmware implementation allows to address this (ab)use case by gently
stalling "leechers".
But I agree that this is all quite theoretic at this point, and actual
implementations would be nice to see.

> > > > In practice most of the non-seed arch_get_random usage is either
> > > > a fallback if there's no seed variant or mixed in with the CRNG
> > > > output anyway.  
> > > Right but I still don't believe the end result makes a whole lot of
> > > sense. In absence of SMCCC we end up using RNDR as a seed which
> > > hits
> > > the "not a great match" comment. Not a huge deal I suppose, for our
> > > (EC2) case, we could just not implement the SMCCC call, and let it
> > > use
> > > RNDR, it's still going to be better than no HW random source, I
> > > just
> > > don't like those firmware calls much.  
> > 
> > I do see them as useful for the seeding case, it shouldn't be in
> > quite such sensitive fast paths as the regular versions and 
> > it means that if a system has a better entropy source than the
> > one backing RNDR (especially if the one backing RNDR has some
> > actual problem) then we can override in software.  As you say if
> > the SMCCC isn't offering anything over the system registers then
> > platforms don't need to implement it.  
> 
> As far as I can tell, the ISA has some pretty strict requirements for
> the entropy source backing RNDR, so ideally, if implementations are
> compliants, it *should* be a non-issue. Famous last words... :)

Indeed, reality can be a annoying. But given the current frequency of
the calls to arch_get_random() - just to reseed the CRNG - now in the
once-per-5-minute range, I think the performance concerns of the SMCCC
are somewhat secondary - even when including the multiple VMs case.

> > > > The arch_get_random_ interfaces already provide a return code
> > > > which the callers can handle as needed, they can do something
> > > > appropriate for the scenario rather than the architecture code
> > > > having to pick.  Sometimes that's to retry later (random.c does
> > > > this when seeding for example), sometimes that's to just carry on
> > > > with whatever other entropy they've already got.  
> > > 
> > > Ok. It's unfortunate that the ISA is so vague on the circumstances
> > > where the instructions are allowed to fail... it says "a reasonable
> > > amount of time", it may as well have said a "random amount of time"
> > > for
> > > the usefulness of it ;-)
> > > The implementation I'm aware of will fail extremely rarely when the
> > > HW
> > > detects an issues that requires corrective action, but I could
> > > imagine
> > > some implemetations just failing when there's no entropy at hand
> > > (esp.
> > > with RNDRSS).  
> > 
> > Yes, I think there being inadequate entropy at hand to reseed is
> > the big concern here - some of that's going to be a quality
> > tradeoff and it's very hard to actually enforce any constraints
> > even if you define them so ultimately it all comes down to
> > quality of implementation issues.  
> 
> Yup. I hate this but I foresee a future where we'll have implementation
> quirks. I hope not but ...
> 
> > > As long as the callers don't give up permanently, that's fine. I
> > > was
> > > just a bit concerned by cnrg_init_try_arch{_early}. It would be
> > > preferable for these to "try harder".  
> > 
> > Yeah, there is some scope for retries there - unfortunately the
> > arch_get_random_ interface can't distinguish between temporary
> > and permanent failures, and people won't want to to slow down the
> > boot path at all by actually blocking.  Looking again there's
> > some scope for improving this process, we will continue to pull
> > seed values in crng_reseed() but not in quite the same way.  I'm
> > now wondering if it'd make sense to hook retries of the
> > architecture init into or alongside try_to_generate_entropy() in
> > wait_for_random_bytes() when trust_cpu is set.  
> 
> Could be. I'm away from the code right now, but I would like to avoid
> having the system behaviour change overall based on whether it happened
> to hit a failure case at boot or not. For deployments at scale, that
> sort of "randomness" isn't the sort we want :)

So are your concerns primarily about the system blocking or the
instructions returning empty-handed while waiting for entropy?
Especially during the initial seeding and address space randmisation
at boot time? In my experience the demand for entropy at this
point (a few hundred bytes, IIRC) are easily handled by any hardware
implementation (given the above mentioned 60MB/s). And even 100 VMs
starting at the same time should be serviceable by the hardware.
So this stalling case is more if one user is actually hammering the
instructions, which the firmware implementation helps to protect
against.

Cheers,
Andre

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-05-31  1:02         ` Andre Przywara
@ 2021-05-31  5:24           ` Ard Biesheuvel
  2021-06-02 22:04             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-05-31  5:24 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Benjamin Herrenschmidt, Mark Brown, Will Deacon, Saidi, Ali, Linux ARM

Hello all,

On Mon, 31 May 2021 at 03:03, Andre Przywara <andre.przywara@arm.com> wrote:
>
> On Sat, 29 May 2021 12:36:06 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> Hi,
>
> (adding Ard, as he had some say in the decisions back then).
>
> > On Fri, 2021-05-28 at 13:56 +0100, Mark Brown wrote:
> > > On Fri, May 28, 2021 at 09:12:38AM +1000, Benjamin Herrenschmidt
> > > wrote:
> > > >
> > > > Right, I was thinking about using RNDRSS instead.
> > >
> > > Yeah, I *suspect* that's where we'll end up longer term now that
> > > the random.c code is less enthusiastic about calling the
> > > function - it was where the patches where initially until the
> > > concerns about overloading were raised and it does seem to map
> > > more naturally onto the API.  I do think we should hold off until
> > > we've got some concrete information on how real systems perform
> > > simply to avoid churn but I wouldn't be surprised to see us
> > > making changes once we have data.  Sounds like you'll be able to
> > > help out here!
> >
> > Hehe yup.
>
> In general quality of implementation was one concern against using the
> instructions. The actual entropy source would be provided by the *SoC
> vendor*, so we can't easily match this against, say the MIDR.
> The firmware interface provides means to disqualify certain
> implementations (using its GUID), also the firmware might get
> workarounds that fix problems (or even use another entropy source).
>

Note that you cannot trivially seed a DRBG from another DRBG and still
comply with the pertinent NIST specs. In particular, you cannot seed a
DRBG of security strength > n from a DRBG of security strength <= n
without jumping through some hoops (if it is even possible to begin
with - I don't remember from the top of my head). Other national or
industry specs may have similar constraints.

Another thing to keep in mind about RNDRRS is that reseeding a typical
DRBG takes 400+ bits of entropy, and so it might be wasteful to use
400 bits of entropy to produce a single 64 bit random number. Per
NIST, reseeding is only required every 2^48 invocations (not
bits/bytes), except when prediction resistance is needed (i.e., when
there is a risk that the internal state of the DRBG may have leaked).

So it would have been *much* better to have one instruction for random
numbers and one for true entropy, but for some reason, that did not
make the cut at the time.


> > > Note that they do both get washed through the PRNG, not that I
> > > think it makes a huge difference to the argument here.
> >
> > Right. At this point, I think we can wait until HW is there and we have
> > enough data to decide what to do with the policy. I wish the ISA was
> > clearer in defining timing characteristics (and fail behaviour) of
> > those instructions... as-is, we'll probably have to mess around based
> > on whatever HW comes out, worse, possibly with quirks.
>
> I was playing around with the TRNG on our Juno board, and figured the
> raw performance of the TRNG (through MMIO!) to be around 60MB/s. This
> seems to be inline with most x86 entropy sources, I guess they also use
> clock jitter or thermal noise, which generally should provide
> "enough" (TM) entropy, at least for this seeding use case.
> So I wouldn't expect many problems with well behaved users. The
> firmware implementation allows to address this (ab)use case by gently
> stalling "leechers".
> But I agree that this is all quite theoretic at this point, and actual
> implementations would be nice to see.
>

Indeed. This also argues for one instruction for random numbers, which
will go on forever without stalling (unless you hit the highly
theoretical limit of 2^48 invocations), and another one for true
entropy, which may fail if no entropy is available at that point. The
fact that RNDR can fail as well is a bit silly, although I suppose
there needs to be some RAS handling of h/w failure of the entropy
source in that case.

> > > > > In practice most of the non-seed arch_get_random usage is either
> > > > > a fallback if there's no seed variant or mixed in with the CRNG
> > > > > output anyway.
> > > > Right but I still don't believe the end result makes a whole lot of
> > > > sense. In absence of SMCCC we end up using RNDR as a seed which
> > > > hits
> > > > the "not a great match" comment. Not a huge deal I suppose, for our
> > > > (EC2) case, we could just not implement the SMCCC call, and let it
> > > > use
> > > > RNDR, it's still going to be better than no HW random source, I
> > > > just
> > > > don't like those firmware calls much.
> > >
> > > I do see them as useful for the seeding case, it shouldn't be in
> > > quite such sensitive fast paths as the regular versions and
> > > it means that if a system has a better entropy source than the
> > > one backing RNDR (especially if the one backing RNDR has some
> > > actual problem) then we can override in software.  As you say if
> > > the SMCCC isn't offering anything over the system registers then
> > > platforms don't need to implement it.
> >
> > As far as I can tell, the ISA has some pretty strict requirements for
> > the entropy source backing RNDR, so ideally, if implementations are
> > compliants, it *should* be a non-issue. Famous last words... :)
>
> Indeed, reality can be a annoying. But given the current frequency of
> the calls to arch_get_random() - just to reseed the CRNG - now in the
> once-per-5-minute range, I think the performance concerns of the SMCCC
> are somewhat secondary - even when including the multiple VMs case.
>

The SMCCC interface was created not as an alternative/complement to
RNDR/RNDRRS, but to codify what several partners (including AWS) were
doing at the time:
- SOC generation n-1 had a TRNG IP block mapped in the non-secure
world, and a driver in upstream Linux with DT bindings,
clock/regulator/PM controls, etc.
- SOC generation n has the same IP block mapped in the secure world,
and the vendor proposes a new DT binding -'v2', and a driver with the
clock/regulator/PM controls removed, and the only thing remaining a
SMC call in the SIP space to get some entropy from the secure side.

This approach was deemed unfortunate, as it caused needless
fragmentation, while still forcing the TRNG access to go through the
driver stack, which does not make it available until relatively late
during the boot. So the idea was that, given the lack of any
functionality in the driver other than the SMC call, to define an
architected SMC that can be invoked from arch code, so much earlier,
with a uniform ABI that anyone can implement.


> > > > > The arch_get_random_ interfaces already provide a return code
> > > > > which the callers can handle as needed, they can do something
> > > > > appropriate for the scenario rather than the architecture code
> > > > > having to pick.  Sometimes that's to retry later (random.c does
> > > > > this when seeding for example), sometimes that's to just carry on
> > > > > with whatever other entropy they've already got.
> > > >
> > > > Ok. It's unfortunate that the ISA is so vague on the circumstances
> > > > where the instructions are allowed to fail... it says "a reasonable
> > > > amount of time", it may as well have said a "random amount of time"
> > > > for
> > > > the usefulness of it ;-)
> > > > The implementation I'm aware of will fail extremely rarely when the
> > > > HW
> > > > detects an issues that requires corrective action, but I could
> > > > imagine
> > > > some implemetations just failing when there's no entropy at hand
> > > > (esp.
> > > > with RNDRSS).
> > >
> > > Yes, I think there being inadequate entropy at hand to reseed is
> > > the big concern here - some of that's going to be a quality
> > > tradeoff and it's very hard to actually enforce any constraints
> > > even if you define them so ultimately it all comes down to
> > > quality of implementation issues.
> >
> > Yup. I hate this but I foresee a future where we'll have implementation
> > quirks. I hope not but ...
> >
> > > > As long as the callers don't give up permanently, that's fine. I
> > > > was
> > > > just a bit concerned by cnrg_init_try_arch{_early}. It would be
> > > > preferable for these to "try harder".
> > >
> > > Yeah, there is some scope for retries there - unfortunately the
> > > arch_get_random_ interface can't distinguish between temporary
> > > and permanent failures, and people won't want to to slow down the
> > > boot path at all by actually blocking.  Looking again there's
> > > some scope for improving this process, we will continue to pull
> > > seed values in crng_reseed() but not in quite the same way.  I'm
> > > now wondering if it'd make sense to hook retries of the
> > > architecture init into or alongside try_to_generate_entropy() in
> > > wait_for_random_bytes() when trust_cpu is set.
> >
> > Could be. I'm away from the code right now, but I would like to avoid
> > having the system behaviour change overall based on whether it happened
> > to hit a failure case at boot or not. For deployments at scale, that
> > sort of "randomness" isn't the sort we want :)
>
> So are your concerns primarily about the system blocking or the
> instructions returning empty-handed while waiting for entropy?
> Especially during the initial seeding and address space randmisation
> at boot time? In my experience the demand for entropy at this
> point (a few hundred bytes, IIRC) are easily handled by any hardware
> implementation (given the above mentioned 60MB/s). And even 100 VMs
> starting at the same time should be serviceable by the hardware.
> So this stalling case is more if one user is actually hammering the
> instructions, which the firmware implementation helps to protect
> against.
>
> Cheers,
> Andre

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-05-31  5:24           ` Ard Biesheuvel
@ 2021-06-02 22:04             ` Benjamin Herrenschmidt
  2021-06-03  0:19               ` Andre Przywara
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2021-06-02 22:04 UTC (permalink / raw)
  To: Ard Biesheuvel, Andre Przywara
  Cc: Mark Brown, Will Deacon, Saidi, Ali, Linux ARM

On Mon, 2021-05-31 at 07:24 +0200, Ard Biesheuvel wrote:
> 
> The SMCCC interface was created not as an alternative/complement to
> RNDR/RNDRRS, but to codify what several partners (including AWS) were
> doing at the time:

Talking of which, any particular reasons not to have (happy to
contribute ours) a /dev/hwrng wrapper around SMCCC ? For various "gov
compliance" kind of thing it's useful in addition to seeding the kernel
CRNG.

Cheers,
Ben.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-06-02 22:04             ` Benjamin Herrenschmidt
@ 2021-06-03  0:19               ` Andre Przywara
  2021-06-03  1:41                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Andre Przywara @ 2021-06-03  0:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Mark Brown, Will Deacon, Saidi, Ali, Linux ARM

On Thu, 03 Jun 2021 08:04:43 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2021-05-31 at 07:24 +0200, Ard Biesheuvel wrote:
> > 
> > The SMCCC interface was created not as an alternative/complement to
> > RNDR/RNDRRS, but to codify what several partners (including AWS) were
> > doing at the time:  
> 
> Talking of which, any particular reasons not to have (happy to
> contribute ours) a /dev/hwrng wrapper around SMCCC ? For various "gov
> compliance" kind of thing it's useful in addition to seeding the kernel
> CRNG.

You mean like this?
https://gitlab.arm.com/linux-arm/linux-ap/-/commit/87e3722f437f9c3f09397e0e9812e6509c94786a
This is not reviewed nor widely tested, but I used it for assessing the
quality of the SMCCC provided numbers on the Juno board using rngtest.
I think one problem was that this opens the SMCCC to userland, so the
entropy could be depleted from there (again under the assumption that
this is really a problem in practice).

I would be interested to hear opinions on this.

Cheers,
Andre

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-06-03  0:19               ` Andre Przywara
@ 2021-06-03  1:41                 ` Benjamin Herrenschmidt
  2021-06-03  7:10                   ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2021-06-03  1:41 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Ard Biesheuvel, Mark Brown, Will Deacon, Saidi, Ali, Linux ARM

On Thu, 2021-06-03 at 01:19 +0100, Andre Przywara wrote:
> 
> You mean like this?
> https://gitlab.arm.com/linux-arm/linux-ap/-/commit/87e3722f437f9c3f09397e0e9812e6509c94786a

Yes. We have a similar one in Amazon Linux which I think Ali submitted
a while back but never went upstream.

> This is not reviewed nor widely tested, but I used it for assessing the
> quality of the SMCCC provided numbers on the Juno board using rngtest.
> I think one problem was that this opens the SMCCC to userland, so the
> entropy could be depleted from there (again under the assumption that
> this is really a problem in practice).

IMHO, userland can always adjust permission to /dev/hwrng if it wishes
to do so...

> I would be interested to hear opinions on this.

The issue is with things like FIPS certification (and other such
horrors) where I believe /dev/random is much harder to deal with since
it mixes multiple entropy sources.

Cheers,
Ben.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-06-03  1:41                 ` Benjamin Herrenschmidt
@ 2021-06-03  7:10                   ` Ard Biesheuvel
  2021-06-03 21:30                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-06-03  7:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andre Przywara, Mark Brown, Will Deacon, Saidi, Ali, Linux ARM

On Thu, 3 Jun 2021 at 03:41, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Thu, 2021-06-03 at 01:19 +0100, Andre Przywara wrote:
> >
> > You mean like this?
> > https://gitlab.arm.com/linux-arm/linux-ap/-/commit/87e3722f437f9c3f09397e0e9812e6509c94786a
>
> Yes. We have a similar one in Amazon Linux which I think Ali submitted
> a while back but never went upstream.
>

I think it is fine to have something like this upstream. At the time,
I asked Andre not to include it, in order to keep the discussion
focused on the SMCCC and arch hook bits. This is all sorted now, so I
think it makes sense to upstream this.

> > This is not reviewed nor widely tested, but I used it for assessing the
> > quality of the SMCCC provided numbers on the Juno board using rngtest.
> > I think one problem was that this opens the SMCCC to userland, so the
> > entropy could be depleted from there (again under the assumption that
> > this is really a problem in practice).
>
> IMHO, userland can always adjust permission to /dev/hwrng if it wishes
> to do so...
>

True. However, the way things are currently set up, the hwrng is used
both either internally (if the entropy estimate is high enough) or via
rngd in user space to read from /dev/hwrng and write it back to
/dev/random. This is kind of pointless in this case, although not
harmful per se

> > I would be interested to hear opinions on this.
>
> The issue is with things like FIPS certification (and other such
> horrors) where I believe /dev/random is much harder to deal with since
> it mixes multiple entropy sources.
>

/dev/random is not an entropy source but a random number generator. I
agree with your characterization of FIPS in the general case, but the
/dev/random kludge we have is not pretty either :-)

Note that NIST SP800-90A/B compliance has similar requirements, i.e.,
if user space wants to seed its own DRBG in user space and comply with
these specs, it needs a compliant entropy source as well. However,
health tests on the entropy source are also mandated, and it is not
clear to me how that would fit into the SMCCC + /dev/hwrng
arrangement.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RNDR/SS vs. SMCCC
  2021-06-03  7:10                   ` Ard Biesheuvel
@ 2021-06-03 21:30                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2021-06-03 21:30 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andre Przywara, Mark Brown, Will Deacon, Saidi, Ali, Linux ARM

On Thu, 2021-06-03 at 09:10 +0200, Ard Biesheuvel wrote:
> 
> True. However, the way things are currently set up, the hwrng is used
> both either internally (if the entropy estimate is high enough) or
> via rngd in user space to read from /dev/hwrng and write it back to
> /dev/random. This is kind of pointless in this case, although not
> harmful per se

Right. For hwrngd, we could add a flag per "source" to indicate not to
bother if we cared enough. For rngd in userspace, not much to do other
than deprecate that thing with newer kernels :-)

> > > I would be interested to hear opinions on this.
> > The issue is with things like FIPS certification (and other such
> > horrors) where I believe /dev/random is much harder to deal with
> > since
> > it mixes multiple entropy sources.
> 
> 
> /dev/random is not an entropy source but a random number generator. I
> agree with your characterization of FIPS in the general case, but the
> /dev/random kludge we have is not pretty either :-)

True :)
> 
> Note that NIST SP800-90A/B compliance has similar requirements, i.e.,
> if user space wants to seed its own DRBG in user space and comply
> with these specs, it needs a compliant entropy source as well.
> However, health tests on the entropy source are also mandated, and it
> is not clear to me how that would fit into the SMCCC + /dev/hwrng
> 
> arrangement.

Yes I'm not sure either. But having /dev/hwrng I think won't hurt
either way.

Cheers,
Ben.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-06-03 21:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26 23:54 RNDR/SS vs. SMCCC Benjamin Herrenschmidt
2021-05-27 12:50 ` Mark Brown
2021-05-27 23:12   ` Benjamin Herrenschmidt
2021-05-28 12:56     ` Mark Brown
2021-05-29  2:36       ` Benjamin Herrenschmidt
2021-05-31  1:02         ` Andre Przywara
2021-05-31  5:24           ` Ard Biesheuvel
2021-06-02 22:04             ` Benjamin Herrenschmidt
2021-06-03  0:19               ` Andre Przywara
2021-06-03  1:41                 ` Benjamin Herrenschmidt
2021-06-03  7:10                   ` Ard Biesheuvel
2021-06-03 21:30                     ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).