All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG/PATCH] kernel RNG and its secrets
@ 2015-03-18  9:53 mancha
  2015-03-18 10:30 ` Daniel Borkmann
  2015-03-18 10:50 ` Hannes Frederic Sowa
  0 siblings, 2 replies; 36+ messages in thread
From: mancha @ 2015-03-18  9:53 UTC (permalink / raw)
  To: tytso, linux-kernel; +Cc: linux-crypto, herbert, dborkman

[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]

Hi.

The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect
memory cleansing against things like dead store optimization:

   void memzero_explicit(void *s, size_t count)
   {
           memset(s, 0, count);
           OPTIMIZER_HIDE_VAR(s);
   }

OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq
against timing analysis, is defined when using gcc as:

   #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))

My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from
optimizing out memset (i.e. secrets remain in memory).

Two things that do work:

   __asm__ __volatile__ ("" : "=r" (var) : "0" (var))

   and

   __asm__ __volatile__("": : :"memory")

The first is OPTIMIZER_HIDE_VAR plus a volatile qualifier and the second
is barrier() [as defined when using gcc].

I propose memzero_explicit use barrier().

--- a/lib/string.c
+++ b/lib/string.c
@@ -616,7 +616,7 @@ EXPORT_SYMBOL(memset);
 void memzero_explicit(void *s, size_t count)
 {
        memset(s, 0, count);
-       OPTIMIZER_HIDE_VAR(s);
+       barrier();
 }
 EXPORT_SYMBOL(memzero_explicit);
 
For any attribution deemed necessary, please use "mancha security".
Please CC me on replies.

--mancha

PS CC'ing Herbert Xu in case this impacts crypto_memneq.

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18  9:53 [BUG/PATCH] kernel RNG and its secrets mancha
@ 2015-03-18 10:30 ` Daniel Borkmann
  2015-03-18 10:50 ` Hannes Frederic Sowa
  1 sibling, 0 replies; 36+ messages in thread
From: Daniel Borkmann @ 2015-03-18 10:30 UTC (permalink / raw)
  To: mancha, tytso, linux-kernel
  Cc: linux-crypto, herbert, Cesar Eduardo Barros, Hannes Frederic Sowa

[ Cc'ing Cesar ]

On 03/18/2015 10:53 AM, mancha wrote:
> Hi.
>
> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect
> memory cleansing against things like dead store optimization:
>
>     void memzero_explicit(void *s, size_t count)
>     {
>             memset(s, 0, count);
>             OPTIMIZER_HIDE_VAR(s);
>     }
>
> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq
> against timing analysis, is defined when using gcc as:
>
>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))
>
> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from
> optimizing out memset (i.e. secrets remain in memory).

Could you elaborate on your test case?

memzero_explicit() is actually an EXPORT_SYMBOL(), are you saying
that gcc removes the call to memzero_explicit() entirely, inlines
it, and then optimizes the memset() eventually away?

Last time I looked, it emitted a call to memzero_explicit(), and
inside memzero_explicit() it did the memset() as it cannot make
any assumption from there. I'm using gcc (GCC) 4.8.3 20140911
(Red Hat 4.8.3-7).

> Two things that do work:
>
>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>
>     and
>
>     __asm__ __volatile__("": : :"memory")
>
> The first is OPTIMIZER_HIDE_VAR plus a volatile qualifier and the second
> is barrier() [as defined when using gcc].
>
> I propose memzero_explicit use barrier().
>
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -616,7 +616,7 @@ EXPORT_SYMBOL(memset);
>   void memzero_explicit(void *s, size_t count)
>   {
>          memset(s, 0, count);
> -       OPTIMIZER_HIDE_VAR(s);
> +       barrier();
>   }
>   EXPORT_SYMBOL(memzero_explicit);
>
> For any attribution deemed necessary, please use "mancha security".
> Please CC me on replies.
>
> --mancha
>
> PS CC'ing Herbert Xu in case this impacts crypto_memneq.
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18  9:53 [BUG/PATCH] kernel RNG and its secrets mancha
  2015-03-18 10:30 ` Daniel Borkmann
@ 2015-03-18 10:50 ` Hannes Frederic Sowa
  2015-03-18 10:56   ` Daniel Borkmann
  1 sibling, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-03-18 10:50 UTC (permalink / raw)
  To: mancha, tytso, linux-kernel; +Cc: linux-crypto, herbert, dborkman



On Wed, Mar 18, 2015, at 10:53, mancha wrote:
> Hi.
> 
> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect
> memory cleansing against things like dead store optimization:
> 
>    void memzero_explicit(void *s, size_t count)
>    {
>            memset(s, 0, count);
>            OPTIMIZER_HIDE_VAR(s);
>    }
> 
> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq
> against timing analysis, is defined when using gcc as:
> 
>    #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))
> 
> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from
> optimizing out memset (i.e. secrets remain in memory).
> 
> Two things that do work:
> 
>    __asm__ __volatile__ ("" : "=r" (var) : "0" (var))

You are correct, volatile signature should be added to
OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
allowed to check if it is needed and may remove the asm statement.
Another option would be to just use var as an input variable - asm
blocks without output variables are always considered being volatile by
gcc.

Can you send a patch?

I don't think it is security critical, as Daniel pointed out, the call
will happen because the function is an external call to the crypto
functions, thus the compiler has to flush memory on return.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 10:50 ` Hannes Frederic Sowa
@ 2015-03-18 10:56   ` Daniel Borkmann
  2015-03-18 11:09     ` Stephan Mueller
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Borkmann @ 2015-03-18 10:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa, mancha, tytso, linux-kernel
  Cc: linux-crypto, herbert, dborkman

On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>
>
> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>> Hi.
>>
>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect
>> memory cleansing against things like dead store optimization:
>>
>>     void memzero_explicit(void *s, size_t count)
>>     {
>>             memset(s, 0, count);
>>             OPTIMIZER_HIDE_VAR(s);
>>     }
>>
>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq
>> against timing analysis, is defined when using gcc as:
>>
>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))
>>
>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from
>> optimizing out memset (i.e. secrets remain in memory).
>>
>> Two things that do work:
>>
>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>
> You are correct, volatile signature should be added to
> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
> allowed to check if it is needed and may remove the asm statement.
> Another option would be to just use var as an input variable - asm
> blocks without output variables are always considered being volatile by
> gcc.
>
> Can you send a patch?
>
> I don't think it is security critical, as Daniel pointed out, the call
> will happen because the function is an external call to the crypto
> functions, thus the compiler has to flush memory on return.

Just had a look.

$ gdb vmlinux
(gdb) disassemble memzero_explicit
Dump of assembler code for function memzero_explicit:
    0xffffffff813a18b0 <+0>:	push   %rbp
    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 <memset>
    0xffffffff813a18be <+14>:	pop    %rbp
    0xffffffff813a18bf <+15>:	retq
End of assembler dump.

(gdb) disassemble extract_entropy
[...]
    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80 <extract_entropy+176>
    0xffffffff814a5009 <+313>:	mov    %r12,%rdi
    0xffffffff814a500c <+316>:	mov    $0xa,%esi
    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0 <memzero_explicit>
    0xffffffff814a5016 <+326>:	mov    -0x48(%rbp),%rax
[...]

I would be fine with __volatile__.

Thanks a lot mancha, could you send a patch?

Best,
Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 10:56   ` Daniel Borkmann
@ 2015-03-18 11:09     ` Stephan Mueller
  2015-03-18 12:02       ` Hannes Frederic Sowa
  2015-04-10 13:25       ` Stephan Mueller
  0 siblings, 2 replies; 36+ messages in thread
From: Stephan Mueller @ 2015-03-18 11:09 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:

Hi Daniel,

>On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>>> Hi.
>>> 
>>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
>>> protect
>>> 
>>> memory cleansing against things like dead store optimization:
>>>     void memzero_explicit(void *s, size_t count)
>>>     {
>>>     
>>>             memset(s, 0, count);
>>>             OPTIMIZER_HIDE_VAR(s);
>>>     
>>>     }
>>> 
>>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
>>> crypto_memneq>> 
>>> against timing analysis, is defined when using gcc as:
>>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0"
>>>     (var))
>>> 
>>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc
>>> from optimizing out memset (i.e. secrets remain in memory).
>>> 
>>> Two things that do work:
>>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>> 
>> You are correct, volatile signature should be added to
>> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
>> allowed to check if it is needed and may remove the asm statement.
>> Another option would be to just use var as an input variable - asm
>> blocks without output variables are always considered being volatile
>> by gcc.
>> 
>> Can you send a patch?
>> 
>> I don't think it is security critical, as Daniel pointed out, the
>> call
>> will happen because the function is an external call to the crypto
>> functions, thus the compiler has to flush memory on return.
>
>Just had a look.
>
>$ gdb vmlinux
>(gdb) disassemble memzero_explicit
>Dump of assembler code for function memzero_explicit:
>    0xffffffff813a18b0 <+0>:	push   %rbp
>    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
>    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
>    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
>    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 <memset>
>    0xffffffff813a18be <+14>:	pop    %rbp
>    0xffffffff813a18bf <+15>:	retq
>End of assembler dump.
>
>(gdb) disassemble extract_entropy
>[...]
>    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
>    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
>    0xffffffff814a500c <+316>:	mov    $0xa,%esi
>    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
><memzero_explicit> 0xffffffff814a5016 <+326>:	mov    -0x48(%rbp),%rax
>[...]
>
>I would be fine with __volatile__.

Are we sure that simply adding a __volatile__ works in any case? I just 
did a test with a simple user space app:

static inline void memset_secure(void *s, int c, size_t n)
{
        memset(s, c, n);
        //__asm__ __volatile__("": : :"memory");
        __asm__ __volatile__("" : "=r" (s) : "0" (s));
}

int main(int argc, char *argv[])
{
#define BUFLEN 20
        char buf[BUFLEN];

        snprintf(buf, (BUFLEN - 1), "teststring\n");
        printf("%s", buf);

        memset_secure(buf, 0, BUFLEN);
}

When using the discussed code of __asm__ __volatile__("" : "=r" (s) : 
"0" (s));  I do not find the code implementing memset(0) in objdump. 
Only when I enable the memory barrier, I see the following (when 
compiling with -O2):

objdump -d memset_secure:
...
0000000000400440 <main>:
...
  400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
  400470:       00 
  400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
  400478:       00 00 
  40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
  400481:       00
...

>
>Thanks a lot mancha, could you send a patch?
>
>Best,
>Daniel
>--
>To unsubscribe from this list: send the line "unsubscribe linux-crypto"
>in the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 11:09     ` Stephan Mueller
@ 2015-03-18 12:02       ` Hannes Frederic Sowa
  2015-03-18 12:14         ` Stephan Mueller
  2015-03-18 12:58         ` mancha
  2015-04-10 13:25       ` Stephan Mueller
  1 sibling, 2 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-03-18 12:02 UTC (permalink / raw)
  To: Stephan Mueller, Daniel Borkmann
  Cc: mancha, tytso, linux-kernel, linux-crypto, herbert, dborkman

On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
> Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
> >On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
> >> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
> >>> Hi.
> >>> 
> >>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
> >>> protect
> >>> 
> >>> memory cleansing against things like dead store optimization:
> >>>     void memzero_explicit(void *s, size_t count)
> >>>     {
> >>>     
> >>>             memset(s, 0, count);
> >>>             OPTIMIZER_HIDE_VAR(s);
> >>>     
> >>>     }
> >>> 
> >>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
> >>> crypto_memneq>> 
> >>> against timing analysis, is defined when using gcc as:
> >>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0"
> >>>     (var))
> >>> 
> >>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc
> >>> from optimizing out memset (i.e. secrets remain in memory).
> >>> 
> >>> Two things that do work:
> >>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
> >> 
> >> You are correct, volatile signature should be added to
> >> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
> >> allowed to check if it is needed and may remove the asm statement.
> >> Another option would be to just use var as an input variable - asm
> >> blocks without output variables are always considered being volatile
> >> by gcc.
> >> 
> >> Can you send a patch?
> >> 
> >> I don't think it is security critical, as Daniel pointed out, the
> >> call
> >> will happen because the function is an external call to the crypto
> >> functions, thus the compiler has to flush memory on return.
> >
> >Just had a look.
> >
> >$ gdb vmlinux
> >(gdb) disassemble memzero_explicit
> >Dump of assembler code for function memzero_explicit:
> >    0xffffffff813a18b0 <+0>:	push   %rbp
> >    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
> >    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
> >    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
> >    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 <memset>
> >    0xffffffff813a18be <+14>:	pop    %rbp
> >    0xffffffff813a18bf <+15>:	retq
> >End of assembler dump.
> >
> >(gdb) disassemble extract_entropy
> >[...]
> >    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
> >    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
> ><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
> >    0xffffffff814a500c <+316>:	mov    $0xa,%esi
> >    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
> ><memzero_explicit> 0xffffffff814a5016 <+326>:	mov    -0x48(%rbp),%rax
> >[...]
> >
> >I would be fine with __volatile__.
> 
> Are we sure that simply adding a __volatile__ works in any case? I just 
> did a test with a simple user space app:
> 
> static inline void memset_secure(void *s, int c, size_t n)
> {
>         memset(s, c, n);
>         //__asm__ __volatile__("": : :"memory");
>         __asm__ __volatile__("" : "=r" (s) : "0" (s));
> }
> 

Good point, thanks!

Of course an input or output of s does not force the memory pointed to
by s being flushed.


My proposal would be to add a

#define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" : : "m"(
({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )

and use this in the code function.

This is documented in gcc manual 6.43.2.5.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 12:02       ` Hannes Frederic Sowa
@ 2015-03-18 12:14         ` Stephan Mueller
  2015-03-18 12:19           ` Hannes Frederic Sowa
  2015-03-18 12:58         ` mancha
  1 sibling, 1 reply; 36+ messages in thread
From: Stephan Mueller @ 2015-03-18 12:14 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Mittwoch, 18. März 2015, 13:02:12 schrieb Hannes Frederic Sowa:

Hi Hannes,

>On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
>> Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
>> >On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>> >> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>> >>> Hi.
>> >>> 
>> >>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
>> >>> protect
>> >>> 
>> >>> memory cleansing against things like dead store optimization:
>> >>>     void memzero_explicit(void *s, size_t count)
>> >>>     {
>> >>>     
>> >>>             memset(s, 0, count);
>> >>>             OPTIMIZER_HIDE_VAR(s);
>> >>>     
>> >>>     }
>> >>> 
>> >>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
>> >>> crypto_memneq>>
>> >>> 
>> >>> against timing analysis, is defined when using gcc as:
>> >>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) :
>> >>>     "0"
>> >>>     (var))
>> >>> 
>> >>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent
>> >>> gcc
>> >>> from optimizing out memset (i.e. secrets remain in memory).
>> >>> 
>> >>> Two things that do work:
>> >>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>> >> 
>> >> You are correct, volatile signature should be added to
>> >> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
>> >> allowed to check if it is needed and may remove the asm statement.
>> >> Another option would be to just use var as an input variable - asm
>> >> blocks without output variables are always considered being
>> >> volatile
>> >> by gcc.
>> >> 
>> >> Can you send a patch?
>> >> 
>> >> I don't think it is security critical, as Daniel pointed out, the
>> >> call
>> >> will happen because the function is an external call to the crypto
>> >> functions, thus the compiler has to flush memory on return.
>> >
>> >Just had a look.
>> >
>> >$ gdb vmlinux
>> >(gdb) disassemble memzero_explicit
>> >
>> >Dump of assembler code for function memzero_explicit:
>> >    0xffffffff813a18b0 <+0>:	push   %rbp
>> >    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
>> >    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
>> >    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
>> >    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 
<memset>
>> >    0xffffffff813a18be <+14>:	pop    %rbp
>> >    0xffffffff813a18bf <+15>:	retq
>> >
>> >End of assembler dump.
>> >
>> >(gdb) disassemble extract_entropy
>> >[...]
>> >
>> >    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
>> >    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
>> >
>> ><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
>> >
>> >    0xffffffff814a500c <+316>:	mov    $0xa,%esi
>> >    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
>> >
>> ><memzero_explicit> 0xffffffff814a5016 <+326>:	mov   
>> >-0x48(%rbp),%rax
>> >[...]
>> >
>> >I would be fine with __volatile__.
>> 
>> Are we sure that simply adding a __volatile__ works in any case? I
>> just did a test with a simple user space app:
>> 
>> static inline void memset_secure(void *s, int c, size_t n)
>> {
>> 
>>         memset(s, c, n);
>>         //__asm__ __volatile__("": : :"memory");
>>         __asm__ __volatile__("" : "=r" (s) : "0" (s));
>> 
>> }
>
>Good point, thanks!
>
>Of course an input or output of s does not force the memory pointed to
>by s being flushed.
>
>
>My proposal would be to add a
>
>#define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" : : "m"(
>({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
>
>and use this in the code function.
>
>This is documented in gcc manual 6.43.2.5.

That one adds the zeroization instructuctions. But now there are much 
more than with the barrier.

  400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
  400470:       00 
  400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
  400478:       00 00 
  40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
  400481:       00 
  400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
  400489:       00 00 
  40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
  400492:       00 00 
  400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
  40049b:       00

Any ideas?
>
>Bye,
>Hannes


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 12:14         ` Stephan Mueller
@ 2015-03-18 12:19           ` Hannes Frederic Sowa
  2015-03-18 12:20             ` Stephan Mueller
  0 siblings, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-03-18 12:19 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman



On Wed, Mar 18, 2015, at 13:14, Stephan Mueller wrote:
> Am Mittwoch, 18. März 2015, 13:02:12 schrieb Hannes Frederic Sowa:
> 
> Hi Hannes,
> 
> >On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
> >> Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
> >> >On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
> >> >> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
> >> >>> Hi.
> >> >>> 
> >> >>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
> >> >>> protect
> >> >>> 
> >> >>> memory cleansing against things like dead store optimization:
> >> >>>     void memzero_explicit(void *s, size_t count)
> >> >>>     {
> >> >>>     
> >> >>>             memset(s, 0, count);
> >> >>>             OPTIMIZER_HIDE_VAR(s);
> >> >>>     
> >> >>>     }
> >> >>> 
> >> >>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
> >> >>> crypto_memneq>>
> >> >>> 
> >> >>> against timing analysis, is defined when using gcc as:
> >> >>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) :
> >> >>>     "0"
> >> >>>     (var))
> >> >>> 
> >> >>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent
> >> >>> gcc
> >> >>> from optimizing out memset (i.e. secrets remain in memory).
> >> >>> 
> >> >>> Two things that do work:
> >> >>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
> >> >> 
> >> >> You are correct, volatile signature should be added to
> >> >> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
> >> >> allowed to check if it is needed and may remove the asm statement.
> >> >> Another option would be to just use var as an input variable - asm
> >> >> blocks without output variables are always considered being
> >> >> volatile
> >> >> by gcc.
> >> >> 
> >> >> Can you send a patch?
> >> >> 
> >> >> I don't think it is security critical, as Daniel pointed out, the
> >> >> call
> >> >> will happen because the function is an external call to the crypto
> >> >> functions, thus the compiler has to flush memory on return.
> >> >
> >> >Just had a look.
> >> >
> >> >$ gdb vmlinux
> >> >(gdb) disassemble memzero_explicit
> >> >
> >> >Dump of assembler code for function memzero_explicit:
> >> >    0xffffffff813a18b0 <+0>:	push   %rbp
> >> >    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
> >> >    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
> >> >    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
> >> >    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 
> <memset>
> >> >    0xffffffff813a18be <+14>:	pop    %rbp
> >> >    0xffffffff813a18bf <+15>:	retq
> >> >
> >> >End of assembler dump.
> >> >
> >> >(gdb) disassemble extract_entropy
> >> >[...]
> >> >
> >> >    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
> >> >    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
> >> >
> >> ><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
> >> >
> >> >    0xffffffff814a500c <+316>:	mov    $0xa,%esi
> >> >    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
> >> >
> >> ><memzero_explicit> 0xffffffff814a5016 <+326>:	mov   
> >> >-0x48(%rbp),%rax
> >> >[...]
> >> >
> >> >I would be fine with __volatile__.
> >> 
> >> Are we sure that simply adding a __volatile__ works in any case? I
> >> just did a test with a simple user space app:
> >> 
> >> static inline void memset_secure(void *s, int c, size_t n)
> >> {
> >> 
> >>         memset(s, c, n);
> >>         //__asm__ __volatile__("": : :"memory");
> >>         __asm__ __volatile__("" : "=r" (s) : "0" (s));
> >> 
> >> }
> >
> >Good point, thanks!
> >
> >Of course an input or output of s does not force the memory pointed to
> >by s being flushed.
> >
> >
> >My proposal would be to add a
> >
> >#define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" : : "m"(
> >({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
> >
> >and use this in the code function.
> >
> >This is documented in gcc manual 6.43.2.5.
> 
> That one adds the zeroization instructuctions. But now there are much 
> more than with the barrier.
> 
>   400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
>   400470:       00 
>   400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
>   400478:       00 00 
>   40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
>   400481:       00 
>   400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
>   400489:       00 00 
>   40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
>   400492:       00 00 
>   400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
>   40049b:       00
> 
> Any ideas?

Hmm, correct definition of u8?

Which version of gcc do you use? I can't see any difference if I compile
your example at -O2.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 12:19           ` Hannes Frederic Sowa
@ 2015-03-18 12:20             ` Stephan Mueller
  2015-03-18 12:42               ` Daniel Borkmann
  0 siblings, 1 reply; 36+ messages in thread
From: Stephan Mueller @ 2015-03-18 12:20 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa:

Hi Hannes,

>On Wed, Mar 18, 2015, at 13:14, Stephan Mueller wrote:
>> Am Mittwoch, 18. März 2015, 13:02:12 schrieb Hannes Frederic Sowa:
>> 
>> Hi Hannes,
>> 
>> >On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
>> >> Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
>> >> >On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>> >> >> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>> >> >>> Hi.
>> >> >>> 
>> >> >>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
>> >> >>> protect
>> >> >>> 
>> >> >>> memory cleansing against things like dead store optimization:
>> >> >>>     void memzero_explicit(void *s, size_t count)
>> >> >>>     {
>> >> >>>     
>> >> >>>             memset(s, 0, count);
>> >> >>>             OPTIMIZER_HIDE_VAR(s);
>> >> >>>     
>> >> >>>     }
>> >> >>> 
>> >> >>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
>> >> >>> crypto_memneq>>
>> >> >>> 
>> >> >>> against timing analysis, is defined when using gcc as:
>> >> >>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) :
>> >> >>>     "0"
>> >> >>>     (var))
>> >> >>> 
>> >> >>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent
>> >> >>> gcc
>> >> >>> from optimizing out memset (i.e. secrets remain in memory).
>> >> >>> 
>> >> >>> Two things that do work:
>> >> >>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>> >> >> 
>> >> >> You are correct, volatile signature should be added to
>> >> >> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc
>> >> >> is
>> >> >> allowed to check if it is needed and may remove the asm
>> >> >> statement.
>> >> >> Another option would be to just use var as an input variable -
>> >> >> asm
>> >> >> blocks without output variables are always considered being
>> >> >> volatile
>> >> >> by gcc.
>> >> >> 
>> >> >> Can you send a patch?
>> >> >> 
>> >> >> I don't think it is security critical, as Daniel pointed out,
>> >> >> the
>> >> >> call
>> >> >> will happen because the function is an external call to the
>> >> >> crypto
>> >> >> functions, thus the compiler has to flush memory on return.
>> >> >
>> >> >Just had a look.
>> >> >
>> >> >$ gdb vmlinux
>> >> >(gdb) disassemble memzero_explicit
>> >> >
>> >> >Dump of assembler code for function memzero_explicit:
>> >> >    0xffffffff813a18b0 <+0>:	push   %rbp
>> >> >    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
>> >> >    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
>> >> >    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
>> >> >    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120
>> 
>> <memset>
>> 
>> >> >    0xffffffff813a18be <+14>:	pop    %rbp
>> >> >    0xffffffff813a18bf <+15>:	retq
>> >> >
>> >> >End of assembler dump.
>> >> >
>> >> >(gdb) disassemble extract_entropy
>> >> >[...]
>> >> >
>> >> >    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
>> >> >    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
>> >> >
>> >> ><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
>> >> >
>> >> >    0xffffffff814a500c <+316>:	mov    $0xa,%esi
>> >> >    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
>> >> >
>> >> ><memzero_explicit> 0xffffffff814a5016 <+326>:	mov
>> >> >-0x48(%rbp),%rax
>> >> >[...]
>> >> >
>> >> >I would be fine with __volatile__.
>> >> 
>> >> Are we sure that simply adding a __volatile__ works in any case? I
>> >> just did a test with a simple user space app:
>> >> 
>> >> static inline void memset_secure(void *s, int c, size_t n)
>> >> {
>> >> 
>> >>         memset(s, c, n);
>> >>         //__asm__ __volatile__("": : :"memory");
>> >>         __asm__ __volatile__("" : "=r" (s) : "0" (s));
>> >> 
>> >> }
>> >
>> >Good point, thanks!
>> >
>> >Of course an input or output of s does not force the memory pointed
>> >to
>> >by s being flushed.
>> >
>> >
>> >My proposal would be to add a
>> >
>> >#define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" : :
>> >"m"(
>> >({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
>> >
>> >and use this in the code function.
>> >
>> >This is documented in gcc manual 6.43.2.5.
>> 
>> That one adds the zeroization instructuctions. But now there are much
>> more than with the barrier.
>> 
>>   400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
>>   400470:       00
>>   400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
>>   400478:       00 00
>>   40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
>>   400481:       00
>>   400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
>>   400489:       00 00
>>   40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
>>   400492:       00 00
>>   400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
>>   40049b:       00
>> 
>> Any ideas?
>
>Hmm, correct definition of u8?

I use unsigned char
>
>Which version of gcc do you use? I can't see any difference if I
>compile your example at -O2.

gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)
>
>Bye,
>Hannes


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 12:20             ` Stephan Mueller
@ 2015-03-18 12:42               ` Daniel Borkmann
  2015-03-18 15:09                 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Borkmann @ 2015-03-18 12:42 UTC (permalink / raw)
  To: Stephan Mueller, Hannes Frederic Sowa
  Cc: mancha, tytso, linux-kernel, linux-crypto, herbert, dborkman

On 03/18/2015 01:20 PM, Stephan Mueller wrote:
> Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa:
>
> Hi Hannes,
>
>> On Wed, Mar 18, 2015, at 13:14, Stephan Mueller wrote:
>>> Am Mittwoch, 18. März 2015, 13:02:12 schrieb Hannes Frederic Sowa:
>>>
>>> Hi Hannes,
>>>
>>>> On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
>>>>> Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
>>>>>> On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>>>>>>> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
>>>>>>>> protect
>>>>>>>>
>>>>>>>> memory cleansing against things like dead store optimization:
>>>>>>>>      void memzero_explicit(void *s, size_t count)
>>>>>>>>      {
>>>>>>>>
>>>>>>>>              memset(s, 0, count);
>>>>>>>>              OPTIMIZER_HIDE_VAR(s);
>>>>>>>>
>>>>>>>>      }
>>>>>>>>
>>>>>>>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
>>>>>>>> crypto_memneq>>
>>>>>>>>
>>>>>>>> against timing analysis, is defined when using gcc as:
>>>>>>>>      #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) :
>>>>>>>>      "0"
>>>>>>>>      (var))
>>>>>>>>
>>>>>>>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent
>>>>>>>> gcc
>>>>>>>> from optimizing out memset (i.e. secrets remain in memory).
>>>>>>>>
>>>>>>>> Two things that do work:
>>>>>>>>      __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>>>>>>>
>>>>>>> You are correct, volatile signature should be added to
>>>>>>> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc
>>>>>>> is
>>>>>>> allowed to check if it is needed and may remove the asm
>>>>>>> statement.
>>>>>>> Another option would be to just use var as an input variable -
>>>>>>> asm
>>>>>>> blocks without output variables are always considered being
>>>>>>> volatile
>>>>>>> by gcc.
>>>>>>>
>>>>>>> Can you send a patch?
>>>>>>>
>>>>>>> I don't think it is security critical, as Daniel pointed out,
>>>>>>> the
>>>>>>> call
>>>>>>> will happen because the function is an external call to the
>>>>>>> crypto
>>>>>>> functions, thus the compiler has to flush memory on return.
>>>>>>
>>>>>> Just had a look.
>>>>>>
>>>>>> $ gdb vmlinux
>>>>>> (gdb) disassemble memzero_explicit
>>>>>>
>>>>>> Dump of assembler code for function memzero_explicit:
>>>>>>     0xffffffff813a18b0 <+0>:	push   %rbp
>>>>>>     0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
>>>>>>     0xffffffff813a18b4 <+4>:	xor    %esi,%esi
>>>>>>     0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
>>>>>>     0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120
>>>
>>> <memset>
>>>
>>>>>>     0xffffffff813a18be <+14>:	pop    %rbp
>>>>>>     0xffffffff813a18bf <+15>:	retq
>>>>>>
>>>>>> End of assembler dump.
>>>>>>
>>>>>> (gdb) disassemble extract_entropy
>>>>>> [...]
>>>>>>
>>>>>>     0xffffffff814a5000 <+304>:	sub    %r15,%rbx
>>>>>>     0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
>>>>>>
>>>>>> <extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
>>>>>>
>>>>>>     0xffffffff814a500c <+316>:	mov    $0xa,%esi
>>>>>>     0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
>>>>>>
>>>>>> <memzero_explicit> 0xffffffff814a5016 <+326>:	mov
>>>>>> -0x48(%rbp),%rax
>>>>>> [...]
>>>>>>
>>>>>> I would be fine with __volatile__.
>>>>>
>>>>> Are we sure that simply adding a __volatile__ works in any case? I
>>>>> just did a test with a simple user space app:
>>>>>
>>>>> static inline void memset_secure(void *s, int c, size_t n)
>>>>> {
>>>>>
>>>>>          memset(s, c, n);
>>>>>          //__asm__ __volatile__("": : :"memory");
>>>>>          __asm__ __volatile__("" : "=r" (s) : "0" (s));
>>>>>
>>>>> }
>>>>
>>>> Good point, thanks!
>>>>
>>>> Of course an input or output of s does not force the memory pointed
>>>> to
>>>> by s being flushed.
>>>>
>>>>
>>>> My proposal would be to add a
>>>>
>>>> #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" : :
>>>> "m"(
>>>> ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
>>>>
>>>> and use this in the code function.
>>>>
>>>> This is documented in gcc manual 6.43.2.5.
>>>
>>> That one adds the zeroization instructuctions. But now there are much
>>> more than with the barrier.
>>>
>>>    400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
>>>    400470:       00
>>>    400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
>>>    400478:       00 00
>>>    40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
>>>    400481:       00
>>>    400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
>>>    400489:       00 00
>>>    40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
>>>    400492:       00 00
>>>    400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
>>>    40049b:       00
>>>
>>> Any ideas?
>>
>> Hmm, correct definition of u8?
>
> I use unsigned char
>>
>> Which version of gcc do you use? I can't see any difference if I
>> compile your example at -O2.
>
> gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)

I can see the same with the gcc version I previously posted. So
it clears the 20 bytes from your example (movq, movq, movl) at
two locations, presumably buf[] and b[].

Best,
Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 12:02       ` Hannes Frederic Sowa
  2015-03-18 12:14         ` Stephan Mueller
@ 2015-03-18 12:58         ` mancha
  1 sibling, 0 replies; 36+ messages in thread
From: mancha @ 2015-03-18 12:58 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Stephan Mueller, Daniel Borkmann, tytso, linux-kernel,
	linux-crypto, herbert, dborkman

[-- Attachment #1: Type: text/plain, Size: 3907 bytes --]

On Wed, Mar 18, 2015 at 01:02:12PM +0100, Hannes Frederic Sowa wrote:
> On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
> > Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
> > >On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
> > >> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
> > >>> Hi.
> > >>> 
> > >>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
> > >>> protect
> > >>> 
> > >>> memory cleansing against things like dead store optimization:
> > >>>     void memzero_explicit(void *s, size_t count)
> > >>>     {
> > >>>     
> > >>>             memset(s, 0, count);
> > >>>             OPTIMIZER_HIDE_VAR(s);
> > >>>     
> > >>>     }
> > >>> 
> > >>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
> > >>> crypto_memneq>> 
> > >>> against timing analysis, is defined when using gcc as:
> > >>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0"
> > >>>     (var))
> > >>> 
> > >>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc
> > >>> from optimizing out memset (i.e. secrets remain in memory).
> > >>> 
> > >>> Two things that do work:
> > >>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
> > >> 
> > >> You are correct, volatile signature should be added to
> > >> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
> > >> allowed to check if it is needed and may remove the asm statement.
> > >> Another option would be to just use var as an input variable - asm
> > >> blocks without output variables are always considered being volatile
> > >> by gcc.
> > >> 
> > >> Can you send a patch?
> > >> 
> > >> I don't think it is security critical, as Daniel pointed out, the
> > >> call
> > >> will happen because the function is an external call to the crypto
> > >> functions, thus the compiler has to flush memory on return.
> > >
> > >Just had a look.
> > >
> > >$ gdb vmlinux
> > >(gdb) disassemble memzero_explicit
> > >Dump of assembler code for function memzero_explicit:
> > >    0xffffffff813a18b0 <+0>:	push   %rbp
> > >    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
> > >    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
> > >    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
> > >    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 <memset>
> > >    0xffffffff813a18be <+14>:	pop    %rbp
> > >    0xffffffff813a18bf <+15>:	retq
> > >End of assembler dump.
> > >
> > >(gdb) disassemble extract_entropy
> > >[...]
> > >    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
> > >    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
> > ><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
> > >    0xffffffff814a500c <+316>:	mov    $0xa,%esi
> > >    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
> > ><memzero_explicit> 0xffffffff814a5016 <+326>:	mov    -0x48(%rbp),%rax
> > >[...]
> > >
> > >I would be fine with __volatile__.
> > 
> > Are we sure that simply adding a __volatile__ works in any case? I just 
> > did a test with a simple user space app:
> > 
> > static inline void memset_secure(void *s, int c, size_t n)
> > {
> >         memset(s, c, n);
> >         //__asm__ __volatile__("": : :"memory");
> >         __asm__ __volatile__("" : "=r" (s) : "0" (s));
> > }
> > 
> 
> Good point, thanks!
> 
> Of course an input or output of s does not force the memory pointed to
> by s being flushed.
> 
> 
> My proposal would be to add a
> 
> #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" : : "m"(
> ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
> 
> and use this in the code function.
> 
> This is documented in gcc manual 6.43.2.5.
> 
> Bye,
> Hannes
> 

Hi all.

Any reason to not use __asm__ __volatile__("": : :"memory") [aka 
barrier()]?

Or maybe __asm__ __volatile__("": :"r"(ptr) :"memory").

Cheers.

--mancha

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 12:42               ` Daniel Borkmann
@ 2015-03-18 15:09                 ` Hannes Frederic Sowa
  2015-03-18 16:02                   ` Stephan Mueller
  2015-03-18 17:41                   ` Theodore Ts'o
  0 siblings, 2 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-03-18 15:09 UTC (permalink / raw)
  To: Daniel Borkmann, Stephan Mueller
  Cc: mancha, tytso, linux-kernel, linux-crypto, herbert, dborkman



On Wed, Mar 18, 2015, at 13:42, Daniel Borkmann wrote:
> On 03/18/2015 01:20 PM, Stephan Mueller wrote:
> > Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa:
> >>>> My proposal would be to add a
> >>>>
> >>>> #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" : :
> >>>> "m"(
> >>>> ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
> >>>>
> >>>> and use this in the code function.
> >>>>
> >>>> This is documented in gcc manual 6.43.2.5.
> >>>
> >>> That one adds the zeroization instructuctions. But now there are much
> >>> more than with the barrier.
> >>>
> >>>    400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
> >>>    400470:       00
> >>>    400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
> >>>    400478:       00 00
> >>>    40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
> >>>    400481:       00
> >>>    400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
> >>>    400489:       00 00
> >>>    40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
> >>>    400492:       00 00
> >>>    400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
> >>>    40049b:       00
> >>>
> >>> Any ideas?
> >>
> >> Hmm, correct definition of u8?
> >
> > I use unsigned char
> >>
> >> Which version of gcc do you use? I can't see any difference if I
> >> compile your example at -O2.
> >
> > gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)

Well, was an error on my side, I see the same behavior.

> 
> I can see the same with the gcc version I previously posted. So
> it clears the 20 bytes from your example (movq, movq, movl) at
> two locations, presumably buf[] and b[].

Yes, it looks like that. The reservation on the stack changes, too.

Seems like just using barrier() is the best and easiest option.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 15:09                 ` Hannes Frederic Sowa
@ 2015-03-18 16:02                   ` Stephan Mueller
  2015-03-18 17:14                     ` mancha
  2015-03-18 17:41                   ` Theodore Ts'o
  1 sibling, 1 reply; 36+ messages in thread
From: Stephan Mueller @ 2015-03-18 16:02 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Mittwoch, 18. März 2015, 16:09:34 schrieb Hannes Frederic Sowa:

Hi Hannes,

>On Wed, Mar 18, 2015, at 13:42, Daniel Borkmann wrote:
>> On 03/18/2015 01:20 PM, Stephan Mueller wrote:
>> > Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa:
>> >>>> My proposal would be to add a
>> >>>> 
>> >>>> #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" :
>> >>>> :
>> >>>> "m"(
>> >>>> ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
>> >>>> 
>> >>>> and use this in the code function.
>> >>>> 
>> >>>> This is documented in gcc manual 6.43.2.5.
>> >>> 
>> >>> That one adds the zeroization instructuctions. But now there are
>> >>> much
>> >>> more than with the barrier.
>> >>> 
>> >>>    400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
>> >>>    400470:       00
>> >>>    400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
>> >>>    400478:       00 00
>> >>>    40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
>> >>>    400481:       00
>> >>>    400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
>> >>>    400489:       00 00
>> >>>    40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
>> >>>    400492:       00 00
>> >>>    400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
>> >>>    40049b:       00
>> >>> 
>> >>> Any ideas?
>> >> 
>> >> Hmm, correct definition of u8?
>> > 
>> > I use unsigned char
>> > 
>> >> Which version of gcc do you use? I can't see any difference if I
>> >> compile your example at -O2.
>> > 
>> > gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)
>
>Well, was an error on my side, I see the same behavior.
>
>> I can see the same with the gcc version I previously posted. So
>> it clears the 20 bytes from your example (movq, movq, movl) at
>> two locations, presumably buf[] and b[].
>
>Yes, it looks like that. The reservation on the stack changes, too.
>
>Seems like just using barrier() is the best and easiest option.

Would you prepare a patch for that?
>
>Thanks,
>Hannes


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 16:02                   ` Stephan Mueller
@ 2015-03-18 17:14                     ` mancha
  2015-03-18 17:49                       ` Daniel Borkmann
  2015-03-18 23:53                       ` Cesar Eduardo Barros
  0 siblings, 2 replies; 36+ messages in thread
From: mancha @ 2015-03-18 17:14 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Hannes Frederic Sowa, Daniel Borkmann, tytso, linux-kernel,
	linux-crypto, herbert, dborkman, cesarb


[-- Attachment #1.1: Type: text/plain, Size: 2740 bytes --]

On Wed, Mar 18, 2015 at 05:02:01PM +0100, Stephan Mueller wrote:
> Am Mittwoch, 18. März 2015, 16:09:34 schrieb Hannes Frederic Sowa:
> 
> Hi Hannes,
> 
> >On Wed, Mar 18, 2015, at 13:42, Daniel Borkmann wrote:
> >> On 03/18/2015 01:20 PM, Stephan Mueller wrote:
> >> > Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa:
> >> >>>> My proposal would be to add a
> >> >>>> 
> >> >>>> #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" :
> >> >>>> :
> >> >>>> "m"(
> >> >>>> ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
> >> >>>> 
> >> >>>> and use this in the code function.
> >> >>>> 
> >> >>>> This is documented in gcc manual 6.43.2.5.
> >> >>> 
> >> >>> That one adds the zeroization instructuctions. But now there are
> >> >>> much
> >> >>> more than with the barrier.
> >> >>> 
> >> >>>    400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
> >> >>>    400470:       00
> >> >>>    400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
> >> >>>    400478:       00 00
> >> >>>    40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
> >> >>>    400481:       00
> >> >>>    400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
> >> >>>    400489:       00 00
> >> >>>    40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
> >> >>>    400492:       00 00
> >> >>>    400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
> >> >>>    40049b:       00
> >> >>> 
> >> >>> Any ideas?
> >> >> 
> >> >> Hmm, correct definition of u8?
> >> > 
> >> > I use unsigned char
> >> > 
> >> >> Which version of gcc do you use? I can't see any difference if I
> >> >> compile your example at -O2.
> >> > 
> >> > gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)
> >
> >Well, was an error on my side, I see the same behavior.
> >
> >> I can see the same with the gcc version I previously posted. So
> >> it clears the 20 bytes from your example (movq, movq, movl) at
> >> two locations, presumably buf[] and b[].
> >
> >Yes, it looks like that. The reservation on the stack changes, too.
> >
> >Seems like just using barrier() is the best and easiest option.
> 
> Would you prepare a patch for that?
> >
> >Thanks,
> >Hannes
> 
> 
> Ciao
> Stephan

Hi.

Patch 0001 fixes the dead store issue in memzero_explicit().

However, if the idea is to use barrier() instead of OPTIMIZER_HIDE_VAR()
in crypto_memneq() as well, then patch 0002 is the one to use. Please
review and keep in mind my analysis was limited to memzero_explicit().

Cesar, were there reasons you didn't use the gcc version of barrier()
for crypto_memneq()?

Please let me know if I can be of any further assistance.

--mancha

[-- Attachment #1.2: 0001-memzero_explicit.patch --]
[-- Type: text/plain, Size: 754 bytes --]

From cd9e882cd1d684f50c52471d83f9ecf55427c360 Mon Sep 17 00:00:00 2001
From: mancha security <mancha1@zoho.com>
Date: Wed, 18 Mar 2015 16:14:34 +0000
Subject: [PATCH] lib/string.c: use barrier() to protect memzero_explicit()

OPTIMIZER_HIDE_VAR(), as defined when using gcc, is insufficient
to ensure protection from dead store optimization.
---
 lib/string.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/string.c b/lib/string.c
index ce81aae..a579201 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset);
 void memzero_explicit(void *s, size_t count)
 {
 	memset(s, 0, count);
-	OPTIMIZER_HIDE_VAR(s);
+	barrier();
 }
 EXPORT_SYMBOL(memzero_explicit);
 
-- 
2.1.4


[-- Attachment #1.3: 0002-OPTIMIZER_HIDE_VAR_purge.patch --]
[-- Type: text/plain, Size: 6288 bytes --]

From bca436d73ee5388be26488e3d2decdad1c6ba322 Mon Sep 17 00:00:00 2001
From: mancha security <mancha1@zoho.com>
Date: Wed, 18 Mar 2015 16:26:11 +0000
Subject: [PATCH] crypto: cyrpto_memneq and memzero_explicit

OPTIMIZER_HIDE_VAR(), as defined for gcc, is insufficient to protect
against certain compiler optimizations. Use barrier() instead.
---
 crypto/memneq.c                | 48 +++++++++++++++++++++---------------------
 include/linux/compiler-gcc.h   |  3 ---
 include/linux/compiler-intel.h |  7 ------
 include/linux/compiler.h       |  4 ----
 lib/string.c                   |  2 +-
 5 files changed, 25 insertions(+), 39 deletions(-)

diff --git a/crypto/memneq.c b/crypto/memneq.c
index afed1bd..efa7750 100644
--- a/crypto/memneq.c
+++ b/crypto/memneq.c
@@ -72,7 +72,7 @@ __crypto_memneq_generic(const void *a, const void *b, size_t size)
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
 	while (size >= sizeof(unsigned long)) {
 		neq |= *(unsigned long *)a ^ *(unsigned long *)b;
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		a += sizeof(unsigned long);
 		b += sizeof(unsigned long);
 		size -= sizeof(unsigned long);
@@ -80,7 +80,7 @@ __crypto_memneq_generic(const void *a, const void *b, size_t size)
 #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
 	while (size > 0) {
 		neq |= *(unsigned char *)a ^ *(unsigned char *)b;
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		a += 1;
 		b += 1;
 		size -= 1;
@@ -96,53 +96,53 @@ static inline unsigned long __crypto_memneq_16(const void *a, const void *b)
 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 	if (sizeof(unsigned long) == 8) {
 		neq |= *(unsigned long *)(a)   ^ *(unsigned long *)(b);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned long *)(a+8) ^ *(unsigned long *)(b+8);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 	} else if (sizeof(unsigned int) == 4) {
 		neq |= *(unsigned int *)(a)    ^ *(unsigned int *)(b);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned int *)(a+4)  ^ *(unsigned int *)(b+4);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned int *)(a+8)  ^ *(unsigned int *)(b+8);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned int *)(a+12) ^ *(unsigned int *)(b+12);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 	} else
 #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
 	{
 		neq |= *(unsigned char *)(a)    ^ *(unsigned char *)(b);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+1)  ^ *(unsigned char *)(b+1);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+2)  ^ *(unsigned char *)(b+2);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+3)  ^ *(unsigned char *)(b+3);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+4)  ^ *(unsigned char *)(b+4);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+5)  ^ *(unsigned char *)(b+5);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+6)  ^ *(unsigned char *)(b+6);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+7)  ^ *(unsigned char *)(b+7);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+8)  ^ *(unsigned char *)(b+8);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+9)  ^ *(unsigned char *)(b+9);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+10) ^ *(unsigned char *)(b+10);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+11) ^ *(unsigned char *)(b+11);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+12) ^ *(unsigned char *)(b+12);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+13) ^ *(unsigned char *)(b+13);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+14) ^ *(unsigned char *)(b+14);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 		neq |= *(unsigned char *)(a+15) ^ *(unsigned char *)(b+15);
-		OPTIMIZER_HIDE_VAR(neq);
+		barrier();
 	}
 
 	return neq;
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index cdf13ca..d4c15f0 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -37,9 +37,6 @@
     __asm__ ("" : "=r"(__ptr) : "0"(ptr));		\
     (typeof(ptr)) (__ptr + (off)); })
 
-/* Make the optimizer believe the variable can be manipulated arbitrarily. */
-#define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))
-
 #ifdef __CHECKER__
 #define __must_be_array(arr) 0
 #else
diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h
index ba147a1..140211e 100644
--- a/include/linux/compiler-intel.h
+++ b/include/linux/compiler-intel.h
@@ -14,19 +14,12 @@
  * It uses intrinsics to do the equivalent things.
  */
 #undef RELOC_HIDE
-#undef OPTIMIZER_HIDE_VAR
 
 #define RELOC_HIDE(ptr, off)					\
   ({ unsigned long __ptr;					\
      __ptr = (unsigned long) (ptr);				\
     (typeof(ptr)) (__ptr + (off)); })
 
-/* This should act as an optimization barrier on var.
- * Given that this compiler does not have inline assembly, a compiler barrier
- * is the best we can do.
- */
-#define OPTIMIZER_HIDE_VAR(var) barrier()
-
 /* Intel ECC compiler doesn't support __builtin_types_compatible_p() */
 #define __must_be_array(a) 0
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 1b45e4a..b70f303 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -181,10 +181,6 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
     (typeof(ptr)) (__ptr + (off)); })
 #endif
 
-#ifndef OPTIMIZER_HIDE_VAR
-#define OPTIMIZER_HIDE_VAR(var) barrier()
-#endif
-
 /* Not-quite-unique ID. */
 #ifndef __UNIQUE_ID
 # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
diff --git a/lib/string.c b/lib/string.c
index ce81aae..a579201 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset);
 void memzero_explicit(void *s, size_t count)
 {
 	memset(s, 0, count);
-	OPTIMIZER_HIDE_VAR(s);
+	barrier();
 }
 EXPORT_SYMBOL(memzero_explicit);
 
-- 
2.1.4


[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 15:09                 ` Hannes Frederic Sowa
  2015-03-18 16:02                   ` Stephan Mueller
@ 2015-03-18 17:41                   ` Theodore Ts'o
  2015-03-18 17:56                     ` Hannes Frederic Sowa
  1 sibling, 1 reply; 36+ messages in thread
From: Theodore Ts'o @ 2015-03-18 17:41 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Stephan Mueller, mancha, linux-kernel,
	linux-crypto, herbert, dborkman

Maybe we should add a kernel self-test that automatically checks
whether or not memset_explicit() gets optimized away?  Otherwise we
might not notice when gcc or how we implement barrier() or whatever
else we end up using ends up changing.

It shold be something that is really fast, so it might be a good idea
to simply automatically run it as part of an __initcall()
unconditionally.  We can debate where the __initcall() lives, but I'd
prefer that it be run even if the crypto layer isn't configured for
some reason.  Hopefully such an self-test is small enough that the
kernel bloat people won't complain.  :-)

							 -Ted

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 17:14                     ` mancha
@ 2015-03-18 17:49                       ` Daniel Borkmann
  2015-03-18 19:09                         ` mancha
  2015-03-18 23:53                       ` Cesar Eduardo Barros
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel Borkmann @ 2015-03-18 17:49 UTC (permalink / raw)
  To: mancha, Stephan Mueller
  Cc: Hannes Frederic Sowa, tytso, linux-kernel, linux-crypto, herbert,
	dborkman, cesarb

On 03/18/2015 06:14 PM, mancha wrote:
...
> Patch 0001 fixes the dead store issue in memzero_explicit().

Thanks! I have issued the fix for the memzero bug to Herbert in
your authorship as discussed, also giving some more context.

For the 2nd issue, lets wait for Cesar.

Thanks again!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 17:41                   ` Theodore Ts'o
@ 2015-03-18 17:56                     ` Hannes Frederic Sowa
  2015-03-18 17:58                       ` Theodore Ts'o
  0 siblings, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-03-18 17:56 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Daniel Borkmann, Stephan Mueller, mancha, linux-kernel,
	linux-crypto, herbert, dborkman

On Wed, Mar 18, 2015, at 18:41, Theodore Ts'o wrote:
> Maybe we should add a kernel self-test that automatically checks
> whether or not memset_explicit() gets optimized away?  Otherwise we
> might not notice when gcc or how we implement barrier() or whatever
> else we end up using ends up changing.
> 
> It shold be something that is really fast, so it might be a good idea
> to simply automatically run it as part of an __initcall()
> unconditionally.  We can debate where the __initcall() lives, but I'd
> prefer that it be run even if the crypto layer isn't configured for
> some reason.  Hopefully such an self-test is small enough that the
> kernel bloat people won't complain.  :-)
> 
> 							 -Ted

Maybe a BUILD_BUGON: ;)

__label__ l1, l2;
char buffer[1024];
l1:
    memset(buffer, 0, 1024);
l2:
  BUILD_BUGON(&&l1 == &&l2);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 17:56                     ` Hannes Frederic Sowa
@ 2015-03-18 17:58                       ` Theodore Ts'o
  0 siblings, 0 replies; 36+ messages in thread
From: Theodore Ts'o @ 2015-03-18 17:58 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Stephan Mueller, mancha, linux-kernel,
	linux-crypto, herbert, dborkman

On Wed, Mar 18, 2015 at 06:56:19PM +0100, Hannes Frederic Sowa wrote:
> 
> Maybe a BUILD_BUGON: ;)

Even better!  :-)

				- Ted

> 
> __label__ l1, l2;
> char buffer[1024];
> l1:
>     memset(buffer, 0, 1024);
> l2:
>   BUILD_BUGON(&&l1 == &&l2);
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 17:49                       ` Daniel Borkmann
@ 2015-03-18 19:09                         ` mancha
  0 siblings, 0 replies; 36+ messages in thread
From: mancha @ 2015-03-18 19:09 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Stephan Mueller, Hannes Frederic Sowa, tytso, linux-kernel,
	linux-crypto, herbert, dborkman, cesarb

[-- Attachment #1: Type: text/plain, Size: 420 bytes --]

On Wed, Mar 18, 2015 at 06:49:55PM +0100, Daniel Borkmann wrote:
> On 03/18/2015 06:14 PM, mancha wrote:
> ...
> >Patch 0001 fixes the dead store issue in memzero_explicit().
> 
> Thanks! I have issued the fix for the memzero bug to Herbert in
> your authorship as discussed, also giving some more context.
> 
> For the 2nd issue, lets wait for Cesar.
> 
> Thanks again!

Excellent! 

Cheers.

--mancha

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 17:14                     ` mancha
  2015-03-18 17:49                       ` Daniel Borkmann
@ 2015-03-18 23:53                       ` Cesar Eduardo Barros
  1 sibling, 0 replies; 36+ messages in thread
From: Cesar Eduardo Barros @ 2015-03-18 23:53 UTC (permalink / raw)
  To: mancha, Stephan Mueller
  Cc: Hannes Frederic Sowa, Daniel Borkmann, tytso, linux-kernel,
	linux-crypto, herbert, dborkman

On 18-03-2015 14:14, mancha wrote:
> On Wed, Mar 18, 2015 at 05:02:01PM +0100, Stephan Mueller wrote:
>> Am Mittwoch, 18. März 2015, 16:09:34 schrieb Hannes Frederic Sowa:
>>> Seems like just using barrier() is the best and easiest option.
>
> However, if the idea is to use barrier() instead of OPTIMIZER_HIDE_VAR()
> in crypto_memneq() as well, then patch 0002 is the one to use. Please
> review and keep in mind my analysis was limited to memzero_explicit().
>
> Cesar, were there reasons you didn't use the gcc version of barrier()
> for crypto_memneq()?

Yes. Two reasons.

Take a look at how barrier() is defined:

#define barrier() __asm__ __volatile__("": : :"memory")

It tells gcc that the dummy assembly "instruction" touches memory (so 
the compiler can't assume anything about the memory), and that nothing 
should be moved from before to after the barrier and vice versa.

It mentions nothing about registers. Therefore, as far as I know gcc can 
assume that the dummy "instruction" touches no integer registers (or 
restores their values). I can imagine a sufficiently perverse compiler 
using that fact to introduce timing-dependent computations. For 
instance, it could load the values using more than one register and 
combine them at the end, after the barriers; there, it could exit early 
in case one of the registers is all-ones. My definition of 
OPTIMIZER_HIDE_VAR introduces a data dependency to prevent that:

#define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))

The second reason is that barrier() is too strong. For crypto_memneq, 
only the or-chain is critical; the order or width of the loads makes no 
difference. The compiler could, if it wishes, do all the loads and xors 
first and do the or-chain at the end, or whenever it can see a pipeline 
bubble; it doesn't matter as long as it does *all* the "or" operations, 
in sequence.

I would be comfortable with a stronger OPTIMIZER_HIDE_VAR (adding 
"memory" or volatile), even though it could limit optimization 
opportunities, but I wouldn't be comfortable with a weaker 
OPTIMIZER_HIDE_VAR (removing the data dependency), unless the gcc and 
clang guys promise that our definition of barrier() will always prevent 
undesired optimization of register-only operations.

There was a third reason for the exact definition of OPTIMIZER_HIDE_VAR: 
it was copied from RELOC_HIDE, which is a longstanding "hide this 
variable from gcc" operation, and thus known to work as expected.

-- 
Cesar Eduardo Barros
cesarb@cesarb.eti.br

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-03-18 11:09     ` Stephan Mueller
  2015-03-18 12:02       ` Hannes Frederic Sowa
@ 2015-04-10 13:25       ` Stephan Mueller
  2015-04-10 14:00         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 36+ messages in thread
From: Stephan Mueller @ 2015-04-10 13:25 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Mittwoch, 18. März 2015, 12:09:45 schrieb Stephan Mueller:

Hi,

>Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
>
>Hi Daniel,
>
>>On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>>> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>>>> Hi.
>>>> 
>>>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
>>>> protect
>>>> 
>>>> memory cleansing against things like dead store optimization:
>>>>     void memzero_explicit(void *s, size_t count)
>>>>     {
>>>>     
>>>>             memset(s, 0, count);
>>>>             OPTIMIZER_HIDE_VAR(s);
>>>>     
>>>>     }
>>>> 
>>>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
>>>> crypto_memneq>>
>>>> 
>>>> against timing analysis, is defined when using gcc as:
>>>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0"
>>>>     (var))
>>>> 
>>>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc
>>>> from optimizing out memset (i.e. secrets remain in memory).
>>>> 
>>>> Two things that do work:
>>>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>>> 
>>> You are correct, volatile signature should be added to
>>> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
>>> allowed to check if it is needed and may remove the asm statement.
>>> Another option would be to just use var as an input variable - asm
>>> blocks without output variables are always considered being volatile
>>> by gcc.
>>> 
>>> Can you send a patch?
>>> 
>>> I don't think it is security critical, as Daniel pointed out, the
>>> call
>>> will happen because the function is an external call to the crypto
>>> functions, thus the compiler has to flush memory on return.
>>
>>Just had a look.
>>
>>$ gdb vmlinux
>>(gdb) disassemble memzero_explicit
>>
>>Dump of assembler code for function memzero_explicit:
>>    0xffffffff813a18b0 <+0>:	push   %rbp
>>    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
>>    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
>>    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
>>    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 <memset>
>>    0xffffffff813a18be <+14>:	pop    %rbp
>>    0xffffffff813a18bf <+15>:	retq
>>
>>End of assembler dump.
>>
>>(gdb) disassemble extract_entropy
>>[...]
>>
>>    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
>>    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
>>
>><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
>>
>>    0xffffffff814a500c <+316>:	mov    $0xa,%esi
>>    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
>>
>><memzero_explicit> 0xffffffff814a5016 <+326>:	mov    -0x48(%rbp),%rax
>>[...]
>>
>>I would be fine with __volatile__.
>
>Are we sure that simply adding a __volatile__ works in any case? I just
>did a test with a simple user space app:
>
>static inline void memset_secure(void *s, int c, size_t n)
>{
>        memset(s, c, n);
>        //__asm__ __volatile__("": : :"memory");
>        __asm__ __volatile__("" : "=r" (s) : "0" (s));
>}
>
>int main(int argc, char *argv[])
>{
>#define BUFLEN 20
>        char buf[BUFLEN];
>
>        snprintf(buf, (BUFLEN - 1), "teststring\n");
>        printf("%s", buf);
>
>        memset_secure(buf, 0, BUFLEN);
>}
>
>When using the discussed code of __asm__ __volatile__("" : "=r" (s) :
>"0" (s));  I do not find the code implementing memset(0) in objdump.
>Only when I enable the memory barrier, I see the following (when
>compiling with -O2):
>
>objdump -d memset_secure:
>...
>0000000000400440 <main>:
>...
>  400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
>  400470:       00
>  400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
>  400478:       00 00
>  40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
>  400481:       00
>...

I would like to bring up that topic again as I did some more analyses:

For testing I used the following code:

static inline void memset_secure(void *s, int c, size_t n)
{
        memset(s, c, n);
	BARRIER
}

where BARRIER is defined as:

(1) __asm__ __volatile__("" : "=r" (s) : "0" (s));

(2) __asm__ __volatile__("": : :"memory");

(3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");

I tested the code with gcc and clang, considering that there is effort 
underway to compile the kernel with clang too.

The following table marks an X when the aforementioned movq/movl code is 
present (or an invocation of memset@plt) in the object code (i.e. the code we 
want). Contrary the table marks - where the code is not present (i.e. the code 
we do not want):

         | BARRIER  | (1) | (2) | (3)
---------+----------+     |     |
Compiler |          |     |     |
=========+==========+==================
                    |     |     |
gcc -O0             |  X  |  X  |  X
                    |     |     |
gcc -O2             |  -  |  X  |  X
                    |     |     |
gcc -O3             |  -  |  X  |  X
                    |     |     |
clang -00           |  X  |  X  |  X
                    |     |     |
clang -02           |  X  |  -  |  X
                    |     |     |
clang -03           |  -  |  -  |  X

As the kernel is compiled with -O2, clang folks would still be left uncovered 
with the current solution (i.e. BARRIER option (2)).

Thus, may I propose to update the patch to use option (3) instead as (i) it 
does not cost anything extra on gcc and (ii) it covers clang too?

Ciao
Stephan 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 13:25       ` Stephan Mueller
@ 2015-04-10 14:00         ` Hannes Frederic Sowa
  2015-04-10 14:09           ` Stephan Mueller
  0 siblings, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-04-10 14:00 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

On Fr, 2015-04-10 at 15:25 +0200, Stephan Mueller wrote:
> I would like to bring up that topic again as I did some more analyses:
> 
> For testing I used the following code:
> 
> static inline void memset_secure(void *s, int c, size_t n)
> {
>         memset(s, c, n);
> 	BARRIER
> }
> 
> where BARRIER is defined as:
> 
> (1) __asm__ __volatile__("" : "=r" (s) : "0" (s));
> 
> (2) __asm__ __volatile__("": : :"memory");
> 
> (3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");

Hm, I wonder a little bit...

Could you quickly test if you replace (s) with (n) just for the fun of
it? I don't know if we should ask clang people about that, at least it
is their goal to be as highly compatible with gcc inline asm.

Thanks for looking into this!

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:00         ` Hannes Frederic Sowa
@ 2015-04-10 14:09           ` Stephan Mueller
  2015-04-10 14:22             ` mancha security
  2015-04-10 14:26             ` Hannes Frederic Sowa
  0 siblings, 2 replies; 36+ messages in thread
From: Stephan Mueller @ 2015-04-10 14:09 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Freitag, 10. April 2015, 16:00:03 schrieb Hannes Frederic Sowa:

Hi Hannes,

>On Fr, 2015-04-10 at 15:25 +0200, Stephan Mueller wrote:
>> I would like to bring up that topic again as I did some more analyses:
>> 
>> For testing I used the following code:
>> 
>> static inline void memset_secure(void *s, int c, size_t n)
>> {
>> 
>>         memset(s, c, n);
>> 	
>> 	BARRIER
>> 
>> }
>> 
>> where BARRIER is defined as:
>> 
>> (1) __asm__ __volatile__("" : "=r" (s) : "0" (s));
>> 
>> (2) __asm__ __volatile__("": : :"memory");
>> 
>> (3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");
>
>Hm, I wonder a little bit...
>
>Could you quickly test if you replace (s) with (n) just for the fun of
>it? I don't know if we should ask clang people about that, at least it
>is their goal to be as highly compatible with gcc inline asm.

Using 

 __asm__ __volatile__("" : "=r" (n) : "0" (n) : "memory");

clang O2/3: no mov

gcc O2/3: mov present

==> not good


Using
 __asm__ __volatile__("" : "=r" (n) : "0" (n));

clang O2/3: no mov

gcc O2/3: no mov


==> not good


What do you expect that change shall do?

>
>Thanks for looking into this!
>
>Bye,
>Hannes


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:09           ` Stephan Mueller
@ 2015-04-10 14:22             ` mancha security
  2015-04-10 14:33               ` Stephan Mueller
  2015-04-10 14:26             ` Hannes Frederic Sowa
  1 sibling, 1 reply; 36+ messages in thread
From: mancha security @ 2015-04-10 14:22 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Hannes Frederic Sowa, Daniel Borkmann, tytso, linux-kernel,
	linux-crypto, herbert, dborkman

[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]

On Fri, Apr 10, 2015 at 04:09:10PM +0200, Stephan Mueller wrote:
> Am Freitag, 10. April 2015, 16:00:03 schrieb Hannes Frederic Sowa:
> 
> Hi Hannes,
> 
> >On Fr, 2015-04-10 at 15:25 +0200, Stephan Mueller wrote:
> >> I would like to bring up that topic again as I did some more analyses:
> >> 
> >> For testing I used the following code:
> >> 
> >> static inline void memset_secure(void *s, int c, size_t n)
> >> {
> >> 
> >>         memset(s, c, n);
> >> 	
> >> 	BARRIER
> >> 
> >> }
> >> 
> >> where BARRIER is defined as:
> >> 
> >> (1) __asm__ __volatile__("" : "=r" (s) : "0" (s));
> >> 
> >> (2) __asm__ __volatile__("": : :"memory");
> >> 
> >> (3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");
> >
> >Hm, I wonder a little bit...
> >
> >Could you quickly test if you replace (s) with (n) just for the fun of
> >it? I don't know if we should ask clang people about that, at least it
> >is their goal to be as highly compatible with gcc inline asm.
> 
> Using 
> 
>  __asm__ __volatile__("" : "=r" (n) : "0" (n) : "memory");
> 
> clang O2/3: no mov
> 
> gcc O2/3: mov present
> 
> ==> not good
> 
> 
> Using
>  __asm__ __volatile__("" : "=r" (n) : "0" (n));
> 
> clang O2/3: no mov
> 
> gcc O2/3: no mov
> 
> 
> ==> not good
> 
> 
> What do you expect that change shall do?
> 
> >
> >Thanks for looking into this!
> >
> >Bye,
> >Hannes
> 
> 
> Ciao
> Stephan

Thanks for the comprehensive testing! Clang 3.3 and was giving me good
results; didn't try newer versions.

I wonder what your tests give with an earlier suggestion of mine:

#define barrier(p) __asm__ __volatile__("": :"r"(p) :"memory")

void memzero_explicit(void *s, size_t count)
{
  memset(s, 0, count);
  barrier(s);
}

--mancha

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:09           ` Stephan Mueller
  2015-04-10 14:22             ` mancha security
@ 2015-04-10 14:26             ` Hannes Frederic Sowa
  2015-04-10 14:36               ` Stephan Mueller
  1 sibling, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-04-10 14:26 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

On Fr, 2015-04-10 at 16:09 +0200, Stephan Mueller wrote:
> Am Freitag, 10. April 2015, 16:00:03 schrieb Hannes Frederic Sowa:
> 
> Hi Hannes,
> 
> >On Fr, 2015-04-10 at 15:25 +0200, Stephan Mueller wrote:
> >> I would like to bring up that topic again as I did some more analyses:
> >> 
> >> For testing I used the following code:
> >> 
> >> static inline void memset_secure(void *s, int c, size_t n)
> >> {
> >> 
> >>         memset(s, c, n);
> >> 	
> >> 	BARRIER
> >> 
> >> }
> >> 
> >> where BARRIER is defined as:
> >> 
> >> (1) __asm__ __volatile__("" : "=r" (s) : "0" (s));
> >> 
> >> (2) __asm__ __volatile__("": : :"memory");
> >> 
> >> (3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");
> >
> >Hm, I wonder a little bit...
> >
> >Could you quickly test if you replace (s) with (n) just for the fun of
> >it? I don't know if we should ask clang people about that, at least it
> >is their goal to be as highly compatible with gcc inline asm.
> 
> Using 
> 
>  __asm__ __volatile__("" : "=r" (n) : "0" (n) : "memory");
> 
> clang O2/3: no mov
> 
> gcc O2/3: mov present
> 
> ==> not good

I suspected a problem in how volatile with non-present output args could
be different, but this seems not to be the case.

I would contact llvm/clang mailing list and ask. Maybe there is a
problem? It seems kind of strange to me...

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:22             ` mancha security
@ 2015-04-10 14:33               ` Stephan Mueller
  2015-04-10 20:09                 ` mancha security
  0 siblings, 1 reply; 36+ messages in thread
From: Stephan Mueller @ 2015-04-10 14:33 UTC (permalink / raw)
  To: mancha security
  Cc: Hannes Frederic Sowa, Daniel Borkmann, tytso, linux-kernel,
	linux-crypto, herbert, dborkman

Am Freitag, 10. April 2015, 14:22:08 schrieb mancha security:

Hi mancha,

>__asm__ __volatile__("": :"r"(p) :"memory")

gcc -O2/3: mov present

clang -O2/3: mov present

==> approach would be good too.

Note, the assembly code does not seem to change whether to use this approach 
or the one I initially tested.


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:26             ` Hannes Frederic Sowa
@ 2015-04-10 14:36               ` Stephan Mueller
  2015-04-10 14:45                 ` Hannes Frederic Sowa
  2015-04-10 14:46                 ` Daniel Borkmann
  0 siblings, 2 replies; 36+ messages in thread
From: Stephan Mueller @ 2015-04-10 14:36 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Freitag, 10. April 2015, 16:26:00 schrieb Hannes Frederic Sowa:

Hi Hannes,

>On Fr, 2015-04-10 at 16:09 +0200, Stephan Mueller wrote:
>> Am Freitag, 10. April 2015, 16:00:03 schrieb Hannes Frederic Sowa:
>> 
>> Hi Hannes,
>> 
>> >On Fr, 2015-04-10 at 15:25 +0200, Stephan Mueller wrote:
>> >> I would like to bring up that topic again as I did some more analyses:
>> >> 
>> >> For testing I used the following code:
>> >> 
>> >> static inline void memset_secure(void *s, int c, size_t n)
>> >> {
>> >> 
>> >>         memset(s, c, n);
>> >> 	
>> >> 	BARRIER
>> >> 
>> >> }
>> >> 
>> >> where BARRIER is defined as:
>> >> 
>> >> (1) __asm__ __volatile__("" : "=r" (s) : "0" (s));
>> >> 
>> >> (2) __asm__ __volatile__("": : :"memory");
>> >> 
>> >> (3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");
>> >
>> >Hm, I wonder a little bit...
>> >
>> >Could you quickly test if you replace (s) with (n) just for the fun of
>> >it? I don't know if we should ask clang people about that, at least it
>> >is their goal to be as highly compatible with gcc inline asm.
>> 
>> Using
>> 
>>  __asm__ __volatile__("" : "=r" (n) : "0" (n) : "memory");
>> 
>> clang O2/3: no mov
>> 
>> gcc O2/3: mov present
>> 
>> ==> not good
>
>I suspected a problem in how volatile with non-present output args could
>be different, but this seems not to be the case.
>
>I would contact llvm/clang mailing list and ask. Maybe there is a
>problem? It seems kind of strange to me...

Do you really think this is a compiler issue? I would rather think it is how 
to interpret the pure "memory" asm option. Thus, I would rather think that 
both, gcc and clang are right and we just need to use the code that fits both.
>
>Thanks,
>Hannes
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:36               ` Stephan Mueller
@ 2015-04-10 14:45                 ` Hannes Frederic Sowa
  2015-04-10 14:46                 ` Daniel Borkmann
  1 sibling, 0 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-04-10 14:45 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

On Fr, 2015-04-10 at 16:36 +0200, Stephan Mueller wrote:
> Am Freitag, 10. April 2015, 16:26:00 schrieb Hannes Frederic Sowa:
> 
> Hi Hannes,
> 
> >On Fr, 2015-04-10 at 16:09 +0200, Stephan Mueller wrote:
> >> Am Freitag, 10. April 2015, 16:00:03 schrieb Hannes Frederic Sowa:
> >> 
> >> Hi Hannes,
> >> 
> >> >On Fr, 2015-04-10 at 15:25 +0200, Stephan Mueller wrote:
> >> >> I would like to bring up that topic again as I did some more analyses:
> >> >> 
> >> >> For testing I used the following code:
> >> >> 
> >> >> static inline void memset_secure(void *s, int c, size_t n)
> >> >> {
> >> >> 
> >> >>         memset(s, c, n);
> >> >> 	
> >> >> 	BARRIER
> >> >> 
> >> >> }
> >> >> 
> >> >> where BARRIER is defined as:
> >> >> 
> >> >> (1) __asm__ __volatile__("" : "=r" (s) : "0" (s));
> >> >> 
> >> >> (2) __asm__ __volatile__("": : :"memory");
> >> >> 
> >> >> (3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");
> >> >
> >> >Hm, I wonder a little bit...
> >> >
> >> >Could you quickly test if you replace (s) with (n) just for the fun of
> >> >it? I don't know if we should ask clang people about that, at least it
> >> >is their goal to be as highly compatible with gcc inline asm.
> >> 
> >> Using
> >> 
> >>  __asm__ __volatile__("" : "=r" (n) : "0" (n) : "memory");
> >> 
> >> clang O2/3: no mov
> >> 
> >> gcc O2/3: mov present
> >> 
> >> ==> not good
> >
> >I suspected a problem in how volatile with non-present output args could
> >be different, but this seems not to be the case.
> >
> >I would contact llvm/clang mailing list and ask. Maybe there is a
> >problem? It seems kind of strange to me...
> 
> Do you really think this is a compiler issue? I would rather think it is how 
> to interpret the pure "memory" asm option. Thus, I would rather think that 
> both, gcc and clang are right and we just need to use the code that fits both.

Clang docs state that they want to be highly compatible with gcc inline
asm. Also, kernel code also uses barrier() in other places and in my
opinion, the compiler cannot make any assumptions about memory and
registers when using volatile asm with memory clobbers. But somehow
clang+llvm seems it does, no?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:36               ` Stephan Mueller
  2015-04-10 14:45                 ` Hannes Frederic Sowa
@ 2015-04-10 14:46                 ` Daniel Borkmann
  2015-04-10 14:50                   ` Stephan Mueller
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel Borkmann @ 2015-04-10 14:46 UTC (permalink / raw)
  To: Stephan Mueller, Hannes Frederic Sowa
  Cc: mancha, tytso, linux-kernel, linux-crypto, herbert, dborkman

On 04/10/2015 04:36 PM, Stephan Mueller wrote:
> Am Freitag, 10. April 2015, 16:26:00 schrieb Hannes Frederic Sowa:
...
>> I suspected a problem in how volatile with non-present output args could
>> be different, but this seems not to be the case.
>>
>> I would contact llvm/clang mailing list and ask. Maybe there is a
>> problem? It seems kind of strange to me...

+1

> Do you really think this is a compiler issue?

If clang/LLVM advertises "GCC compatibility", then this would
certainly be a different behavior.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:46                 ` Daniel Borkmann
@ 2015-04-10 14:50                   ` Stephan Mueller
  2015-04-10 14:54                     ` Daniel Borkmann
  2015-04-27 19:10                     ` Stephan Mueller
  0 siblings, 2 replies; 36+ messages in thread
From: Stephan Mueller @ 2015-04-10 14:50 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Freitag, 10. April 2015, 16:46:04 schrieb Daniel Borkmann:

Hi Daniel,

>On 04/10/2015 04:36 PM, Stephan Mueller wrote:
>> Am Freitag, 10. April 2015, 16:26:00 schrieb Hannes Frederic Sowa:
>...
>
>>> I suspected a problem in how volatile with non-present output args could
>>> be different, but this seems not to be the case.
>>> 
>>> I would contact llvm/clang mailing list and ask. Maybe there is a
>>> problem? It seems kind of strange to me...
>
>+1
>
>> Do you really think this is a compiler issue?
>
>If clang/LLVM advertises "GCC compatibility", then this would
>certainly be a different behavior.

As you wish. I will contact the clang folks. As the proposed fix is not super 
urgend, I think we can leave it until I got word from clang.

>--
>To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:50                   ` Stephan Mueller
@ 2015-04-10 14:54                     ` Daniel Borkmann
  2015-04-27 19:10                     ` Stephan Mueller
  1 sibling, 0 replies; 36+ messages in thread
From: Daniel Borkmann @ 2015-04-10 14:54 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

On 04/10/2015 04:50 PM, Stephan Mueller wrote:
> Am Freitag, 10. April 2015, 16:46:04 schrieb Daniel Borkmann:
>
> Hi Daniel,
>
>> On 04/10/2015 04:36 PM, Stephan Mueller wrote:
>>> Am Freitag, 10. April 2015, 16:26:00 schrieb Hannes Frederic Sowa:
>> ...
>>
>>>> I suspected a problem in how volatile with non-present output args could
>>>> be different, but this seems not to be the case.
>>>>
>>>> I would contact llvm/clang mailing list and ask. Maybe there is a
>>>> problem? It seems kind of strange to me...
>>
>> +1
>>
>>> Do you really think this is a compiler issue?
>>
>> If clang/LLVM advertises "GCC compatibility", then this would
>> certainly be a different behavior.
>
> As you wish. I will contact the clang folks. As the proposed fix is not super
> urgend, I think we can leave it until I got word from clang.

Okay, that would be good, please let us know.

I believe there's certainly effort in the direction of an official
kernel clang/LLVM support, but not officially supported yet. If we
could get some clarification on the issue, even better.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:33               ` Stephan Mueller
@ 2015-04-10 20:09                 ` mancha security
  0 siblings, 0 replies; 36+ messages in thread
From: mancha security @ 2015-04-10 20:09 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Hannes Frederic Sowa, Daniel Borkmann, tytso, linux-kernel,
	linux-crypto, herbert, dborkman

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On Fri, Apr 10, 2015 at 04:33:17PM +0200, Stephan Mueller wrote:
> Am Freitag, 10. April 2015, 14:22:08 schrieb mancha security:
> 
> Hi mancha,
> 
> >__asm__ __volatile__("": :"r"(p) :"memory")
> 
> gcc -O2/3: mov present
> 
> clang -O2/3: mov present
> 
> ==> approach would be good too.
> 
> Note, the assembly code does not seem to change whether to use this approach 
> or the one I initially tested.
> 
> 
> Ciao
> Stephan

Hi Stephan.

Many thanks for confirmation.

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-10 14:50                   ` Stephan Mueller
  2015-04-10 14:54                     ` Daniel Borkmann
@ 2015-04-27 19:10                     ` Stephan Mueller
  2015-04-27 20:34                       ` Daniel Borkmann
  1 sibling, 1 reply; 36+ messages in thread
From: Stephan Mueller @ 2015-04-27 19:10 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Freitag, 10. April 2015, 16:50:22 schrieb Stephan Mueller:

Hi Stephan,

>Am Freitag, 10. April 2015, 16:46:04 schrieb Daniel Borkmann:
>
>Hi Daniel,
>
>>On 04/10/2015 04:36 PM, Stephan Mueller wrote:
>>> Am Freitag, 10. April 2015, 16:26:00 schrieb Hannes Frederic Sowa:
>>...
>>
>>>> I suspected a problem in how volatile with non-present output args could
>>>> be different, but this seems not to be the case.
>>>> 
>>>> I would contact llvm/clang mailing list and ask. Maybe there is a
>>>> problem? It seems kind of strange to me...
>>
>>+1
>>
>>> Do you really think this is a compiler issue?
>>
>>If clang/LLVM advertises "GCC compatibility", then this would
>>certainly be a different behavior.
>
>As you wish. I will contact the clang folks. As the proposed fix is not super
>urgend, I think we can leave it until I got word from clang.

I posted the issue on the clang mailing list on April 10 -- no word so far. I 
would interpret this as a sign that it is a no-issue for them.

Thus, I propose we update our memzero_explicit implementation to use

__asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");

Concerns?

Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-27 19:10                     ` Stephan Mueller
@ 2015-04-27 20:34                       ` Daniel Borkmann
  2015-04-27 20:41                         ` Stephan Mueller
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Borkmann @ 2015-04-27 20:34 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

On 04/27/2015 09:10 PM, Stephan Mueller wrote:
...
> I posted the issue on the clang mailing list on April 10 -- no word so far. I
> would interpret this as a sign that it is a no-issue for them.

Hm. ;)

Here's a bug report on the topic, gcc vs llvm:

   https://llvm.org/bugs/show_bug.cgi?id=15495

Lets add a new barrier macro to linux/compiler{,-gcc}.h, f.e.

   #define barrier_data(ptr) __asm__ __volatile__("" : : "r" (ptr) : "memory")

or the version Mancha proposed. You could wrap that ...

   #define OPTIMIZER_HIDE(ptr)   barrier_data(ptr)

... and use that one for memzero_explicit() instead:

   void memzero_explicit(void *s, size_t count)
   {
      memset(s, 0, count);
      OPTIMIZER_HIDE(s);
   }

It certainly needs comments explaining in what situations to use
which OPTIMIZER_HIDE* variants, etc.

Do you want to send a patch?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-27 20:34                       ` Daniel Borkmann
@ 2015-04-27 20:41                         ` Stephan Mueller
  2015-04-27 20:53                           ` Daniel Borkmann
  0 siblings, 1 reply; 36+ messages in thread
From: Stephan Mueller @ 2015-04-27 20:41 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

Am Montag, 27. April 2015, 22:34:30 schrieb Daniel Borkmann:

Hi Daniel,

> On 04/27/2015 09:10 PM, Stephan Mueller wrote:
> ...
> 
> > I posted the issue on the clang mailing list on April 10 -- no word so
> > far. I would interpret this as a sign that it is a no-issue for them.
> 
> Hm. ;)
> 
> Here's a bug report on the topic, gcc vs llvm:
> 
>    https://llvm.org/bugs/show_bug.cgi?id=15495
> 
> Lets add a new barrier macro to linux/compiler{,-gcc}.h, f.e.
> 
>    #define barrier_data(ptr) __asm__ __volatile__("" : : "r" (ptr) :
> "memory")
> 
> or the version Mancha proposed. You could wrap that ...
> 
>    #define OPTIMIZER_HIDE(ptr)   barrier_data(ptr)
> 
> ... and use that one for memzero_explicit() instead:
> 
>    void memzero_explicit(void *s, size_t count)
>    {
>       memset(s, 0, count);
>       OPTIMIZER_HIDE(s);
>    }
> 
> It certainly needs comments explaining in what situations to use
> which OPTIMIZER_HIDE* variants, etc.
> 
> Do you want to send a patch?

It seems you have the code already in mind, so please if you could write it 
:-)

-- 
Ciao
Stephan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [BUG/PATCH] kernel RNG and its secrets
  2015-04-27 20:41                         ` Stephan Mueller
@ 2015-04-27 20:53                           ` Daniel Borkmann
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Borkmann @ 2015-04-27 20:53 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Hannes Frederic Sowa, mancha, tytso, linux-kernel, linux-crypto,
	herbert, dborkman

On 04/27/2015 10:41 PM, Stephan Mueller wrote:
...
> It seems you have the code already in mind, so please if you could write it
> :-)

Ok, sure. I'll cook something by tomorrow morning.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-04-27 20:53 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-18  9:53 [BUG/PATCH] kernel RNG and its secrets mancha
2015-03-18 10:30 ` Daniel Borkmann
2015-03-18 10:50 ` Hannes Frederic Sowa
2015-03-18 10:56   ` Daniel Borkmann
2015-03-18 11:09     ` Stephan Mueller
2015-03-18 12:02       ` Hannes Frederic Sowa
2015-03-18 12:14         ` Stephan Mueller
2015-03-18 12:19           ` Hannes Frederic Sowa
2015-03-18 12:20             ` Stephan Mueller
2015-03-18 12:42               ` Daniel Borkmann
2015-03-18 15:09                 ` Hannes Frederic Sowa
2015-03-18 16:02                   ` Stephan Mueller
2015-03-18 17:14                     ` mancha
2015-03-18 17:49                       ` Daniel Borkmann
2015-03-18 19:09                         ` mancha
2015-03-18 23:53                       ` Cesar Eduardo Barros
2015-03-18 17:41                   ` Theodore Ts'o
2015-03-18 17:56                     ` Hannes Frederic Sowa
2015-03-18 17:58                       ` Theodore Ts'o
2015-03-18 12:58         ` mancha
2015-04-10 13:25       ` Stephan Mueller
2015-04-10 14:00         ` Hannes Frederic Sowa
2015-04-10 14:09           ` Stephan Mueller
2015-04-10 14:22             ` mancha security
2015-04-10 14:33               ` Stephan Mueller
2015-04-10 20:09                 ` mancha security
2015-04-10 14:26             ` Hannes Frederic Sowa
2015-04-10 14:36               ` Stephan Mueller
2015-04-10 14:45                 ` Hannes Frederic Sowa
2015-04-10 14:46                 ` Daniel Borkmann
2015-04-10 14:50                   ` Stephan Mueller
2015-04-10 14:54                     ` Daniel Borkmann
2015-04-27 19:10                     ` Stephan Mueller
2015-04-27 20:34                       ` Daniel Borkmann
2015-04-27 20:41                         ` Stephan Mueller
2015-04-27 20:53                           ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.