linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf_fuzzer causes reboot
@ 2014-02-21 20:25 Vince Weaver
  2014-02-21 22:13 ` perf_fuzzer compiled for x32 " Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-21 20:25 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Peter Zijlstra, Ingo Molnar


So I'm not sure who exactly to report this to.  Some perf people CC'd as
I trigger it while using the perf_fuzzer.

This is with 3.14-rc3 on a core2 machine, although I've had the reboots 
happen throughout at least 3.14-rc*

I'm having a hard time coming up with a reproducible test case.  Using the 
random seed that caused the below will cause the perf_fuzzer to segfault 
but not reboot.

The log isn't very helpful, it reboots so fast that the oops doesn't 
finish printing and the serial log just moves to the bootloader...

[ 4466.804123] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[ 4466.808014] IP: [<ffffffff81111783>] cache_reap+0x5e/0x1c5
[ 4466.808014] PGD 0 
[ 4466.808014] Oops: 0000 [#1] GNU GRUB  version 2.00-17


Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-21 20:25 perf_fuzzer causes reboot Vince Weaver
@ 2014-02-21 22:13 ` Vince Weaver
  2014-02-21 22:34   ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-21 22:13 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, H.J. Lu


cc'ing x32 people

On Fri, 21 Feb 2014, Vince Weaver wrote:

> So I'm not sure who exactly to report this to.  Some perf people CC'd as
> I trigger it while using the perf_fuzzer.
> 
> This is with 3.14-rc3 on a core2 machine, although I've had the reboots 
> happen throughout at least 3.14-rc*
> 
> I'm having a hard time coming up with a reproducible test case.  Using the 
> random seed that caused the below will cause the perf_fuzzer to segfault 
> but not reboot.
> 
> The log isn't very helpful, it reboots so fast that the oops doesn't 
> finish printing and the serial log just moves to the bootloader...
> 
> [ 4466.804123] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
> [ 4466.808014] IP: [<ffffffff81111783>] cache_reap+0x5e/0x1c5
> [ 4466.808014] PGD 0 
> [ 4466.808014] Oops: 0000 [#1] GNU GRUB  version 2.00-17

Maybe related, this is on an x32-compiled binary.

When trying to reproduce the perf_fuzzer myseriously segfaults on what 
appears to be perfectly valid mmap'd perf ring-buffers.

(running under gdb)

Program received signal SIGSEGV, Segmentation fault.
0x0041efbb in __memset_sse2 ()

=> 0x0041efbb <+2203>:  movdqa %xmm0,(%rdi)

rdi            0xf7f61000       4160098304

f7f61000-f7f72000 rw-s 00000000 00:08 4475  anon_inode:[perf_event]

So I'm not sure if somehow something is wrong with the page mapping, that 
makes a valid write fail and sometimes (possibly due to address space 
randomization) reboot the system?

Vince


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-21 22:13 ` perf_fuzzer compiled for x32 " Vince Weaver
@ 2014-02-21 22:34   ` Vince Weaver
  2014-02-22  4:50     ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-21 22:34 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, H.J. Lu

On Fri, 21 Feb 2014, Vince Weaver wrote:

> On Fri, 21 Feb 2014, Vince Weaver wrote:
> 
> > So I'm not sure who exactly to report this to.  Some perf people CC'd as
> > I trigger it while using the perf_fuzzer.
> > 
> > This is with 3.14-rc3 on a core2 machine, although I've had the reboots 
> > happen throughout at least 3.14-rc*
> > 
> > I'm having a hard time coming up with a reproducible test case.  Using the 
> > random seed that caused the below will cause the perf_fuzzer to segfault 
> > but not reboot.
> > 
> > The log isn't very helpful, it reboots so fast that the oops doesn't 
> > finish printing and the serial log just moves to the bootloader...
> > 
> > [ 4466.804123] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
> > [ 4466.808014] IP: [<ffffffff81111783>] cache_reap+0x5e/0x1c5
> > [ 4466.808014] PGD 0 
> > [ 4466.808014] Oops: 0000 [#1] GNU GRUB  version 2.00-17
> 
> Maybe related, this is on an x32-compiled binary.
> 
> When trying to reproduce the perf_fuzzer myseriously segfaults on what 
> appears to be perfectly valid mmap'd perf ring-buffers.
> 
> (running under gdb)
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0041efbb in __memset_sse2 ()
> 
> => 0x0041efbb <+2203>:  movdqa %xmm0,(%rdi)
> 
> rdi            0xf7f61000       4160098304
> 
> f7f61000-f7f72000 rw-s 00000000 00:08 4475  anon_inode:[perf_event]
> 
> So I'm not sure if somehow something is wrong with the page mapping, that 
> makes a valid write fail and sometimes (possibly due to address space 
> randomization) reboot the system?

also strange, when I look at the core dumps it always shows the bad memory
address happening at the beginning of an mmap page as expected for the
ip listed, but the segfault listed in the kernel happens at some 
completely unrelated address that isn't even page aligned and shouldn't be 
possible based on the gdb/coredump results?

[ 1560.313863] perf_fuzzer[2826]: segfault at 503283ff ip 000000000041efbb 
sp 00000000ffd367d8 error 6 in perf_fuzzer[400000+d1000]
[ 1704.673245] perf_fuzzer[2835]: segfault at 503283ff ip 000000000041efbb 
sp 00000000ff972be8 error 6 in perf_fuzzer[400000+d1000]
[ 2978.101276] perf_fuzzer[2841]: segfault at 503283ff ip 000000000041efbb 
sp 00000000ff92ba68 error 6 in perf_fuzzer[400000+d1000]
[ 4907.185366] perf_fuzzer[2868]: segfault at 503283ff ip 000000000041efbb 
sp 00000000ffadcd28 error 6 in perf_fuzzer[400000+d1000]
[ 9570.793746] perf_fuzzer[6183]: segfault at 4d0bf28e ip 000000000041efbb 
sp 00000000ff83f688 error 6 in perf_fuzzer[400000+d1000]
[ 9743.888431] perf_fuzzer[6187]: segfault at 91734d5 ip 000000000041efbb 
sp 00000000ffb4e288 error 6 in perf_fuzzer[400000+d1000]

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-21 22:34   ` Vince Weaver
@ 2014-02-22  4:50     ` Vince Weaver
  2014-02-22  5:03       ` H. Peter Anvin
  2014-02-22  6:26       ` H. Peter Anvin
  0 siblings, 2 replies; 115+ messages in thread
From: Vince Weaver @ 2014-02-22  4:50 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, H.J. Lu


So I changed the perf_fuzzer so when it randomly stomps all over the
perf_event_mmap_page, it uses a constant value of 0xdeadbeef rather
than a random value.

The result is below.  The segfaults make a bit more sense now, it
almost looks like what is happening is we are corrupting an address
value somehow (head? tail?) and the kernel then uses the corrupt address 
and writes to memory outside of the mmap ring buffer.

I still haven't figured out how to trigger this exactly, but you can
see when over-written with 0xdeadbeef the memory address written to is
consistently some small multiple of 0x120.

I imagine it would be a bad thing if it turned out to be possible to 
select what memory address got written to.  Although since I've
only reproduced this on x32 maybe it won't be possible to over-write
the kernel; but I have seen this bug cause a reboot when the
wrong thing got over-written.

[28002.850192] perf_fuzzer[7083]: segfault at 2be0 ip 000000000041efab sp 00000000ff826748 error 6 in perf_fuzzer[400000+d1000]
[28639.769869] perf_fuzzer[7100]: segfault at 1320 ip 000000000041efab sp 00000000ffa65038 error 6 in perf_fuzzer[400000+d1000]
[29396.986242] perf_fuzzer[7120]: segfault at 10e0 ip 000000000041efab sp 00000000ffd48e68 error 6 in perf_fuzzer[400000+d1000]
[29738.892931] perf_fuzzer[7128]: segfault at 18c0 ip 000000000041efab sp 00000000ffcdcd88 error 6 in perf_fuzzer[400000+d1000]
[29815.550210] perf_fuzzer[7132]: segfault at 120 ip 000000000041efab sp 00000000ffe673b8 error 6 in perf_fuzzer[400000+d1000]
[30173.455348] perf_fuzzer[7141]: segfault at 120 ip 000000000041efab sp 00000000ffda1948 error 6 in perf_fuzzer[400000+d1000]
[30570.625642] perf_fuzzer[7156]: segfault at 1680 ip 000000000041efab sp 00000000ffaad028 error 6 in perf_fuzzer[400000+d1000]
[31047.887784] perf_fuzzer[7169]: segfault at 60c0 ip 000000000041efab sp 00000000ffaa86e8 error 6 in perf_fuzzer[400000+d1000]
[31300.168714] perf_fuzzer[7175]: segfault at 3a80 ip 000000000041efab sp 00000000ffd83228 error 6 in perf_fuzzer[400000+d1000]
[31984.727278] perf_fuzzer[7193]: segfault at 7e0 ip 000000000041efab sp 00000000ff9db1f8 error 6 in perf_fuzzer[400000+d1000]

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-22  4:50     ` Vince Weaver
@ 2014-02-22  5:03       ` H. Peter Anvin
  2014-02-22  6:26       ` H. Peter Anvin
  1 sibling, 0 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-22  5:03 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

Those are segfaults in user space, though?

On February 21, 2014 8:50:38 PM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>
>So I changed the perf_fuzzer so when it randomly stomps all over the
>perf_event_mmap_page, it uses a constant value of 0xdeadbeef rather
>than a random value.
>
>The result is below.  The segfaults make a bit more sense now, it
>almost looks like what is happening is we are corrupting an address
>value somehow (head? tail?) and the kernel then uses the corrupt
>address 
>and writes to memory outside of the mmap ring buffer.
>
>I still haven't figured out how to trigger this exactly, but you can
>see when over-written with 0xdeadbeef the memory address written to is
>consistently some small multiple of 0x120.
>
>I imagine it would be a bad thing if it turned out to be possible to 
>select what memory address got written to.  Although since I've
>only reproduced this on x32 maybe it won't be possible to over-write
>the kernel; but I have seen this bug cause a reboot when the
>wrong thing got over-written.
>
>[28002.850192] perf_fuzzer[7083]: segfault at 2be0 ip 000000000041efab
>sp 00000000ff826748 error 6 in perf_fuzzer[400000+d1000]
>[28639.769869] perf_fuzzer[7100]: segfault at 1320 ip 000000000041efab
>sp 00000000ffa65038 error 6 in perf_fuzzer[400000+d1000]
>[29396.986242] perf_fuzzer[7120]: segfault at 10e0 ip 000000000041efab
>sp 00000000ffd48e68 error 6 in perf_fuzzer[400000+d1000]
>[29738.892931] perf_fuzzer[7128]: segfault at 18c0 ip 000000000041efab
>sp 00000000ffcdcd88 error 6 in perf_fuzzer[400000+d1000]
>[29815.550210] perf_fuzzer[7132]: segfault at 120 ip 000000000041efab
>sp 00000000ffe673b8 error 6 in perf_fuzzer[400000+d1000]
>[30173.455348] perf_fuzzer[7141]: segfault at 120 ip 000000000041efab
>sp 00000000ffda1948 error 6 in perf_fuzzer[400000+d1000]
>[30570.625642] perf_fuzzer[7156]: segfault at 1680 ip 000000000041efab
>sp 00000000ffaad028 error 6 in perf_fuzzer[400000+d1000]
>[31047.887784] perf_fuzzer[7169]: segfault at 60c0 ip 000000000041efab
>sp 00000000ffaa86e8 error 6 in perf_fuzzer[400000+d1000]
>[31300.168714] perf_fuzzer[7175]: segfault at 3a80 ip 000000000041efab
>sp 00000000ffd83228 error 6 in perf_fuzzer[400000+d1000]
>[31984.727278] perf_fuzzer[7193]: segfault at 7e0 ip 000000000041efab
>sp 00000000ff9db1f8 error 6 in perf_fuzzer[400000+d1000]
>
>Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-22  4:50     ` Vince Weaver
  2014-02-22  5:03       ` H. Peter Anvin
@ 2014-02-22  6:26       ` H. Peter Anvin
  2014-02-23  5:18         ` Vince Weaver
  1 sibling, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-22  6:26 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On 02/21/2014 08:50 PM, Vince Weaver wrote:
>
> So I changed the perf_fuzzer so when it randomly stomps all over the
> perf_event_mmap_page, it uses a constant value of 0xdeadbeef rather
> than a random value.
>
> The result is below.  The segfaults make a bit more sense now, it
> almost looks like what is happening is we are corrupting an address
> value somehow (head? tail?) and the kernel then uses the corrupt address
> and writes to memory outside of the mmap ring buffer.
>

That seems unlikely:

	handle->page = (offset >> page_shift) & (rb->nr_pages - 1);
	offset &= (1UL << page_shift) - 1;

The masking to the number of pages should make that not possible, even 
if a completely bogus value is written.

> I still haven't figured out how to trigger this exactly, but you can
> see when over-written with 0xdeadbeef the memory address written to is
> consistently some small multiple of 0x120.
>
> I imagine it would be a bad thing if it turned out to be possible to
> select what memory address got written to.  Although since I've
> only reproduced this on x32 maybe it won't be possible to over-write
> the kernel; but I have seen this bug cause a reboot when the
> wrong thing got over-written.
>
> [28002.850192] perf_fuzzer[7083]: segfault at 2be0 ip 000000000041efab sp 00000000ff826748 error 6 in perf_fuzzer[400000+d1000]
> [28639.769869] perf_fuzzer[7100]: segfault at 1320 ip 000000000041efab sp 00000000ffa65038 error 6 in perf_fuzzer[400000+d1000]
> [29396.986242] perf_fuzzer[7120]: segfault at 10e0 ip 000000000041efab sp 00000000ffd48e68 error 6 in perf_fuzzer[400000+d1000]
> [29738.892931] perf_fuzzer[7128]: segfault at 18c0 ip 000000000041efab sp 00000000ffcdcd88 error 6 in perf_fuzzer[400000+d1000]
> [29815.550210] perf_fuzzer[7132]: segfault at 120 ip 000000000041efab sp 00000000ffe673b8 error 6 in perf_fuzzer[400000+d1000]
> [30173.455348] perf_fuzzer[7141]: segfault at 120 ip 000000000041efab sp 00000000ffda1948 error 6 in perf_fuzzer[400000+d1000]
> [30570.625642] perf_fuzzer[7156]: segfault at 1680 ip 000000000041efab sp 00000000ffaad028 error 6 in perf_fuzzer[400000+d1000]
> [31047.887784] perf_fuzzer[7169]: segfault at 60c0 ip 000000000041efab sp 00000000ffaa86e8 error 6 in perf_fuzzer[400000+d1000]
> [31300.168714] perf_fuzzer[7175]: segfault at 3a80 ip 000000000041efab sp 00000000ffd83228 error 6 in perf_fuzzer[400000+d1000]
> [31984.727278] perf_fuzzer[7193]: segfault at 7e0 ip 000000000041efab sp 00000000ff9db1f8 error 6 in perf_fuzzer[400000+d1000]

Error 6 reflects a write in userspace to a not-present page.

Since your previous trace indicates that the value of the register in 
question is a different one, I'm guessing that what we have here is PEBS 
getting activated.  0x120 is 2*0x90, and 0x90 is the size of a 64-bit 
PEBS record.

	-hpa

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-22  6:26       ` H. Peter Anvin
@ 2014-02-23  5:18         ` Vince Weaver
  2014-02-23  5:24           ` H. Peter Anvin
  2014-02-23  6:07           ` H. Peter Anvin
  0 siblings, 2 replies; 115+ messages in thread
From: Vince Weaver @ 2014-02-23  5:18 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vince Weaver, Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On Fri, 21 Feb 2014, H. Peter Anvin wrote:

> Error 6 reflects a write in userspace to a not-present page.
> 
> Since your previous trace indicates that the value of the register in question
> is a different one, I'm guessing that what we have here is PEBS getting
> activated.  0x120 is 2*0x90, and 0x90 is the size of a 64-bit PEBS record.

I'm having problems generating a replayable syscall trace that exhibits 
the problem.

It turns out that the segfault address listed (the multiple of 0x120) 
happens to be the value in the RBP register at the time of the segfault.

That's odd, as the instruction is
	movdqa %xmm0,(%rdi)
and rdi is the valid mmap address of the perf ring buffer
	rdi            0xf7768000	4151738368

so I'm not sure why RBP is involved at all.

In all of the cases I've investigated the precise_ip value has been set 
for the problem event... but none of the events have been hardware events
(software and breakpoint so far).  So probably not PEBS related?

Vince


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-23  5:18         ` Vince Weaver
@ 2014-02-23  5:24           ` H. Peter Anvin
  2014-02-23  6:07           ` H. Peter Anvin
  1 sibling, 0 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-23  5:24 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

What is the instructions around it, by any chance?

On February 22, 2014 9:18:17 PM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>On Fri, 21 Feb 2014, H. Peter Anvin wrote:
>
>> Error 6 reflects a write in userspace to a not-present page.
>> 
>> Since your previous trace indicates that the value of the register in
>question
>> is a different one, I'm guessing that what we have here is PEBS
>getting
>> activated.  0x120 is 2*0x90, and 0x90 is the size of a 64-bit PEBS
>record.
>
>I'm having problems generating a replayable syscall trace that exhibits
>
>the problem.
>
>It turns out that the segfault address listed (the multiple of 0x120) 
>happens to be the value in the RBP register at the time of the
>segfault.
>
>That's odd, as the instruction is
>	movdqa %xmm0,(%rdi)
>and rdi is the valid mmap address of the perf ring buffer
>	rdi            0xf7768000	4151738368
>
>so I'm not sure why RBP is involved at all.
>
>In all of the cases I've investigated the precise_ip value has been set
>
>for the problem event... but none of the events have been hardware
>events
>(software and breakpoint so far).  So probably not PEBS related?
>
>Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-23  5:18         ` Vince Weaver
  2014-02-23  5:24           ` H. Peter Anvin
@ 2014-02-23  6:07           ` H. Peter Anvin
  2014-02-23 14:05             ` Vince Weaver
  1 sibling, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-23  6:07 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

I'd be interested in how rbp gets set, too.  It might just be a coincidence and the value in rbp has some other meaning here.

On February 22, 2014 9:18:17 PM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>On Fri, 21 Feb 2014, H. Peter Anvin wrote:
>
>> Error 6 reflects a write in userspace to a not-present page.
>> 
>> Since your previous trace indicates that the value of the register in
>question
>> is a different one, I'm guessing that what we have here is PEBS
>getting
>> activated.  0x120 is 2*0x90, and 0x90 is the size of a 64-bit PEBS
>record.
>
>I'm having problems generating a replayable syscall trace that exhibits
>
>the problem.
>
>It turns out that the segfault address listed (the multiple of 0x120) 
>happens to be the value in the RBP register at the time of the
>segfault.
>
>That's odd, as the instruction is
>	movdqa %xmm0,(%rdi)
>and rdi is the valid mmap address of the perf ring buffer
>	rdi            0xf7768000	4151738368
>
>so I'm not sure why RBP is involved at all.
>
>In all of the cases I've investigated the precise_ip value has been set
>
>for the problem event... but none of the events have been hardware
>events
>(software and breakpoint so far).  So probably not PEBS related?
>
>Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-23  6:07           ` H. Peter Anvin
@ 2014-02-23 14:05             ` Vince Weaver
  2014-02-24  3:02               ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-23 14:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vince Weaver, Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On Sat, 22 Feb 2014, H. Peter Anvin wrote:

> I'd be interested in how rbp gets set, too.  It might just be a 
> coincidence and the value in rbp has some other meaning here.

The code in question does this:

        i=find_random_active_event();
        if (i<0) return;
        if ((event_data[i].mmap)) {
                value=0xdeadbeef;
                memset(event_data[i].mmap,value,getpagesize());


  [New LWP 10526]
  Core was generated by `./perf_fuzzer -t OCIRMQWPpAi -r 1392938876'.
  Program terminated with signal 11, Segmentation fault.
  #0  0x0041efab in __memset_sse2 ()
  (gdb) bt
  #0  0x0041efab in __memset_sse2 ()
  #1  0x004017ec in trash_random_mmap () at perf_fuzzer.c:808
  #2  main (argc=<optimized out>, argv=<optimized out>) at perf_fuzzer.c:1604


So rbp is set by the imul below, it is the offset into the
event_data[i] array where the elements have size of 0x120

   0x004017bd <+3085>:  callq  0x402ee0 <find_random_active_event>
   0x004017c2 <+3090>:  test   %eax,%eax
   0x004017c4 <+3092>:  js     0x4011e8 <main+1592>
   0x004017ca <+3098>:  imul   $0x120,%eax,%ebp
   0x004017d0 <+3104>:  mov    0x756b2c(%ebp),%eax

   0x004017d7 <+3111>:  test   %eax,%eax
   0x004017d9 <+3113>:  je     0x40183b <main+3211>

   0x004017db <+3115>:  mov    0xc(%esp),%edx
   0x004017e0 <+3120>:  mov    %eax,%edi
   0x004017e2 <+3122>:  mov    $0xdeadbeef,%esi
   0x004017e7 <+3127>:  callq  0x400260
   0x004017ec <+3132>:  testb  $0x20,0x353e76(%rip)        # 0x755669 <logging+$

   400260:       ff 25 ce 0e 2d 00       jmpq   *0x2d0ece(%rip)        # 6d1134 $

   0x6d1134:       0x0041e710

  Dump of assembler code for function __memset_sse2:

   0x0041e710 <+0>:     cmp    $0x1,%rdx
   0x0041e714 <+4>:     mov    %rdi,%rax
   0x0041e717 <+7>:     jne    0x41e71d <__memset_sse2+13>
   0x0041e719 <+9>:     mov    %sil,(%rdi)

and as far as I can tell nothing touches rbp again until the segfault.
Nothing in _memset_sse2 does as far as I can tell.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-23 14:05             ` Vince Weaver
@ 2014-02-24  3:02               ` Vince Weaver
  2014-02-24  5:22                 ` H. Peter Anvin
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-24  3:02 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On Sun, 23 Feb 2014, Vince Weaver wrote:
> 
> and as far as I can tell nothing touches rbp again until the segfault.
> Nothing in _memset_sse2 does as far as I can tell.

I only know enough about ftrace to be dangerous, but here is what I think 
is the trace of the problem:

     perf_fuzzer-11492 [000] 197077.488363: function:             intel_get_event_constraints
     perf_fuzzer-11492 [000] 197077.488363: function:                intel_pebs_constraints
     perf_fuzzer-11492 [000] 197077.488365: function:             intel_put_event_constraints
     perf_fuzzer-11492 [000] 197077.488365: function:             intel_pmu_enable_all
     perf_fuzzer-11492 [000] 197077.488366: function:                intel_pmu_pebs_enable_all
     perf_fuzzer-11492 [000] 197077.488367: function:                intel_pmu_lbr_enable_all
     perf_fuzzer-11492 [000] 197077.488366: function:                intel_pmu_pebs_enable_all
     perf_fuzzer-11492 [000] 197077.488367: function:                intel_pmu_lbr_enable_all
     perf_fuzzer-11492 [000] 197077.488368: function:             mutex_unlock
     perf_fuzzer-11492 [000] 197077.488368: function:             mutex_lock
     perf_fuzzer-11492 [000] 197077.488369: function:                _cond_resched
     perf_fuzzer-11492 [000] 197077.488370: function:             _raw_spin_lock_irq
     perf_fuzzer-11492 [000] 197077.488370: function:             mutex_unlock
     perf_fuzzer-11492 [000] 197077.488371: function:             mutex_lock
     perf_fuzzer-11492 [000] 197077.488371: function:                _cond_resched
     perf_fuzzer-11492 [000] 197077.488372: function:             _raw_spin_lock_irq
     perf_fuzzer-11492 [000] 197077.488373: function:             mutex_unlock
     perf_fuzzer-11492 [000] 197077.488373: function:             mutex_lock
     perf_fuzzer-11492 [000] 197077.488374: function:                _cond_resched
     perf_fuzzer-11492 [000] 197077.488374: function:             _raw_spin_lock_irq
     perf_fuzzer-11492 [000] 197077.488375: function:             smp_call_function_single
     perf_fuzzer-11492 [000] 197077.488376: function:             _raw_spin_lock
     perf_fuzzer-11492 [000] 197077.488377: function:             mutex_unlock
     perf_fuzzer-11492 [000] 197077.488378: function:             mutex_lock
     perf_fuzzer-11492 [000] 197077.488378: function:                _cond_resched
     perf_fuzzer-11492 [000] 197077.488379: function:             _raw_spin_lock_irq
     perf_fuzzer-11492 [000] 197077.488380: function:             smp_call_function_single
     perf_fuzzer-11492 [000] 197077.488380: function:             _raw_spin_lock
     perf_fuzzer-11492 [000] 197077.488381: function:             mutex_unlock
     perf_fuzzer-11492 [000] 197077.488382: function:             mutex_unlock
     perf_fuzzer-11492 [000] 197077.488383: function:             syscall_trace_leave
     perf_fuzzer-11492 [000] 197077.488383: sys_exit:             NR 1073741981 = 0
     perf_fuzzer-11492 [000] 197077.488387: function:             do_device_not_available
     perf_fuzzer-11492 [000] 197077.488387: function:                math_state_restore
     perf_fuzzer-11492 [000] 197077.488390: function:             trace_do_page_fault
     perf_fuzzer-11492 [000] 197077.488391: page_fault_user:      address=__per_cpu_end ip=__per_cpu_end error_code=0x6
     perf_fuzzer-11492 [000] 197077.488395: function:             perf_callchain
     perf_fuzzer-11492 [000] 197077.488396: function:             copy_from_user_nmi
     perf_fuzzer-11492 [000] 197077.488397: function:             trace_do_page_fault
     perf_fuzzer-11492 [000] 197077.488398: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0
     perf_fuzzer-11492 [000] 197077.488399: function:                __do_page_fault
     perf_fuzzer-11492 [000] 197077.488400: function:                   bad_area_nosemaphore
     perf_fuzzer-11492 [000] 197077.488401: function:                      __bad_area_nosemaphore
     perf_fuzzer-11492 [000] 197077.488401: function:                         no_context
     perf_fuzzer-11492 [000] 197077.488402: function:                            fixup_exception
     perf_fuzzer-11492 [000] 197077.488403: function:                               search_exception_tables
     perf_fuzzer-11492 [000] 197077.488403: function:                                  search_extable
     perf_fuzzer-11492 [000] 197077.488405: function:             copy_user_handle_tail
     perf_fuzzer-11492 [000] 197077.488406: function:             trace_do_page_fault
     perf_fuzzer-11492 [000] 197077.488406: page_fault_kernel:    address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
     perf_fuzzer-11492 [000] 197077.488407: function:                __do_page_fault
     perf_fuzzer-11492 [000] 197077.488408: function:                   bad_area_nosemaphore
     perf_fuzzer-11492 [000] 197077.488409: function:                      __bad_area_nosemaphore
     perf_fuzzer-11492 [000] 197077.488409: function:                         no_context
     perf_fuzzer-11492 [000] 197077.488410: function:                            fixup_exception
     perf_fuzzer-11492 [000] 197077.488410: function:                               search_exception_tables
     perf_fuzzer-11492 [000] 197077.488411: function:                                  search_extable
     perf_fuzzer-11492 [000] 197077.488413: function:             perf_output_begin
     perf_fuzzer-11492 [000] 197077.488414: function:             perf_output_copy
     perf_fuzzer-11492 [000] 197077.488415: function:             perf_output_copy
     perf_fuzzer-11492 [000] 197077.488415: function:             perf_output_copy
     perf_fuzzer-11492 [000] 197077.488416: function:             perf_output_copy
     perf_fuzzer-11492 [000] 197077.488418: function:             perf_output_copy
     perf_fuzzer-11492 [000] 197077.488419: function:             perf_output_copy
     perf_fuzzer-11492 [000] 197077.488419: function:             perf_output_end
     perf_fuzzer-11492 [000] 197077.488420: function:                perf_output_put_handle
     perf_fuzzer-11492 [000] 197077.488421: function:             __do_page_fault
     perf_fuzzer-11492 [000] 197077.488422: function:                down_read_trylock
     perf_fuzzer-11492 [000] 197077.488423: function:                _cond_resched
     perf_fuzzer-11492 [000] 197077.488423: function:                find_vma
     perf_fuzzer-11492 [000] 197077.488424: function:                bad_area
     perf_fuzzer-11492 [000] 197077.488425: function:                   up_read
     perf_fuzzer-11492 [000] 197077.488426: function:                   __bad_area_nosemaphore
     perf_fuzzer-11492 [000] 197077.488426: function:                      is_prefetch
     perf_fuzzer-11492 [000] 197077.488427: function:                         convert_ip_to_linear
     perf_fuzzer-11492 [000] 197077.488428: function:                      unhandled_signal
     perf_fuzzer-11492 [000] 197077.488429: function:                      __printk_ratelimit
     perf_fuzzer-11492 [000] 197077.488430: function:             _raw_spin_trylock
     perf_fuzzer-11492 [000] 197077.488430: function:             _raw_spin_unlock_irqrestore
     perf_fuzzer-11492 [000] 197077.488431: function:                      printk
     perf_fuzzer-11492 [000] 197077.488432: function:                         vprintk_emit
     perf_fuzzer-11492 [000] 197077.488434: function:                            _raw_spin_lock
     perf_fuzzer-11492 [000] 197077.488443: function:                            cont_add
     perf_fuzzer-11492 [000] 197077.488444: function:                            console_trylock
     perf_fuzzer-11492 [000] 197077.488445: function:                               down_trylock
     perf_fuzzer-11492 [000] 197077.488445: function:                                  _raw_spin_lock_irqsave
     perf_fuzzer-11492 [000] 197077.488446: function:                                  _raw_spin_unlock_irqrestore
     perf_fuzzer-11492 [000] 197077.488447: function:                            console_unlock
     perf_fuzzer-11492 [000] 197077.488447: function:                               _raw_spin_lock_irqsave
     perf_fuzzer-11492 [000] 197077.488449: function:                               print_time
     perf_fuzzer-11492 [000] 197077.488452: function:                               T.950
     perf_fuzzer-11492 [000] 197077.488453: console:              [197179.420735] perf_fuzzer[11492]: segfault at 22e0 ip 000000000041efab sp 00000000ffda0938 error 6

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24  3:02               ` Vince Weaver
@ 2014-02-24  5:22                 ` H. Peter Anvin
  2014-02-24 15:35                   ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-24  5:22 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On 02/23/2014 07:02 PM, Vince Weaver wrote:
> On Sun, 23 Feb 2014, Vince Weaver wrote:
>>
>> and as far as I can tell nothing touches rbp again until the segfault.
>> Nothing in _memset_sse2 does as far as I can tell.
> 
> I only know enough about ftrace to be dangerous, but here is what I think 
> is the trace of the problem:
> 
>      perf_fuzzer-11492 [000] 197077.488420: function:                perf_output_put_handle
>      perf_fuzzer-11492 [000] 197077.488421: function:             __do_page_fault

So we do a write to the buffer rather immediately before this happens,
and in particular that will update the head:

	rb->user_page->data_head = head;

However, that doesn't explain what is going on and in particular the
write to whatever address was in %rbp.  The rest pretty much seems to be
the page fault logic.

Incidentally, I doubt that this is x32-related in any way; there seems
to be absolutely no difference between x86-64 perf and x32 perf; more
likely it just makes the error more reproducible because the address
space is so much smaller.

	-hpa



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24  5:22                 ` H. Peter Anvin
@ 2014-02-24 15:35                   ` Vince Weaver
  2014-02-24 16:34                     ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-24 15:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vince Weaver, Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On Sun, 23 Feb 2014, H. Peter Anvin wrote:

> So we do a write to the buffer rather immediately before this happens,
> and in particular that will update the head:
> 
> 	rb->user_page->data_head = head;
> 
> However, that doesn't explain what is going on and in particular the
> write to whatever address was in %rbp.  The rest pretty much seems to be
> the page fault logic.

It turns out you don't even have to over-write rb->user_page->data_head.
Just touching the mmap page with a write of a single byte (it doesn't 
matter where) is enough to trigger the bug.

This is a pain to track down, it would be easier if I could get a 
replayable syscall trace, but even though the segfault is very 
reproducible with my fuzzer, it's very sensitive to extra syscalls in the 
trace path and the fuzzer logger/replayer path has a different number of 
write syscalls and won't trigger the problem.

> Incidentally, I doubt that this is x32-related in any way; there seems
> to be absolutely no difference between x86-64 perf and x32 perf; more
> likely it just makes the error more reproducible because the address
> space is so much smaller.

quite possibly.  I only began chasing the problem because when compiled 
for x32 this bug apparently will reboot the machine now and then (not just 
segfault the program).  I never saw that failure mode with x86_64, but 
again maybe it's just easier to hit with the reduced address space as you 
say.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 15:35                   ` Vince Weaver
@ 2014-02-24 16:34                     ` Vince Weaver
  2014-02-24 16:47                       ` H. Peter Anvin
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-24 16:34 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On Mon, 24 Feb 2014, Vince Weaver wrote:

> Just touching the mmap page with a write of a single byte (it doesn't 
> matter where) is enough to trigger the bug.

OK, investigating this more.

     perf_fuzzer-2971  [000]   154.944114: page_fault_user:       address=0xf7729000 ip=0x41efab error_code=0x6
     perf_fuzzer-2971  [000]   154.944118: function:              ip=0xffffffff810d40e7 parent_ip=0xffffffff810d0840
     perf_fuzzer-2971  [000]   154.944119: function:              ip=0xffffffff812a91a5 parent_ip=0xffffffff81013ff5
     perf_fuzzer-2971  [000]   154.944120: function:              ip=0xffffffff8153837c parent_ip=0xffffffff81535432
     perf_fuzzer-2971  [000]   154.944121: page_fault_kernel:     address=0x22e0 ip=0xffffffff812a7d5c error_code=0x0

It looks like there are two page faults.  The first is caused by the user
code accessing the mmap'd page.  It looks sort of normal and what you'd
expect if the perf_event mmap ring buffer is being accessed for the first
time.

What follows is a kernel page fault, and this is the one where for 
whatever reason CR2 has obtained the value of the userspace RBP register.

Vince



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 16:34                     ` Vince Weaver
@ 2014-02-24 16:47                       ` H. Peter Anvin
  2014-02-24 17:10                         ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-24 16:47 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

Ok, so the obvious question is what is at that kernel address?

On February 24, 2014 8:34:30 AM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>On Mon, 24 Feb 2014, Vince Weaver wrote:
>
>> Just touching the mmap page with a write of a single byte (it doesn't
>
>> matter where) is enough to trigger the bug.
>
>OK, investigating this more.
>
>perf_fuzzer-2971  [000]   154.944114: page_fault_user:      
>address=0xf7729000 ip=0x41efab error_code=0x6
>perf_fuzzer-2971  [000]   154.944118: function:             
>ip=0xffffffff810d40e7 parent_ip=0xffffffff810d0840
>perf_fuzzer-2971  [000]   154.944119: function:             
>ip=0xffffffff812a91a5 parent_ip=0xffffffff81013ff5
>perf_fuzzer-2971  [000]   154.944120: function:             
>ip=0xffffffff8153837c parent_ip=0xffffffff81535432
>perf_fuzzer-2971  [000]   154.944121: page_fault_kernel:    
>address=0x22e0 ip=0xffffffff812a7d5c error_code=0x0
>
>It looks like there are two page faults.  The first is caused by the
>user
>code accessing the mmap'd page.  It looks sort of normal and what you'd
>expect if the perf_event mmap ring buffer is being accessed for the
>first
>time.
>
>What follows is a kernel page fault, and this is the one where for 
>whatever reason CR2 has obtained the value of the userspace RBP
>register.
>
>Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 16:47                       ` H. Peter Anvin
@ 2014-02-24 17:10                         ` Vince Weaver
  2014-02-24 17:25                           ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-24 17:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vince Weaver, Linux Kernel, Peter Zijlstra, Ingo Molnar, H.J. Lu

On Mon, 24 Feb 2014, H. Peter Anvin wrote:

> On February 24, 2014 8:34:30 AM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
> >On Mon, 24 Feb 2014, Vince Weaver wrote:
> >
> >> Just touching the mmap page with a write of a single byte (it doesn't
> >
> >> matter where) is enough to trigger the bug.
> >
> >OK, investigating this more.
> >
> >perf_fuzzer-2971  [000]   154.944114: page_fault_user:      
> >address=0xf7729000 ip=0x41efab error_code=0x6
> >perf_fuzzer-2971  [000]   154.944118: function:             
> >ip=0xffffffff810d40e7 parent_ip=0xffffffff810d0840
> >perf_fuzzer-2971  [000]   154.944119: function:             
> >ip=0xffffffff812a91a5 parent_ip=0xffffffff81013ff5
> >perf_fuzzer-2971  [000]   154.944120: function:             
> >ip=0xffffffff8153837c parent_ip=0xffffffff81535432
> >perf_fuzzer-2971  [000]   154.944121: page_fault_kernel:    
> >address=0x22e0 ip=0xffffffff812a7d5c error_code=0x0

> Ok, so the obvious question is what is at that kernel address?
> 

It's in copy_user_generic_string()
	rep movsq %ds:(%rsi),%es:(%rdi)

And looking at the ftrace:
     perf_fuzzer-2979  [000]   161.475920: page_fault_user:      address=__per_cpu_end ip=__per_cpu_end error_code=0x6
     perf_fuzzer-2979  [000]   161.475922: function:             perf_callchain
     perf_fuzzer-2979  [000]   161.475922: function:             copy_from_user_nmi
     perf_fuzzer-2979  [000]   161.475923: function:             trace_do_page_fault
     perf_fuzzer-2979  [000]   161.475924: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0

What is likely happening is the user page fault is triggering 
code to do a "perf_callchain" dump, which is calling copy_from_user_nmi()
which calls copy_user_generic_string() which is somehow getting the user
RBP in the RDI register somehow?

Vince


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:10                         ` Vince Weaver
@ 2014-02-24 17:25                           ` Peter Zijlstra
  2014-02-24 17:32                             ` Vince Weaver
  2014-02-24 17:52                             ` H. Peter Anvin
  0 siblings, 2 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-24 17:25 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Linux Kernel, Ingo Molnar, H.J. Lu, Steven Rostedt

On Mon, Feb 24, 2014 at 12:10:44PM -0500, Vince Weaver wrote:
> On Mon, 24 Feb 2014, H. Peter Anvin wrote:
> 
> > On February 24, 2014 8:34:30 AM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
> > >On Mon, 24 Feb 2014, Vince Weaver wrote:
> > >
> > >> Just touching the mmap page with a write of a single byte (it doesn't
> > >
> > >> matter where) is enough to trigger the bug.
> > >
> > >OK, investigating this more.
> > >
> > >perf_fuzzer-2971  [000]   154.944114: page_fault_user:      
> > >address=0xf7729000 ip=0x41efab error_code=0x6
> > >perf_fuzzer-2971  [000]   154.944118: function:             
> > >ip=0xffffffff810d40e7 parent_ip=0xffffffff810d0840
> > >perf_fuzzer-2971  [000]   154.944119: function:             
> > >ip=0xffffffff812a91a5 parent_ip=0xffffffff81013ff5
> > >perf_fuzzer-2971  [000]   154.944120: function:             
> > >ip=0xffffffff8153837c parent_ip=0xffffffff81535432
> > >perf_fuzzer-2971  [000]   154.944121: page_fault_kernel:    
> > >address=0x22e0 ip=0xffffffff812a7d5c error_code=0x0
> 
> > Ok, so the obvious question is what is at that kernel address?
> > 
> 
> It's in copy_user_generic_string()
> 	rep movsq %ds:(%rsi),%es:(%rdi)
> 
> And looking at the ftrace:
>      perf_fuzzer-2979  [000]   161.475920: page_fault_user:      address=__per_cpu_end ip=__per_cpu_end error_code=0x6
>      perf_fuzzer-2979  [000]   161.475922: function:             perf_callchain
>      perf_fuzzer-2979  [000]   161.475922: function:             copy_from_user_nmi
>      perf_fuzzer-2979  [000]   161.475923: function:             trace_do_page_fault
>      perf_fuzzer-2979  [000]   161.475924: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0
> 
> What is likely happening is the user page fault is triggering 
> code to do a "perf_callchain" dump, which is calling copy_from_user_nmi()
> which calls copy_user_generic_string() which is somehow getting the user
> RBP in the RDI register somehow?

So that code very much relies on the 'recursive' NMI/iret magic from
Steve, patch 3f3c8b8c4b2a3 (and assorted fixes later).

If CR2 is getting corrupted; 7fbb98c5cb075 seems relevant.

Peter, does x32 have a slightly different ABI/calling convention that
would make any of these patches just slightly 'off'?



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:25                           ` Peter Zijlstra
@ 2014-02-24 17:32                             ` Vince Weaver
  2014-02-24 17:40                               ` H. Peter Anvin
                                                 ` (2 more replies)
  2014-02-24 17:52                             ` H. Peter Anvin
  1 sibling, 3 replies; 115+ messages in thread
From: Vince Weaver @ 2014-02-24 17:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar, H.J. Lu,
	Steven Rostedt

On Mon, 24 Feb 2014, Peter Zijlstra wrote:

> On Mon, Feb 24, 2014 at 12:10:44PM -0500, Vince Weaver wrote:
> > On Mon, 24 Feb 2014, H. Peter Anvin wrote:
> > 
> > > On February 24, 2014 8:34:30 AM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
> > > >On Mon, 24 Feb 2014, Vince Weaver wrote:
> > > >
> > > >> Just touching the mmap page with a write of a single byte (it doesn't
> > > >
> > > >> matter where) is enough to trigger the bug.
> > > >
> > > >OK, investigating this more.
> > > >
> > > >perf_fuzzer-2971  [000]   154.944114: page_fault_user:      
> > > >address=0xf7729000 ip=0x41efab error_code=0x6
> > > >perf_fuzzer-2971  [000]   154.944118: function:             
> > > >ip=0xffffffff810d40e7 parent_ip=0xffffffff810d0840
> > > >perf_fuzzer-2971  [000]   154.944119: function:             
> > > >ip=0xffffffff812a91a5 parent_ip=0xffffffff81013ff5
> > > >perf_fuzzer-2971  [000]   154.944120: function:             
> > > >ip=0xffffffff8153837c parent_ip=0xffffffff81535432
> > > >perf_fuzzer-2971  [000]   154.944121: page_fault_kernel:    
> > > >address=0x22e0 ip=0xffffffff812a7d5c error_code=0x0
> > 
> > > Ok, so the obvious question is what is at that kernel address?
> > > 
> > 
> > It's in copy_user_generic_string()
> > 	rep movsq %ds:(%rsi),%es:(%rdi)
> > 
> > And looking at the ftrace:
> >      perf_fuzzer-2979  [000]   161.475920: page_fault_user:      address=__per_cpu_end ip=__per_cpu_end error_code=0x6
> >      perf_fuzzer-2979  [000]   161.475922: function:             perf_callchain
> >      perf_fuzzer-2979  [000]   161.475922: function:             copy_from_user_nmi
> >      perf_fuzzer-2979  [000]   161.475923: function:             trace_do_page_fault
> >      perf_fuzzer-2979  [000]   161.475924: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0
> > 
> > What is likely happening is the user page fault is triggering 
> > code to do a "perf_callchain" dump, which is calling copy_from_user_nmi()
> > which calls copy_user_generic_string() which is somehow getting the user
> > RBP in the RDI register somehow?
> 
> So that code very much relies on the 'recursive' NMI/iret magic from
> Steve, patch 3f3c8b8c4b2a3 (and assorted fixes later).
> 
> If CR2 is getting corrupted; 7fbb98c5cb075 seems relevant.
> 
> Peter, does x32 have a slightly different ABI/calling convention that
> would make any of these patches just slightly 'off'?

I do note that 
	perf_callchain_user();

Does
	fp = (void __user *)regs->bp;
	
	...

	bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));


And in my particular executable RBP has nothing to do with a frame 
pointer, but is instead being used as a general purpose register.

Am I missing something here?  Though in that case I'm not sure why this 
wouldn't be easier to trigger.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:32                             ` Vince Weaver
@ 2014-02-24 17:40                               ` H. Peter Anvin
  2014-02-24 18:00                                 ` Vince Weaver
  2014-02-24 17:40                               ` Peter Zijlstra
  2014-02-24 17:41                               ` Vince Weaver
  2 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-24 17:40 UTC (permalink / raw)
  To: Vince Weaver, Peter Zijlstra
  Cc: Linux Kernel, Ingo Molnar, H.J. Lu, Steven Rostedt

On 02/24/2014 09:32 AM, Vince Weaver wrote:
>>
>> Peter, does x32 have a slightly different ABI/calling convention that
>> would make any of these patches just slightly 'off'?
> 
> I do note that 
> 	perf_callchain_user();
> 
> Does
> 	fp = (void __user *)regs->bp;
> 	
> 	...
> 
> 	bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
> 
> 
> And in my particular executable RBP has nothing to do with a frame 
> pointer, but is instead being used as a general purpose register.
> 
> Am I missing something here?  Though in that case I'm not sure why this 
> wouldn't be easier to trigger.
> 

Neither x86-64 nor x32 are typically compiled with fixed frame pointers
(which would be %rbp if they are).  So I'm guessing the perf_callchain
logic is only applicable to a user-space binary explicitly compiled with
frame pointers turned on.

So copy_from_user_nmi() stumbles onto a nonexistent page and takes a
page fault.  This isn't a big deal, because perf_callchain_user() is set
up to handle that (and just terminates the trace), *except* now CR2 is
corrupt, and we took this event while handling a page fault already...
and apparently before we even did read_cr2() in __do_page_fault.

The description of copy_from_user_nmi() states:

/*
 * We rely on the nested NMI work to allow atomic faults from the NMI
path; the
 * nested NMI paths are careful to preserve CR2.
 */

... but that doesn't seem to happen here for whatever reason.

There is no hint in your trace what happens after the kernel page fault
so that makes it hard to know.

	-hpa


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:32                             ` Vince Weaver
  2014-02-24 17:40                               ` H. Peter Anvin
@ 2014-02-24 17:40                               ` Peter Zijlstra
  2014-02-24 17:41                               ` Vince Weaver
  2 siblings, 0 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-24 17:40 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Linux Kernel, Ingo Molnar, H.J. Lu, Steven Rostedt

On Mon, Feb 24, 2014 at 12:32:39PM -0500, Vince Weaver wrote:
> I do note that 
> 	perf_callchain_user();
> 
> Does
> 	fp = (void __user *)regs->bp;
> 	
> 	...
> 
> 	bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
> 
> 
> And in my particular executable RBP has nothing to do with a frame 
> pointer, but is instead being used as a general purpose register.
> 
> Am I missing something here?  Though in that case I'm not sure why this 
> wouldn't be easier to trigger.

Ah, in case the frame doesn't actually exist we would expect to fault
and get the fixup treatment, returning a short copy (the return value
being bytes _NOT_ copied).

When that happens;

  if (bytes != 0)
  	break;

At which point we'll terminate the stack frame iteration.

This is where we rely on being able to take a fault from NMI context,
the fault iret will re-enable NMIs, necessitating all the magic Steve
did.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:32                             ` Vince Weaver
  2014-02-24 17:40                               ` H. Peter Anvin
  2014-02-24 17:40                               ` Peter Zijlstra
@ 2014-02-24 17:41                               ` Vince Weaver
  2014-02-24 17:42                                 ` H. Peter Anvin
  2 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-24 17:41 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, H. Peter Anvin, Linux Kernel, Ingo Molnar,
	H.J. Lu, Steven Rostedt

On Mon, 24 Feb 2014, Vince Weaver wrote:

> I do note that 
> 	perf_callchain_user();
> 
> Does
> 	fp = (void __user *)regs->bp;
> 	
> 	...
> 
> 	bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
> 
> 
> And in my particular executable RBP has nothing to do with a frame 
> pointer, but is instead being used as a general purpose register.

and as a reminder, I'm seeing this on an x32 executable, so 
perf_callchain_user32() is probably coming into play.

So maybe it is an x32 issue after all.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:41                               ` Vince Weaver
@ 2014-02-24 17:42                                 ` H. Peter Anvin
  0 siblings, 0 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-24 17:42 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Linux Kernel, Ingo Molnar, H.J. Lu, Steven Rostedt

On 02/24/2014 09:41 AM, Vince Weaver wrote:
> On Mon, 24 Feb 2014, Vince Weaver wrote:
> 
>> I do note that 
>> 	perf_callchain_user();
>>
>> Does
>> 	fp = (void __user *)regs->bp;
>> 	
>> 	...
>>
>> 	bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
>>
>>
>> And in my particular executable RBP has nothing to do with a frame 
>> pointer, but is instead being used as a general purpose register.
> 
> and as a reminder, I'm seeing this on an x32 executable, so 
> perf_callchain_user32() is probably coming into play.
> 
> So maybe it is an x32 issue after all.
> 

No.

	if (!test_thread_flag(TIF_IA32))
		return 0;

TIF_IA32 is clear for an x32 process.

	-hpa


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:25                           ` Peter Zijlstra
  2014-02-24 17:32                             ` Vince Weaver
@ 2014-02-24 17:52                             ` H. Peter Anvin
  1 sibling, 0 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-24 17:52 UTC (permalink / raw)
  To: Peter Zijlstra, Vince Weaver
  Cc: Linux Kernel, Ingo Molnar, H.J. Lu, Steven Rostedt

On 02/24/2014 09:25 AM, Peter Zijlstra wrote:
>>
>> What is likely happening is the user page fault is triggering 
>> code to do a "perf_callchain" dump, which is calling copy_from_user_nmi()
>> which calls copy_user_generic_string() which is somehow getting the user
>> RBP in the RDI register somehow?
> 
> So that code very much relies on the 'recursive' NMI/iret magic from
> Steve, patch 3f3c8b8c4b2a3 (and assorted fixes later).
> 
> If CR2 is getting corrupted; 7fbb98c5cb075 seems relevant.
> 
> Peter, does x32 have a slightly different ABI/calling convention that
> would make any of these patches just slightly 'off'?
> 

As long as we're talking kernel code, x32 isn't even involved (we do not
support compiling the kernel as x32 and most likely never will.)

	-hpa



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 17:40                               ` H. Peter Anvin
@ 2014-02-24 18:00                                 ` Vince Weaver
  2014-02-24 18:07                                   ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-24 18:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar, H.J. Lu,
	Steven Rostedt

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1855 bytes --]

On Mon, 24 Feb 2014, H. Peter Anvin wrote:

> On 02/24/2014 09:32 AM, Vince Weaver wrote:
> >>
> >> Peter, does x32 have a slightly different ABI/calling convention that
> >> would make any of these patches just slightly 'off'?
> > 
> > I do note that 
> > 	perf_callchain_user();
> > 
> > Does
> > 	fp = (void __user *)regs->bp;
> > 	
> > 	...
> > 
> > 	bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
> > 
> > 
> > And in my particular executable RBP has nothing to do with a frame 
> > pointer, but is instead being used as a general purpose register.
> > 
> > Am I missing something here?  Though in that case I'm not sure why this 
> > wouldn't be easier to trigger.
> > 
> 
> Neither x86-64 nor x32 are typically compiled with fixed frame pointers
> (which would be %rbp if they are).  So I'm guessing the perf_callchain
> logic is only applicable to a user-space binary explicitly compiled with
> frame pointers turned on.
> 
> So copy_from_user_nmi() stumbles onto a nonexistent page and takes a
> page fault.  This isn't a big deal, because perf_callchain_user() is set
> up to handle that (and just terminates the trace), *except* now CR2 is
> corrupt, and we took this event while handling a page fault already...
> and apparently before we even did read_cr2() in __do_page_fault.
> 
> The description of copy_from_user_nmi() states:
> 
> /*
>  * We rely on the nested NMI work to allow atomic faults from the NMI
> path; the
>  * nested NMI paths are careful to preserve CR2.
>  */
> 
> ... but that doesn't seem to happen here for whatever reason.
> 
> There is no hint in your trace what happens after the kernel page fault
> so that makes it hard to know.

Ahh, ftrace, the cause of and solution to all my perf_fuzzing problems.

Anyway I've attached the full tail end of the trace if you want to see 
everything that happens.

Vince

[-- Attachment #2: Type: APPLICATION/octet-stream, Size: 61434 bytes --]

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 18:00                                 ` Vince Weaver
@ 2014-02-24 18:07                                   ` Vince Weaver
  2014-02-24 18:34                                     ` H. Peter Anvin
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-24 18:07 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar,
	H.J. Lu, Steven Rostedt

On Mon, 24 Feb 2014, Vince Weaver wrote:

> On Mon, 24 Feb 2014, H. Peter Anvin wrote:
> 
> > On 02/24/2014 09:32 AM, Vince Weaver wrote:
> > >>
> > >> Peter, does x32 have a slightly different ABI/calling convention that
> > >> would make any of these patches just slightly 'off'?
> > > 
> > > I do note that 
> > > 	perf_callchain_user();
> > > 
> > > Does
> > > 	fp = (void __user *)regs->bp;
> > > 	
> > > 	...
> > > 
> > > 	bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
> > > 
> > > 
> > > And in my particular executable RBP has nothing to do with a frame 
> > > pointer, but is instead being used as a general purpose register.
> > > 
> > > Am I missing something here?  Though in that case I'm not sure why this 
> > > wouldn't be easier to trigger.
> > > 
> > 
> > Neither x86-64 nor x32 are typically compiled with fixed frame pointers
> > (which would be %rbp if they are).  So I'm guessing the perf_callchain
> > logic is only applicable to a user-space binary explicitly compiled with
> > frame pointers turned on.
> > 
> > So copy_from_user_nmi() stumbles onto a nonexistent page and takes a
> > page fault.  This isn't a big deal, because perf_callchain_user() is set
> > up to handle that (and just terminates the trace), *except* now CR2 is
> > corrupt, and we took this event while handling a page fault already...
> > and apparently before we even did read_cr2() in __do_page_fault.
> > 
> > The description of copy_from_user_nmi() states:
> > 
> > /*
> >  * We rely on the nested NMI work to allow atomic faults from the NMI
> > path; the
> >  * nested NMI paths are careful to preserve CR2.
> >  */
> > 
> > ... but that doesn't seem to happen here for whatever reason.
> > 
> > There is no hint in your trace what happens after the kernel page fault
> > so that makes it hard to know.
> 
> Ahh, ftrace, the cause of and solution to all my perf_fuzzing problems.
> 
> Anyway I've attached the full tail end of the trace if you want to see 
> everything that happens.

and then I note there are *two* kernel page faults.

     perf_fuzzer-2979  [000]   161.475924: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0
address=0x1 ip=0xffffffff812a7d9c error_code=0x0
     perf_fuzzer-2979  [000]   161.475924: function:                __do_page_fault
     perf_fuzzer-2979  [000]   161.475924: function:                   bad_area_nosemaphore
     perf_fuzzer-2979  [000]   161.475925: function:                      __bad_area_nosemaphore
     perf_fuzzer-2979  [000]   161.475925: function:                         no_context
     perf_fuzzer-2979  [000]   161.475925: function:                            fixup_exception
     perf_fuzzer-2979  [000]   161.475926: function:                               search_exception_tables
     perf_fuzzer-2979  [000]   161.475926: function:                                  search_extable
     perf_fuzzer-2979  [000]   161.475927: function:             copy_user_handle_tail
     perf_fuzzer-2979  [000]   161.475927: function:             trace_do_page_fault
     perf_fuzzer-2979  [000]   161.475928: page_fault_kernel:    address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
address=0x1 ip=0xffffffff812a92bb error_code=0x0
     perf_fuzzer-2979  [000]   161.475928: function:                __do_page_fault
     perf_fuzzer-2979  [000]   161.475928: function:                   bad_area_nosemaphore
     perf_fuzzer-2979  [000]   161.475929: function:                      __bad_area_nosemaphore
     perf_fuzzer-2979  [000]   161.475929: function:                         no_context
     perf_fuzzer-2979  [000]   161.475929: function:                            fixup_exception
     perf_fuzzer-2979  [000]   161.475929: function:                               search_exception_tables
     perf_fuzzer-2979  [000]   161.475930: function:                                  search_extable
     perf_fuzzer-2979  [000]   161.475931: function:             perf_output_begin
     perf_fuzzer-2979  [000]   161.475931: function:             perf_output_copy

That second one is in copy_user_handle_tail()

Sorry for the sloppy analysis here, I did most of the initial tracing last 
night at 1am typing one-handed with a sick crying baby draped over one
shoulder, so not really operating at my best.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 18:07                                   ` Vince Weaver
@ 2014-02-24 18:34                                     ` H. Peter Anvin
  2014-02-24 19:13                                       ` Steven Rostedt
  0 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-24 18:34 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Linux Kernel, Ingo Molnar, H.J. Lu, Steven Rostedt

On 02/24/2014 10:07 AM, Vince Weaver wrote:
>>
>> Anyway I've attached the full tail end of the trace if you want to see 
>> everything that happens.
> 
> and then I note there are *two* kernel page faults.
> 
>      perf_fuzzer-2979  [000]   161.475924: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0
> address=0x1 ip=0xffffffff812a7d9c error_code=0x0
>      perf_fuzzer-2979  [000]   161.475924: function:                __do_page_fault
>      perf_fuzzer-2979  [000]   161.475924: function:                   bad_area_nosemaphore
>      perf_fuzzer-2979  [000]   161.475925: function:                      __bad_area_nosemaphore
>      perf_fuzzer-2979  [000]   161.475925: function:                         no_context
>      perf_fuzzer-2979  [000]   161.475925: function:                            fixup_exception
>      perf_fuzzer-2979  [000]   161.475926: function:                               search_exception_tables
>      perf_fuzzer-2979  [000]   161.475926: function:                                  search_extable
>      perf_fuzzer-2979  [000]   161.475927: function:             copy_user_handle_tail
>      perf_fuzzer-2979  [000]   161.475927: function:             trace_do_page_fault
>      perf_fuzzer-2979  [000]   161.475928: page_fault_kernel:    address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
> address=0x1 ip=0xffffffff812a92bb error_code=0x0
>      perf_fuzzer-2979  [000]   161.475928: function:                __do_page_fault
>      perf_fuzzer-2979  [000]   161.475928: function:                   bad_area_nosemaphore
>      perf_fuzzer-2979  [000]   161.475929: function:                      __bad_area_nosemaphore
>      perf_fuzzer-2979  [000]   161.475929: function:                         no_context
>      perf_fuzzer-2979  [000]   161.475929: function:                            fixup_exception
>      perf_fuzzer-2979  [000]   161.475929: function:                               search_exception_tables
>      perf_fuzzer-2979  [000]   161.475930: function:                                  search_extable
>      perf_fuzzer-2979  [000]   161.475931: function:             perf_output_begin
>      perf_fuzzer-2979  [000]   161.475931: function:             perf_output_copy
> 
> That second one is in copy_user_handle_tail()
> 

Either way, it really seems like we have a case of CR2 leakage out of
the NMI context.

	-hpa


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 18:34                                     ` H. Peter Anvin
@ 2014-02-24 19:13                                       ` Steven Rostedt
  2014-02-24 19:15                                         ` H. Peter Anvin
  2014-02-24 19:30                                         ` Peter Zijlstra
  0 siblings, 2 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-24 19:13 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar, H.J. Lu

On Mon, 24 Feb 2014 10:34:13 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 02/24/2014 10:07 AM, Vince Weaver wrote:
> >>
> >> Anyway I've attached the full tail end of the trace if you want to see 
> >> everything that happens.
> > 
> > and then I note there are *two* kernel page faults.
> > 
> >      perf_fuzzer-2979  [000]   161.475924: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0
> > address=0x1 ip=0xffffffff812a7d9c error_code=0x0
> >      perf_fuzzer-2979  [000]   161.475924: function:                __do_page_fault
> >      perf_fuzzer-2979  [000]   161.475924: function:                   bad_area_nosemaphore
> >      perf_fuzzer-2979  [000]   161.475925: function:                      __bad_area_nosemaphore
> >      perf_fuzzer-2979  [000]   161.475925: function:                         no_context
> >      perf_fuzzer-2979  [000]   161.475925: function:                            fixup_exception
> >      perf_fuzzer-2979  [000]   161.475926: function:                               search_exception_tables
> >      perf_fuzzer-2979  [000]   161.475926: function:                                  search_extable
> >      perf_fuzzer-2979  [000]   161.475927: function:             copy_user_handle_tail
> >      perf_fuzzer-2979  [000]   161.475927: function:             trace_do_page_fault
> >      perf_fuzzer-2979  [000]   161.475928: page_fault_kernel:    address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
> > address=0x1 ip=0xffffffff812a92bb error_code=0x0
> >      perf_fuzzer-2979  [000]   161.475928: function:                __do_page_fault
> >      perf_fuzzer-2979  [000]   161.475928: function:                   bad_area_nosemaphore
> >      perf_fuzzer-2979  [000]   161.475929: function:                      __bad_area_nosemaphore
> >      perf_fuzzer-2979  [000]   161.475929: function:                         no_context
> >      perf_fuzzer-2979  [000]   161.475929: function:                            fixup_exception
> >      perf_fuzzer-2979  [000]   161.475929: function:                               search_exception_tables
> >      perf_fuzzer-2979  [000]   161.475930: function:                                  search_extable
> >      perf_fuzzer-2979  [000]   161.475931: function:             perf_output_begin
> >      perf_fuzzer-2979  [000]   161.475931: function:             perf_output_copy
> > 
> > That second one is in copy_user_handle_tail()
> > 
> 
> Either way, it really seems like we have a case of CR2 leakage out of
> the NMI context.

Ah, and x86_64 saves off the cr2 register when entering NMI and restores
it before returning. But it seems to be missing from the i386 code.

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 19:13                                       ` Steven Rostedt
@ 2014-02-24 19:15                                         ` H. Peter Anvin
  2014-02-24 19:30                                         ` Peter Zijlstra
  1 sibling, 0 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-24 19:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar, H.J. Lu

On 02/24/2014 11:13 AM, Steven Rostedt wrote:
>>
>> Either way, it really seems like we have a case of CR2 leakage out of
>> the NMI context.
> 
> Ah, and x86_64 saves off the cr2 register when entering NMI and restores
> it before returning. But it seems to be missing from the i386 code.
> 

OK, that might be a problem, but this is the 64-bit kernel.

	-hpa



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 19:13                                       ` Steven Rostedt
  2014-02-24 19:15                                         ` H. Peter Anvin
@ 2014-02-24 19:30                                         ` Peter Zijlstra
  2014-02-24 19:32                                           ` Steven Rostedt
  2014-02-25  3:49                                           ` H. Peter Anvin
  1 sibling, 2 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-24 19:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar, H.J. Lu

On Mon, Feb 24, 2014 at 02:13:29PM -0500, Steven Rostedt wrote:
> Ah, and x86_64 saves off the cr2 register when entering NMI and restores
> it before returning. But it seems to be missing from the i386 code.

arch/x86/kernel/nmi.c:

#define nmi_nesting_preprocess(regs)					\
	do {								\
		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
			this_cpu_write(nmi_state, NMI_LATCHED);		\
			return;						\
		}							\
		this_cpu_write(nmi_state, NMI_EXECUTING);		\
		this_cpu_write(nmi_cr2, read_cr2());			\
	} while (0);							\
	nmi_restart:

#define nmi_nesting_postprocess()					\
	do {								\
		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
			write_cr2(this_cpu_read(nmi_cr2));		\
		if (this_cpu_dec_return(nmi_state))			\
			goto nmi_restart;				\
	} while (0)

That very much looks like saving/restoring CR2 to me.

FWIW; I hate how the x86_64 and i386 versions of this NMI nesting magic
are so completely different.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 19:30                                         ` Peter Zijlstra
@ 2014-02-24 19:32                                           ` Steven Rostedt
  2014-02-25  3:49                                           ` H. Peter Anvin
  1 sibling, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-24 19:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar, H.J. Lu

On Mon, 24 Feb 2014 20:30:43 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 24, 2014 at 02:13:29PM -0500, Steven Rostedt wrote:
> > Ah, and x86_64 saves off the cr2 register when entering NMI and restores
> > it before returning. But it seems to be missing from the i386 code.
> 
> arch/x86/kernel/nmi.c:
> 
> #define nmi_nesting_preprocess(regs)					\
> 	do {								\
> 		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
> 			this_cpu_write(nmi_state, NMI_LATCHED);		\
> 			return;						\
> 		}							\
> 		this_cpu_write(nmi_state, NMI_EXECUTING);		\
> 		this_cpu_write(nmi_cr2, read_cr2());			\
> 	} while (0);							\
> 	nmi_restart:
> 
> #define nmi_nesting_postprocess()					\
> 	do {								\
> 		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
> 			write_cr2(this_cpu_read(nmi_cr2));		\
> 		if (this_cpu_dec_return(nmi_state))			\
> 			goto nmi_restart;				\
> 	} while (0)
> 
> That very much looks like saving/restoring CR2 to me.
> 

Ah, I forgot to look in the C file.

> FWIW; I hate how the x86_64 and i386 versions of this NMI nesting magic
> are so completely different.

Yeah, it sucks. But that's because the architecture on handling NMIs is
completely different. x86_64 swaps the stack automatically, where as
i386 uses the same stack if you are already in the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-24 19:30                                         ` Peter Zijlstra
  2014-02-24 19:32                                           ` Steven Rostedt
@ 2014-02-25  3:49                                           ` H. Peter Anvin
  2014-02-25 14:07                                             ` Vince Weaver
  1 sibling, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-25  3:49 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: Vince Weaver, Linux Kernel, Ingo Molnar, H.J. Lu

On 02/24/2014 11:30 AM, Peter Zijlstra wrote:
> On Mon, Feb 24, 2014 at 02:13:29PM -0500, Steven Rostedt wrote:
>> Ah, and x86_64 saves off the cr2 register when entering NMI and restores
>> it before returning. But it seems to be missing from the i386 code.
> 
> arch/x86/kernel/nmi.c:
> 
> #define nmi_nesting_preprocess(regs)					\
> 	do {								\
> 		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
> 			this_cpu_write(nmi_state, NMI_LATCHED);		\
> 			return;						\
> 		}							\
> 		this_cpu_write(nmi_state, NMI_EXECUTING);		\
> 		this_cpu_write(nmi_cr2, read_cr2());			\
> 	} while (0);							\
> 	nmi_restart:
> 
> #define nmi_nesting_postprocess()					\
> 	do {								\
> 		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
> 			write_cr2(this_cpu_read(nmi_cr2));		\
> 		if (this_cpu_dec_return(nmi_state))			\
> 			goto nmi_restart;				\
> 	} while (0)
> 
> That very much looks like saving/restoring CR2 to me.
> 
> FWIW; I hate how the x86_64 and i386 versions of this NMI nesting magic
> are so completely different.

Is there any way that nmi_cr2 can end up getting overwritten by multiple
nestings of some kind?  I would have thought it would have made more
sense to spill cr2 onto the stack after the stack has been properly set up.

	-hpa



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-25  3:49                                           ` H. Peter Anvin
@ 2014-02-25 14:07                                             ` Vince Weaver
  2014-02-25 14:34                                               ` H. Peter Anvin
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-25 14:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Steven Rostedt, Vince Weaver, Linux Kernel, Ingo Molnar

On Mon, 24 Feb 2014, H. Peter Anvin wrote:

> On 02/24/2014 11:30 AM, Peter Zijlstra wrote:
> > On Mon, Feb 24, 2014 at 02:13:29PM -0500, Steven Rostedt wrote:
> >> Ah, and x86_64 saves off the cr2 register when entering NMI and restores
> >> it before returning. But it seems to be missing from the i386 code.
> > 
> > arch/x86/kernel/nmi.c:
> > 
> > #define nmi_nesting_preprocess(regs)					\
> > 	do {								\
> > 		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
> > 			this_cpu_write(nmi_state, NMI_LATCHED);		\
> > 			return;						\
> > 		}							\
> > 		this_cpu_write(nmi_state, NMI_EXECUTING);		\
> > 		this_cpu_write(nmi_cr2, read_cr2());			\
> > 	} while (0);							\
> > 	nmi_restart:
> > 
> > #define nmi_nesting_postprocess()					\
> > 	do {								\
> > 		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
> > 			write_cr2(this_cpu_read(nmi_cr2));		\
> > 		if (this_cpu_dec_return(nmi_state))			\
> > 			goto nmi_restart;				\
> > 	} while (0)
> > 
> > That very much looks like saving/restoring CR2 to me.
> > 
> > FWIW; I hate how the x86_64 and i386 versions of this NMI nesting magic
> > are so completely different.
> 
> Is there any way that nmi_cr2 can end up getting overwritten by multiple
> nestings of some kind?  I would have thought it would have made more
> sense to spill cr2 onto the stack after the stack has been properly set up.

So how can I help with debugging this?

While the missing cr2 issue made debugging frustrating, I find the other 
aspects of the bug more serious:

  1.  Programs that are doing valid memory accesses can segfault
and worse
  2.  This bug can cause an instant-reboot of the system (I assume somehow
      with the right combination of memory accesses  it causes a 
      triple-fault?)

#2 is why I spent all of this time tracking this down, I couldn't leave a 
machine fuzzing overnight without the machine rebooting.

Vince


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-25 14:07                                             ` Vince Weaver
@ 2014-02-25 14:34                                               ` H. Peter Anvin
  2014-02-25 14:43                                                 ` Steven Rostedt
  0 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-25 14:34 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Steven Rostedt, Linux Kernel, Ingo Molnar

#2 is what I really don't understand.

I worry something else is going on there

On February 25, 2014 6:07:51 AM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>On Mon, 24 Feb 2014, H. Peter Anvin wrote:
>
>> On 02/24/2014 11:30 AM, Peter Zijlstra wrote:
>> > On Mon, Feb 24, 2014 at 02:13:29PM -0500, Steven Rostedt wrote:
>> >> Ah, and x86_64 saves off the cr2 register when entering NMI and
>restores
>> >> it before returning. But it seems to be missing from the i386
>code.
>> > 
>> > arch/x86/kernel/nmi.c:
>> > 
>> > #define nmi_nesting_preprocess(regs)					\
>> > 	do {								\
>> > 		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
>> > 			this_cpu_write(nmi_state, NMI_LATCHED);		\
>> > 			return;						\
>> > 		}							\
>> > 		this_cpu_write(nmi_state, NMI_EXECUTING);		\
>> > 		this_cpu_write(nmi_cr2, read_cr2());			\
>> > 	} while (0);							\
>> > 	nmi_restart:
>> > 
>> > #define nmi_nesting_postprocess()					\
>> > 	do {								\
>> > 		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
>> > 			write_cr2(this_cpu_read(nmi_cr2));		\
>> > 		if (this_cpu_dec_return(nmi_state))			\
>> > 			goto nmi_restart;				\
>> > 	} while (0)
>> > 
>> > That very much looks like saving/restoring CR2 to me.
>> > 
>> > FWIW; I hate how the x86_64 and i386 versions of this NMI nesting
>magic
>> > are so completely different.
>> 
>> Is there any way that nmi_cr2 can end up getting overwritten by
>multiple
>> nestings of some kind?  I would have thought it would have made more
>> sense to spill cr2 onto the stack after the stack has been properly
>set up.
>
>So how can I help with debugging this?
>
>While the missing cr2 issue made debugging frustrating, I find the
>other 
>aspects of the bug more serious:
>
>  1.  Programs that are doing valid memory accesses can segfault
>and worse
>2.  This bug can cause an instant-reboot of the system (I assume
>somehow
>      with the right combination of memory accesses  it causes a 
>      triple-fault?)
>
>#2 is why I spent all of this time tracking this down, I couldn't leave
>a 
>machine fuzzing overnight without the machine rebooting.
>
>Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-25 14:34                                               ` H. Peter Anvin
@ 2014-02-25 14:43                                                 ` Steven Rostedt
  2014-02-25 15:33                                                   ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-25 14:43 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Tue, 25 Feb 2014 06:34:55 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> #2 is what I really don't understand.
> 
> I worry something else is going on there

Yeah, me too.

-- Steve

> 
> >
> >While the missing cr2 issue made debugging frustrating, I find the
> >other 
> >aspects of the bug more serious:
> >
> >  1.  Programs that are doing valid memory accesses can segfault
> >and worse
> >2.  This bug can cause an instant-reboot of the system (I assume
> >somehow
> >      with the right combination of memory accesses  it causes a 
> >      triple-fault?)
> >
> >#2 is why I spent all of this time tracking this down, I couldn't leave
> >a 
> >machine fuzzing overnight without the machine rebooting.
> >
> >Vince
> 


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-25 14:43                                                 ` Steven Rostedt
@ 2014-02-25 15:33                                                   ` Vince Weaver
  2014-02-26 15:06                                                     ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-25 15:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Tue, 25 Feb 2014, Steven Rostedt wrote:

> On Tue, 25 Feb 2014 06:34:55 -0800
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
> > #2 is what I really don't understand.
> > 
> > I worry something else is going on there
> 
> Yeah, me too.
> 

OK, well I'll work on isolating that next, I was hoping the segfault issue 
was related.  It's a lot harder getting traces out of a machine that 
insta-reboots so fast that it can't even finish printing the oops message.

I need my testing machine for some unrelated experiments so I might not 
get a chance to work on this further for a day or two.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-25 15:33                                                   ` Vince Weaver
@ 2014-02-26 15:06                                                     ` Vince Weaver
  2014-02-27 22:06                                                       ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-26 15:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Steven Rostedt, H. Peter Anvin, Peter Zijlstra, Linux Kernel,
	Ingo Molnar

On Tue, 25 Feb 2014, Vince Weaver wrote:

> On Tue, 25 Feb 2014, Steven Rostedt wrote:
> 
> > On Tue, 25 Feb 2014 06:34:55 -0800
> > "H. Peter Anvin" <hpa@zytor.com> wrote:
> > 
> > > #2 is what I really don't understand.
> > > 
> > > I worry something else is going on there
> > 
> > Yeah, me too.
> > 
> 
> OK, well I'll work on isolating that next, I was hoping the segfault issue 
> was related.  It's a lot harder getting traces out of a machine that 
> insta-reboots so fast that it can't even finish printing the oops message.

OK, dug up the random seeds that are causing the system to reboot, and it 
turns out that it's hard to reproduce the actual reboot, but in the cases 
where it doesn't reboot it segfaults in the same code path that was 
causing the other segfaults in this thread.  So I think it's all the same 
bug, although I can investigate further if needed.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-26 15:06                                                     ` Vince Weaver
@ 2014-02-27 22:06                                                       ` Vince Weaver
  2014-02-27 22:31                                                         ` Steven Rostedt
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-27 22:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Steven Rostedt, H. Peter Anvin, Peter Zijlstra, Linux Kernel,
	Ingo Molnar


I spent some more time on this.
I managed to get a trace that exhibited the bug practically right 
away, but still unable to generate a reproducible trace :(  

So instead I'm adding WARN's and trace_printks to see what I can find out.

Here's a summary of what I think is happneing.  Please let me know if I'm 
wildly wrong in analyzing this.



Userspace accesses the perf 
ring-buffer-user-page of an event for the first time (in this trace
it's a SW_CPU_CLOCK event).

This triggers a page_fault to bring in the page.

This triggers a TRACEPOINT event (task/task_newtask) which
has PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION;
and exclude_callchain_kernel=1, triggering a perf_callchain().

Here is a dump of the stack right after we enter perf_callchain()

vince@core2:~$ [  202.320444] ------------[ cut here ]------------
[  202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 perf_callchain+0x67/0x211()
hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common
[  202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24
[  202.324001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[  202.324001]  00000000000000a8 ffff880119ec1b48 ffffffff81531dc1 00000000000000a8
[  202.324001]  0000000000000000 ffff880119ec1b88 ffffffff81040ce4 ffff880119ec1b88
[  202.324001]  ffffffff810d416f ffff880119ec1e38 ffff8800cba67800 0000000000000000
[  202.324001] Call Trace:
[  202.324001]  [<ffffffff81531dc1>] dump_stack+0x49/0x60
[  202.324001]  [<ffffffff81040ce4>] warn_slowpath_common+0x81/0x9b
[  202.324001]  [<ffffffff810d416f>] ? perf_callchain+0x67/0x211
[  202.324001]  [<ffffffff81040d18>] warn_slowpath_null+0x1a/0x1c
[  202.324001]  [<ffffffff810d416f>] perf_callchain+0x67/0x211
[  202.324001]  [<ffffffff8106b0ab>] ? local_clock+0x1b/0x24
[  202.324001]  [<ffffffff810d0872>] perf_prepare_sample+0x7b/0x304
[  202.324001]  [<ffffffff810d1079>] __perf_event_overflow+0x156/0x1c1
[  202.324001]  [<ffffffff810f4613>] ? free_pgtables+0xa7/0xc9
[  202.324001]  [<ffffffff810d125e>] perf_swevent_overflow+0x41/0x5b
[  202.324001]  [<ffffffff810d12ea>] perf_swevent_event+0x72/0x74
[  202.324001]  [<ffffffff810d1478>] perf_tp_event+0xea/0x1ef
[  202.324001]  [<ffffffff8103515a>] ? perf_trace_x86_exceptions+0x4c/0xba
[  202.324001]  [<ffffffff810351b6>] perf_trace_x86_exceptions+0xa8/0xba
[  202.324001]  [<ffffffff8103515a>] ? perf_trace_x86_exceptions+0x4c/0xba
[  202.324001]  [<ffffffff81538439>] trace_do_page_fault+0x48/0x99
[  202.324001]  [<ffffffff815354b2>] trace_page_fault+0x22/0x30
[  202.324001] ---[ end trace 31ccd31b4e82cb42 ]---

This triggers a kernel pagefault because the BP register is not valid and
copy_from_user_nmi() tries to copy the user stack area from there.

This *should* be OK?  And it maybe looks like it works, but then

	copy_user_handle_tail()

causes another page fault again at the invaid BP register.

But then the code continues to
	perf_output_begin()


237.265689: page_fault_kernel:    address=0x17a0 ip=copy_user_handle_tail error_code=0x0
237.265689: function:                __do_page_fault
237.265689: function:                   bad_area_nosemaphore
237.265690: function:                      __bad_area_nosemaphore
237.265690: function:                         no_context
237.265690: function:                            fixup_exception
237.265690: function:                               search_exception_tables
237.265690: function:                                  search_extable
237.265691: function:             perf_output_begin
237.265692: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e7

how are we back in __do_page_fault again here?

237.265692: function:             __do_page_fault
237.265692: function:                down_read_trylock
237.265692: function:                _cond_resched
237.265693: function:                find_vma
237.265693: function:                bad_area
237.265693: function:                   up_read
237.265693: function:                   __bad_area_nosemaphore
237.265694: function:                      is_prefetch
237.265694: function:                         convert_ip_to_linear
237.265695: function:                      unhandled_signal
237.265695: function:                      __printk_ratelimit
237.265695: function:             _raw_spin_trylock
237.265695: function:             _raw_spin_unlock_irqrestore
237.265696: function:                      printk
237.265696: function:                         vprintk_emit

[  202.877004] perf_fuzzer[2873]: segfault at 17a0 ip 00000000004017fd sp 00000000ffd19d10 error 6 in perf_fuzzer[400000+d1000]


Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-27 22:06                                                       ` Vince Weaver
@ 2014-02-27 22:31                                                         ` Steven Rostedt
  2014-02-27 22:52                                                           ` H. Peter Anvin
  2014-02-28  9:39                                                           ` Peter Zijlstra
  0 siblings, 2 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-27 22:31 UTC (permalink / raw)
  To: Vince Weaver; +Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Thu, 27 Feb 2014 17:06:36 -0500 (EST)
Vince Weaver <vincent.weaver@maine.edu> wrote:

> 
> I spent some more time on this.
> I managed to get a trace that exhibited the bug practically right 
> away, but still unable to generate a reproducible trace :(  
> 
> So instead I'm adding WARN's and trace_printks to see what I can find out.
> 
> Here's a summary of what I think is happneing.  Please let me know if I'm 
> wildly wrong in analyzing this.
> 
> 
> 
> Userspace accesses the perf 
> ring-buffer-user-page of an event for the first time (in this trace
> it's a SW_CPU_CLOCK event).
> 
> This triggers a page_fault to bring in the page.
> 
> This triggers a TRACEPOINT event (task/task_newtask) which
> has PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_CALLCHAIN|PERF_SAMPLE_ID|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_WEIGHT|PERF_SAMPLE_DATA_SRC|PERF_SAMPLE_TRANSACTION;
> and exclude_callchain_kernel=1, triggering a perf_callchain().
> 
> Here is a dump of the stack right after we enter perf_callchain()
> 
> vince@core2:~$ [  202.320444] ------------[ cut here ]------------
> [  202.324001] WARNING: CPU: 1 PID: 2873 at kernel/events/callchain.c:168 perf_callchain+0x67/0x211()
> hermal_sys ehci_pci ehci_hcd sg sd_mod usbcore usb_common
> [  202.324001] CPU: 1 PID: 2873 Comm: perf_fuzzer Not tainted 3.14.0-rc3+ #24
> [  202.324001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
> [  202.324001]  00000000000000a8 ffff880119ec1b48 ffffffff81531dc1 00000000000000a8
> [  202.324001]  0000000000000000 ffff880119ec1b88 ffffffff81040ce4 ffff880119ec1b88
> [  202.324001]  ffffffff810d416f ffff880119ec1e38 ffff8800cba67800 0000000000000000
> [  202.324001] Call Trace:
> [  202.324001]  [<ffffffff81531dc1>] dump_stack+0x49/0x60
> [  202.324001]  [<ffffffff81040ce4>] warn_slowpath_common+0x81/0x9b
> [  202.324001]  [<ffffffff810d416f>] ? perf_callchain+0x67/0x211
> [  202.324001]  [<ffffffff81040d18>] warn_slowpath_null+0x1a/0x1c
> [  202.324001]  [<ffffffff810d416f>] perf_callchain+0x67/0x211
> [  202.324001]  [<ffffffff8106b0ab>] ? local_clock+0x1b/0x24
> [  202.324001]  [<ffffffff810d0872>] perf_prepare_sample+0x7b/0x304
> [  202.324001]  [<ffffffff810d1079>] __perf_event_overflow+0x156/0x1c1
> [  202.324001]  [<ffffffff810f4613>] ? free_pgtables+0xa7/0xc9
> [  202.324001]  [<ffffffff810d125e>] perf_swevent_overflow+0x41/0x5b
> [  202.324001]  [<ffffffff810d12ea>] perf_swevent_event+0x72/0x74
> [  202.324001]  [<ffffffff810d1478>] perf_tp_event+0xea/0x1ef
> [  202.324001]  [<ffffffff8103515a>] ? perf_trace_x86_exceptions+0x4c/0xba
> [  202.324001]  [<ffffffff810351b6>] perf_trace_x86_exceptions+0xa8/0xba
> [  202.324001]  [<ffffffff8103515a>] ? perf_trace_x86_exceptions+0x4c/0xba
> [  202.324001]  [<ffffffff81538439>] trace_do_page_fault+0x48/0x99
> [  202.324001]  [<ffffffff815354b2>] trace_page_fault+0x22/0x30
> [  202.324001] ---[ end trace 31ccd31b4e82cb42 ]---
> 
> This triggers a kernel pagefault because the BP register is not valid and
> copy_from_user_nmi() tries to copy the user stack area from there.
> 
> This *should* be OK?  And it maybe looks like it works, but then
> 
> 	copy_user_handle_tail()
> 
> causes another page fault again at the invaid BP register.
> 
> But then the code continues to
> 	perf_output_begin()
> 
> 
> 237.265689: page_fault_kernel:    address=0x17a0 ip=copy_user_handle_tail error_code=0x0
> 237.265689: function:                __do_page_fault
> 237.265689: function:                   bad_area_nosemaphore
> 237.265690: function:                      __bad_area_nosemaphore
> 237.265690: function:                         no_context
> 237.265690: function:                            fixup_exception
> 237.265690: function:                               search_exception_tables
> 237.265690: function:                                  search_extable
> 237.265691: function:             perf_output_begin

You can also allow perf to be traced. Remove the CFLAGS_REMOVE_core.o
line from kernel/events/Makefile.

> 237.265692: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e7
> 
> how are we back in __do_page_fault again here?

Well, the perf ring buffer is vmalloced, right? That can cause a page
fault too.

> 
> 237.265692: function:             __do_page_fault

Try adding a trace_printk() to see all the page faults?

> 237.265692: function:                down_read_trylock
> 237.265692: function:                _cond_resched
> 237.265693: function:                find_vma
> 237.265693: function:                bad_area
> 237.265693: function:                   up_read
> 237.265693: function:                   __bad_area_nosemaphore
> 237.265694: function:                      is_prefetch
> 237.265694: function:                         convert_ip_to_linear
> 237.265695: function:                      unhandled_signal
> 237.265695: function:                      __printk_ratelimit
> 237.265695: function:             _raw_spin_trylock
> 237.265695: function:             _raw_spin_unlock_irqrestore
> 237.265696: function:                      printk
> 237.265696: function:                         vprintk_emit
> 
> [  202.877004] perf_fuzzer[2873]: segfault at 17a0 ip 00000000004017fd sp 00000000ffd19d10 error 6 in perf_fuzzer[400000+d1000]
> 

Yeah, something is getting mesed up.

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-27 22:31                                                         ` Steven Rostedt
@ 2014-02-27 22:52                                                           ` H. Peter Anvin
  2014-02-27 23:30                                                             ` Steven Rostedt
  2014-02-28  1:34                                                             ` Vince Weaver
  2014-02-28  9:39                                                           ` Peter Zijlstra
  1 sibling, 2 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-27 22:52 UTC (permalink / raw)
  To: Steven Rostedt, Vince Weaver; +Cc: Peter Zijlstra, Linux Kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> 
> Yeah, something is getting mesed up.
> 

What it *looks* like to me is that we try to nest the cr2 save/restore,
which doesn't nest because it is a percpu variable.

... except in the x86-64 case, we *ALSO* save/restore cr2 inside
entry_64.S, which makes the stuff in do_nmi completely redundant and
there for no good reason.

I would actually suggest we do the equivalent on i386 as well.

Vince, could you try this patch as an experiment?



[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 952 bytes --]

diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..938e45c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -10,6 +10,8 @@
 #include <asm/word-at-a-time.h>
 #include <linux/sched.h>
 
+#include <asm/processor.h>
+
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
@@ -18,6 +20,7 @@ unsigned long
 copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 {
 	unsigned long ret;
+	unsigned long cr2;
 
 	if (__range_not_ok(from, n, TASK_SIZE))
 		return 0;
@@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	 * disable pagefaults so that its behaviour is consistent even when
 	 * called form other contexts.
 	 */
+	cr2 = read_cr2();
 	pagefault_disable();
 	ret = __copy_from_user_inatomic(to, from, n);
 	pagefault_enable();
+	write_cr2(cr2);
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-27 22:52                                                           ` H. Peter Anvin
@ 2014-02-27 23:30                                                             ` Steven Rostedt
  2014-02-27 23:46                                                               ` H. Peter Anvin
  2014-02-28  1:34                                                             ` Vince Weaver
  1 sibling, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-27 23:30 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Thu, 27 Feb 2014 14:52:54 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > 
> > Yeah, something is getting mesed up.
> > 
> 
> What it *looks* like to me is that we try to nest the cr2 save/restore,
> which doesn't nest because it is a percpu variable.
> 
> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> entry_64.S, which makes the stuff in do_nmi completely redundant and
> there for no good reason.

Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
section. That is, it isn't even executed. That's i386 code. The only
place the cr2 is saved for x86_64 is in entry_64.S.

-- Steve

> 
> I would actually suggest we do the equivalent on i386 as well.
> 
> Vince, could you try this patch as an experiment?
> 
> 


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-27 23:30                                                             ` Steven Rostedt
@ 2014-02-27 23:46                                                               ` H. Peter Anvin
  2014-02-28  1:00                                                                 ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-27 23:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar

On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> On Thu, 27 Feb 2014 14:52:54 -0800
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
>> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
>>>
>>> Yeah, something is getting mesed up.
>>>
>>
>> What it *looks* like to me is that we try to nest the cr2 save/restore,
>> which doesn't nest because it is a percpu variable.
>>
>> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
>> entry_64.S, which makes the stuff in do_nmi completely redundant and
>> there for no good reason.
> 
> Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> section. That is, it isn't even executed. That's i386 code. The only
> place the cr2 is saved for x86_64 is in entry_64.S.
> 

Right, egg on my face.  However, I still think it would make more sense
for it to nest the way entry_64.S does if at all possible.

That makes this even more confusing, though.  I would still like to see
what happens with the patch I sent Vince.

	-hpa



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-27 23:46                                                               ` H. Peter Anvin
@ 2014-02-28  1:00                                                                 ` Vince Weaver
  2014-02-28 20:34                                                                   ` Paul E. McKenney
  0 siblings, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-28  1:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Thu, 27 Feb 2014, H. Peter Anvin wrote:

> On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > On Thu, 27 Feb 2014 14:52:54 -0800
> > "H. Peter Anvin" <hpa@zytor.com> wrote:
> > 
> >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> >>>
> >>> Yeah, something is getting mesed up.
> >>>
> >>
> >> What it *looks* like to me is that we try to nest the cr2 save/restore,
> >> which doesn't nest because it is a percpu variable.
> >>
> >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> >> entry_64.S, which makes the stuff in do_nmi completely redundant and
> >> there for no good reason.
> > 
> > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> > section. That is, it isn't even executed. That's i386 code. The only
> > place the cr2 is saved for x86_64 is in entry_64.S.
> > 
> 
> Right, egg on my face.  However, I still think it would make more sense
> for it to nest the way entry_64.S does if at all possible.
> 
> That makes this even more confusing, though.  I would still like to see
> what happens with the patch I sent Vince.

I'll try your patch momentarily, first I had some other changes I started 
running before I left work (for some reason it recompiled the whole 
kernel).

8: function:             perf_output_begin
8: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e
8: bputs:                perf_output_begin: VMW: before rcu_dereference
9: function:             __do_page_fault
9: function:                down_read_trylock
9: function:                _cond_resched
9: function:                find_vma

so it looks like the fault happens 

rcu_read_lock();

116         /*
117          * For inherited events we send all the output towards the parent.
118          */
119         if (event->parent)
120                 event = event->parent;
121 

somewhere between here

122         rb = rcu_dereference(event->rb);
123         if (unlikely(!rb))
124                 goto out;

and here

125 
126         if (unlikely(!rb->nr_pages))
127                 goto out;

although if rcu locks do anything to turn off tracing then this could be 
suspect.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-27 22:52                                                           ` H. Peter Anvin
  2014-02-27 23:30                                                             ` Steven Rostedt
@ 2014-02-28  1:34                                                             ` Vince Weaver
  2014-02-28  2:17                                                               ` H. Peter Anvin
  2014-02-28  2:57                                                               ` Steven Rostedt
  1 sibling, 2 replies; 115+ messages in thread
From: Vince Weaver @ 2014-02-28  1:34 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Thu, 27 Feb 2014, H. Peter Anvin wrote:

> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > 
> > Yeah, something is getting mesed up.
> > 
> 
> What it *looks* like to me is that we try to nest the cr2 save/restore,
> which doesn't nest because it is a percpu variable.
> 
> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> entry_64.S, which makes the stuff in do_nmi completely redundant and
> there for no good reason.
> 
> I would actually suggest we do the equivalent on i386 as well.
> 
> Vince, could you try this patch as an experiment?

OK with your patch applied it does not segfault.

The trace looks like you'd expect too:

     perf_fuzzer-2910  [000]   255.355331: page_fault_kernel:    address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
     perf_fuzzer-2910  [000]   255.355331: function:                __do_page_fault
     perf_fuzzer-2910  [000]   255.355331: function:                   bad_area_nosemaphore
     perf_fuzzer-2910  [000]   255.355331: function:                      __bad_area_nosemaphore
     perf_fuzzer-2910  [000]   255.355332: function:                         no_context
     perf_fuzzer-2910  [000]   255.355332: function:                            fixup_exception
     perf_fuzzer-2910  [000]   255.355332: function:                               search_exception_tables
     perf_fuzzer-2910  [000]   255.355332: function:                                  search_extable
     perf_fuzzer-2910  [000]   255.355333: function:             perf_output_begin
     perf_fuzzer-2910  [000]   255.355333: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e7
     perf_fuzzer-2910  [000]   255.355333: bprint:               perf_output_begin: VMW: before event->parent (nil)
     perf_fuzzer-2910  [000]   255.355334: bprint:               perf_output_begin: VMW: before rcu_dereference (nil)
     perf_fuzzer-2910  [000]   255.355334: function:             __do_page_fault
     perf_fuzzer-2910  [000]   255.355334: function:                down_read_trylock
     perf_fuzzer-2910  [000]   255.355334: function:                _cond_resched
     perf_fuzzer-2910  [000]   255.355335: function:                find_vma
     perf_fuzzer-2910  [000]   255.355335: function:                handle_mm_fault
     perf_fuzzer-2910  [000]   255.355335: function:                   __do_fault
     perf_fuzzer-2910  [000]   255.355335: bputs:                perf_mmap_fault: VMW: perf_mmap_fault
     perf_fuzzer-2910  [000]   255.355336: bprint:               perf_mmap_fault: VMW: perf_mmap_fault 0xffff8800cba6a980
     perf_fuzzer-2910  [000]   255.355336: function:             perf_mmap_to_page
     perf_fuzzer-2910  [000]   255.355336: function:             _cond_resched
     perf_fuzzer-2910  [000]   255.355336: function:                      unlock_page
     perf_fuzzer-2910  [000]   255.355336: function:                         page_waitqueue
     perf_fuzzer-2910  [000]   255.355337: function:                         __wake_up_bit
     perf_fuzzer-2910  [000]   255.355337: bputs:                perf_mmap_fault: VMW: perf_mmap_fault
     perf_fuzzer-2910  [000]   255.355337: function:             _cond_resched
     perf_fuzzer-2910  [000]   255.355337: function:                      _raw_spin_lock
     perf_fuzzer-2910  [000]   255.355337: function:                      page_add_file_rmap
     perf_fuzzer-2910  [000]   255.355338: function:                         __inc_zone_page_state
     perf_fuzzer-2910  [000]   255.355338: function:                            __inc_zone_state
     perf_fuzzer-2910  [000]   255.355338: function:                      set_page_dirty
     perf_fuzzer-2910  [000]   255.355338: function:                         page_mapping
     perf_fuzzer-2910  [000]   255.355338: function:                         anon_set_page_dirty
     perf_fuzzer-2910  [000]   255.355339: function:                      unlock_page


Vince


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28  1:34                                                             ` Vince Weaver
@ 2014-02-28  2:17                                                               ` H. Peter Anvin
  2014-02-28  2:57                                                               ` Steven Rostedt
  1 sibling, 0 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-28  2:17 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Steven Rostedt, Peter Zijlstra, Linux Kernel, Ingo Molnar

Ok... I think we're definitely talking about a cr2 leak.  The reboot might be a race condition in the NMI nesting handling maybe?

On February 27, 2014 5:34:34 PM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>On Thu, 27 Feb 2014, H. Peter Anvin wrote:
>
>> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
>> > 
>> > Yeah, something is getting mesed up.
>> > 
>> 
>> What it *looks* like to me is that we try to nest the cr2
>save/restore,
>> which doesn't nest because it is a percpu variable.
>> 
>> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
>> entry_64.S, which makes the stuff in do_nmi completely redundant and
>> there for no good reason.
>> 
>> I would actually suggest we do the equivalent on i386 as well.
>> 
>> Vince, could you try this patch as an experiment?
>
>OK with your patch applied it does not segfault.
>
>The trace looks like you'd expect too:
>
>perf_fuzzer-2910  [000]   255.355331: page_fault_kernel:   
>address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
>perf_fuzzer-2910  [000]   255.355331: function:               
>__do_page_fault
>perf_fuzzer-2910  [000]   255.355331: function:                  
>bad_area_nosemaphore
>perf_fuzzer-2910  [000]   255.355331: function:                     
>__bad_area_nosemaphore
>perf_fuzzer-2910  [000]   255.355332: function:                        
>no_context
>perf_fuzzer-2910  [000]   255.355332: function:                        
>   fixup_exception
>perf_fuzzer-2910  [000]   255.355332: function:                        
>      search_exception_tables
>perf_fuzzer-2910  [000]   255.355332: function:                        
>         search_extable
>perf_fuzzer-2910  [000]   255.355333: function:            
>perf_output_begin
>perf_fuzzer-2910  [000]   255.355333: bprint:              
>perf_output_begin: VMW: event type 2 config 2a st: 2c3e7
>perf_fuzzer-2910  [000]   255.355333: bprint:              
>perf_output_begin: VMW: before event->parent (nil)
>perf_fuzzer-2910  [000]   255.355334: bprint:              
>perf_output_begin: VMW: before rcu_dereference (nil)
>perf_fuzzer-2910  [000]   255.355334: function:            
>__do_page_fault
>perf_fuzzer-2910  [000]   255.355334: function:               
>down_read_trylock
>perf_fuzzer-2910  [000]   255.355334: function:               
>_cond_resched
>perf_fuzzer-2910  [000]   255.355335: function:                find_vma
>perf_fuzzer-2910  [000]   255.355335: function:               
>handle_mm_fault
>perf_fuzzer-2910  [000]   255.355335: function:                  
>__do_fault
>perf_fuzzer-2910  [000]   255.355335: bputs:               
>perf_mmap_fault: VMW: perf_mmap_fault
>perf_fuzzer-2910  [000]   255.355336: bprint:              
>perf_mmap_fault: VMW: perf_mmap_fault 0xffff8800cba6a980
>perf_fuzzer-2910  [000]   255.355336: function:            
>perf_mmap_to_page
>perf_fuzzer-2910  [000]   255.355336: function:            
>_cond_resched
>perf_fuzzer-2910  [000]   255.355336: function:                     
>unlock_page
>perf_fuzzer-2910  [000]   255.355336: function:                        
>page_waitqueue
>perf_fuzzer-2910  [000]   255.355337: function:                        
>__wake_up_bit
>perf_fuzzer-2910  [000]   255.355337: bputs:               
>perf_mmap_fault: VMW: perf_mmap_fault
>perf_fuzzer-2910  [000]   255.355337: function:            
>_cond_resched
>perf_fuzzer-2910  [000]   255.355337: function:                     
>_raw_spin_lock
>perf_fuzzer-2910  [000]   255.355337: function:                     
>page_add_file_rmap
>perf_fuzzer-2910  [000]   255.355338: function:                        
>__inc_zone_page_state
>perf_fuzzer-2910  [000]   255.355338: function:                        
>   __inc_zone_state
>perf_fuzzer-2910  [000]   255.355338: function:                     
>set_page_dirty
>perf_fuzzer-2910  [000]   255.355338: function:                        
>page_mapping
>perf_fuzzer-2910  [000]   255.355338: function:                        
>anon_set_page_dirty
>perf_fuzzer-2910  [000]   255.355339: function:                     
>unlock_page
>
>
>Vince

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28  1:34                                                             ` Vince Weaver
  2014-02-28  2:17                                                               ` H. Peter Anvin
@ 2014-02-28  2:57                                                               ` Steven Rostedt
  2014-02-28 11:11                                                                 ` Peter Zijlstra
  2014-02-28 14:15                                                                 ` Vince Weaver
  1 sibling, 2 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28  2:57 UTC (permalink / raw)
  To: Vince Weaver; +Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
Vince Weaver <vincent.weaver@maine.edu> wrote:


> > I would actually suggest we do the equivalent on i386 as well.
> > 
> > Vince, could you try this patch as an experiment?
> 
> OK with your patch applied it does not segfault.
> 

Vince, Great! Can you remove Peter's patch, and try this one. It
removes the crud to save the cr2 from entry_64.S and makes both i386
and x86_64 do the same thing in regards to cr2 handling.

-- Steve

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1e96c36..937cb8d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1854,29 +1854,11 @@ end_repeat_nmi:
 	call save_paranoid
 	DEFAULT_FRAME 0
 
-	/*
-	 * Save off the CR2 register. If we take a page fault in the NMI then
-	 * it could corrupt the CR2 value. If the NMI preempts a page fault
-	 * handler before it was able to read the CR2 register, and then the
-	 * NMI itself takes a page fault, the page fault that was preempted
-	 * will read the information from the NMI page fault and not the
-	 * origin fault. Save it off and restore it if it changes.
-	 * Use the r12 callee-saved register.
-	 */
-	movq %cr2, %r12
-
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq %rsp,%rdi
 	movq $-1,%rsi
 	call do_nmi
 
-	/* Did the NMI take a page fault? Restore cr2 if it did */
-	movq %cr2, %rcx
-	cmpq %rcx, %r12
-	je 1f
-	movq %r12, %cr2
-1:
-	
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz nmi_restore
 nmi_swapgs:
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49c..f1a6294 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -443,7 +443,6 @@ enum nmi_states {
 	NMI_LATCHED,
 };
 static DEFINE_PER_CPU(enum nmi_states, nmi_state);
-static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 
 #define nmi_nesting_preprocess(regs)					\
 	do {								\
@@ -452,14 +451,11 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 			return;						\
 		}							\
 		this_cpu_write(nmi_state, NMI_EXECUTING);		\
-		this_cpu_write(nmi_cr2, read_cr2());			\
 	} while (0);							\
 	nmi_restart:
 
 #define nmi_nesting_postprocess()					\
 	do {								\
-		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
-			write_cr2(this_cpu_read(nmi_cr2));		\
 		if (this_cpu_dec_return(nmi_state))			\
 			goto nmi_restart;				\
 	} while (0)
@@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
 dotraplinkage notrace __kprobes void
 do_nmi(struct pt_regs *regs, long error_code)
 {
+	unsigned long cr2;
+	
 	nmi_nesting_preprocess(regs);
 
+	/*
+	 * Save off the CR2 register. If we take a page fault in the NMI then
+	 * it could corrupt the CR2 value. If the NMI preempts a page fault
+	 * handler before it was able to read the CR2 register, and then the
+	 * NMI itself takes a page fault, the page fault that was preempted
+	 * will read the information from the NMI page fault and not the
+	 * origin fault. Save it off and restore it if it changes.
+	 * Use the r12 callee-saved register.
+	 */
+	cr2 = read_cr2();
+
 	nmi_enter();
 
 	inc_irq_stat(__nmi_count);
@@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code)
 
 	nmi_exit();
 
+	/* Reads are cheaper than writes */
+	if (unlikely(cr2 != read_cr2()))
+		write_cr2(cr2);
+
 	/* On i386, may loop back to preprocess */
 	nmi_nesting_postprocess();
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-27 22:31                                                         ` Steven Rostedt
  2014-02-27 22:52                                                           ` H. Peter Anvin
@ 2014-02-28  9:39                                                           ` Peter Zijlstra
  1 sibling, 0 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28  9:39 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Thu, Feb 27, 2014 at 05:31:50PM -0500, Steven Rostedt wrote:
> Well, the perf ring buffer is vmalloced, right? That can cause a page
> fault too.

On x86 they're typically not -- although we have a debug CONFIG option
to test that code on x86 too. On SPARC/ARM etc.. we have to use
vmalloc_user() because of cache aliasing (yay for VIPT).

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28  2:57                                                               ` Steven Rostedt
@ 2014-02-28 11:11                                                                 ` Peter Zijlstra
  2014-02-28 13:37                                                                   ` Steven Rostedt
  2014-02-28 14:15                                                                 ` Vince Weaver
  1 sibling, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 11:11 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
> @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
>  dotraplinkage notrace __kprobes void
>  do_nmi(struct pt_regs *regs, long error_code)
>  {
> +	unsigned long cr2;
> +	
>  	nmi_nesting_preprocess(regs);
>  
> +	/*
> +	 * Save off the CR2 register. If we take a page fault in the NMI then
> +	 * it could corrupt the CR2 value. If the NMI preempts a page fault
> +	 * handler before it was able to read the CR2 register, and then the
> +	 * NMI itself takes a page fault, the page fault that was preempted
> +	 * will read the information from the NMI page fault and not the
> +	 * origin fault. Save it off and restore it if it changes.

> +	 * Use the r12 callee-saved register.

You might want to make that line go away :-)

> +	 */
> +	cr2 = read_cr2();
> +
>  	nmi_enter();
>  
>  	inc_irq_stat(__nmi_count);
> @@ -523,6 +532,10 @@ do_nmi(struct pt_regs *regs, long error_code)
>  
>  	nmi_exit();
>  
> +	/* Reads are cheaper than writes */
> +	if (unlikely(cr2 != read_cr2()))
> +		write_cr2(cr2);
> +
>  	/* On i386, may loop back to preprocess */
>  	nmi_nesting_postprocess();
>  }

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 11:11                                                                 ` Peter Zijlstra
@ 2014-02-28 13:37                                                                   ` Steven Rostedt
  0 siblings, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 13:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014 12:11:11 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Feb 27, 2014 at 09:57:26PM -0500, Steven Rostedt wrote:
> > @@ -512,8 +508,21 @@ static inline void nmi_nesting_postprocess(void)
> >  dotraplinkage notrace __kprobes void
> >  do_nmi(struct pt_regs *regs, long error_code)
> >  {
> > +	unsigned long cr2;
> > +	
> >  	nmi_nesting_preprocess(regs);
> >  
> > +	/*
> > +	 * Save off the CR2 register. If we take a page fault in the NMI then
> > +	 * it could corrupt the CR2 value. If the NMI preempts a page fault
> > +	 * handler before it was able to read the CR2 register, and then the
> > +	 * NMI itself takes a page fault, the page fault that was preempted
> > +	 * will read the information from the NMI page fault and not the
> > +	 * origin fault. Save it off and restore it if it changes.
> 
> > +	 * Use the r12 callee-saved register.
> 
> You might want to make that line go away :-)
> 

Gah! I noticed that before sending and deleted the line, recreated the
patch and sent that out. The one step I forgot to do was to save the
file after deleting it :-p

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28  2:57                                                               ` Steven Rostedt
  2014-02-28 11:11                                                                 ` Peter Zijlstra
@ 2014-02-28 14:15                                                                 ` Vince Weaver
  2014-02-28 14:23                                                                   ` Steven Rostedt
  1 sibling, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-28 14:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vince Weaver, H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Thu, 27 Feb 2014, Steven Rostedt wrote:

> On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
> Vince Weaver <vincent.weaver@maine.edu> wrote:
> 
> 
> > > I would actually suggest we do the equivalent on i386 as well.
> > > 
> > > Vince, could you try this patch as an experiment?
> > 
> > OK with your patch applied it does not segfault.
> > 
> 
> Vince, Great! Can you remove Peter's patch, and try this one. It
> removes the crud to save the cr2 from entry_64.S and makes both i386
> and x86_64 do the same thing in regards to cr2 handling.

no, with only this patch applied it segfaults as per previous:

[  126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 00000000004017fd sp 00000000ffaff3f0 error 6 in perf_fuzzer[400000+d1000]

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 14:15                                                                 ` Vince Weaver
@ 2014-02-28 14:23                                                                   ` Steven Rostedt
  2014-02-28 15:07                                                                     ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 14:23 UTC (permalink / raw)
  To: Vince Weaver; +Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014 09:15:33 -0500 (EST)
Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Thu, 27 Feb 2014, Steven Rostedt wrote:
> 
> > On Thu, 27 Feb 2014 20:34:34 -0500 (EST)
> > Vince Weaver <vincent.weaver@maine.edu> wrote:
> > 
> > 
> > > > I would actually suggest we do the equivalent on i386 as well.
> > > > 
> > > > Vince, could you try this patch as an experiment?
> > > 
> > > OK with your patch applied it does not segfault.
> > > 
> > 
> > Vince, Great! Can you remove Peter's patch, and try this one. It
> > removes the crud to save the cr2 from entry_64.S and makes both i386
> > and x86_64 do the same thing in regards to cr2 handling.
> 
> no, with only this patch applied it segfaults as per previous:
> 
> [  126.396049] perf_fuzzer[2904]: segfault at 17a0 ip 00000000004017fd sp 00000000ffaff3f0 error 6 in perf_fuzzer[400000+d1000]

Interesting. Are you doing a perf function trace?

And just in case, can you add this patch and make sure the copy is
called by NMI.

-- Steve

diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..ca943cd 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -29,6 +29,7 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	 */
 	pagefault_disable();
 	ret = __copy_from_user_inatomic(to, from, n);
+	trace_dump_stack(2)
 	pagefault_enable();
 
 	return ret;

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 14:23                                                                   ` Steven Rostedt
@ 2014-02-28 15:07                                                                     ` Vince Weaver
  2014-02-28 15:13                                                                       ` H. Peter Anvin
                                                                                         ` (2 more replies)
  0 siblings, 3 replies; 115+ messages in thread
From: Vince Weaver @ 2014-02-28 15:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vince Weaver, H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014, Steven Rostedt wrote:

> Interesting. Are you doing a perf function trace?
> 
> And just in case, can you add this patch and make sure the copy is
> called by NMI.

199.900682: function:             trace_do_page_fault
199.900683: page_fault_user:      address=__per_cpu_end ip=__per_cpu_end error_code=0x6
199.900683: function:             perf_swevent_get_recursion_context
199.900684: function:             perf_tp_event
199.900684: function:                perf_swevent_event
199.900684: function:                   perf_swevent_overflow
199.900684: function:                      __perf_event_overflow
199.900685: function:                         perf_prepare_sample
199.900685: function:                            __perf_event_header__init_id
199.900685: function:                               task_tgid_nr_ns
199.900685: function:                               perf_event_tid
199.900686: function:                                  __task_pid_nr_ns
199.900686: function:                            perf_callchain
199.900687: function:             copy_from_user_nmi
199.900687: function:             trace_do_page_fault
199.900687: page_fault_kernel:    address=irq_stack_union ip=copy_user_generic_string error_code=0x0
199.900688: function:                __do_page_fault
199.900688: function:                   bad_area_nosemaphore
199.900688: function:                      __bad_area_nosemaphore
199.900689: function:                         no_context
199.900689: function:                            fixup_exception
199.900689: function:                               search_exception_tables
199.900689: function:                                  search_extable
199.900691: function:             copy_user_handle_tail
199.900691: function:             trace_do_page_fault
199.900691: page_fault_kernel:    address=irq_stack_union ip=copy_user_handle_tail error_code=0x0
199.900691: function:                __do_page_fault
199.900692: function:                   bad_area_nosemaphore
199.900692: function:                      __bad_area_nosemaphore
199.900692: function:                         no_context
199.900692: function:                            fixup_exception
199.900692: function:                               search_exception_tables
199.900692: function:                                  search_extable
199.900693: function:             save_stack_trace
199.900693: function:                dump_trace
199.900694: function:                   print_context_stack
199.900694: function:                      __kernel_text_address
199.900694: function:                         is_module_text_address
199.900695: function:                            __module_text_address
199.900695: function:                               __module_address
199.900695: function:                      __kernel_text_address
199.900695: function:                         is_module_text_address
199.900696: function:                            __module_text_address
199.900696: function:                               __module_address
...
199.900705: function:                      __kernel_text_address
199.900809: kernel_stack:         <stack trace>
=> perf_callchain (ffffffff810d35a2)
=> perf_prepare_sample (ffffffff810cfae3)
=> __perf_event_overflow (ffffffff810d02f4)
=> perf_swevent_overflow (ffffffff810d04e3)
=> perf_swevent_event (ffffffff810d0574)
=> perf_tp_event (ffffffff810d070c)
=> perf_trace_x86_exceptions (ffffffff810341b6)
=> trace_do_page_fault (ffffffff81537702)
=> trace_page_fault (ffffffff81534772)
199.900810: function:             perf_output_begin
199.900810: function:             __do_page_fault
199.900810: function:                __perf_sw_event
199.900810: function:                   perf_swevent_get_recursion_context
199.900811: function:                down_read_trylock
199.900811: function:                _cond_resched
199.900811: function:                find_vma
199.900811: function:                bad_area
199.900812: function:                   up_read
199.900812: function:                   __bad_area_nosemaphore
199.900812: function:                      is_prefetch
199.900812: function:                         convert_ip_to_linear
199.900813: function:                      unhandled_signal
199.900813: function:                      __printk_ratelimit
199.900813: function:             _raw_spin_trylock
199.900813: function:             _raw_spin_unlock_irqrestore
199.900814: function:                      printk
199.900814: function:                         vprintk_emit


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 15:07                                                                     ` Vince Weaver
@ 2014-02-28 15:13                                                                       ` H. Peter Anvin
  2014-02-28 15:40                                                                         ` Peter Zijlstra
  2014-02-28 15:20                                                                       ` Steven Rostedt
  2014-02-28 20:38                                                                       ` H. Peter Anvin
  2 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-28 15:13 UTC (permalink / raw)
  To: Vince Weaver, Steven Rostedt; +Cc: Peter Zijlstra, Linux Kernel, Ingo Molnar

If I'm reading this right we end up going from the page fault tracepoint to copy_from_user_nmi() without going through NMI, and the cr2 corruption is obvious.  I guess the assumption that only the NMI path needed to save cr2 is flawed?

On February 28, 2014 7:07:29 AM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>On Fri, 28 Feb 2014, Steven Rostedt wrote:
>
>> Interesting. Are you doing a perf function trace?
>> 
>> And just in case, can you add this patch and make sure the copy is
>> called by NMI.
>
>199.900682: function:             trace_do_page_fault
>199.900683: page_fault_user:      address=__per_cpu_end
>ip=__per_cpu_end error_code=0x6
>199.900683: function:             perf_swevent_get_recursion_context
>199.900684: function:             perf_tp_event
>199.900684: function:                perf_swevent_event
>199.900684: function:                   perf_swevent_overflow
>199.900684: function:                      __perf_event_overflow
>199.900685: function:                         perf_prepare_sample
>199.900685: function:                           
>__perf_event_header__init_id
>199.900685: function:                               task_tgid_nr_ns
>199.900685: function:                               perf_event_tid
>199.900686: function:                                  __task_pid_nr_ns
>199.900686: function:                            perf_callchain
>199.900687: function:             copy_from_user_nmi
>199.900687: function:             trace_do_page_fault
>199.900687: page_fault_kernel:    address=irq_stack_union
>ip=copy_user_generic_string error_code=0x0
>199.900688: function:                __do_page_fault
>199.900688: function:                   bad_area_nosemaphore
>199.900688: function:                      __bad_area_nosemaphore
>199.900689: function:                         no_context
>199.900689: function:                            fixup_exception
>199.900689: function:                              
>search_exception_tables
>199.900689: function:                                  search_extable
>199.900691: function:             copy_user_handle_tail
>199.900691: function:             trace_do_page_fault
>199.900691: page_fault_kernel:    address=irq_stack_union
>ip=copy_user_handle_tail error_code=0x0
>199.900691: function:                __do_page_fault
>199.900692: function:                   bad_area_nosemaphore
>199.900692: function:                      __bad_area_nosemaphore
>199.900692: function:                         no_context
>199.900692: function:                            fixup_exception
>199.900692: function:                              
>search_exception_tables
>199.900692: function:                                  search_extable
>199.900693: function:             save_stack_trace
>199.900693: function:                dump_trace
>199.900694: function:                   print_context_stack
>199.900694: function:                      __kernel_text_address
>199.900694: function:                         is_module_text_address
>199.900695: function:                            __module_text_address
>199.900695: function:                               __module_address
>199.900695: function:                      __kernel_text_address
>199.900695: function:                         is_module_text_address
>199.900696: function:                            __module_text_address
>199.900696: function:                               __module_address
>...
>199.900705: function:                      __kernel_text_address
>199.900809: kernel_stack:         <stack trace>
>=> perf_callchain (ffffffff810d35a2)
>=> perf_prepare_sample (ffffffff810cfae3)
>=> __perf_event_overflow (ffffffff810d02f4)
>=> perf_swevent_overflow (ffffffff810d04e3)
>=> perf_swevent_event (ffffffff810d0574)
>=> perf_tp_event (ffffffff810d070c)
>=> perf_trace_x86_exceptions (ffffffff810341b6)
>=> trace_do_page_fault (ffffffff81537702)
>=> trace_page_fault (ffffffff81534772)
>199.900810: function:             perf_output_begin
>199.900810: function:             __do_page_fault
>199.900810: function:                __perf_sw_event
>199.900810: function:                  
>perf_swevent_get_recursion_context
>199.900811: function:                down_read_trylock
>199.900811: function:                _cond_resched
>199.900811: function:                find_vma
>199.900811: function:                bad_area
>199.900812: function:                   up_read
>199.900812: function:                   __bad_area_nosemaphore
>199.900812: function:                      is_prefetch
>199.900812: function:                         convert_ip_to_linear
>199.900813: function:                      unhandled_signal
>199.900813: function:                      __printk_ratelimit
>199.900813: function:             _raw_spin_trylock
>199.900813: function:             _raw_spin_unlock_irqrestore
>199.900814: function:                      printk
>199.900814: function:                         vprintk_emit

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 15:07                                                                     ` Vince Weaver
  2014-02-28 15:13                                                                       ` H. Peter Anvin
@ 2014-02-28 15:20                                                                       ` Steven Rostedt
  2014-02-28 15:30                                                                         ` Steven Rostedt
  2014-02-28 20:38                                                                       ` H. Peter Anvin
  2 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 15:20 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar, Seiji Aguchi

On Fri, 28 Feb 2014 10:07:29 -0500 (EST)
Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Fri, 28 Feb 2014, Steven Rostedt wrote:

> 199.900696: function:                               __module_address
> ...
> 199.900705: function:                      __kernel_text_address
> 199.900809: kernel_stack:         <stack trace>
> => perf_callchain (ffffffff810d35a2)
> => perf_prepare_sample (ffffffff810cfae3)
> => __perf_event_overflow (ffffffff810d02f4)
> => perf_swevent_overflow (ffffffff810d04e3)
> => perf_swevent_event (ffffffff810d0574)
> => perf_tp_event (ffffffff810d070c)
> => perf_trace_x86_exceptions (ffffffff810341b6)
> => trace_do_page_fault (ffffffff81537702)
> => trace_page_fault (ffffffff81534772)

Thank you!!! You just found the bug :-)

The bug was caused by:

commit 25c74b10bacead867478480170083f69cfc0db48
x86, trace: Register exception handler to trace IDT

With this code:

dotraplinkage void __kprobes
trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
	enum ctx_state prev_state;

	prev_state = exception_enter();
	trace_page_fault_entries(regs, error_code);
	__do_page_fault(regs, error_code);
	exception_exit(prev_state);
}

The trace_page_fault_entries() which is called before the cr2 is saved
can fault by perf doing a userspace stack trace. But the cr2 is not
restored when calling __do_page_fault() and that gets the wrong cr2.

Below is a patch that should fix this. Please remove all other patches
and try this out.

Thanks,

-- Steve

> 199.900810: function:             perf_output_begin
> 199.900810: function:             __do_page_fault
> 199.900810: function:                __perf_sw_event
> 199.900810: function:                   perf_swevent_get_recursion_context
> 199.900811: function:                down_read_trylock
> 199.900811: function:                _cond_resched
> 199.900811: function:                find_vma
> 199.900811: function:                bad_area
> 199.900812: function:                   up_read
> 199.900812: function:                   __bad_area_nosemaphore
> 199.900812: function:                      is_prefetch
> 199.900812: function:                         convert_ip_to_linear
> 199.900813: function:                      unhandled_signal
> 199.900813: function:                      __printk_ratelimit
> 199.900813: function:             _raw_spin_trylock
> 199.900813: function:             _raw_spin_unlock_irqrestore
> 199.900814: function:                      printk
> 199.900814: function:                         vprintk_emit

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6dea040..66b636d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	unsigned long cr2;
 
 	prev_state = exception_enter();
+	/* The trace might fault, save the cr2 register */
+	cr2 = read_cr2();
 	trace_page_fault_entries(regs, error_code);
+	/* Put back the original cr2 if needed */
+	if (cr2 != read_cr2())
+		write_cr2(cr2);
 	__do_page_fault(regs, error_code);
 	exception_exit(prev_state);
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 15:20                                                                       ` Steven Rostedt
@ 2014-02-28 15:30                                                                         ` Steven Rostedt
  0 siblings, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 15:30 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar, Seiji Aguchi

On Fri, 28 Feb 2014 10:20:00 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Below is a patch that should fix this. Please remove all other patches
> and try this out.

Updated patch, as Peter Zijlstra on IRC asked me if the
exception_enter() can be traced. And looking at it, it sure can be.

-- Steve

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6dea040..f606b67 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1271,9 +1271,15 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	unsigned long cr2;
 
+	/* The trace might fault, save the cr2 register */
+	cr2 = read_cr2();
 	prev_state = exception_enter();
 	trace_page_fault_entries(regs, error_code);
+	/* Put back the original cr2 if needed */
+	if (cr2 != read_cr2())
+		write_cr2(cr2);
 	__do_page_fault(regs, error_code);
 	exception_exit(prev_state);
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH] x86 trace: Fix page fault tracing bug
@ 2014-02-28 15:33 Jiri Olsa
  2014-02-28 15:47 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 115+ messages in thread
From: Jiri Olsa @ 2014-02-28 15:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Seiji Aguchi

The trace_do_page_fault function trigger tracepoint
and then handles the actual page fault.

This could lead to error if the tracepoint caused page
fault. The original cr2 value gets lost and the original
page fault handler kills current process with SIGSEGV.

This happens if you record page faults with callchain
data, the user part of it will cause tracepoint handler
to page fault:

  # perf record -g -e exceptions:page_fault_user ls

Fixing this by saving the original cr2 value
and using it after tracepoint handler is done.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
---
 arch/x86/mm/fault.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9d591c8..52fad6c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1016,11 +1016,11 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * routines.
  */
 static void __kprobes
-__do_page_fault(struct pt_regs *regs, unsigned long error_code)
+__do_page_fault(struct pt_regs *regs, unsigned long error_code,
+		unsigned long address)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
-	unsigned long address;
 	struct mm_struct *mm;
 	int fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
@@ -1028,9 +1028,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	tsk = current;
 	mm = tsk->mm;
 
-	/* Get the faulting address: */
-	address = read_cr2();
-
 	/*
 	 * Detect and handle instructions that would cause a page fault for
 	 * both a tracked kernel page and a userspace page.
@@ -1248,9 +1245,14 @@ dotraplinkage void __kprobes
 do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	unsigned long address;
 
 	prev_state = exception_enter();
-	__do_page_fault(regs, error_code);
+
+	/* Get the faulting address: */
+	address = read_cr2();
+
+	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
@@ -1267,9 +1269,18 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	unsigned long address;
 
 	prev_state = exception_enter();
+
+	/*
+	 * The tracepoint processing could trigger another page
+	 * fault (user space callchain reading) and destroy the
+	 * original cr2 value, so read the faulting address now.
+	 */
+	address = read_cr2();
+
 	trace_page_fault_entries(regs, error_code);
-	__do_page_fault(regs, error_code);
+	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 15:13                                                                       ` H. Peter Anvin
@ 2014-02-28 15:40                                                                         ` Peter Zijlstra
  2014-02-28 16:15                                                                           ` H. Peter Anvin
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 15:40 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Vince Weaver, Steven Rostedt, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
> If I'm reading this right we end up going from the page fault
> tracepoint to copy_from_user_nmi() without going through NMI, and the
> cr2 corruption is obvious.  I guess the assumption that only the NMI
> path needed to save cr2 is flawed?

It was never assumed it would only go through NMI, but that it would be
NMI safe -- and as it turns out, it is that.

What I did assume was that any other callsites would be safe, seeing how
they'd already be running in 'normal' contexts.

I had not considered people putting tracepoints _that_ early in the
exception paths.

Note that there's more tracepoints there than the one mentioned.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86 trace: Fix page fault tracing bug
  2014-02-28 15:33 [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
@ 2014-02-28 15:47 ` Peter Zijlstra
  2014-02-28 16:05   ` [PATCHv2] " Jiri Olsa
  2014-02-28 15:47 ` [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
  2014-02-28 16:01 ` Steven Rostedt
  2 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 15:47 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Seiji Aguchi

On Fri, Feb 28, 2014 at 04:33:40PM +0100, Jiri Olsa wrote:

While I like the idea of just pushing up the CR2 read; the below does
the read too late still, exception_enter() also has a tracepoint in.

> @@ -1267,9 +1269,18 @@ dotraplinkage void __kprobes
>  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	unsigned long address;
>  
>  	prev_state = exception_enter();
> +
> +	/*
> +	 * The tracepoint processing could trigger another page
> +	 * fault (user space callchain reading) and destroy the
> +	 * original cr2 value, so read the faulting address now.
> +	 */
> +	address = read_cr2();
> +
>  	trace_page_fault_entries(regs, error_code);
> -	__do_page_fault(regs, error_code);
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
> -- 
> 1.7.11.7
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86 trace: Fix page fault tracing bug
  2014-02-28 15:33 [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
  2014-02-28 15:47 ` Peter Zijlstra
@ 2014-02-28 15:47 ` Jiri Olsa
  2014-02-28 16:00   ` Steven Rostedt
  2014-02-28 16:01 ` Steven Rostedt
  2 siblings, 1 reply; 115+ messages in thread
From: Jiri Olsa @ 2014-02-28 15:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Seiji Aguchi

On Fri, Feb 28, 2014 at 04:33:40PM +0100, Jiri Olsa wrote:
> The trace_do_page_fault function trigger tracepoint
> and then handles the actual page fault.
> 
> This could lead to error if the tracepoint caused page
> fault. The original cr2 value gets lost and the original
> page fault handler kills current process with SIGSEGV.
> 
> This happens if you record page faults with callchain
> data, the user part of it will cause tracepoint handler
> to page fault:
> 
>   # perf record -g -e exceptions:page_fault_user ls
> 
> Fixing this by saving the original cr2 value
> and using it after tracepoint handler is done.
> 

sry, forgot..

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

jirka

> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Seiji Aguchi <seiji.aguchi@hds.com>
> ---
>  arch/x86/mm/fault.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9d591c8..52fad6c 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1016,11 +1016,11 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
>   * routines.
>   */
>  static void __kprobes
> -__do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +__do_page_fault(struct pt_regs *regs, unsigned long error_code,
> +		unsigned long address)
>  {
>  	struct vm_area_struct *vma;
>  	struct task_struct *tsk;
> -	unsigned long address;
>  	struct mm_struct *mm;
>  	int fault;
>  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> @@ -1028,9 +1028,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  	tsk = current;
>  	mm = tsk->mm;
>  
> -	/* Get the faulting address: */
> -	address = read_cr2();
> -
>  	/*
>  	 * Detect and handle instructions that would cause a page fault for
>  	 * both a tracked kernel page and a userspace page.
> @@ -1248,9 +1245,14 @@ dotraplinkage void __kprobes
>  do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	unsigned long address;
>  
>  	prev_state = exception_enter();
> -	__do_page_fault(regs, error_code);
> +
> +	/* Get the faulting address: */
> +	address = read_cr2();
> +
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
>  
> @@ -1267,9 +1269,18 @@ dotraplinkage void __kprobes
>  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	unsigned long address;
>  
>  	prev_state = exception_enter();
> +
> +	/*
> +	 * The tracepoint processing could trigger another page
> +	 * fault (user space callchain reading) and destroy the
> +	 * original cr2 value, so read the faulting address now.
> +	 */
> +	address = read_cr2();
> +
>  	trace_page_fault_entries(regs, error_code);
> -	__do_page_fault(regs, error_code);
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
> -- 
> 1.7.11.7
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86 trace: Fix page fault tracing bug
  2014-02-28 15:47 ` [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
@ 2014-02-28 16:00   ` Steven Rostedt
  0 siblings, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 16:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Seiji Aguchi,
	Vince Weaver

Vince, can you test this patch instead. Seems that the bug you found was
found by others. You can remove all patches again, and modify this patch such
that the read of cr2 is before the exception_enter() call (in both locations)

On Fri, Feb 28, 2014 at 04:47:15PM +0100, Jiri Olsa wrote:
> On Fri, Feb 28, 2014 at 04:33:40PM +0100, Jiri Olsa wrote:
> > The trace_do_page_fault function trigger tracepoint
> > and then handles the actual page fault.
> > 
> > This could lead to error if the tracepoint caused page
> > fault. The original cr2 value gets lost and the original
> > page fault handler kills current process with SIGSEGV.
> > 
> > This happens if you record page faults with callchain
> > data, the user part of it will cause tracepoint handler
> > to page fault:
> > 
> >   # perf record -g -e exceptions:page_fault_user ls
> > 
> > Fixing this by saving the original cr2 value
> > and using it after tracepoint handler is done.
> > 
> 
> sry, forgot..
> 
> Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

Please also add:

Reported-by: Vince Weaver <vincent.weaver@maine.edu>

And after verificiation, add a Tested-by for him too. He's been doing a
lot to help us solve this bug (from a different angle).

You may also add:

Link: http://lkml.kernel.org/r/20140224172536.GD9987@twins.programming.kicks-ass.net

-- Steve


> 
> jirka
> 
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Seiji Aguchi <seiji.aguchi@hds.com>
> > ---
> >  arch/x86/mm/fault.c | 25 ++++++++++++++++++-------
> >  1 file changed, 18 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index 9d591c8..52fad6c 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1016,11 +1016,11 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
> >   * routines.
> >   */
> >  static void __kprobes
> > -__do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > +__do_page_fault(struct pt_regs *regs, unsigned long error_code,
> > +		unsigned long address)
> >  {
> >  	struct vm_area_struct *vma;
> >  	struct task_struct *tsk;
> > -	unsigned long address;
> >  	struct mm_struct *mm;
> >  	int fault;
> >  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> > @@ -1028,9 +1028,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  	tsk = current;
> >  	mm = tsk->mm;
> >  
> > -	/* Get the faulting address: */
> > -	address = read_cr2();
> > -
> >  	/*
> >  	 * Detect and handle instructions that would cause a page fault for
> >  	 * both a tracked kernel page and a userspace page.
> > @@ -1248,9 +1245,14 @@ dotraplinkage void __kprobes
> >  do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  {
> >  	enum ctx_state prev_state;
> > +	unsigned long address;
> >  
> >  	prev_state = exception_enter();
> > -	__do_page_fault(regs, error_code);
> > +
> > +	/* Get the faulting address: */
> > +	address = read_cr2();
> > +
> > +	__do_page_fault(regs, error_code, address);
> >  	exception_exit(prev_state);
> >  }
> >  
> > @@ -1267,9 +1269,18 @@ dotraplinkage void __kprobes
> >  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  {
> >  	enum ctx_state prev_state;
> > +	unsigned long address;
> >  
> >  	prev_state = exception_enter();
> > +
> > +	/*
> > +	 * The tracepoint processing could trigger another page
> > +	 * fault (user space callchain reading) and destroy the
> > +	 * original cr2 value, so read the faulting address now.
> > +	 */
> > +	address = read_cr2();
> > +
> >  	trace_page_fault_entries(regs, error_code);
> > -	__do_page_fault(regs, error_code);
> > +	__do_page_fault(regs, error_code, address);
> >  	exception_exit(prev_state);
> >  }
> > -- 
> > 1.7.11.7
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86 trace: Fix page fault tracing bug
  2014-02-28 15:33 [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
  2014-02-28 15:47 ` Peter Zijlstra
  2014-02-28 15:47 ` [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
@ 2014-02-28 16:01 ` Steven Rostedt
  2 siblings, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 16:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Seiji Aguchi,
	Vince Weaver

On Fri, Feb 28, 2014 at 04:33:40PM +0100, Jiri Olsa wrote:
> The trace_do_page_fault function trigger tracepoint
> and then handles the actual page fault.
> 
> This could lead to error if the tracepoint caused page
> fault. The original cr2 value gets lost and the original
> page fault handler kills current process with SIGSEGV.
> 
> This happens if you record page faults with callchain
> data, the user part of it will cause tracepoint handler
> to page fault:
> 
>   # perf record -g -e exceptions:page_fault_user ls
> 
> Fixing this by saving the original cr2 value
> and using it after tracepoint handler is done.
> 
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Seiji Aguchi <seiji.aguchi@hds.com>
> ---
>  arch/x86/mm/fault.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9d591c8..52fad6c 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1016,11 +1016,11 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
>   * routines.
>   */
>  static void __kprobes
> -__do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +__do_page_fault(struct pt_regs *regs, unsigned long error_code,
> +		unsigned long address)
>  {
>  	struct vm_area_struct *vma;
>  	struct task_struct *tsk;
> -	unsigned long address;
>  	struct mm_struct *mm;
>  	int fault;
>  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> @@ -1028,9 +1028,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  	tsk = current;
>  	mm = tsk->mm;
>  
> -	/* Get the faulting address: */
> -	address = read_cr2();
> -
>  	/*
>  	 * Detect and handle instructions that would cause a page fault for
>  	 * both a tracked kernel page and a userspace page.
> @@ -1248,9 +1245,14 @@ dotraplinkage void __kprobes
>  do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	unsigned long address;
>  
>  	prev_state = exception_enter();
> -	__do_page_fault(regs, error_code);
> +
> +	/* Get the faulting address: */
> +	address = read_cr2();
> +

As Peter already stated, the address = read_cr2() needs to go before the
exception_enter() call.

> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
>  
> @@ -1267,9 +1269,18 @@ dotraplinkage void __kprobes
>  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	unsigned long address;
>  
>  	prev_state = exception_enter();
> +
> +	/*
> +	 * The tracepoint processing could trigger another page
> +	 * fault (user space callchain reading) and destroy the
> +	 * original cr2 value, so read the faulting address now.
> +	 */
> +	address = read_cr2();

Same here.

After that, you can add:

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> +
>  	trace_page_fault_entries(regs, error_code);
> -	__do_page_fault(regs, error_code);
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCHv2] x86 trace: Fix page fault tracing bug
  2014-02-28 15:47 ` Peter Zijlstra
@ 2014-02-28 16:05   ` Jiri Olsa
  2014-02-28 16:11     ` H. Peter Anvin
                       ` (2 more replies)
  0 siblings, 3 replies; 115+ messages in thread
From: Jiri Olsa @ 2014-02-28 16:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Seiji Aguchi,
	Steven Rostedt, Vince Weaver

On Fri, Feb 28, 2014 at 04:47:08PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 04:33:40PM +0100, Jiri Olsa wrote:
> 
> While I like the idea of just pushing up the CR2 read; the below does
> the read too late still, exception_enter() also has a tracepoint in.

please check v2, thanks

jirka


---
The trace_do_page_fault function trigger tracepoint
and then handles the actual page fault.

This could lead to error if the tracepoint caused page
fault. The original cr2 value gets lost and the original
page fault handler kills current process with SIGSEGV.

This happens if you record page faults with callchain
data, the user part of it will cause tracepoint handler
to page fault:

  # perf record -g -e exceptions:page_fault_user ls

Fixing this by saving the original cr2 value
and using it after tracepoint handler is done.

v2: Moving the cr2 read before exception_enter, because
    it could trigger tracepoint as well.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/mm/fault.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9d591c8..dd59031 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1016,11 +1016,11 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * routines.
  */
 static void __kprobes
-__do_page_fault(struct pt_regs *regs, unsigned long error_code)
+__do_page_fault(struct pt_regs *regs, unsigned long error_code,
+		unsigned long address)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
-	unsigned long address;
 	struct mm_struct *mm;
 	int fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
@@ -1028,9 +1028,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	tsk = current;
 	mm = tsk->mm;
 
-	/* Get the faulting address: */
-	address = read_cr2();
-
 	/*
 	 * Detect and handle instructions that would cause a page fault for
 	 * both a tracked kernel page and a userspace page.
@@ -1248,9 +1245,11 @@ dotraplinkage void __kprobes
 do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	/* Get the faulting address: */
+	unsigned long address = read_cr2();
 
 	prev_state = exception_enter();
-	__do_page_fault(regs, error_code);
+	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
@@ -1267,9 +1266,16 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	/*
+	 * The exception_enter and tracepoint processing could
+	 * trigger another page faults (user space callchain
+	 * reading) and destroy the original cr2 value, so read
+	 * the faulting address now.
+	 */
+	unsigned long address = read_cr2();
 
 	prev_state = exception_enter();
 	trace_page_fault_entries(regs, error_code);
-	__do_page_fault(regs, error_code);
+	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCHv2] x86 trace: Fix page fault tracing bug
  2014-02-28 16:05   ` [PATCHv2] " Jiri Olsa
@ 2014-02-28 16:11     ` H. Peter Anvin
  2014-02-28 16:23       ` Steven Rostedt
  2014-02-28 16:15     ` Steven Rostedt
  2014-03-05  0:03     ` [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults tip-bot for Jiri Olsa
  2 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-28 16:11 UTC (permalink / raw)
  To: Jiri Olsa, Peter Zijlstra
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Seiji Aguchi, Steven Rostedt,
	Vince Weaver

On 02/28/2014 08:05 AM, Jiri Olsa wrote:
> On Fri, Feb 28, 2014 at 04:47:08PM +0100, Peter Zijlstra wrote:
>> On Fri, Feb 28, 2014 at 04:33:40PM +0100, Jiri Olsa wrote:
>>
>> While I like the idea of just pushing up the CR2 read; the below does
>> the read too late still, exception_enter() also has a tracepoint in.
> 
> please check v2, thanks

Is there any way something could come in even earlier (perhaps via
__fentry__)?

If so, do we need to hoist the reading of %cr2 all the way into assembly
or something else?

	-hpa


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 15:40                                                                         ` Peter Zijlstra
@ 2014-02-28 16:15                                                                           ` H. Peter Anvin
  2014-02-28 16:29                                                                             ` Steven Rostedt
  2014-02-28 20:55                                                                             ` Peter Zijlstra
  0 siblings, 2 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-28 16:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, Steven Rostedt, Linux Kernel, Ingo Molnar

On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
>> If I'm reading this right we end up going from the page fault
>> tracepoint to copy_from_user_nmi() without going through NMI, and the
>> cr2 corruption is obvious.  I guess the assumption that only the NMI
>> path needed to save cr2 is flawed?
> 
> It was never assumed it would only go through NMI, but that it would be
> NMI safe -- and as it turns out, it is that.
> 
> What I did assume was that any other callsites would be safe, seeing how
> they'd already be running in 'normal' contexts.
> 
> I had not considered people putting tracepoints _that_ early in the
> exception paths.
> 
> Note that there's more tracepoints there than the one mentioned.
> 

Well, I was talking about the assumption spelled out in the comment
above copy_from_user_nmi() which pretty much states "cr2 is safe because
cr2 is saved/restored in the NMI wrappers."

	-hpa

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCHv2] x86 trace: Fix page fault tracing bug
  2014-02-28 16:05   ` [PATCHv2] " Jiri Olsa
  2014-02-28 16:11     ` H. Peter Anvin
@ 2014-02-28 16:15     ` Steven Rostedt
  2014-03-05  0:03     ` [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults tip-bot for Jiri Olsa
  2 siblings, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 16:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Seiji Aguchi,
	Vince Weaver

Vince, you probably missed my other emails, as I sent from mutt, and
not my normal email client. I see that mutt uses my own box to send,
and I got these error messages (need to put back sending via my ISP
instead of my local mail server):

>>> vincent.weaver@maine.edu (after MAIL FROM): 550 5.5.4 <rostedt@home.goodmis.org>... Domain of sender address rostedt@home.goodmis.org refused. MX points to hosts without valid addresses. This violates RFC 1035/2181.  

-- Steve


On Fri, 28 Feb 2014 17:05:26 +0100
Jiri Olsa <jolsa@redhat.com> wrote:

> On Fri, Feb 28, 2014 at 04:47:08PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 28, 2014 at 04:33:40PM +0100, Jiri Olsa wrote:
> > 
> > While I like the idea of just pushing up the CR2 read; the below does
> > the read too late still, exception_enter() also has a tracepoint in.
> 
> please check v2, thanks
> 
> jirka
> 
> 
> ---
> The trace_do_page_fault function trigger tracepoint
> and then handles the actual page fault.
> 
> This could lead to error if the tracepoint caused page
> fault. The original cr2 value gets lost and the original
> page fault handler kills current process with SIGSEGV.
> 
> This happens if you record page faults with callchain
> data, the user part of it will cause tracepoint handler
> to page fault:
> 
>   # perf record -g -e exceptions:page_fault_user ls
> 
> Fixing this by saving the original cr2 value
> and using it after tracepoint handler is done.
> 
> v2: Moving the cr2 read before exception_enter, because
>     it could trigger tracepoint as well.
> 
> Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Seiji Aguchi <seiji.aguchi@hds.com>
> Cc: Vince Weaver <vincent.weaver@maine.edu>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
>  arch/x86/mm/fault.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9d591c8..dd59031 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1016,11 +1016,11 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
>   * routines.
>   */
>  static void __kprobes
> -__do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +__do_page_fault(struct pt_regs *regs, unsigned long error_code,
> +		unsigned long address)
>  {
>  	struct vm_area_struct *vma;
>  	struct task_struct *tsk;
> -	unsigned long address;
>  	struct mm_struct *mm;
>  	int fault;
>  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> @@ -1028,9 +1028,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  	tsk = current;
>  	mm = tsk->mm;
>  
> -	/* Get the faulting address: */
> -	address = read_cr2();
> -
>  	/*
>  	 * Detect and handle instructions that would cause a page fault for
>  	 * both a tracked kernel page and a userspace page.
> @@ -1248,9 +1245,11 @@ dotraplinkage void __kprobes
>  do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	/* Get the faulting address: */
> +	unsigned long address = read_cr2();
>  
>  	prev_state = exception_enter();
> -	__do_page_fault(regs, error_code);
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
>  
> @@ -1267,9 +1266,16 @@ dotraplinkage void __kprobes
>  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	/*
> +	 * The exception_enter and tracepoint processing could
> +	 * trigger another page faults (user space callchain
> +	 * reading) and destroy the original cr2 value, so read
> +	 * the faulting address now.
> +	 */
> +	unsigned long address = read_cr2();
>  
>  	prev_state = exception_enter();
>  	trace_page_fault_entries(regs, error_code);
> -	__do_page_fault(regs, error_code);
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCHv2] x86 trace: Fix page fault tracing bug
  2014-02-28 16:11     ` H. Peter Anvin
@ 2014-02-28 16:23       ` Steven Rostedt
  0 siblings, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 16:23 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jiri Olsa, Peter Zijlstra, linux-kernel, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, Seiji Aguchi,
	Vince Weaver

On Fri, 28 Feb 2014 08:11:59 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> If so, do we need to hoist the reading of %cr2 all the way into assembly
> or something else?

Function tracing code should not fault. Peter and Jiri were discussing
on IRC to make sure that perf could not enable userspace stack tracing
for function tracing.

At this point, I would say, lets get rid of any users of function
tracing callbacks that can fault. There callbacks are limited in
context even more so than any other context (faults, interrupts, and
NMIs).

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 16:15                                                                           ` H. Peter Anvin
@ 2014-02-28 16:29                                                                             ` Steven Rostedt
  2014-02-28 19:33                                                                               ` [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace() Steven Rostedt
  2014-02-28 20:56                                                                               ` perf_fuzzer compiled for x32 causes reboot Peter Zijlstra
  2014-02-28 20:55                                                                             ` Peter Zijlstra
  1 sibling, 2 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 16:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, 28 Feb 2014 08:15:11 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> Well, I was talking about the assumption spelled out in the comment
> above copy_from_user_nmi() which pretty much states "cr2 is safe because
> cr2 is saved/restored in the NMI wrappers."

Yeah, it seems that the name "copy_from_user_nmi()" is a misnomer. As
it can be called outside of nmi context. Perhaps we should have a
copy_from_user_trace() that does the save and restore of the cr2.

As that's the only place that faults, it may be the best answer.

Arnaldo,

Can you test this patch and see if it fixes the bug for you too. Vince
already said it fixes it for him.

I'm attaching it below (it's from H. Peter).

Peter Z., as both Jiri's and my patch fixed the callers of the problem
area, and as we have been discussing, there may be more problem areas,
I'm thinking the best solution is to just use H. Peter's patch instead.
And then we should rename it to copy_from_user_trace().

-- Steve


diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..938e45c 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -10,6 +10,8 @@
 #include <asm/word-at-a-time.h>
 #include <linux/sched.h>
 
+#include <asm/processor.h>
+
 /*
  * We rely on the nested NMI work to allow atomic faults from the NMI path; the
  * nested NMI paths are careful to preserve CR2.
@@ -18,6 +20,7 @@ unsigned long
 copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 {
 	unsigned long ret;
+	unsigned long cr2;
 
 	if (__range_not_ok(from, n, TASK_SIZE))
 		return 0;
@@ -27,9 +30,11 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	 * disable pagefaults so that its behaviour is consistent even when
 	 * called form other contexts.
 	 */
+	cr2 = read_cr2();
 	pagefault_disable();
 	ret = __copy_from_user_inatomic(to, from, n);
 	pagefault_enable();
+	write_cr2(cr2);
 
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace()
  2014-02-28 16:29                                                                             ` Steven Rostedt
@ 2014-02-28 19:33                                                                               ` Steven Rostedt
  2014-02-28 20:46                                                                                 ` Peter Zijlstra
  2014-02-28 20:56                                                                               ` perf_fuzzer compiled for x32 causes reboot Peter Zijlstra
  1 sibling, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 19:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

[ H. Peter, Here's the rename patch. I did not include your update. You
can add that first and then massage this patch on top. But this isn't
critical for mainline or stable, where as I believe your patch is. ]

The tracing utilities sometimes need to read from userspace (stack tracing),
and to do this it has as special copy_from_user function called
copy_from_user_nmi(). Well, as tracers can call this from outside of
nmi context, the "_nmi" part is a misnomer and "_trace" is a better
name.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/include/asm/perf_event.h          |  2 +-
 arch/x86/include/asm/uaccess.h             |  2 +-
 arch/x86/kernel/cpu/perf_event.c           |  4 ++--
 arch/x86/kernel/cpu/perf_event_intel_ds.c  |  2 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |  2 +-
 arch/x86/lib/usercopy.c                    | 10 ++++++----
 arch/x86/oprofile/backtrace.c              |  4 ++--
 7 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8249df4..7bf4b25 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -274,6 +274,6 @@ static inline void perf_check_microcode(void) { }
  static inline void amd_pmu_disable_virt(void) { }
 #endif
 
-#define arch_perf_out_copy_user copy_from_user_nmi
+#define arch_perf_out_copy_user copy_from_user_trace
 
 #endif /* _ASM_X86_PERF_EVENT_H */
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 8ec57c0..d734baf 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -515,7 +515,7 @@ struct __large_struct { unsigned long buf[100]; };
 	__put_user_size_ex((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
 
 extern unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n);
+copy_from_user_trace(void *to, const void __user *from, unsigned long n);
 extern __must_check long
 strncpy_from_user(char *dst, const char __user *src, long count);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8e13293..7eda4af 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1988,7 +1988,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
 		frame.next_frame     = 0;
 		frame.return_address = 0;
 
-		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		bytes = copy_from_user_trace(&frame, fp, sizeof(frame));
 		if (bytes != 0)
 			break;
 
@@ -2040,7 +2040,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 		frame.next_frame	     = NULL;
 		frame.return_address = 0;
 
-		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		bytes = copy_from_user_trace(&frame, fp, sizeof(frame));
 		if (bytes != 0)
 			break;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index ae96cfa..95d76c6 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -788,7 +788,7 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
 		u8 *buf = this_cpu_read(insn_buffer);
 
 		size = ip - to; /* Must fit our buffer, see above */
-		bytes = copy_from_user_nmi(buf, (void __user *)to, size);
+		bytes = copy_from_user_trace(buf, (void __user *)to, size);
 		if (bytes != 0)
 			return 0;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..0505ada 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -490,7 +490,7 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
 			return X86_BR_NONE;
 
 		/* may fail if text not present */
-		bytes = copy_from_user_nmi(buf, (void __user *)from, size);
+		bytes = copy_from_user_trace(buf, (void __user *)from, size);
 		if (bytes != 0)
 			return X86_BR_NONE;
 
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ddf9ecb..131ff16e 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -11,11 +11,13 @@
 #include <linux/sched.h>
 
 /*
- * We rely on the nested NMI work to allow atomic faults from the NMI path; the
- * nested NMI paths are careful to preserve CR2.
+ * Used by tracing, needs to restore state of the cr2 register if
+ * the copy triggered a page fault. That's because tracing can happen
+ * between the time a normal page fault occurred and the cr2 is read
+ * by the handler.
  */
 unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
+copy_from_user_trace(void *to, const void __user *from, unsigned long n)
 {
 	unsigned long ret;
 
@@ -33,4 +35,4 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(copy_from_user_nmi);
+EXPORT_SYMBOL_GPL(copy_from_user_trace);
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 5d04be5..7831650 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -46,7 +46,7 @@ dump_user_backtrace_32(struct stack_frame_ia32 *head)
 	struct stack_frame_ia32 *fp;
 	unsigned long bytes;
 
-	bytes = copy_from_user_nmi(bufhead, head, sizeof(bufhead));
+	bytes = copy_from_user_trace(bufhead, head, sizeof(bufhead));
 	if (bytes != 0)
 		return NULL;
 
@@ -92,7 +92,7 @@ static struct stack_frame *dump_user_backtrace(struct stack_frame *head)
 	struct stack_frame bufhead[2];
 	unsigned long bytes;
 
-	bytes = copy_from_user_nmi(bufhead, head, sizeof(bufhead));
+	bytes = copy_from_user_trace(bufhead, head, sizeof(bufhead));
 	if (bytes != 0)
 		return NULL;
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28  1:00                                                                 ` Vince Weaver
@ 2014-02-28 20:34                                                                   ` Paul E. McKenney
  2014-02-28 20:47                                                                     ` Steven Rostedt
  0 siblings, 1 reply; 115+ messages in thread
From: Paul E. McKenney @ 2014-02-28 20:34 UTC (permalink / raw)
  To: Vince Weaver
  Cc: H. Peter Anvin, Steven Rostedt, Peter Zijlstra, Linux Kernel,
	Ingo Molnar

On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
> On Thu, 27 Feb 2014, H. Peter Anvin wrote:
> 
> > On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > > On Thu, 27 Feb 2014 14:52:54 -0800
> > > "H. Peter Anvin" <hpa@zytor.com> wrote:
> > > 
> > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > >>>
> > >>> Yeah, something is getting mesed up.
> > >>>
> > >>
> > >> What it *looks* like to me is that we try to nest the cr2 save/restore,
> > >> which doesn't nest because it is a percpu variable.
> > >>
> > >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> > >> entry_64.S, which makes the stuff in do_nmi completely redundant and
> > >> there for no good reason.
> > > 
> > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> > > section. That is, it isn't even executed. That's i386 code. The only
> > > place the cr2 is saved for x86_64 is in entry_64.S.
> > > 
> > 
> > Right, egg on my face.  However, I still think it would make more sense
> > for it to nest the way entry_64.S does if at all possible.
> > 
> > That makes this even more confusing, though.  I would still like to see
> > what happens with the patch I sent Vince.
> 
> I'll try your patch momentarily, first I had some other changes I started 
> running before I left work (for some reason it recompiled the whole 
> kernel).
> 
> 8: function:             perf_output_begin
> 8: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e
> 8: bputs:                perf_output_begin: VMW: before rcu_dereference
> 9: function:             __do_page_fault
> 9: function:                down_read_trylock
> 9: function:                _cond_resched
> 9: function:                find_vma
> 
> so it looks like the fault happens 
> 
> rcu_read_lock();
> 
> 116         /*
> 117          * For inherited events we send all the output towards the parent.
> 118          */
> 119         if (event->parent)
> 120                 event = event->parent;
> 121 
> 
> somewhere between here
> 
> 122         rb = rcu_dereference(event->rb);
> 123         if (unlikely(!rb))
> 124                 goto out;
> 
> and here
> 
> 125 
> 126         if (unlikely(!rb->nr_pages))
> 127                 goto out;
> 
> although if rcu locks do anything to turn off tracing then this could be 
> suspect.

The most likely suspect is of course event->rb in the rcu_dereference.
I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
currently interact with tracing.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 15:07                                                                     ` Vince Weaver
  2014-02-28 15:13                                                                       ` H. Peter Anvin
  2014-02-28 15:20                                                                       ` Steven Rostedt
@ 2014-02-28 20:38                                                                       ` H. Peter Anvin
  2014-02-28 20:46                                                                         ` Steven Rostedt
  2014-02-28 21:18                                                                         ` Vince Weaver
  2 siblings, 2 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-02-28 20:38 UTC (permalink / raw)
  To: Vince Weaver, Steven Rostedt; +Cc: Peter Zijlstra, Linux Kernel, Ingo Molnar

Now we need to figure out if the reboot problem and the segfault problem are actually the same... I have a nasty feeling they might be different problems.

On February 28, 2014 7:07:29 AM PST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>On Fri, 28 Feb 2014, Steven Rostedt wrote:
>
>> Interesting. Are you doing a perf function trace?
>> 
>> And just in case, can you add this patch and make sure the copy is
>> called by NMI.
>
>199.900682: function:             trace_do_page_fault
>199.900683: page_fault_user:      address=__per_cpu_end
>ip=__per_cpu_end error_code=0x6
>199.900683: function:             perf_swevent_get_recursion_context
>199.900684: function:             perf_tp_event
>199.900684: function:                perf_swevent_event
>199.900684: function:                   perf_swevent_overflow
>199.900684: function:                      __perf_event_overflow
>199.900685: function:                         perf_prepare_sample
>199.900685: function:                           
>__perf_event_header__init_id
>199.900685: function:                               task_tgid_nr_ns
>199.900685: function:                               perf_event_tid
>199.900686: function:                                  __task_pid_nr_ns
>199.900686: function:                            perf_callchain
>199.900687: function:             copy_from_user_nmi
>199.900687: function:             trace_do_page_fault
>199.900687: page_fault_kernel:    address=irq_stack_union
>ip=copy_user_generic_string error_code=0x0
>199.900688: function:                __do_page_fault
>199.900688: function:                   bad_area_nosemaphore
>199.900688: function:                      __bad_area_nosemaphore
>199.900689: function:                         no_context
>199.900689: function:                            fixup_exception
>199.900689: function:                              
>search_exception_tables
>199.900689: function:                                  search_extable
>199.900691: function:             copy_user_handle_tail
>199.900691: function:             trace_do_page_fault
>199.900691: page_fault_kernel:    address=irq_stack_union
>ip=copy_user_handle_tail error_code=0x0
>199.900691: function:                __do_page_fault
>199.900692: function:                   bad_area_nosemaphore
>199.900692: function:                      __bad_area_nosemaphore
>199.900692: function:                         no_context
>199.900692: function:                            fixup_exception
>199.900692: function:                              
>search_exception_tables
>199.900692: function:                                  search_extable
>199.900693: function:             save_stack_trace
>199.900693: function:                dump_trace
>199.900694: function:                   print_context_stack
>199.900694: function:                      __kernel_text_address
>199.900694: function:                         is_module_text_address
>199.900695: function:                            __module_text_address
>199.900695: function:                               __module_address
>199.900695: function:                      __kernel_text_address
>199.900695: function:                         is_module_text_address
>199.900696: function:                            __module_text_address
>199.900696: function:                               __module_address
>...
>199.900705: function:                      __kernel_text_address
>199.900809: kernel_stack:         <stack trace>
>=> perf_callchain (ffffffff810d35a2)
>=> perf_prepare_sample (ffffffff810cfae3)
>=> __perf_event_overflow (ffffffff810d02f4)
>=> perf_swevent_overflow (ffffffff810d04e3)
>=> perf_swevent_event (ffffffff810d0574)
>=> perf_tp_event (ffffffff810d070c)
>=> perf_trace_x86_exceptions (ffffffff810341b6)
>=> trace_do_page_fault (ffffffff81537702)
>=> trace_page_fault (ffffffff81534772)
>199.900810: function:             perf_output_begin
>199.900810: function:             __do_page_fault
>199.900810: function:                __perf_sw_event
>199.900810: function:                  
>perf_swevent_get_recursion_context
>199.900811: function:                down_read_trylock
>199.900811: function:                _cond_resched
>199.900811: function:                find_vma
>199.900811: function:                bad_area
>199.900812: function:                   up_read
>199.900812: function:                   __bad_area_nosemaphore
>199.900812: function:                      is_prefetch
>199.900812: function:                         convert_ip_to_linear
>199.900813: function:                      unhandled_signal
>199.900813: function:                      __printk_ratelimit
>199.900813: function:             _raw_spin_trylock
>199.900813: function:             _raw_spin_unlock_irqrestore
>199.900814: function:                      printk
>199.900814: function:                         vprintk_emit

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace()
  2014-02-28 19:33                                                                               ` [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace() Steven Rostedt
@ 2014-02-28 20:46                                                                                 ` Peter Zijlstra
  2014-02-28 20:51                                                                                   ` Steven Rostedt
  2014-02-28 21:01                                                                                   ` Steven Rostedt
  0 siblings, 2 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, Feb 28, 2014 at 02:33:16PM -0500, Steven Rostedt wrote:
> [ H. Peter, Here's the rename patch. I did not include your update. You
> can add that first and then massage this patch on top. But this isn't
> critical for mainline or stable, where as I believe your patch is. ]
> 
> The tracing utilities sometimes need to read from userspace (stack tracing),
> and to do this it has as special copy_from_user function called
> copy_from_user_nmi(). Well, as tracers can call this from outside of
> nmi context, the "_nmi" part is a misnomer and "_trace" is a better
> name.

NAK, spin_lock_irq() is very much an IRQ safe lock. Similarly
copy_from_user_nmi() is an NMI safe copy from user.

Furthermore, there's exactly 0 trace users, so the proposed name is
actively worse.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 20:38                                                                       ` H. Peter Anvin
@ 2014-02-28 20:46                                                                         ` Steven Rostedt
  2014-02-28 21:18                                                                         ` Vince Weaver
  1 sibling, 0 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 20:46 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Vince Weaver, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014 12:38:52 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> Now we need to figure out if the reboot problem and the segfault problem are actually the same... I have a nasty feeling they might be different problems.

I wonder if there was any recursion problem. Although, I believe perf
has recursion checks. But perhaps something recursed before the checks?

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 20:34                                                                   ` Paul E. McKenney
@ 2014-02-28 20:47                                                                     ` Steven Rostedt
  2014-02-28 20:54                                                                       ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 20:47 UTC (permalink / raw)
  To: paulmck
  Cc: Vince Weaver, H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014 12:34:05 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Thu, Feb 27, 2014 at 08:00:04PM -0500, Vince Weaver wrote:
> > On Thu, 27 Feb 2014, H. Peter Anvin wrote:
> > 
> > > On 02/27/2014 03:30 PM, Steven Rostedt wrote:
> > > > On Thu, 27 Feb 2014 14:52:54 -0800
> > > > "H. Peter Anvin" <hpa@zytor.com> wrote:
> > > > 
> > > >> On 02/27/2014 02:31 PM, Steven Rostedt wrote:
> > > >>>
> > > >>> Yeah, something is getting mesed up.
> > > >>>
> > > >>
> > > >> What it *looks* like to me is that we try to nest the cr2 save/restore,
> > > >> which doesn't nest because it is a percpu variable.
> > > >>
> > > >> ... except in the x86-64 case, we *ALSO* save/restore cr2 inside
> > > >> entry_64.S, which makes the stuff in do_nmi completely redundant and
> > > >> there for no good reason.
> > > > 
> > > > Peter, look at the code. That percpu cr2 is in a #ifdef CONFIG_X86_32
> > > > section. That is, it isn't even executed. That's i386 code. The only
> > > > place the cr2 is saved for x86_64 is in entry_64.S.
> > > > 
> > > 
> > > Right, egg on my face.  However, I still think it would make more sense
> > > for it to nest the way entry_64.S does if at all possible.
> > > 
> > > That makes this even more confusing, though.  I would still like to see
> > > what happens with the patch I sent Vince.
> > 
> > I'll try your patch momentarily, first I had some other changes I started 
> > running before I left work (for some reason it recompiled the whole 
> > kernel).
> > 
> > 8: function:             perf_output_begin
> > 8: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e
> > 8: bputs:                perf_output_begin: VMW: before rcu_dereference
> > 9: function:             __do_page_fault
> > 9: function:                down_read_trylock
> > 9: function:                _cond_resched
> > 9: function:                find_vma
> > 
> > so it looks like the fault happens 
> > 
> > rcu_read_lock();
> > 
> > 116         /*
> > 117          * For inherited events we send all the output towards the parent.
> > 118          */
> > 119         if (event->parent)
> > 120                 event = event->parent;
> > 121 
> > 
> > somewhere between here
> > 
> > 122         rb = rcu_dereference(event->rb);
> > 123         if (unlikely(!rb))
> > 124                 goto out;
> > 
> > and here
> > 
> > 125 
> > 126         if (unlikely(!rb->nr_pages))
> > 127                 goto out;
> > 
> > although if rcu locks do anything to turn off tracing then this could be 
> > suspect.
> 
> The most likely suspect is of course event->rb in the rcu_dereference.
> I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
> currently interact with tracing.  ;-)

These are all perf related. You'll need to defer to Peter Zijlstra ;-)

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace()
  2014-02-28 20:46                                                                                 ` Peter Zijlstra
@ 2014-02-28 20:51                                                                                   ` Steven Rostedt
  2014-02-28 20:58                                                                                     ` Peter Zijlstra
  2014-02-28 21:01                                                                                   ` Steven Rostedt
  1 sibling, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 20:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, 28 Feb 2014 21:46:21 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Feb 28, 2014 at 02:33:16PM -0500, Steven Rostedt wrote:
> > [ H. Peter, Here's the rename patch. I did not include your update. You
> > can add that first and then massage this patch on top. But this isn't
> > critical for mainline or stable, where as I believe your patch is. ]
> > 
> > The tracing utilities sometimes need to read from userspace (stack tracing),
> > and to do this it has as special copy_from_user function called
> > copy_from_user_nmi(). Well, as tracers can call this from outside of
> > nmi context, the "_nmi" part is a misnomer and "_trace" is a better
> > name.
> 
> NAK, spin_lock_irq() is very much an IRQ safe lock. Similarly
> copy_from_user_nmi() is an NMI safe copy from user.
> 
> Furthermore, there's exactly 0 trace users, so the proposed name is
> actively worse.

Heh, I consider perf and oprofile special tracers ;-)

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 20:47                                                                     ` Steven Rostedt
@ 2014-02-28 20:54                                                                       ` Peter Zijlstra
  2014-02-28 21:17                                                                         ` Paul E. McKenney
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 20:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: paulmck, Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
> > > I'll try your patch momentarily, first I had some other changes I started 
> > > running before I left work (for some reason it recompiled the whole 
> > > kernel).
> > > 
> > > 8: function:             perf_output_begin
> > > 8: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e
> > > 8: bputs:                perf_output_begin: VMW: before rcu_dereference
> > > 9: function:             __do_page_fault
> > > 9: function:                down_read_trylock
> > > 9: function:                _cond_resched
> > > 9: function:                find_vma
> > > 
> > > so it looks like the fault happens 
> > > 
> > > rcu_read_lock();
> > > 
> > > 116         /*
> > > 117          * For inherited events we send all the output towards the parent.
> > > 118          */
> > > 119         if (event->parent)
> > > 120                 event = event->parent;
> > > 121 
> > > 
> > > somewhere between here
> > > 
> > > 122         rb = rcu_dereference(event->rb);
> > > 123         if (unlikely(!rb))
> > > 124                 goto out;
> > > 
> > > and here
> > > 
> > > 125 
> > > 126         if (unlikely(!rb->nr_pages))
> > > 127                 goto out;
> > > 
> > > although if rcu locks do anything to turn off tracing then this could be 
> > > suspect.
> > 
> > The most likely suspect is of course event->rb in the rcu_dereference.
> > I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
> > currently interact with tracing.  ;-)
> 
> These are all perf related. You'll need to defer to Peter Zijlstra ;-)

I'm lost.. :/

pretty much all perf objects are RCU freed. 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 16:15                                                                           ` H. Peter Anvin
  2014-02-28 16:29                                                                             ` Steven Rostedt
@ 2014-02-28 20:55                                                                             ` Peter Zijlstra
  1 sibling, 0 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 20:55 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Vince Weaver, Steven Rostedt, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 08:15:11AM -0800, H. Peter Anvin wrote:
> On 02/28/2014 07:40 AM, Peter Zijlstra wrote:
> > On Fri, Feb 28, 2014 at 07:13:06AM -0800, H. Peter Anvin wrote:
> >> If I'm reading this right we end up going from the page fault
> >> tracepoint to copy_from_user_nmi() without going through NMI, and the
> >> cr2 corruption is obvious.  I guess the assumption that only the NMI
> >> path needed to save cr2 is flawed?
> > 
> > It was never assumed it would only go through NMI, but that it would be
> > NMI safe -- and as it turns out, it is that.
> > 
> > What I did assume was that any other callsites would be safe, seeing how
> > they'd already be running in 'normal' contexts.
> > 
> > I had not considered people putting tracepoints _that_ early in the
> > exception paths.
> > 
> > Note that there's more tracepoints there than the one mentioned.
> > 
> 
> Well, I was talking about the assumption spelled out in the comment
> above copy_from_user_nmi() which pretty much states "cr2 is safe because
> cr2 is saved/restored in the NMI wrappers."

That is because we assumed NMI was the only thing that could interrupt a
fault handler before it would read CR2. Never thinking someone would put
a tracepoint there.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 16:29                                                                             ` Steven Rostedt
  2014-02-28 19:33                                                                               ` [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace() Steven Rostedt
@ 2014-02-28 20:56                                                                               ` Peter Zijlstra
  2014-02-28 21:06                                                                                 ` Steven Rostedt
  1 sibling, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 20:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, Feb 28, 2014 at 11:29:46AM -0500, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 08:15:11 -0800
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
> > Well, I was talking about the assumption spelled out in the comment
> > above copy_from_user_nmi() which pretty much states "cr2 is safe because
> > cr2 is saved/restored in the NMI wrappers."
> 
> Yeah, it seems that the name "copy_from_user_nmi()" is a misnomer. As
> it can be called outside of nmi context. Perhaps we should have a
> copy_from_user_trace() that does the save and restore of the cr2.
> 
> As that's the only place that faults, it may be the best answer.
> 
> Arnaldo,
> 
> Can you test this patch and see if it fixes the bug for you too. Vince
> already said it fixes it for him.
> 
> I'm attaching it below (it's from H. Peter).
> 
> Peter Z., as both Jiri's and my patch fixed the callers of the problem
> area, and as we have been discussing, there may be more problem areas,
> I'm thinking the best solution is to just use H. Peter's patch instead.
> And then we should rename it to copy_from_user_trace().

Like already said; _trace is an absolutely abysmal name. Also you
_really_ don't want an unconditional CR2 write in there, that's just
stupidly expensive.

Also, this function is called a _LOT_ under certain workloads, I don't
know how cheap a CR2 read is, but it had better be really cheap.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace()
  2014-02-28 20:51                                                                                   ` Steven Rostedt
@ 2014-02-28 20:58                                                                                     ` Peter Zijlstra
  0 siblings, 0 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 20:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, Feb 28, 2014 at 03:51:30PM -0500, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 21:46:21 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Feb 28, 2014 at 02:33:16PM -0500, Steven Rostedt wrote:
> > > [ H. Peter, Here's the rename patch. I did not include your update. You
> > > can add that first and then massage this patch on top. But this isn't
> > > critical for mainline or stable, where as I believe your patch is. ]
> > > 
> > > The tracing utilities sometimes need to read from userspace (stack tracing),
> > > and to do this it has as special copy_from_user function called
> > > copy_from_user_nmi(). Well, as tracers can call this from outside of
> > > nmi context, the "_nmi" part is a misnomer and "_trace" is a better
> > > name.
> > 
> > NAK, spin_lock_irq() is very much an IRQ safe lock. Similarly
> > copy_from_user_nmi() is an NMI safe copy from user.
> > 
> > Furthermore, there's exactly 0 trace users, so the proposed name is
> > actively worse.
> 
> Heh, I consider perf and oprofile special tracers ;-)

They're first and foremost profilers; which is an entirely different
thing. But other people seem to get confused on this distinction too.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace()
  2014-02-28 20:46                                                                                 ` Peter Zijlstra
  2014-02-28 20:51                                                                                   ` Steven Rostedt
@ 2014-02-28 21:01                                                                                   ` Steven Rostedt
  2014-02-28 21:17                                                                                     ` Peter Zijlstra
  1 sibling, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 21:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, 28 Feb 2014 21:46:21 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> NAK, spin_lock_irq() is very much an IRQ safe lock. Similarly

Actually, that's not true. It doesn't mean it is an IRQ safe lock. That
would mean it is safe to use in IRQs. What it does mean is that it
disables irqs and spin_unlock_irq() enables irqs. And we have
spin_lock_irqsave() which disables irqs and saves the flags for a
spin_lock_irqrestore()

> copy_from_user_nmi() is an NMI safe copy from user.
> 
> Furthermore, there's exactly 0 trace users, so the proposed name is
> actively worse.

OK, I'll go with that. So instead of an _nmi() which it has nothing to
do with NMIs, how about making it like spin_lock_irq() and a more
descriptive name about what it does?

 copy_from_user_savecr2()

?

-- Steve


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 20:56                                                                               ` perf_fuzzer compiled for x32 causes reboot Peter Zijlstra
@ 2014-02-28 21:06                                                                                 ` Steven Rostedt
  2014-03-01  9:16                                                                                   ` Ingo Molnar
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, 28 Feb 2014 21:56:38 +0100
Peter Zijlstra <peterz@infradead.org> wrote:


> Like already said; _trace is an absolutely abysmal name. Also you
> _really_ don't want an unconditional CR2 write in there, that's just
> stupidly expensive.

But a read isn't. Which is why we only do a write if the copy caused a
page fault (which is already expensive).

The proposed patch will have:

	if (cr2 != read_cr2())
		write_cr2(cr2);

> 
> Also, this function is called a _LOT_ under certain workloads, I don't
> know how cheap a CR2 read is, but it had better be really cheap.

That's a HPA question.

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace()
  2014-02-28 21:01                                                                                   ` Steven Rostedt
@ 2014-02-28 21:17                                                                                     ` Peter Zijlstra
  0 siblings, 0 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 21:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Vince Weaver, Linux Kernel, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Fri, Feb 28, 2014 at 04:01:39PM -0500, Steven Rostedt wrote:
> OK, I'll go with that. So instead of an _nmi() which it has nothing to
> do with NMIs, 

Well, how is 'safe to use from NMI' nothing to do with NMIs?

> how about making it like spin_lock_irq() and a more
> descriptive name about what it does?
> 
>  copy_from_user_savecr2()

So how expensive are these CR2 reads? Because if they're more expensive
than one or two cycles (normal instructions) I'd really rather not put
them in there.

I seem to have missed what was wrong with Jiri's patch. Are we worried
about the function trace events?

So IF we're going to push them into this side of things; I'd really
rather see the CR2 foo in perf_callchain_user() and make
arch_perf_out_copy_user() a real function which includes the CR2 goo
too.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 20:54                                                                       ` Peter Zijlstra
@ 2014-02-28 21:17                                                                         ` Paul E. McKenney
  2014-02-28 21:27                                                                           ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Paul E. McKenney @ 2014-02-28 21:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 09:54:09PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 03:47:16PM -0500, Steven Rostedt wrote:
> > > > I'll try your patch momentarily, first I had some other changes I started 
> > > > running before I left work (for some reason it recompiled the whole 
> > > > kernel).
> > > > 
> > > > 8: function:             perf_output_begin
> > > > 8: bprint:               perf_output_begin: VMW: event type 2 config 2a st: 2c3e
> > > > 8: bputs:                perf_output_begin: VMW: before rcu_dereference
> > > > 9: function:             __do_page_fault
> > > > 9: function:                down_read_trylock
> > > > 9: function:                _cond_resched
> > > > 9: function:                find_vma
> > > > 
> > > > so it looks like the fault happens 
> > > > 
> > > > rcu_read_lock();
> > > > 
> > > > 116         /*
> > > > 117          * For inherited events we send all the output towards the parent.
> > > > 118          */
> > > > 119         if (event->parent)
> > > > 120                 event = event->parent;
> > > > 121 
> > > > 
> > > > somewhere between here
> > > > 
> > > > 122         rb = rcu_dereference(event->rb);
> > > > 123         if (unlikely(!rb))
> > > > 124                 goto out;
> > > > 
> > > > and here
> > > > 
> > > > 125 
> > > > 126         if (unlikely(!rb->nr_pages))
> > > > 127                 goto out;
> > > > 
> > > > although if rcu locks do anything to turn off tracing then this could be 
> > > > suspect.
> > > 
> > > The most likely suspect is of course event->rb in the rcu_dereference.
> > > I have to defer to Steven on how rcu_read_lock() and rcu_read_unlock()
> > > currently interact with tracing.  ;-)
> > 
> > These are all perf related. You'll need to defer to Peter Zijlstra ;-)
> 
> I'm lost.. :/
> 
> pretty much all perf objects are RCU freed. 

This code isn't running in idle context is it?  If so, RCU will happily
free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
though.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 20:38                                                                       ` H. Peter Anvin
  2014-02-28 20:46                                                                         ` Steven Rostedt
@ 2014-02-28 21:18                                                                         ` Vince Weaver
  2014-02-28 21:30                                                                           ` Steven Rostedt
  1 sibling, 1 reply; 115+ messages in thread
From: Vince Weaver @ 2014-02-28 21:18 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vince Weaver, Steven Rostedt, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014, H. Peter Anvin wrote:

> Now we need to figure out if the reboot problem and the segfault problem 
> are actually the same... I have a nasty feeling they might be different 
> problems.

I'm currently running a script that tries setting EBP to all possible 
32-bit pages and running the test to see if that triggers anything.

If I look at my notes the original reboot crash might have happened when I 
had the fuzzer also generating overflow signals (my current tests do not) 
so I'm not sure if having all this mess triggered from inside a signal 
handler could make it reboot somehow.

I was away from the computer this afternoon and of course I have scores of 
e-mails on this topic now with lots of competing patches.  Is there one 
in particular I'm supposed to be testing?

Thanks,

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 21:17                                                                         ` Paul E. McKenney
@ 2014-02-28 21:27                                                                           ` Peter Zijlstra
  2014-02-28 21:51                                                                             ` Paul E. McKenney
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 21:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> This code isn't running in idle context is it?  If so, RCU will happily
> free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
> though.

Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
entry code deals with the idle state AFAIK.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 21:18                                                                         ` Vince Weaver
@ 2014-02-28 21:30                                                                           ` Steven Rostedt
  2014-02-28 23:34                                                                             ` Vince Weaver
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 21:30 UTC (permalink / raw)
  To: Vince Weaver; +Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
Vince Weaver <vincent.weaver@maine.edu> wrote:

> I was away from the computer this afternoon and of course I have scores of 
> e-mails on this topic now with lots of competing patches.  Is there one 
> in particular I'm supposed to be testing?

I was poking fun at you on IRC for this exact reason:

<rostedt> poor Vince, I keep sending him new patches. "No, don't test this patch, now test this one. Oh wait, try this one instead"
* peterz sees Vince thinking: "stop... sending.. me.. damn... patches... already... !!!11!"
<rostedt> or at least, "Let me finish this test before I cancel it again for another damn patch"
<rostedt> then he's probably doing "I'm not going to run any tests now, until I wait a while to see if there's a new patch to test"

 :-)

Anyway, I would say wait a bit while we sort this out. At least we have
a strong idea of what the bug is. Now we need to agree on the solution.

Vince, you have been really terrific in helping us figure out what was
wrong. I don't want to waste your time until we can all agree on what
the proper fix is. When we do, it would be great if you can test it.

But for now, have a great weekend and thanks for all your hard work in
narrowing down this issue.

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 21:27                                                                           ` Peter Zijlstra
@ 2014-02-28 21:51                                                                             ` Paul E. McKenney
  2014-02-28 21:55                                                                               ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Paul E. McKenney @ 2014-02-28 21:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > This code isn't running in idle context is it?  If so, RCU will happily
> > free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
> > though.
> 
> Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> entry code deals with the idle state AFAIK.

Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
about this being code invoked from some energy-efficiency driver or
another within the idle loop.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 21:51                                                                             ` Paul E. McKenney
@ 2014-02-28 21:55                                                                               ` Peter Zijlstra
  2014-02-28 22:05                                                                                 ` Steven Rostedt
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-02-28 21:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > This code isn't running in idle context is it?  If so, RCU will happily
> > > free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
> > > though.
> > 
> > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> > entry code deals with the idle state AFAIK.
> 
> Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
> about this being code invoked from some energy-efficiency driver or
> another within the idle loop.

Right, so any tracepoint can end up there; but I thought there was
already the rule that tracepoints needed RCU enabled.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 21:55                                                                               ` Peter Zijlstra
@ 2014-02-28 22:05                                                                                 ` Steven Rostedt
  2014-02-28 22:23                                                                                   ` Paul E. McKenney
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-02-28 22:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Vince Weaver, H. Peter Anvin, Linux Kernel,
	Ingo Molnar

On Fri, 28 Feb 2014 22:55:11 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > > This code isn't running in idle context is it?  If so, RCU will happily
> > > > free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
> > > > though.
> > > 
> > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> > > entry code deals with the idle state AFAIK.
> > 
> > Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
> > about this being code invoked from some energy-efficiency driver or
> > another within the idle loop.
> 
> Right, so any tracepoint can end up there; but I thought there was
> already the rule that tracepoints needed RCU enabled.

There is and we have special tracepoint caller for those cases we want a
tracepoint out of RCU scope. These will reactivate rcu in the
tracepoint code.

  trace_<tp_name>_rcuidle(...)

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 22:05                                                                                 ` Steven Rostedt
@ 2014-02-28 22:23                                                                                   ` Paul E. McKenney
  0 siblings, 0 replies; 115+ messages in thread
From: Paul E. McKenney @ 2014-02-28 22:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Vince Weaver, H. Peter Anvin, Linux Kernel, Ingo Molnar

On Fri, Feb 28, 2014 at 05:05:53PM -0500, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 22:55:11 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Feb 28, 2014 at 01:51:50PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2014 at 10:27:00PM +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 28, 2014 at 01:17:33PM -0800, Paul E. McKenney wrote:
> > > > > This code isn't running in idle context is it?  If so, RCU will happily
> > > > > free out from under it.  CONFIG_PROVE_RCU should detect this sort of thing,
> > > > > though.
> > > > 
> > > > Well, interrupts/NMIs can happen when idle, but the interrupt/NMI
> > > > entry code deals with the idle state AFAIK.
> > > 
> > > Yep, rcu_irq_enter() and rcu_nmi_enter() deal with that.  More worried
> > > about this being code invoked from some energy-efficiency driver or
> > > another within the idle loop.
> > 
> > Right, so any tracepoint can end up there; but I thought there was
> > already the rule that tracepoints needed RCU enabled.
> 
> There is and we have special tracepoint caller for those cases we want a
> tracepoint out of RCU scope. These will reactivate rcu in the
> tracepoint code.
> 
>   trace_<tp_name>_rcuidle(...)

OK, I finally looked at the emails leading up to this in the thread...
I believe that I am doing premature debugging.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 21:30                                                                           ` Steven Rostedt
@ 2014-02-28 23:34                                                                             ` Vince Weaver
  2014-03-01  0:43                                                                               ` H. Peter Anvin
  2014-03-01  3:36                                                                               ` Steven Rostedt
  0 siblings, 2 replies; 115+ messages in thread
From: Vince Weaver @ 2014-02-28 23:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vince Weaver, H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014, Steven Rostedt wrote:

> On Fri, 28 Feb 2014 16:18:23 -0500 (EST)
> Vince Weaver <vincent.weaver@maine.edu> wrote:
> 
> > I was away from the computer this afternoon and of course I have scores of 
> > e-mails on this topic now with lots of competing patches.  Is there one 
> > in particular I'm supposed to be testing?
> 
> I was poking fun at you on IRC for this exact reason:
> 
> <rostedt> poor Vince, I keep sending him new patches. "No, don't test this patch, now test this one. Oh wait, try this one instead"
> * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... already... !!!11!"
> <rostedt> or at least, "Let me finish this test before I cancel it again for another damn patch"
> <rostedt> then he's probably doing "I'm not going to run any tests now, until I wait a while to see if there's a new patch to test"

Well while it might appear that I spend all of my days finding perf_event 
bugs, I actually am a college professor so I do occasionally have to run 
off to teach a class, meet with students, or write papers/grants for other 
academics to reject.

It's nice others can reproduce the issue now, it would have saved me a lot 
of trouble, although now in theory I have a much better handle of how to 
use/abuse ftrace so I guess it was worth it.

Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
know if there are any other issues left.  I do look forward to the day 
when I can leave it running overnight and have a clean syslog the next 
morning.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 23:34                                                                             ` Vince Weaver
@ 2014-03-01  0:43                                                                               ` H. Peter Anvin
  2014-03-01  3:36                                                                               ` Steven Rostedt
  1 sibling, 0 replies; 115+ messages in thread
From: H. Peter Anvin @ 2014-03-01  0:43 UTC (permalink / raw)
  To: Vince Weaver, Steven Rostedt; +Cc: Peter Zijlstra, Linux Kernel, Ingo Molnar

On 02/28/2014 03:34 PM, Vince Weaver wrote:
> 
> Well while it might appear that I spend all of my days finding perf_event 
> bugs, I actually am a college professor so I do occasionally have to run 
> off to teach a class, meet with students, or write papers/grants for other 
> academics to reject.
> 

We really appreciate your help.  This has been really critical.

>
> It's nice others can reproduce the issue now, it would have saved me a lot 
> of trouble, although now in theory I have a much better handle of how to 
> use/abuse ftrace so I guess it was worth it.
> 
> Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
> know if there are any other issues left.  I do look forward to the day 
> when I can leave it running overnight and have a clean syslog the next 
> morning.
> 

We all do, definitely, and your help has been a huge step in that direction.

	-hpa



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 23:34                                                                             ` Vince Weaver
  2014-03-01  0:43                                                                               ` H. Peter Anvin
@ 2014-03-01  3:36                                                                               ` Steven Rostedt
  2014-03-01 16:24                                                                                 ` Andi Kleen
  2014-03-02 16:02                                                                                 ` Vince Weaver
  1 sibling, 2 replies; 115+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:36 UTC (permalink / raw)
  To: Vince Weaver; +Cc: H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
Vince Weaver <vincent.weaver@maine.edu> wrote:

> > I was poking fun at you on IRC for this exact reason:
> > 
> > <rostedt> poor Vince, I keep sending him new patches. "No, don't test this patch, now test this one. Oh wait, try this one instead"
> > * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... already... !!!11!"
> > <rostedt> or at least, "Let me finish this test before I cancel it again for another damn patch"
> > <rostedt> then he's probably doing "I'm not going to run any tests now, until I wait a while to see if there's a new patch to test"

I hope you were not offended by this. It was actually a testament to
the work you've given us.

> 
> Well while it might appear that I spend all of my days finding perf_event 
> bugs, I actually am a college professor so I do occasionally have to run 
> off to teach a class, meet with students, or write papers/grants for other 
> academics to reject.

But perf_event bug finder is a much more prestigious title than
"college professor" ;-)

> 
> It's nice others can reproduce the issue now, it would have saved me a lot 
> of trouble, although now in theory I have a much better handle of how to 
> use/abuse ftrace so I guess it was worth it.

I always enjoy finding bugs in other people's code. That's usually the
best way to learn what their code does.

> 
> Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
> know if there are any other issues left.  I do look forward to the day 
> when I can leave it running overnight and have a clean syslog the next 
> morning.

BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
really useful for us to do our own testing too.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-02-28 21:06                                                                                 ` Steven Rostedt
@ 2014-03-01  9:16                                                                                   ` Ingo Molnar
  2014-03-01  9:50                                                                                     ` Borislav Petkov
  2014-03-03  9:16                                                                                     ` Peter Zijlstra
  0 siblings, 2 replies; 115+ messages in thread
From: Ingo Molnar @ 2014-03-01  9:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, H. Peter Anvin, Vince Weaver, Linux Kernel,
	Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > Also, this function is called a _LOT_ under certain workloads, I 
> > don't know how cheap a CR2 read is, but it had better be really 
> > cheap.
> 
> That's a HPA question.

We read CR2 in the page fault hot path, so it's on the top of CPU 
architects' minds and it's reasonably optimized. A couple of cycles 
IIRC, but would be nice to hear actual numbers.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-03-01  9:16                                                                                   ` Ingo Molnar
@ 2014-03-01  9:50                                                                                     ` Borislav Petkov
  2014-03-01 16:50                                                                                       ` H. Peter Anvin
  2014-03-03  9:16                                                                                     ` Peter Zijlstra
  1 sibling, 1 reply; 115+ messages in thread
From: Borislav Petkov @ 2014-03-01  9:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Peter Zijlstra, H. Peter Anvin, Vince Weaver,
	Linux Kernel, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo

On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > Also, this function is called a _LOT_ under certain workloads, I 
> > > don't know how cheap a CR2 read is, but it had better be really 
> > > cheap.
> > 
> > That's a HPA question.
> 
> We read CR2 in the page fault hot path, so it's on the top of CPU 
> architects' minds and it's reasonably optimized. A couple of cycles 
> IIRC, but would be nice to hear actual numbers.

Yeah, we were discussing this last night on IRC.

And hpa actually meant that the optimization potential was there but no
one did do it, except maybe Transmeta. :-)

So the expensive thing is writing to CR2 because it is a serializing
instruction. In fact, all writes to control registers except CR8 are
serializing.

The reading from CR2 should be cheaper but not as cheap as a normal
MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so
definitely at least a couple of cycles and I'd guess it is not a trivial
MOV on Intel too.

Maybe a way to hide this cost is the OoO, as hpa suggested, depending on
how much parallelism that particular code region can offer (serializing
instructions close by).

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-03-01  3:36                                                                               ` Steven Rostedt
@ 2014-03-01 16:24                                                                                 ` Andi Kleen
  2014-03-02 15:34                                                                                   ` Vince Weaver
  2014-03-02 16:02                                                                                 ` Vince Weaver
  1 sibling, 1 reply; 115+ messages in thread
From: Andi Kleen @ 2014-03-01 16:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vince Weaver, H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

Steven Rostedt <rostedt@goodmis.org> writes:
>
> BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> really useful for us to do our own testing too.

I believe it's part of trinity.

http://codemonkey.org.uk/projects/trinity/

Perhaps it should have a "ftracer fuzzer" too?

-Andi

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-03-01  9:50                                                                                     ` Borislav Petkov
@ 2014-03-01 16:50                                                                                       ` H. Peter Anvin
  2014-03-04 23:05                                                                                         ` Borislav Petkov
  0 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-03-01 16:50 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar
  Cc: Steven Rostedt, Peter Zijlstra, Vince Weaver, Linux Kernel,
	Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo

The bottom line is that if we want hard numbers we probably have to measure.

Hoisting the cr2 read is a no-brainer, might even help performance...

On March 1, 2014 1:50:42 AM PST, Borislav Petkov <bp@alien8.de> wrote:
>On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
>> 
>> * Steven Rostedt <rostedt@goodmis.org> wrote:
>> 
>> > > Also, this function is called a _LOT_ under certain workloads, I 
>> > > don't know how cheap a CR2 read is, but it had better be really 
>> > > cheap.
>> > 
>> > That's a HPA question.
>> 
>> We read CR2 in the page fault hot path, so it's on the top of CPU 
>> architects' minds and it's reasonably optimized. A couple of cycles 
>> IIRC, but would be nice to hear actual numbers.
>
>Yeah, we were discussing this last night on IRC.
>
>And hpa actually meant that the optimization potential was there but no
>one did do it, except maybe Transmeta. :-)
>
>So the expensive thing is writing to CR2 because it is a serializing
>instruction. In fact, all writes to control registers except CR8 are
>serializing.
>
>The reading from CR2 should be cheaper but not as cheap as a normal
>MOV %reg %reg is. On AMD, MOV %reg, %cr2 is done with microcode so
>definitely at least a couple of cycles and I'd guess it is not a
>trivial
>MOV on Intel too.
>
>Maybe a way to hide this cost is the OoO, as hpa suggested, depending
>on
>how much parallelism that particular code region can offer (serializing
>instructions close by).

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-03-01 16:24                                                                                 ` Andi Kleen
@ 2014-03-02 15:34                                                                                   ` Vince Weaver
  0 siblings, 0 replies; 115+ messages in thread
From: Vince Weaver @ 2014-03-02 15:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Steven Rostedt, Vince Weaver, H. Peter Anvin, Peter Zijlstra,
	Linux Kernel, Ingo Molnar

On Sat, 1 Mar 2014, Andi Kleen wrote:

> Steven Rostedt <rostedt@goodmis.org> writes:
> >
> > BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> > really useful for us to do our own testing too.
> 
> I believe it's part of trinity.
> 
> http://codemonkey.org.uk/projects/trinity/
> 
> Perhaps it should have a "ftracer fuzzer" too?

It's not part of trinity, it's a separate project that re-uses some 
of trinity's codebase.

http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/

You can get the source as part of the perf_event_tests package
	https://github.com/deater/perf_event_tests

go into the fuzzer directory, run make, then "./perf_fuzzer".
You can also run the ./fast_repro.sh script instead, it's what I've
been using to track this kernel issue, although it tripped over the issue
a lot faster when compiled with -mx32 than it did when complied as native 
x86_64.

I think the code as found in the git tree should be working, I made a lot 
of changes when tracking this problem and I need to make sure I only check 
in the proper ones and not let any weird debugging patches slip in.

I'm aware of at least two other WARN messages and possibly one or two 
hard-lockup bugs in addition to the potential system reboot that can be 
triggered by the fuzzer, it just takes a while to isolate these things.

Vince



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-03-01  3:36                                                                               ` Steven Rostedt
  2014-03-01 16:24                                                                                 ` Andi Kleen
@ 2014-03-02 16:02                                                                                 ` Vince Weaver
  1 sibling, 0 replies; 115+ messages in thread
From: Vince Weaver @ 2014-03-02 16:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vince Weaver, H. Peter Anvin, Peter Zijlstra, Linux Kernel, Ingo Molnar

On Fri, 28 Feb 2014, Steven Rostedt wrote:
> On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
> Vince Weaver <vincent.weaver@maine.edu> wrote:
> 
> But perf_event bug finder is a much more prestigious title than
> "college professor" ;-)

yes, it's something to fall back on if/when I get denied tenure :)

I do enjoy tracking down bugs like this, it's why I spend a lot of time on 
it when I should probably be doing other things.

> BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
> really useful for us to do our own testing too.

yes, code is fully available though possibly not publicized well.  I 
posted the full info later in the thread as I seem to be responding to 
e-mails in reverse order today for some reason.

Vince

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-03-01  9:16                                                                                   ` Ingo Molnar
  2014-03-01  9:50                                                                                     ` Borislav Petkov
@ 2014-03-03  9:16                                                                                     ` Peter Zijlstra
  1 sibling, 0 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-03  9:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, H. Peter Anvin, Vince Weaver, Linux Kernel,
	Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo

On Sat, Mar 01, 2014 at 10:16:50AM +0100, Ingo Molnar wrote:
> We read CR2 in the page fault hot path, so it's on the top of CPU 
> architects' minds and it's reasonably optimized. A couple of cycles 
> IIRC, but would be nice to hear actual numbers.

Well, going with what Linus found; it looks like they made page faults
more expensive on Haswell as compared to previous generations. So its
not _that_ high on their list :/

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: perf_fuzzer compiled for x32 causes reboot
  2014-03-01 16:50                                                                                       ` H. Peter Anvin
@ 2014-03-04 23:05                                                                                         ` Borislav Petkov
  0 siblings, 0 replies; 115+ messages in thread
From: Borislav Petkov @ 2014-03-04 23:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Steven Rostedt, Peter Zijlstra, Vince Weaver,
	Linux Kernel, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo

On Sat, Mar 01, 2014 at 08:50:17AM -0800, H. Peter Anvin wrote:
> The bottom line is that if we want hard numbers we probably have to
> measure.
>
> Hoisting the cr2 read is a no-brainer, might even help performance...

Btw, I just got word that on AMD, a read from CR2 is 4 cycles on family
0x15 and 3 cycles on family 0x10. If we're in the same ballpark on
Intel, I'd say we're fine with those numbers even in hot paths. The OoO
is an additional bonus which should hide latency even more.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-02-28 16:05   ` [PATCHv2] " Jiri Olsa
  2014-02-28 16:11     ` H. Peter Anvin
  2014-02-28 16:15     ` Steven Rostedt
@ 2014-03-05  0:03     ` tip-bot for Jiri Olsa
  2014-03-05 11:14       ` Peter Zijlstra
  2 siblings, 1 reply; 115+ messages in thread
From: tip-bot for Jiri Olsa @ 2014-03-05  0:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, acme,
	seiji.aguchi, jolsa, vincent.weaver, rostedt, tglx, hpa

Commit-ID:  0ac09f9f8cd1fb028a48330edba6023d347d3cea
Gitweb:     http://git.kernel.org/tip/0ac09f9f8cd1fb028a48330edba6023d347d3cea
Author:     Jiri Olsa <jolsa@redhat.com>
AuthorDate: Fri, 28 Feb 2014 17:05:26 +0100
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 4 Mar 2014 16:00:14 -0800

x86, trace: Fix CR2 corruption when tracing page faults

The trace_do_page_fault function trigger tracepoint
and then handles the actual page fault.

This could lead to error if the tracepoint caused page
fault. The original cr2 value gets lost and the original
page fault handler kills current process with SIGSEGV.

This happens if you record page faults with callchain
data, the user part of it will cause tracepoint handler
to page fault:

  # perf record -g -e exceptions:page_fault_user ls

Fixing this by saving the original cr2 value
and using it after tracepoint handler is done.

v2: Moving the cr2 read before exception_enter, because
    it could trigger tracepoint as well.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1402211701380.6395@vincent-weaver-1.um.maine.edu
Link: http://lkml.kernel.org/r/20140228160526.GD1133@krava.brq.redhat.com
---
 arch/x86/mm/fault.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6dea040..e7fa28b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1022,11 +1022,11 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * routines.
  */
 static void __kprobes
-__do_page_fault(struct pt_regs *regs, unsigned long error_code)
+__do_page_fault(struct pt_regs *regs, unsigned long error_code,
+		unsigned long address)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
-	unsigned long address;
 	struct mm_struct *mm;
 	int fault;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
@@ -1034,9 +1034,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	tsk = current;
 	mm = tsk->mm;
 
-	/* Get the faulting address: */
-	address = read_cr2();
-
 	/*
 	 * Detect and handle instructions that would cause a page fault for
 	 * both a tracked kernel page and a userspace page.
@@ -1252,9 +1249,11 @@ dotraplinkage void __kprobes
 do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	/* Get the faulting address: */
+	unsigned long address = read_cr2();
 
 	prev_state = exception_enter();
-	__do_page_fault(regs, error_code);
+	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
@@ -1271,9 +1270,16 @@ dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
+	/*
+	 * The exception_enter and tracepoint processing could
+	 * trigger another page faults (user space callchain
+	 * reading) and destroy the original cr2 value, so read
+	 * the faulting address now.
+	 */
+	unsigned long address = read_cr2();
 
 	prev_state = exception_enter();
 	trace_page_fault_entries(regs, error_code);
-	__do_page_fault(regs, error_code);
+	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05  0:03     ` [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults tip-bot for Jiri Olsa
@ 2014-03-05 11:14       ` Peter Zijlstra
  2014-03-05 12:20         ` Steven Rostedt
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-05 11:14 UTC (permalink / raw)
  To: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, rostedt, tglx, hpa
  Cc: linux-tip-commits

On Tue, Mar 04, 2014 at 04:03:25PM -0800, tip-bot for Jiri Olsa wrote:

FWIW I also prefer this patch.

> @@ -1252,9 +1249,11 @@ dotraplinkage void __kprobes
>  do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	/* Get the faulting address: */
> +	unsigned long address = read_cr2();
>  
>  	prev_state = exception_enter();
> -	__do_page_fault(regs, error_code);
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
>  
> @@ -1271,9 +1270,16 @@ dotraplinkage void __kprobes
>  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> +	/*
> +	 * The exception_enter and tracepoint processing could
> +	 * trigger another page faults (user space callchain
> +	 * reading) and destroy the original cr2 value, so read
> +	 * the faulting address now.
> +	 */
> +	unsigned long address = read_cr2();
>  
>  	prev_state = exception_enter();
>  	trace_page_fault_entries(regs, error_code);
> -	__do_page_fault(regs, error_code);
> +	__do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }

How about also marking these two functions as notrace? That would also
avoid getting __mcount calls from before we read CR2.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 11:14       ` Peter Zijlstra
@ 2014-03-05 12:20         ` Steven Rostedt
  2014-03-05 12:25           ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-03-05 12:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, 5 Mar 2014 12:14:15 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Mar 04, 2014 at 04:03:25PM -0800, tip-bot for Jiri Olsa wrote:
> 
> FWIW I also prefer this patch.
> 
> > @@ -1252,9 +1249,11 @@ dotraplinkage void __kprobes
> >  do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  {
> >  	enum ctx_state prev_state;
> > +	/* Get the faulting address: */
> > +	unsigned long address = read_cr2();
> >  
> >  	prev_state = exception_enter();
> > -	__do_page_fault(regs, error_code);
> > +	__do_page_fault(regs, error_code, address);
> >  	exception_exit(prev_state);
> >  }
> >  
> > @@ -1271,9 +1270,16 @@ dotraplinkage void __kprobes
> >  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  {
> >  	enum ctx_state prev_state;
> > +	/*
> > +	 * The exception_enter and tracepoint processing could
> > +	 * trigger another page faults (user space callchain
> > +	 * reading) and destroy the original cr2 value, so read
> > +	 * the faulting address now.
> > +	 */
> > +	unsigned long address = read_cr2();
> >  
> >  	prev_state = exception_enter();
> >  	trace_page_fault_entries(regs, error_code);
> > -	__do_page_fault(regs, error_code);
> > +	__do_page_fault(regs, error_code, address);
> >  	exception_exit(prev_state);
> >  }
> 
> How about also marking these two functions as notrace? That would also
> avoid getting __mcount calls from before we read CR2.

Please no! I used tracing of the do_page_fault function all the time.
It is very useful.

Call backs from function tracing are to be very limited. They are more
restrictive than NMI contexts. One thing you shouldn't do is to have
page faults. Thus, fix the callback.

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 12:20         ` Steven Rostedt
@ 2014-03-05 12:25           ` Peter Zijlstra
  2014-03-05 12:33             ` Steven Rostedt
  2014-03-05 12:36             ` Peter Zijlstra
  0 siblings, 2 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-05 12:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, Mar 05, 2014 at 07:20:22AM -0500, Steven Rostedt wrote:
> On Wed, 5 Mar 2014 12:14:15 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> Please no! I used tracing of the do_page_fault function all the time.
> It is very useful.

Why do you trace do_page_fault and not the __do_page_fault() function?
do_page_fault() is a minimal wrapper which doesn't actually do all that
much?

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 12:25           ` Peter Zijlstra
@ 2014-03-05 12:33             ` Steven Rostedt
  2014-03-05 12:54               ` Peter Zijlstra
  2014-03-05 12:36             ` Peter Zijlstra
  1 sibling, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-03-05 12:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, 5 Mar 2014 13:25:35 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Mar 05, 2014 at 07:20:22AM -0500, Steven Rostedt wrote:
> > On Wed, 5 Mar 2014 12:14:15 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > Please no! I used tracing of the do_page_fault function all the time.
> > It is very useful.
> 
> Why do you trace do_page_fault and not the __do_page_fault() function?
> do_page_fault() is a minimal wrapper which doesn't actually do all that
> much?

Then we better make sure that __do_page_fault() is never inlined.
Otherwise, it wont be available to trace.

I'm fine with adding "notrace" to do_page_fault() and to
trace_do_page_fault() as long as we also include a "noinline" to
__do_page_fault(). Would need a comment stating why that noinline is
there though.

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 12:25           ` Peter Zijlstra
  2014-03-05 12:33             ` Steven Rostedt
@ 2014-03-05 12:36             ` Peter Zijlstra
  2014-03-05 13:00               ` Steven Rostedt
  1 sibling, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-05 12:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, Mar 05, 2014 at 01:25:35PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 05, 2014 at 07:20:22AM -0500, Steven Rostedt wrote:
> > On Wed, 5 Mar 2014 12:14:15 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > Please no! I used tracing of the do_page_fault function all the time.
> > It is very useful.
> 
> Why do you trace do_page_fault and not the __do_page_fault() function?
> do_page_fault() is a minimal wrapper which doesn't actually do all that
> much?

Here, something like so; then you can still trace your do_page_fault().

It also fixes up trace_page_fault_entries() to not re-read the cr2.

---
 arch/x86/include/asm/traps.h |  2 +-
 arch/x86/kernel/entry_32.S   |  2 +-
 arch/x86/kernel/entry_64.S   |  2 +-
 arch/x86/kernel/kvm.c        |  2 +-
 arch/x86/mm/fault.c          | 21 ++++++++++-----------
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 58d66fe06b61..1280f72deea8 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -71,7 +71,7 @@ dotraplinkage void do_double_fault(struct pt_regs *, long);
 asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *);
 #endif
 dotraplinkage void do_general_protection(struct pt_regs *, long);
-dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
+dotraplinkage void normal_do_page_fault(struct pt_regs *, unsigned long);
 #ifdef CONFIG_TRACING
 dotraplinkage void trace_do_page_fault(struct pt_regs *, unsigned long);
 #endif
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index a2a4f4697889..9a9f64755da8 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1257,7 +1257,7 @@ END(trace_page_fault)
 ENTRY(page_fault)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_page_fault
+	pushl_cfi $normal_do_page_fault
 	ALIGN
 error_code:
 	/* the function address is in %gs's slot on the stack */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1e96c3628bf2..7d49812741ac 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1491,7 +1491,7 @@ zeroentry xen_int3 do_int3
 errorentry xen_stack_segment do_stack_segment
 #endif
 errorentry general_protection do_general_protection
-trace_errorentry page_fault do_page_fault
+trace_errorentry page_fault normal_do_page_fault
 #ifdef CONFIG_KVM_GUEST
 errorentry async_page_fault do_async_page_fault
 #endif
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 713f1b3bad52..9e7db22ec437 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -259,7 +259,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
 
 	switch (kvm_read_and_reset_pf_reason()) {
 	default:
-		do_page_fault(regs, error_code);
+		normal_do_page_fault(regs, error_code);
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e7fa28bf3262..576cfc3d0086 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1022,8 +1022,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * routines.
  */
 static void __kprobes
-__do_page_fault(struct pt_regs *regs, unsigned long error_code,
-		unsigned long address)
+do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
@@ -1245,28 +1244,28 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	up_read(&mm->mmap_sem);
 }
 
-dotraplinkage void __kprobes
-do_page_fault(struct pt_regs *regs, unsigned long error_code)
+dotraplinkage void __kprobes notrace
+normal_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
 	/* Get the faulting address: */
 	unsigned long address = read_cr2();
 
 	prev_state = exception_enter();
-	__do_page_fault(regs, error_code, address);
+	do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
-static void trace_page_fault_entries(struct pt_regs *regs,
+static void trace_page_fault_entries(unsigned long address, struct pt_regs *regs,
 				     unsigned long error_code)
 {
 	if (user_mode(regs))
-		trace_page_fault_user(read_cr2(), regs, error_code);
+		trace_page_fault_user(address, regs, error_code);
 	else
-		trace_page_fault_kernel(read_cr2(), regs, error_code);
+		trace_page_fault_kernel(address, regs, error_code);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	enum ctx_state prev_state;
@@ -1279,7 +1278,7 @@ trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	unsigned long address = read_cr2();
 
 	prev_state = exception_enter();
-	trace_page_fault_entries(regs, error_code);
-	__do_page_fault(regs, error_code, address);
+	trace_page_fault_entries(address, regs, error_code);
+	do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 12:33             ` Steven Rostedt
@ 2014-03-05 12:54               ` Peter Zijlstra
  2014-03-05 13:02                 ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-05 12:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, Mar 05, 2014 at 07:33:44AM -0500, Steven Rostedt wrote:
> Then we better make sure that __do_page_fault() is never inlined.
> Otherwise, it wont be available to trace.
> 
> I'm fine with adding "notrace" to do_page_fault() and to
> trace_do_page_fault() as long as we also include a "noinline" to
> __do_page_fault(). Would need a comment stating why that noinline is
> there though.

When CONFIG_TRACING there's two callers, which makes it highly unlikely
GCC would inline the massive __do_page_fault() function, but sure.

How about something like so then; still has the normal_do_page_fault()
thing, although I suppose we could drop that.

It also puts trace_page_fault_entries() and trace_do_page_fault() under
CONFIG_TRACING. I could only find the entry_32.S user; I suppose the
64bit one is hidden by CPP goo somewhere?

---
 arch/x86/include/asm/traps.h |  2 +-
 arch/x86/kernel/entry_32.S   |  2 +-
 arch/x86/kernel/entry_64.S   |  2 +-
 arch/x86/kernel/kvm.c        |  2 +-
 arch/x86/mm/fault.c          | 42 +++++++++++++++++++++++++++---------------
 5 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 58d66fe06b61..1280f72deea8 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -71,7 +71,7 @@ dotraplinkage void do_double_fault(struct pt_regs *, long);
 asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *);
 #endif
 dotraplinkage void do_general_protection(struct pt_regs *, long);
-dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
+dotraplinkage void normal_do_page_fault(struct pt_regs *, unsigned long);
 #ifdef CONFIG_TRACING
 dotraplinkage void trace_do_page_fault(struct pt_regs *, unsigned long);
 #endif
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index a2a4f4697889..9a9f64755da8 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1257,7 +1257,7 @@ END(trace_page_fault)
 ENTRY(page_fault)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_page_fault
+	pushl_cfi $normal_do_page_fault
 	ALIGN
 error_code:
 	/* the function address is in %gs's slot on the stack */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1e96c3628bf2..7d49812741ac 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1491,7 +1491,7 @@ zeroentry xen_int3 do_int3
 errorentry xen_stack_segment do_stack_segment
 #endif
 errorentry general_protection do_general_protection
-trace_errorentry page_fault do_page_fault
+trace_errorentry page_fault normal_do_page_fault
 #ifdef CONFIG_KVM_GUEST
 errorentry async_page_fault do_async_page_fault
 #endif
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 713f1b3bad52..9e7db22ec437 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -259,7 +259,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
 
 	switch (kvm_read_and_reset_pf_reason()) {
 	default:
-		do_page_fault(regs, error_code);
+		normal_do_page_fault(regs, error_code);
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e7fa28bf3262..8134e5ada329 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1020,10 +1020,13 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
  * routines.
+ *
+ * This function must have noinline because both callers
+ * {normal,trace}_do_page_fault() have notrace on. Having this an actual function
+ * guarantees there's a function trace entry.
  */
-static void __kprobes
-__do_page_fault(struct pt_regs *regs, unsigned long error_code,
-		unsigned long address)
+static void __kprobes noinline
+do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
@@ -1245,31 +1248,38 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	up_read(&mm->mmap_sem);
 }
 
-dotraplinkage void __kprobes
-do_page_fault(struct pt_regs *regs, unsigned long error_code)
+dotraplinkage void __kprobes notrace
+normal_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
+	unsigned long address = read_cr2(); /* Get the faulting address */
 	enum ctx_state prev_state;
-	/* Get the faulting address: */
-	unsigned long address = read_cr2();
+
+	/*
+	 * We must have this function tagged with __kprobes, notrace and call
+	 * read_cr2() before calling anything else. To avoid calling any kind
+	 * of tracing machinery before we've observed the CR2 value.
+	 *
+	 * exception_{enter,exit}() contain all sorts of tracepoints.
+	 */
 
 	prev_state = exception_enter();
-	__do_page_fault(regs, error_code, address);
+	do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
-static void trace_page_fault_entries(struct pt_regs *regs,
+#ifdef CONFIG_TRACING
+static void trace_page_fault_entries(unsigned long address, struct pt_regs *regs,
 				     unsigned long error_code)
 {
 	if (user_mode(regs))
-		trace_page_fault_user(read_cr2(), regs, error_code);
+		trace_page_fault_user(address, regs, error_code);
 	else
-		trace_page_fault_kernel(read_cr2(), regs, error_code);
+		trace_page_fault_kernel(address, regs, error_code);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
-	enum ctx_state prev_state;
 	/*
 	 * The exception_enter and tracepoint processing could
 	 * trigger another page faults (user space callchain
@@ -1277,9 +1287,11 @@ trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 * the faulting address now.
 	 */
 	unsigned long address = read_cr2();
+	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
-	trace_page_fault_entries(regs, error_code);
-	__do_page_fault(regs, error_code, address);
+	trace_page_fault_entries(address, regs, error_code);
+	do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
+#endif

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 12:36             ` Peter Zijlstra
@ 2014-03-05 13:00               ` Steven Rostedt
  2014-03-05 13:08                 ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-03-05 13:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, 5 Mar 2014 13:36:35 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Mar 05, 2014 at 01:25:35PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 05, 2014 at 07:20:22AM -0500, Steven Rostedt wrote:
> > > On Wed, 5 Mar 2014 12:14:15 +0100
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > Please no! I used tracing of the do_page_fault function all the time.
> > > It is very useful.
> > 
> > Why do you trace do_page_fault and not the __do_page_fault() function?
> > do_page_fault() is a minimal wrapper which doesn't actually do all that
> > much?
> 
> Here, something like so; then you can still trace your do_page_fault().
> 
> It also fixes up trace_page_fault_entries() to not re-read the cr2.
> 
> ---
>  arch/x86/include/asm/traps.h |  2 +-
>  arch/x86/kernel/entry_32.S   |  2 +-
>  arch/x86/kernel/entry_64.S   |  2 +-
>  arch/x86/kernel/kvm.c        |  2 +-
>  arch/x86/mm/fault.c          | 21 ++++++++++-----------
>  5 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> index 58d66fe06b61..1280f72deea8 100644
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -71,7 +71,7 @@ dotraplinkage void do_double_fault(struct pt_regs *, long);
>  asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *);
>  #endif
>  dotraplinkage void do_general_protection(struct pt_regs *, long);
> -dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
> +dotraplinkage void normal_do_page_fault(struct pt_regs *, unsigned long);

Why not call it "do_do_page_fault()" ;-)

>  #ifdef CONFIG_TRACING
>  dotraplinkage void trace_do_page_fault(struct pt_regs *, unsigned long);
>  #endif
> diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> index a2a4f4697889..9a9f64755da8 100644
> --- a/arch/x86/kernel/entry_32.S
> +++ b/arch/x86/kernel/entry_32.S
> @@ -1257,7 +1257,7 @@ END(trace_page_fault)
>  ENTRY(page_fault)
>  	RING0_EC_FRAME
>  	ASM_CLAC
> -	pushl_cfi $do_page_fault
> +	pushl_cfi $normal_do_page_fault
>  	ALIGN
>  error_code:
>  	/* the function address is in %gs's slot on the stack */
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 1e96c3628bf2..7d49812741ac 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1491,7 +1491,7 @@ zeroentry xen_int3 do_int3
>  errorentry xen_stack_segment do_stack_segment
>  #endif
>  errorentry general_protection do_general_protection
> -trace_errorentry page_fault do_page_fault
> +trace_errorentry page_fault normal_do_page_fault
>  #ifdef CONFIG_KVM_GUEST
>  errorentry async_page_fault do_async_page_fault
>  #endif
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 713f1b3bad52..9e7db22ec437 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -259,7 +259,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>  
>  	switch (kvm_read_and_reset_pf_reason()) {
>  	default:
> -		do_page_fault(regs, error_code);
> +		normal_do_page_fault(regs, error_code);
>  		break;
>  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
>  		/* page is swapped out by the host. */
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index e7fa28bf3262..576cfc3d0086 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1022,8 +1022,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
>   * routines.
>   */
>  static void __kprobes

Still requires the "noinline" above

-- Steve

> -__do_page_fault(struct pt_regs *regs, unsigned long error_code,
> -		unsigned long address)
> +do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
>  {
>  	struct vm_area_struct *vma;
>  	struct task_struct *tsk;
> @@ -1245,28 +1244,28 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  	up_read(&mm->mmap_sem);
>  }
>  
> -dotraplinkage void __kprobes
> -do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +dotraplinkage void __kprobes notrace
> +normal_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
>  	/* Get the faulting address: */
>  	unsigned long address = read_cr2();
>  
>  	prev_state = exception_enter();
> -	__do_page_fault(regs, error_code, address);
> +	do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }
>  
> -static void trace_page_fault_entries(struct pt_regs *regs,
> +static void trace_page_fault_entries(unsigned long address, struct pt_regs *regs,
>  				     unsigned long error_code)
>  {
>  	if (user_mode(regs))
> -		trace_page_fault_user(read_cr2(), regs, error_code);
> +		trace_page_fault_user(address, regs, error_code);
>  	else
> -		trace_page_fault_kernel(read_cr2(), regs, error_code);
> +		trace_page_fault_kernel(address, regs, error_code);
>  }
>  
> -dotraplinkage void __kprobes
> +dotraplinkage void __kprobes notrace
>  trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>  	enum ctx_state prev_state;
> @@ -1279,7 +1278,7 @@ trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  	unsigned long address = read_cr2();
>  
>  	prev_state = exception_enter();
> -	trace_page_fault_entries(regs, error_code);
> -	__do_page_fault(regs, error_code, address);
> +	trace_page_fault_entries(address, regs, error_code);
> +	do_page_fault(regs, error_code, address);
>  	exception_exit(prev_state);
>  }


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 12:54               ` Peter Zijlstra
@ 2014-03-05 13:02                 ` Peter Zijlstra
  2014-03-05 13:07                   ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-05 13:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, Mar 05, 2014 at 01:54:21PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 05, 2014 at 07:33:44AM -0500, Steven Rostedt wrote:
> > Then we better make sure that __do_page_fault() is never inlined.
> > Otherwise, it wont be available to trace.
> > 
> > I'm fine with adding "notrace" to do_page_fault() and to
> > trace_do_page_fault() as long as we also include a "noinline" to
> > __do_page_fault(). Would need a comment stating why that noinline is
> > there though.
> 
> When CONFIG_TRACING there's two callers, which makes it highly unlikely
> GCC would inline the massive __do_page_fault() function, but sure.
> 
> How about something like so then; still has the normal_do_page_fault()
> thing, although I suppose we could drop that.
> 
> It also puts trace_page_fault_entries() and trace_do_page_fault() under
> CONFIG_TRACING. I could only find the entry_32.S user; I suppose the
> 64bit one is hidden by CPP goo somewhere?

> +trace_errorentry page_fault normal_do_page_fault

ah found it; its in there. That also means the normal_do_page_fault()
thing won't actually compile proper.

Lemme do one with that.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 13:02                 ` Peter Zijlstra
@ 2014-03-05 13:07                   ` Peter Zijlstra
  0 siblings, 0 replies; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-05 13:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, Mar 05, 2014 at 02:02:24PM +0100, Peter Zijlstra wrote:
> > It also puts trace_page_fault_entries() and trace_do_page_fault() under
> > CONFIG_TRACING. I could only find the entry_32.S user; I suppose the
> > 64bit one is hidden by CPP goo somewhere?
> 
> > +trace_errorentry page_fault normal_do_page_fault
> 
> ah found it; its in there. That also means the normal_do_page_fault()
> thing won't actually compile proper.
> 
> Lemme do one with that.

---
 arch/x86/mm/fault.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e7fa28bf3262..a10c8c792161 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1020,8 +1020,12 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
  * routines.
+ *
+ * This function must have noinline because both callers
+ * {,trace_}do_page_fault() have notrace on. Having this an actual function
+ * guarantees there's a function trace entry.
  */
-static void __kprobes
+static void __kprobes noinline
 __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address)
 {
@@ -1245,31 +1249,38 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	up_read(&mm->mmap_sem);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
+	unsigned long address = read_cr2(); /* Get the faulting address */
 	enum ctx_state prev_state;
-	/* Get the faulting address: */
-	unsigned long address = read_cr2();
+
+	/*
+	 * We must have this function tagged with __kprobes, notrace and call
+	 * read_cr2() before calling anything else. To avoid calling any kind
+	 * of tracing machinery before we've observed the CR2 value.
+	 *
+	 * exception_{enter,exit}() contain all sorts of tracepoints.
+	 */
 
 	prev_state = exception_enter();
 	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
-static void trace_page_fault_entries(struct pt_regs *regs,
+#ifdef CONFIG_TRACING
+static void trace_page_fault_entries(unsigned long address, struct pt_regs *regs,
 				     unsigned long error_code)
 {
 	if (user_mode(regs))
-		trace_page_fault_user(read_cr2(), regs, error_code);
+		trace_page_fault_user(address, regs, error_code);
 	else
-		trace_page_fault_kernel(read_cr2(), regs, error_code);
+		trace_page_fault_kernel(address, regs, error_code);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
-	enum ctx_state prev_state;
 	/*
 	 * The exception_enter and tracepoint processing could
 	 * trigger another page faults (user space callchain
@@ -1277,9 +1288,11 @@ trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 * the faulting address now.
 	 */
 	unsigned long address = read_cr2();
+	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
-	trace_page_fault_entries(regs, error_code);
+	trace_page_fault_entries(address, regs, error_code);
 	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
+#endif /* CONFIG_TRACING */

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 13:00               ` Steven Rostedt
@ 2014-03-05 13:08                 ` Peter Zijlstra
  2014-03-05 21:37                   ` H. Peter Anvin
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-05 13:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, hpa, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, Mar 05, 2014 at 08:00:18AM -0500, Steven Rostedt wrote:
> >  static void __kprobes
> 
> Still requires the "noinline" above

Trim trim trim trim your emails...

also, you're like 3 version behind.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 13:08                 ` Peter Zijlstra
@ 2014-03-05 21:37                   ` H. Peter Anvin
  2014-03-06  8:40                     ` Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: H. Peter Anvin @ 2014-03-05 21:37 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: mingo, paulus, linux-kernel, acme, seiji.aguchi, jolsa,
	vincent.weaver, tglx, hpa, linux-tip-commits

On 03/05/2014 05:08 AM, Peter Zijlstra wrote:
> On Wed, Mar 05, 2014 at 08:00:18AM -0500, Steven Rostedt wrote:
>>>  static void __kprobes
>>
>> Still requires the "noinline" above
> 
> Trim trim trim trim your emails...
> 
> also, you're like 3 version behind.
> 

OK, I have to admit to having lost track of this thread.  Please let me
know when there is anything actionable for me.

	-hpa


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-05 21:37                   ` H. Peter Anvin
@ 2014-03-06  8:40                     ` Peter Zijlstra
  2014-03-06 11:02                       ` Steven Rostedt
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-06  8:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, mingo, paulus, linux-kernel, acme, seiji.aguchi,
	jolsa, vincent.weaver, tglx, hpa, linux-tip-commits

On Wed, Mar 05, 2014 at 01:37:25PM -0800, H. Peter Anvin wrote:

> OK, I have to admit to having lost track of this thread.  Please let me
> know when there is anything actionable for me.

lkml.kernel.org/r/20140305130749.GR3104@twins.programming.kicks-ass.net

Was the latest; Rostedt are you good with that one? If so I suppose I
should make it a proper patch and get hpa to look at it :-)

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults
  2014-03-06  8:40                     ` Peter Zijlstra
@ 2014-03-06 11:02                       ` Steven Rostedt
  2014-03-06 14:53                         ` [PATCH] x86: Further robustify CR2 handling vs tracing Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Steven Rostedt @ 2014-03-06 11:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, mingo, paulus, linux-kernel, acme, seiji.aguchi,
	jolsa, vincent.weaver, tglx, hpa, linux-tip-commits

On Thu, 6 Mar 2014 09:40:50 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Mar 05, 2014 at 01:37:25PM -0800, H. Peter Anvin wrote:
> 
> > OK, I have to admit to having lost track of this thread.  Please let me
> > know when there is anything actionable for me.
> 
> lkml.kernel.org/r/20140305130749.GR3104@twins.programming.kicks-ass.net
> 
> Was the latest; Rostedt are you good with that one? If so I suppose I
> should make it a proper patch and get hpa to look at it :-)

Yeah, looks good.

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH] x86: Further robustify CR2 handling vs tracing
  2014-03-06 11:02                       ` Steven Rostedt
@ 2014-03-06 14:53                         ` Peter Zijlstra
  2014-03-07 23:07                           ` [tip:x86/urgent] x86, trace: " tip-bot for Peter Zijlstra
  0 siblings, 1 reply; 115+ messages in thread
From: Peter Zijlstra @ 2014-03-06 14:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, mingo, paulus, linux-kernel, acme, seiji.aguchi,
	jolsa, vincent.weaver, tglx, hpa, linux-tip-commits

Subject: x86: Further robustify CR2 handling vs tracing
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 5 Mar 2014 14:07:49 +0100

Building on commit 0ac09f9f8cd1 ("x86, trace: Fix CR2 corruption when
tracing page faults") this patch addresses another few issues:

 - Now that read_cr2() is lifted into trace_do_page_fault(), we should
   pass the address to trace_page_fault_entries() to avoid it
   re-reading a potentially changed cr2.

 - Put both trace_do_page_fault() and trace_page_fault_entries() under
   CONFIG_TRACING.

 - Mark both fault entry functions {,trace_}do_page_fault() as notrace
   to avoid getting __mcount or other function entry trace callbacks
   before we've observed CR2.

 - Mark __do_page_fault() as noinline to guarantee the function tracer
   does get to see the fault.

Cc: jolsa@redhat.com
Cc: vincent.weaver@maine.edu
Cc: tglx@linutronix.de
Cc: hpa@linux.intel.com
Cc: mingo@kernel.org
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/mm/fault.c |   33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1020,8 +1020,12 @@ static inline bool smap_violation(int er
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
  * routines.
+ *
+ * This function must have noinline because both callers
+ * {,trace_}do_page_fault() have notrace on. Having this an actual function
+ * guarantees there's a function trace entry.
  */
-static void __kprobes
+static void __kprobes noinline
 __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address)
 {
@@ -1245,31 +1249,38 @@ __do_page_fault(struct pt_regs *regs, un
 	up_read(&mm->mmap_sem);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
+	unsigned long address = read_cr2(); /* Get the faulting address */
 	enum ctx_state prev_state;
-	/* Get the faulting address: */
-	unsigned long address = read_cr2();
+
+	/*
+	 * We must have this function tagged with __kprobes, notrace and call
+	 * read_cr2() before calling anything else. To avoid calling any kind
+	 * of tracing machinery before we've observed the CR2 value.
+	 *
+	 * exception_{enter,exit}() contain all sorts of tracepoints.
+	 */
 
 	prev_state = exception_enter();
 	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
-static void trace_page_fault_entries(struct pt_regs *regs,
+#ifdef CONFIG_TRACING
+static void trace_page_fault_entries(unsigned long address, struct pt_regs *regs,
 				     unsigned long error_code)
 {
 	if (user_mode(regs))
-		trace_page_fault_user(read_cr2(), regs, error_code);
+		trace_page_fault_user(address, regs, error_code);
 	else
-		trace_page_fault_kernel(read_cr2(), regs, error_code);
+		trace_page_fault_kernel(address, regs, error_code);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
-	enum ctx_state prev_state;
 	/*
 	 * The exception_enter and tracepoint processing could
 	 * trigger another page faults (user space callchain
@@ -1277,9 +1288,11 @@ trace_do_page_fault(struct pt_regs *regs
 	 * the faulting address now.
 	 */
 	unsigned long address = read_cr2();
+	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
-	trace_page_fault_entries(regs, error_code);
+	trace_page_fault_entries(address, regs, error_code);
 	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
+#endif /* CONFIG_TRACING */

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [tip:x86/urgent] x86, trace: Further robustify CR2 handling vs tracing
  2014-03-06 14:53                         ` [PATCH] x86: Further robustify CR2 handling vs tracing Peter Zijlstra
@ 2014-03-07 23:07                           ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 115+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-03-07 23:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, jolsa, vincent.weaver, rostedt,
	tglx, hpa

Commit-ID:  d4078e232267ff53f3b030b9698a3c001db4dbec
Gitweb:     http://git.kernel.org/tip/d4078e232267ff53f3b030b9698a3c001db4dbec
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 5 Mar 2014 14:07:49 +0100
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 6 Mar 2014 10:58:18 -0800

x86, trace: Further robustify CR2 handling vs tracing

Building on commit 0ac09f9f8cd1 ("x86, trace: Fix CR2 corruption when
tracing page faults") this patch addresses another few issues:

 - Now that read_cr2() is lifted into trace_do_page_fault(), we should
   pass the address to trace_page_fault_entries() to avoid it
   re-reading a potentially changed cr2.

 - Put both trace_do_page_fault() and trace_page_fault_entries() under
   CONFIG_TRACING.

 - Mark both fault entry functions {,trace_}do_page_fault() as notrace
   to avoid getting __mcount or other function entry trace callbacks
   before we've observed CR2.

 - Mark __do_page_fault() as noinline to guarantee the function tracer
   does get to see the fault.

Cc: <jolsa@redhat.com>
Cc: <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306145300.GO9987@twins.programming.kicks-ass.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/fault.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e7fa28b..a10c8c7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1020,8 +1020,12 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
  * routines.
+ *
+ * This function must have noinline because both callers
+ * {,trace_}do_page_fault() have notrace on. Having this an actual function
+ * guarantees there's a function trace entry.
  */
-static void __kprobes
+static void __kprobes noinline
 __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address)
 {
@@ -1245,31 +1249,38 @@ good_area:
 	up_read(&mm->mmap_sem);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
+	unsigned long address = read_cr2(); /* Get the faulting address */
 	enum ctx_state prev_state;
-	/* Get the faulting address: */
-	unsigned long address = read_cr2();
+
+	/*
+	 * We must have this function tagged with __kprobes, notrace and call
+	 * read_cr2() before calling anything else. To avoid calling any kind
+	 * of tracing machinery before we've observed the CR2 value.
+	 *
+	 * exception_{enter,exit}() contain all sorts of tracepoints.
+	 */
 
 	prev_state = exception_enter();
 	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
 
-static void trace_page_fault_entries(struct pt_regs *regs,
+#ifdef CONFIG_TRACING
+static void trace_page_fault_entries(unsigned long address, struct pt_regs *regs,
 				     unsigned long error_code)
 {
 	if (user_mode(regs))
-		trace_page_fault_user(read_cr2(), regs, error_code);
+		trace_page_fault_user(address, regs, error_code);
 	else
-		trace_page_fault_kernel(read_cr2(), regs, error_code);
+		trace_page_fault_kernel(address, regs, error_code);
 }
 
-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
-	enum ctx_state prev_state;
 	/*
 	 * The exception_enter and tracepoint processing could
 	 * trigger another page faults (user space callchain
@@ -1277,9 +1288,11 @@ trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 * the faulting address now.
 	 */
 	unsigned long address = read_cr2();
+	enum ctx_state prev_state;
 
 	prev_state = exception_enter();
-	trace_page_fault_entries(regs, error_code);
+	trace_page_fault_entries(address, regs, error_code);
 	__do_page_fault(regs, error_code, address);
 	exception_exit(prev_state);
 }
+#endif /* CONFIG_TRACING */

^ permalink raw reply related	[flat|nested] 115+ messages in thread

end of thread, other threads:[~2014-03-07 23:08 UTC | newest]

Thread overview: 115+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-21 20:25 perf_fuzzer causes reboot Vince Weaver
2014-02-21 22:13 ` perf_fuzzer compiled for x32 " Vince Weaver
2014-02-21 22:34   ` Vince Weaver
2014-02-22  4:50     ` Vince Weaver
2014-02-22  5:03       ` H. Peter Anvin
2014-02-22  6:26       ` H. Peter Anvin
2014-02-23  5:18         ` Vince Weaver
2014-02-23  5:24           ` H. Peter Anvin
2014-02-23  6:07           ` H. Peter Anvin
2014-02-23 14:05             ` Vince Weaver
2014-02-24  3:02               ` Vince Weaver
2014-02-24  5:22                 ` H. Peter Anvin
2014-02-24 15:35                   ` Vince Weaver
2014-02-24 16:34                     ` Vince Weaver
2014-02-24 16:47                       ` H. Peter Anvin
2014-02-24 17:10                         ` Vince Weaver
2014-02-24 17:25                           ` Peter Zijlstra
2014-02-24 17:32                             ` Vince Weaver
2014-02-24 17:40                               ` H. Peter Anvin
2014-02-24 18:00                                 ` Vince Weaver
2014-02-24 18:07                                   ` Vince Weaver
2014-02-24 18:34                                     ` H. Peter Anvin
2014-02-24 19:13                                       ` Steven Rostedt
2014-02-24 19:15                                         ` H. Peter Anvin
2014-02-24 19:30                                         ` Peter Zijlstra
2014-02-24 19:32                                           ` Steven Rostedt
2014-02-25  3:49                                           ` H. Peter Anvin
2014-02-25 14:07                                             ` Vince Weaver
2014-02-25 14:34                                               ` H. Peter Anvin
2014-02-25 14:43                                                 ` Steven Rostedt
2014-02-25 15:33                                                   ` Vince Weaver
2014-02-26 15:06                                                     ` Vince Weaver
2014-02-27 22:06                                                       ` Vince Weaver
2014-02-27 22:31                                                         ` Steven Rostedt
2014-02-27 22:52                                                           ` H. Peter Anvin
2014-02-27 23:30                                                             ` Steven Rostedt
2014-02-27 23:46                                                               ` H. Peter Anvin
2014-02-28  1:00                                                                 ` Vince Weaver
2014-02-28 20:34                                                                   ` Paul E. McKenney
2014-02-28 20:47                                                                     ` Steven Rostedt
2014-02-28 20:54                                                                       ` Peter Zijlstra
2014-02-28 21:17                                                                         ` Paul E. McKenney
2014-02-28 21:27                                                                           ` Peter Zijlstra
2014-02-28 21:51                                                                             ` Paul E. McKenney
2014-02-28 21:55                                                                               ` Peter Zijlstra
2014-02-28 22:05                                                                                 ` Steven Rostedt
2014-02-28 22:23                                                                                   ` Paul E. McKenney
2014-02-28  1:34                                                             ` Vince Weaver
2014-02-28  2:17                                                               ` H. Peter Anvin
2014-02-28  2:57                                                               ` Steven Rostedt
2014-02-28 11:11                                                                 ` Peter Zijlstra
2014-02-28 13:37                                                                   ` Steven Rostedt
2014-02-28 14:15                                                                 ` Vince Weaver
2014-02-28 14:23                                                                   ` Steven Rostedt
2014-02-28 15:07                                                                     ` Vince Weaver
2014-02-28 15:13                                                                       ` H. Peter Anvin
2014-02-28 15:40                                                                         ` Peter Zijlstra
2014-02-28 16:15                                                                           ` H. Peter Anvin
2014-02-28 16:29                                                                             ` Steven Rostedt
2014-02-28 19:33                                                                               ` [PATCH] x86: Rename copy_from_user_nmi() to copy_from_user_trace() Steven Rostedt
2014-02-28 20:46                                                                                 ` Peter Zijlstra
2014-02-28 20:51                                                                                   ` Steven Rostedt
2014-02-28 20:58                                                                                     ` Peter Zijlstra
2014-02-28 21:01                                                                                   ` Steven Rostedt
2014-02-28 21:17                                                                                     ` Peter Zijlstra
2014-02-28 20:56                                                                               ` perf_fuzzer compiled for x32 causes reboot Peter Zijlstra
2014-02-28 21:06                                                                                 ` Steven Rostedt
2014-03-01  9:16                                                                                   ` Ingo Molnar
2014-03-01  9:50                                                                                     ` Borislav Petkov
2014-03-01 16:50                                                                                       ` H. Peter Anvin
2014-03-04 23:05                                                                                         ` Borislav Petkov
2014-03-03  9:16                                                                                     ` Peter Zijlstra
2014-02-28 20:55                                                                             ` Peter Zijlstra
2014-02-28 15:20                                                                       ` Steven Rostedt
2014-02-28 15:30                                                                         ` Steven Rostedt
2014-02-28 20:38                                                                       ` H. Peter Anvin
2014-02-28 20:46                                                                         ` Steven Rostedt
2014-02-28 21:18                                                                         ` Vince Weaver
2014-02-28 21:30                                                                           ` Steven Rostedt
2014-02-28 23:34                                                                             ` Vince Weaver
2014-03-01  0:43                                                                               ` H. Peter Anvin
2014-03-01  3:36                                                                               ` Steven Rostedt
2014-03-01 16:24                                                                                 ` Andi Kleen
2014-03-02 15:34                                                                                   ` Vince Weaver
2014-03-02 16:02                                                                                 ` Vince Weaver
2014-02-28  9:39                                                           ` Peter Zijlstra
2014-02-24 17:40                               ` Peter Zijlstra
2014-02-24 17:41                               ` Vince Weaver
2014-02-24 17:42                                 ` H. Peter Anvin
2014-02-24 17:52                             ` H. Peter Anvin
2014-02-28 15:33 [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
2014-02-28 15:47 ` Peter Zijlstra
2014-02-28 16:05   ` [PATCHv2] " Jiri Olsa
2014-02-28 16:11     ` H. Peter Anvin
2014-02-28 16:23       ` Steven Rostedt
2014-02-28 16:15     ` Steven Rostedt
2014-03-05  0:03     ` [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults tip-bot for Jiri Olsa
2014-03-05 11:14       ` Peter Zijlstra
2014-03-05 12:20         ` Steven Rostedt
2014-03-05 12:25           ` Peter Zijlstra
2014-03-05 12:33             ` Steven Rostedt
2014-03-05 12:54               ` Peter Zijlstra
2014-03-05 13:02                 ` Peter Zijlstra
2014-03-05 13:07                   ` Peter Zijlstra
2014-03-05 12:36             ` Peter Zijlstra
2014-03-05 13:00               ` Steven Rostedt
2014-03-05 13:08                 ` Peter Zijlstra
2014-03-05 21:37                   ` H. Peter Anvin
2014-03-06  8:40                     ` Peter Zijlstra
2014-03-06 11:02                       ` Steven Rostedt
2014-03-06 14:53                         ` [PATCH] x86: Further robustify CR2 handling vs tracing Peter Zijlstra
2014-03-07 23:07                           ` [tip:x86/urgent] x86, trace: " tip-bot for Peter Zijlstra
2014-02-28 15:47 ` [PATCH] x86 trace: Fix page fault tracing bug Jiri Olsa
2014-02-28 16:00   ` Steven Rostedt
2014-02-28 16:01 ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).