linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf: record segfaults for cycles event when collecting data on a VM
@ 2012-02-08 16:55 David Ahern
  2012-02-08 17:44 ` Joerg Roedel
  0 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2012-02-08 16:55 UTC (permalink / raw)
  To: joerg.roedel, Arnaldo Carvalho de Melo, LKML

$ ps -p 21483
  PID TTY          TIME CMD
21483 pts/3    00:30:20 qemu-kvm

perf record -e cycles -p 21483 -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data (~503 samples) ]
Segmentation fault

git bisect points to:

1aed2671738785e8f5aea663a6fda91aa7ef59b5 is the first bad commit
commit 1aed2671738785e8f5aea663a6fda91aa7ef59b5
Author: Joerg Roedel <joerg.roedel@amd.com>
Date:   Wed Jan 4 17:54:20 2012 +0100

    perf kvm: Do guest-only counting by default

    Make use of exclude_guest and exlude_host in perf-kvm to do only
    guest-only counting by default.

    Cc: Gleb Natapov <gleb@redhat.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Joerg Roedel <joro@8bytes.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: Gleb Natapov <gleb@redhat.com>
    Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
    [ committer note: Moved perf_{guest,host} & event_attr_init to util.c ]
    [                 so as not to drag more stuff to the python binding]
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>



When processing the events at the end of the collection machine comes as
NULL. Top part of the stack trace:


#0  0x00000000004667c2 in machine__findnew_thread (self=0x0, pid=21483)
at util/thread.c:75
        p = 0x28
        parent = 0x0
        th = 0x8e51b0
#1  0x00000000004395bc in build_id__mark_dso_hit (tool=0x7802e0,
event=0x7ffff7ff9df8,
    sample=0x7fffffffe030, evsel=0x8e4c80, machine=0x0) at
util/build-id.c:27
        al = {thread = 0x7fffffffe030, map = 0x6f68f12cf0d73400, sym =
0x7fffffffdfa0,
          addr = 140737488347184, level = -8 '\370', filtered = 157,
cpumode = 255 '\377', cpu = 32767}
        cpumode = 4 '\004'
        thread = 0x8e50a0
#2  0x0000000000464aed in perf_session_deliver_event (session=0x8e50a0,
event=0x7ffff7ff9df8,
    sample=0x7fffffffe030, tool=0x7802e0, file_offset=11768) at
util/session.c:799
        evsel = 0x8e4c80
        machine = 0x0
#3  0x0000000000464fd4 in perf_session__process_event (session=0x8e50a0,
event=0x7ffff7ff9df8,
    tool=0x7802e0, file_offset=11768) at util/session.c:907
        sample = {ip = 18446744071579083331, pid = 21483, tid = 21487,
time = 18446744073709551615,
          addr = 0, id = 18446744073709551615, stream_id =
18446744073709551615, period = 10325467,
          cpu = 4294967295, raw_size = 0, raw_data = 0x0, callchain = 0x0}
        ret = 0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-08 16:55 perf: record segfaults for cycles event when collecting data on a VM David Ahern
@ 2012-02-08 17:44 ` Joerg Roedel
  2012-02-08 17:53   ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Roedel @ 2012-02-08 17:44 UTC (permalink / raw)
  To: David Ahern; +Cc: Arnaldo Carvalho de Melo, LKML, Jason Wang

On Wed, Feb 08, 2012 at 09:55:35AM -0700, David Ahern wrote:
> perf record -e cycles -p 21483 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.012 MB perf.data (~503 samples) ]
> Segmentation fault

The problem is similar to the one Jason is seeing. I am working on a fix
right now.
Bottom line is that the perf-tool may receive samples tagged as
GUEST_KERNEL even when guest-sampling is disabled (probably a
race-condition). The perf-tool can not find a valid machine pointer for
such a sample and passes NULL down to the other functions. And some
functions don't seem to handle this.

David, can you try to change the default for perf_guest back to false
amd re-test? Not with 'sleep 1' probably, on my setup it takes a busy
guest and a few seconds to trigger.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-08 17:44 ` Joerg Roedel
@ 2012-02-08 17:53   ` David Ahern
  2012-02-08 17:57     ` Joerg Roedel
  0 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2012-02-08 17:53 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Arnaldo Carvalho de Melo, LKML, Jason Wang



On 02/08/2012 10:44 AM, Joerg Roedel wrote:
> On Wed, Feb 08, 2012 at 09:55:35AM -0700, David Ahern wrote:
>> perf record -e cycles -p 21483 -- sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.012 MB perf.data (~503 samples) ]
>> Segmentation fault
> 
> The problem is similar to the one Jason is seeing. I am working on a fix
> right now.
> Bottom line is that the perf-tool may receive samples tagged as
> GUEST_KERNEL even when guest-sampling is disabled (probably a
> race-condition). The perf-tool can not find a valid machine pointer for
> such a sample and passes NULL down to the other functions. And some
> functions don't seem to handle this.
> 
> David, can you try to change the default for perf_guest back to false
> amd re-test? Not with 'sleep 1' probably, on my setup it takes a busy
> guest and a few seconds to trigger.

The segfault is in the event processing once the collection is done
(perf-record), so the time length should not matter.

Either way, this fixes the segafult:

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 8131410..fb25d13 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -6,7 +6,7 @@
  * XXX We need to find a better place for these things...
  */
 bool perf_host  = true;
-bool perf_guest = true;
+bool perf_guest = false;

 void event_attr_init(struct perf_event_attr *attr)
 {

which makes sense. It forces perf_session__find_machine_for_cpumode() to
return the host machine always.

David

> 
> 
> 	Joerg
> 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-08 17:53   ` David Ahern
@ 2012-02-08 17:57     ` Joerg Roedel
  2012-02-09  7:30       ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Roedel @ 2012-02-08 17:57 UTC (permalink / raw)
  To: David Ahern; +Cc: Arnaldo Carvalho de Melo, LKML, Jason Wang

On Wed, Feb 08, 2012 at 10:53:04AM -0700, David Ahern wrote:

> The segfault is in the event processing once the collection is done
> (perf-record), so the time length should not matter.

Yes, but I wanted to make sure there is a GUEST_KERNEL sample in there
to process. It usually takes some time (at least in my tests) to get a
spurious one.

> Either way, this fixes the segafult:
> 
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index 8131410..fb25d13 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -6,7 +6,7 @@
>   * XXX We need to find a better place for these things...
>   */
>  bool perf_host  = true;
> -bool perf_guest = true;
> +bool perf_guest = false;
> 
>  void event_attr_init(struct perf_event_attr *attr)
>  {
> 
> which makes sense. It forces perf_session__find_machine_for_cpumode() to
> return the host machine always.

Great, thanks. I will send two patches tomorrow to fix Jason's problem
and change the default for perf_guest.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-08 17:57     ` Joerg Roedel
@ 2012-02-09  7:30       ` Ingo Molnar
  2012-02-09 11:14         ` Joerg Roedel
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2012-02-09  7:30 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: David Ahern, Arnaldo Carvalho de Melo, LKML, Jason Wang


* Joerg Roedel <joerg.roedel@amd.com> wrote:

> > which makes sense. It forces 
> > perf_session__find_machine_for_cpumode() to return the host 
> > machine always.
> 
> Great, thanks. I will send two patches tomorrow to fix Jason's 
> problem and change the default for perf_guest.

Well, if the crash is fixed then the the default can stay, 
right?

Generally we should treat all input data in a perf.data or even 
the bits we get in the ring-buffer as external data that has to 
be checked carefully, with no assumptions made about data.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-09  7:30       ` Ingo Molnar
@ 2012-02-09 11:14         ` Joerg Roedel
  2012-02-09 13:34           ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Roedel @ 2012-02-09 11:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David Ahern, Arnaldo Carvalho de Melo, LKML, Jason Wang

On Thu, Feb 09, 2012 at 08:30:24AM +0100, Ingo Molnar wrote:
> 
> * Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > > which makes sense. It forces 
> > > perf_session__find_machine_for_cpumode() to return the host 
> > > machine always.
> > 
> > Great, thanks. I will send two patches tomorrow to fix Jason's 
> > problem and change the default for perf_guest.
> 
> Well, if the crash is fixed then the the default can stay, 
> right?

David's crash is fixed by changing the default back to its original
value :)

> Generally we should treat all input data in a perf.data or even 
> the bits we get in the ring-buffer as external data that has to 
> be checked carefully, with no assumptions made about data.

Well, there are two options:

	1) Make sure machine == NULL does not happen. Changing the
	   default of perf_guest back to false does exactly this for
	   David's problem.
	2) Make sure that a machine == NULL pointer is never
	   dereferenced

I was going to fix it with option 1. Do you suggest option 2 is better?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-09 11:14         ` Joerg Roedel
@ 2012-02-09 13:34           ` Ingo Molnar
  2012-02-09 14:32             ` Joerg Roedel
  2012-02-09 14:43             ` David Ahern
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2012-02-09 13:34 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: David Ahern, Arnaldo Carvalho de Melo, LKML, Jason Wang


* Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Thu, Feb 09, 2012 at 08:30:24AM +0100, Ingo Molnar wrote:
> > 
> > * Joerg Roedel <joerg.roedel@amd.com> wrote:
> > 
> > > > which makes sense. It forces 
> > > > perf_session__find_machine_for_cpumode() to return the host 
> > > > machine always.
> > > 
> > > Great, thanks. I will send two patches tomorrow to fix Jason's 
> > > problem and change the default for perf_guest.
> > 
> > Well, if the crash is fixed then the the default can stay, 
> > right?
> 
> David's crash is fixed by changing the default back to its 
> original value :)

Then that's the wrong fix really.

> > Generally we should treat all input data in a perf.data or even 
> > the bits we get in the ring-buffer as external data that has to 
> > be checked carefully, with no assumptions made about data.
> 
> Well, there are two options:
> 
> 	1) Make sure machine == NULL does not happen. Changing the
> 	   default of perf_guest back to false does exactly this for
> 	   David's problem.

So what if it's turned on by the user? Do we still crash 
occasionally?

> 	2) Make sure that a machine == NULL pointer is never
> 	   dereferenced
> 
> I was going to fix it with option 1. Do you suggest option 2 is better?

Looks like the better fix. You said:

> Bottom line is that the perf-tool may receive samples tagged 
> as GUEST_KERNEL even when guest-sampling is disabled (probably 
> a race-condition). The perf-tool can not find a valid machine 
> pointer for such a sample and passes NULL down to the other 
> functions. And some functions don't seem to handle this.

tooling should never be surprised by getting some unexpected 
sample via the perf.data or the ring-buffer - regardless of 
whether that functionality is default enabled or manually 
enabled.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-09 13:34           ` Ingo Molnar
@ 2012-02-09 14:32             ` Joerg Roedel
  2012-02-09 15:28               ` Joerg Roedel
  2012-02-09 14:43             ` David Ahern
  1 sibling, 1 reply; 11+ messages in thread
From: Joerg Roedel @ 2012-02-09 14:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David Ahern, Arnaldo Carvalho de Melo, LKML, Jason Wang

On Thu, Feb 09, 2012 at 02:34:47PM +0100, Ingo Molnar wrote:
> 
> * Joerg Roedel <joerg.roedel@amd.com> wrote:

> > Well, there are two options:
> > 
> > 	1) Make sure machine == NULL does not happen. Changing the
> > 	   default of perf_guest back to false does exactly this for
> > 	   David's problem.
> 
> So what if it's turned on by the user? Do we still crash 
> occasionally?

It is only turned on by perf-kvm, and this path should setup a machine
object for guest samples.

> 
> > 	2) Make sure that a machine == NULL pointer is never
> > 	   dereferenced
> > 
> > I was going to fix it with option 1. Do you suggest option 2 is better?
> 
> Looks like the better fix. You said:
> 
> > Bottom line is that the perf-tool may receive samples tagged 
> > as GUEST_KERNEL even when guest-sampling is disabled (probably 
> > a race-condition). The perf-tool can not find a valid machine 
> > pointer for such a sample and passes NULL down to the other 
> > functions. And some functions don't seem to handle this.
> 
> tooling should never be surprised by getting some unexpected 
> sample via the perf.data or the ring-buffer - regardless of 
> whether that functionality is default enabled or manually 
> enabled.

Yeah, right. Guest samples may also show up intentionally when the event
modifiers are used. So crashing on machine==NULL needs to be fixed.


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-09 13:34           ` Ingo Molnar
  2012-02-09 14:32             ` Joerg Roedel
@ 2012-02-09 14:43             ` David Ahern
  1 sibling, 0 replies; 11+ messages in thread
From: David Ahern @ 2012-02-09 14:43 UTC (permalink / raw)
  To: Ingo Molnar, Joerg Roedel; +Cc: Arnaldo Carvalho de Melo, LKML, Jason Wang

On 02/09/2012 06:34 AM, Ingo Molnar wrote:
> 
> * Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
>> On Thu, Feb 09, 2012 at 08:30:24AM +0100, Ingo Molnar wrote:
>>>
>>> * Joerg Roedel <joerg.roedel@amd.com> wrote:
>>>
>>>>> which makes sense. It forces 
>>>>> perf_session__find_machine_for_cpumode() to return the host 
>>>>> machine always.
>>>>
>>>> Great, thanks. I will send two patches tomorrow to fix Jason's 
>>>> problem and change the default for perf_guest.
>>>
>>> Well, if the crash is fixed then the the default can stay, 
>>> right?
>>
>> David's crash is fixed by changing the default back to its 
>> original value :)
> 
> Then that's the wrong fix really.

You can't enable perf_guest unless guest info has been configured as
done with perf-kvm.

The problem is triggered in perf_session__find_machine_for_cpumode(). If
perf_session__find_machine() is changed to
perf_session__findnew_machine() you get into machines__findnew() which
shows the root cause: the event is for a VM but symbol_conf.guestmount
is not set.

So, really perf_guest cannot be enabled unless guest info has been given.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-09 14:32             ` Joerg Roedel
@ 2012-02-09 15:28               ` Joerg Roedel
  2012-02-09 15:47                 ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Roedel @ 2012-02-09 15:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David Ahern, Arnaldo Carvalho de Melo, LKML, Jason Wang

On Thu, Feb 09, 2012 at 03:32:53PM +0100, Joerg Roedel wrote:
> Yeah, right. Guest samples may also show up intentionally when the event
> modifiers are used. So crashing on machine==NULL needs to be fixed.

Hmm, on the other hand, there is no way to specify guest symbol
information to the regular perf subcommands except for perf-kvm. Without
that information perf can not resolve any GUEST_KERNEL samples, so in
the end there is no point in setting perf_guest to true except for
perf-kvm, no?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: record segfaults for cycles event when collecting data on a VM
  2012-02-09 15:28               ` Joerg Roedel
@ 2012-02-09 15:47                 ` David Ahern
  0 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2012-02-09 15:47 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, Jason Wang



On 02/09/2012 08:28 AM, Joerg Roedel wrote:
> On Thu, Feb 09, 2012 at 03:32:53PM +0100, Joerg Roedel wrote:
>> Yeah, right. Guest samples may also show up intentionally when the event
>> modifiers are used. So crashing on machine==NULL needs to be fixed.
> 
> Hmm, on the other hand, there is no way to specify guest symbol
> information to the regular perf subcommands except for perf-kvm. Without
> that information perf can not resolve any GUEST_KERNEL samples, so in
> the end there is no point in setting perf_guest to true except for
> perf-kvm, no?

Exactly. That was my point in my last reply.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-02-09 15:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-08 16:55 perf: record segfaults for cycles event when collecting data on a VM David Ahern
2012-02-08 17:44 ` Joerg Roedel
2012-02-08 17:53   ` David Ahern
2012-02-08 17:57     ` Joerg Roedel
2012-02-09  7:30       ` Ingo Molnar
2012-02-09 11:14         ` Joerg Roedel
2012-02-09 13:34           ` Ingo Molnar
2012-02-09 14:32             ` Joerg Roedel
2012-02-09 15:28               ` Joerg Roedel
2012-02-09 15:47                 ` David Ahern
2012-02-09 14:43             ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).