linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf evlist: fix memory corruption for Kernel PMU event
@ 2020-10-01 11:57 Barry Song
  2020-10-01 23:06 ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2020-10-01 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxarm, Barry Song, Andi Kleen, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Alexey Budankov

Commit 7736627b865d ("perf stat: Use affinity for closing file
descriptors") will use FD(evsel, cpu, thread) to read and write
file descriptors xyarray. For a kernel PMU event, this leads to
serious memory corruption and perf crash.
I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr
with the total number of CPUs. so xyarray which is allocated by
evlist->core.cpus->nr will get overflow. This leads to various
segmentation faults in perf tool for kernel PMU events, eg:
./perf stat -e bus_cycles  sleep 1
*** Error in `./perf': free(): invalid next size (fast): 0x00000000401e6370 ***
Aborted (core dumped)

Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 tools/perf/util/evlist.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c0768c6..3022152 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1226,10 +1226,14 @@ void evlist__close(struct evlist *evlist)
 	int cpu, i;
 
 	/*
-	 * With perf record core.cpus is usually NULL.
+	 * With perf record core.cpus is usually NULL;
+	 * For Kernel PMU event x, "perf stat -e x" will set evlist->core.cpus->nr to
+	 * 1 while evsel has cpus->nr which contains all CPUs. evsel__cpu_iter_skip()
+	 * will be false, memory corruption will happen if we use affinity to close
+	 * file descriptors;
 	 * Use the old method to handle this for now.
 	 */
-	if (!evlist->core.cpus) {
+	if (!evlist->core.cpus || evlist->core.cpus->nr == 1) {
 		evlist__for_each_entry_reverse(evlist, evsel)
 			evsel__close(evsel);
 		return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
  2020-10-01 11:57 [PATCH] perf evlist: fix memory corruption for Kernel PMU event Barry Song
@ 2020-10-01 23:06 ` Andi Kleen
  2020-10-02  3:02   ` Song Bao Hua (Barry Song)
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2020-10-01 23:06 UTC (permalink / raw)
  To: Barry Song
  Cc: linux-kernel, linuxarm, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Alexey Budankov

On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> Commit 7736627b865d ("perf stat: Use affinity for closing file
> descriptors") will use FD(evsel, cpu, thread) to read and write
> file descriptors xyarray. For a kernel PMU event, this leads to
> serious memory corruption and perf crash.
> I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr
> with the total number of CPUs. so xyarray which is allocated by
> evlist->core.cpus->nr will get overflow. This leads to various
> segmentation faults in perf tool for kernel PMU events, eg:
> ./perf stat -e bus_cycles  sleep 1
> *** Error in `./perf': free(): invalid next size (fast): 0x00000000401e6370 ***
> Aborted (core dumped)

Thanks.

I believe there is already a patch queued for this.

The problem seems to only happen on ARM64.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
  2020-10-01 23:06 ` Andi Kleen
@ 2020-10-02  3:02   ` Song Bao Hua (Barry Song)
  2020-10-06  1:25     ` Namhyung Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Song Bao Hua (Barry Song) @ 2020-10-02  3:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Linuxarm, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Alexey Budankov



> -----Original Message-----
> From: Andi Kleen [mailto:ak@linux.intel.com]
> Sent: Friday, October 2, 2020 12:07 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Peter
> Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>; Arnaldo
> Carvalho de Melo <acme@kernel.org>; Mark Rutland
> <mark.rutland@arm.com>; Alexander Shishkin
> <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@redhat.com>;
> Namhyung Kim <namhyung@kernel.org>; Adrian Hunter
> <adrian.hunter@intel.com>; Alexey Budankov
> <alexey.budankov@linux.intel.com>
> Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
> 
> On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> > Commit 7736627b865d ("perf stat: Use affinity for closing file
> > descriptors") will use FD(evsel, cpu, thread) to read and write file
> > descriptors xyarray. For a kernel PMU event, this leads to serious
> > memory corruption and perf crash.
> > I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr with
> > the total number of CPUs. so xyarray which is allocated by
> > evlist->core.cpus->nr will get overflow. This leads to various
> > segmentation faults in perf tool for kernel PMU events, eg:
> > ./perf stat -e bus_cycles  sleep 1
> > *** Error in `./perf': free(): invalid next size (fast):
> > 0x00000000401e6370 *** Aborted (core dumped)
> 
> Thanks.
> 
> I believe there is already a patch queued for this.

Andi, thanks! Could you share the link or the commit ID? I'd like to take a look at the fix.
I could still reproduce this issue in the latest linus' tree and I didn't find any commit
related to this issue in linux-next and tip/perf/core.

> 
> The problem seems to only happen on ARM64.

My platform which has this issue is really ARM64.

Thanks
Barry

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
  2020-10-02  3:02   ` Song Bao Hua (Barry Song)
@ 2020-10-06  1:25     ` Namhyung Kim
  2020-10-06  6:39       ` Song Bao Hua (Barry Song)
  0 siblings, 1 reply; 7+ messages in thread
From: Namhyung Kim @ 2020-10-06  1:25 UTC (permalink / raw)
  To: Song Bao Hua (Barry Song)
  Cc: Andi Kleen, linux-kernel, Linuxarm, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Alexey Budankov

Hello,

On Fri, Oct 2, 2020 at 12:02 PM Song Bao Hua (Barry Song)
<song.bao.hua@hisilicon.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Andi Kleen [mailto:ak@linux.intel.com]
> > Sent: Friday, October 2, 2020 12:07 PM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > Cc: linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Peter
> > Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>; Arnaldo
> > Carvalho de Melo <acme@kernel.org>; Mark Rutland
> > <mark.rutland@arm.com>; Alexander Shishkin
> > <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@redhat.com>;
> > Namhyung Kim <namhyung@kernel.org>; Adrian Hunter
> > <adrian.hunter@intel.com>; Alexey Budankov
> > <alexey.budankov@linux.intel.com>
> > Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
> >
> > On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> > > Commit 7736627b865d ("perf stat: Use affinity for closing file
> > > descriptors") will use FD(evsel, cpu, thread) to read and write file
> > > descriptors xyarray. For a kernel PMU event, this leads to serious
> > > memory corruption and perf crash.
> > > I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr with
> > > the total number of CPUs. so xyarray which is allocated by
> > > evlist->core.cpus->nr will get overflow. This leads to various
> > > segmentation faults in perf tool for kernel PMU events, eg:
> > > ./perf stat -e bus_cycles  sleep 1
> > > *** Error in `./perf': free(): invalid next size (fast):
> > > 0x00000000401e6370 *** Aborted (core dumped)
> >
> > Thanks.
> >
> > I believe there is already a patch queued for this.
>
> Andi, thanks! Could you share the link or the commit ID? I'd like to take a look at the fix.
> I could still reproduce this issue in the latest linus' tree and I didn't find any commit
> related to this issue in linux-next and tip/perf/core.

I think Andi was referring to this discussion which is not merged yet:

https://lore.kernel.org/lkml/20200922031346.15051-2-liwei391@huawei.com/

I suggested a patch at the end.  Can you please try it?

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
  2020-10-06  1:25     ` Namhyung Kim
@ 2020-10-06  6:39       ` Song Bao Hua (Barry Song)
  2020-10-06 11:11         ` Jiri Olsa
  0 siblings, 1 reply; 7+ messages in thread
From: Song Bao Hua (Barry Song) @ 2020-10-06  6:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, linux-kernel, Linuxarm, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Alexey Budankov



> -----Original Message-----
> From: Namhyung Kim [mailto:namhyung@kernel.org]
> Sent: Tuesday, October 6, 2020 2:26 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: Andi Kleen <ak@linux.intel.com>; linux-kernel@vger.kernel.org; Linuxarm
> <linuxarm@huawei.com>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar
> <mingo@redhat.com>; Arnaldo Carvalho de Melo <acme@kernel.org>; Mark
> Rutland <mark.rutland@arm.com>; Alexander Shishkin
> <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@redhat.com>; Adrian
> Hunter <adrian.hunter@intel.com>; Alexey Budankov
> <alexey.budankov@linux.intel.com>
> Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
> 
> Hello,
> 
> On Fri, Oct 2, 2020 at 12:02 PM Song Bao Hua (Barry Song)
> <song.bao.hua@hisilicon.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Andi Kleen [mailto:ak@linux.intel.com]
> > > Sent: Friday, October 2, 2020 12:07 PM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > Cc: linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Peter
> > > Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>;
> Arnaldo
> > > Carvalho de Melo <acme@kernel.org>; Mark Rutland
> > > <mark.rutland@arm.com>; Alexander Shishkin
> > > <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@redhat.com>;
> > > Namhyung Kim <namhyung@kernel.org>; Adrian Hunter
> > > <adrian.hunter@intel.com>; Alexey Budankov
> > > <alexey.budankov@linux.intel.com>
> > > Subject: Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU
> event
> > >
> > > On Fri, Oct 02, 2020 at 12:57:29AM +1300, Barry Song wrote:
> > > > Commit 7736627b865d ("perf stat: Use affinity for closing file
> > > > descriptors") will use FD(evsel, cpu, thread) to read and write file
> > > > descriptors xyarray. For a kernel PMU event, this leads to serious
> > > > memory corruption and perf crash.
> > > > I have seen evlist->core.cpus->nr is 1 while evsel has cpus->nr with
> > > > the total number of CPUs. so xyarray which is allocated by
> > > > evlist->core.cpus->nr will get overflow. This leads to various
> > > > segmentation faults in perf tool for kernel PMU events, eg:
> > > > ./perf stat -e bus_cycles  sleep 1
> > > > *** Error in `./perf': free(): invalid next size (fast):
> > > > 0x00000000401e6370 *** Aborted (core dumped)
> > >
> > > Thanks.
> > >
> > > I believe there is already a patch queued for this.
> >
> > Andi, thanks! Could you share the link or the commit ID? I'd like to take a
> look at the fix.
> > I could still reproduce this issue in the latest linus' tree and I didn't find any
> commit
> > related to this issue in linux-next and tip/perf/core.
> 
> I think Andi was referring to this discussion which is not merged yet:
> 
> https://lore.kernel.org/lkml/20200922031346.15051-2-liwei391@huawei.co
> m/
> 
> I suggested a patch at the end.  Can you please try it?

I tried the patch you suggested.

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 2208444ecb44..cfcdbd7be066 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
         if (!evsel->own_cpus || evlist->has_user_cpus) {
                 perf_cpu_map__put(evsel->cpus);
                evsel->cpus = perf_cpu_map__get(evlist->cpus);
+       } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
+               perf_cpu_map__put(evsel->cpus);
+               evsel->cpus = perf_cpu_map__get(evlist->cpus);
        } else if (evsel->cpus != evsel->own_cpus) {
                perf_cpu_map__put(evsel->cpus);
                evsel->cpus = perf_cpu_map__get(evsel->own_cpus);

it did fix the crash I have seen on arm64. I'd prefer you put the below fixes tag in the commit log. 
Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
Perf stat began to crash from v5.4 kernel, so the fix should be backported to stable trees.

Thanks
Barry

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
  2020-10-06  6:39       ` Song Bao Hua (Barry Song)
@ 2020-10-06 11:11         ` Jiri Olsa
  2020-10-07  7:23           ` Namhyung Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Olsa @ 2020-10-06 11:11 UTC (permalink / raw)
  To: Song Bao Hua (Barry Song)
  Cc: Namhyung Kim, Andi Kleen, linux-kernel, Linuxarm, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Adrian Hunter, Alexey Budankov

On Tue, Oct 06, 2020 at 06:39:44AM +0000, Song Bao Hua (Barry Song) wrote:

SNIP

> > > Andi, thanks! Could you share the link or the commit ID? I'd like to take a
> > look at the fix.
> > > I could still reproduce this issue in the latest linus' tree and I didn't find any
> > commit
> > > related to this issue in linux-next and tip/perf/core.
> > 
> > I think Andi was referring to this discussion which is not merged yet:
> > 
> > https://lore.kernel.org/lkml/20200922031346.15051-2-liwei391@huawei.co
> > m/
> > 
> > I suggested a patch at the end.  Can you please try it?
> 
> I tried the patch you suggested.
> 
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 2208444ecb44..cfcdbd7be066 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
>          if (!evsel->own_cpus || evlist->has_user_cpus) {
>                  perf_cpu_map__put(evsel->cpus);
>                 evsel->cpus = perf_cpu_map__get(evlist->cpus);
> +       } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> +               perf_cpu_map__put(evsel->cpus);
> +               evsel->cpus = perf_cpu_map__get(evlist->cpus);
>         } else if (evsel->cpus != evsel->own_cpus) {
>                 perf_cpu_map__put(evsel->cpus);
>                 evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> 
> it did fix the crash I have seen on arm64. I'd prefer you put the below fixes tag in the commit log. 
> Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> Perf stat began to crash from v5.4 kernel, so the fix should be backported to stable trees.

awesome.. Namhyung, could you please send full patch?

thanks,
jirka


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf evlist: fix memory corruption for Kernel PMU event
  2020-10-06 11:11         ` Jiri Olsa
@ 2020-10-07  7:23           ` Namhyung Kim
  0 siblings, 0 replies; 7+ messages in thread
From: Namhyung Kim @ 2020-10-07  7:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Song Bao Hua (Barry Song),
	Andi Kleen, linux-kernel, Linuxarm, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Adrian Hunter, Alexey Budankov

Hello,

On Tue, Oct 6, 2020 at 8:11 PM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Oct 06, 2020 at 06:39:44AM +0000, Song Bao Hua (Barry Song) wrote:
>
> SNIP
>
> > > > Andi, thanks! Could you share the link or the commit ID? I'd like to take a
> > > look at the fix.
> > > > I could still reproduce this issue in the latest linus' tree and I didn't find any
> > > commit
> > > > related to this issue in linux-next and tip/perf/core.
> > >
> > > I think Andi was referring to this discussion which is not merged yet:
> > >
> > > https://lore.kernel.org/lkml/20200922031346.15051-2-liwei391@huawei.co
> > > m/
> > >
> > > I suggested a patch at the end.  Can you please try it?
> >
> > I tried the patch you suggested.
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 2208444ecb44..cfcdbd7be066 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> >          if (!evsel->own_cpus || evlist->has_user_cpus) {
> >                  perf_cpu_map__put(evsel->cpus);
> >                 evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > +       } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> > +               perf_cpu_map__put(evsel->cpus);
> > +               evsel->cpus = perf_cpu_map__get(evlist->cpus);
> >         } else if (evsel->cpus != evsel->own_cpus) {
> >                 perf_cpu_map__put(evsel->cpus);
> >                 evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> >
> > it did fix the crash I have seen on arm64. I'd prefer you put the below fixes tag in the commit log.
> > Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> > Perf stat began to crash from v5.4 kernel, so the fix should be backported to stable trees.
>
> awesome.. Namhyung, could you please send full patch?

Sure, I'll do!

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-07  7:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 11:57 [PATCH] perf evlist: fix memory corruption for Kernel PMU event Barry Song
2020-10-01 23:06 ` Andi Kleen
2020-10-02  3:02   ` Song Bao Hua (Barry Song)
2020-10-06  1:25     ` Namhyung Kim
2020-10-06  6:39       ` Song Bao Hua (Barry Song)
2020-10-06 11:11         ` Jiri Olsa
2020-10-07  7:23           ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).