All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jin, Yao" <yao.jin@linux.intel.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: acme@kernel.org, jolsa@kernel.org, peterz@infradead.org,
	mingo@redhat.com, alexander.shishkin@linux.intel.com,
	Linux-kernel@vger.kernel.org, ak@linux.intel.com,
	kan.liang@intel.com, yao.jin@intel.com
Subject: Re: [PATCH] perf evsel: Get group fd from CPU0 for system wide event
Date: Fri, 15 May 2020 14:04:57 +0800	[thread overview]
Message-ID: <68e53765-6f45-9483-7543-0a2f961cdc62@linux.intel.com> (raw)
In-Reply-To: <3e813227-4954-0d4b-bc7a-ca272b18454a@linux.intel.com>

Hi Jiri,

On 5/9/2020 3:37 PM, Jin, Yao wrote:
> Hi Jiri,
> 
> On 5/5/2020 8:03 AM, Jiri Olsa wrote:
>> On Sat, May 02, 2020 at 10:33:59AM +0800, Jin, Yao wrote:
>>
>> SNIP
>>
>>>>> @@ -1461,6 +1461,9 @@ static int get_group_fd(struct evsel *evsel, int cpu, int thread)
>>>>>        BUG_ON(!leader->core.fd);
>>>>>        fd = FD(leader, cpu, thread);
>>>>> +    if (fd == -1 && leader->core.system_wide)
>>>>
>>>> fd does not need to be -1 in here.. in my setup cstate_pkg/c2-residency/
>>>> has cpumask 0, so other cpus never get open and are 0, and the whole thing
>>>> ends up with:
>>>>
>>>>     sys_perf_event_open: pid -1  cpu 1  group_fd 0  flags 0
>>>>     sys_perf_event_open failed, error -9
>>>>
>>>> I actualy thought we put -1 to fd array but couldn't find it.. perhaps we should od that
>>>>
>>>>
>>>
>>> I have tested on two platforms. On KBL desktop fd is 0 for this case, but on
>>> oncascadelakex server, fd is -1, so the BUG_ON(fd == -1) is triggered.
>>>
>>>>> +        fd = FD(leader, 0, thread);
>>>>> +
>>>>
>>>> so how do we group following events?
>>>>
>>>>     cstate_pkg/c2-residency/ - cpumask 0
>>>>     msr/tsc/                 - all cpus
>>>>
>>>
>>> Not sure if it's enough to only use cpumask 0 because
>>> cstate_pkg/c2-residency/ should be per-socket.
>>>
>>>> cpu 0 is fine.. the rest I have no idea ;-)
>>>>
>>>
>>> Perhaps we directly remove the BUG_ON(fd == -1) assertion?
>>
>> I think we need to make clear how to deal with grouping over
>> events that comes for different cpus
>>
>>     so how do we group following events?
>>
>>        cstate_pkg/c2-residency/ - cpumask 0
>>        msr/tsc/                 - all cpus
>>
>>
>> what's the reason/expected output of groups with above events?
>> seems to make sense only if we limit msr/tsc/ to cpumask 0 as well
>>
>> jirka
>>
> 
> On 2-socket machine (e.g cascadelakex), "cstate_pkg/c2-residency/" is per-socket event and the 
> cpumask is 0 and 24.
> 
> root@lkp-csl-2sp5 /sys/devices/cstate_pkg# cat cpumask
> 0,24
> 
> We can't limit it to cpumask 0. It should be programmed on CPU0 and CPU24 (the first CPU on each 
> socket).
> 
> The "msr/tsc" are per-cpu event, it should be programmed on all cpus. So I don't think we can limit 
> msr/tsc to cpumask 0.
> 
> The issue is how we deal with get_group_fd().
> 
> static int get_group_fd(struct evsel *evsel, int cpu, int thread)
> {
>          struct evsel *leader = evsel->leader;
>          int fd;
> 
>          if (evsel__is_group_leader(evsel))
>                  return -1;
> 
>          /*
>           * Leader must be already processed/open,
>           * if not it's a bug.
>           */
>          BUG_ON(!leader->core.fd);
> 
>          fd = FD(leader, cpu, thread);
>          BUG_ON(fd == -1);
> 
>          return fd;
> }
> 
> When evsel is "msr/tsc/",
> 
> FD(leader, 0, 0) is 3 (3 is the fd of "cstate_pkg/c2-residency/" on CPU0)
> FD(leader, 1, 0) is -1
> BUG_ON asserted.
> 
> If we just return group_fd(-1) for "msr/tsc", it looks like it's not a problem, is it?
> 
> Thanks
> Jin Yao

I think I get the root cause. That should be a serious bug in get_group_fd, access violation!

For a group mixed with system-wide event and per-core event and the group leader is system-wide 
event, access violation will happen.

perf_evsel__alloc_fd allocates one FD member for system-wide event (only FD(evsel, 0, 0) is valid).

But for per core event, perf_evsel__alloc_fd allocates N FD members (N = ncpus). For example, for 
ncpus is 8, FD(evsel, 0, 0) to FD(evsel, 7, 0) are valid.

get_group_fd(struct evsel *evsel, int cpu, int thread)
{
     struct evsel *leader = evsel->leader;

     fd = FD(leader, cpu, thread);    /* access violation may happen here */
}

If leader is system-wide event, only the FD(leader, 0, 0) is valid.

When get_group_fd accesses FD(leader, 1, 0), access violation happens.

My fix is:

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 28683b0eb738..db05b8a1e1a8 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1440,6 +1440,9 @@ static int get_group_fd(struct evsel *evsel, int cpu, int thread)
         if (evsel__is_group_leader(evsel))
                 return -1;

+       if (leader->core.system_wide && !evsel->core.system_wide)
+               return -2;
+
         /*
          * Leader must be already processed/open,
          * if not it's a bug.
@@ -1665,6 +1668,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
                                 pid = perf_thread_map__pid(threads, thread);

                         group_fd = get_group_fd(evsel, cpu, thread);
+                       if (group_fd == -2) {
+                               errno = EINVAL;
+                               err = -EINVAL;
+                               goto out_close;
+                       }
  retry_open:
                         test_attr__ready();

It enables the perf_evlist__reset_weak_group. And in the second_pass (in __run_perf_stat), the 
events will be opened successfully.

I have tested OK for this fix on cascadelakex.

Thanks
Jin Yao

  reply	other threads:[~2020-05-15  6:05 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30  1:34 [PATCH] perf evsel: Get group fd from CPU0 for system wide event Jin Yao
2020-05-01 10:23 ` Jiri Olsa
2020-05-02  2:33   ` Jin, Yao
2020-05-05  0:03     ` Jiri Olsa
2020-05-09  7:37       ` Jin, Yao
2020-05-15  6:04         ` Jin, Yao [this message]
2020-05-15  8:33           ` Jiri Olsa
2020-05-18  3:28             ` Jin, Yao
2020-05-20  5:36               ` Jin, Yao
2020-05-20  7:50                 ` Jiri Olsa
2020-05-21  4:38                   ` Jin, Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68e53765-6f45-9483-7543-0a2f961cdc62@linux.intel.com \
    --to=yao.jin@linux.intel.com \
    --cc=Linux-kernel@vger.kernel.org \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jolsa@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=kan.liang@intel.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=yao.jin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.