linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using perf_event_open() to sample multiple events of a process
@ 2021-11-05  5:57 Nadav Amit
  2021-11-06  0:45 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Nadav Amit @ 2021-11-05  5:57 UTC (permalink / raw)
  To: kan.liang, Peter Zijlstra; +Cc: LKML, linux-perf-users

Hello Ken, Peter,

I would appreciate some help regarding the use of perf_event_open()
to have multiple samples getting into the same mmap’d memory when they
are both attached to the same process.

I am doing so (using both PERF_FLAG_FD_NO_GROUP and PERF_FLAG_FD_OUTPUT),
but it results in -EINVAL. Debugging the code shows that
perf_event_set_output() fails due to the following check:

        /*
         * If its not a per-cpu rb, it must be the same task.
         */
        if (output_event->cpu == -1 && output_event->ctx != event->ctx)
                goto out;

However, it appears that at this point, event->ctx is not initialized
(it is null) so the test fails and the whole perf_event_open() syscall
fails.

Am I missing something? If not, I am unsure, unfortunately, what the
proper way to fix it is…

I include a small test that fails on my system. The second
perf_event_open fails due to the check in perf_event_set_output():




#define _GNU_SOURCE 1

#include <asm/unistd.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

long perf_event_open(struct perf_event_attr* event_attr, pid_t pid, int cpu,
		     int group_fd, unsigned long flags)
{
    return syscall(__NR_perf_event_open, event_attr, pid, cpu, group_fd, flags);
}

int main(void)
{
	pid_t pid = getpid();
	int group_fd, fd;
	void *p;
	struct perf_event_attr pe = {
		.type = 4,
		.size = sizeof(struct perf_event_attr),
		.config = 0x11d0,
		.sample_type = 0x8,
		.sample_period = 1000,
		.precise_ip = 2,
	};

	group_fd = perf_event_open(&pe, pid, -1, -1, PERF_FLAG_FD_CLOEXEC | 
						     PERF_FLAG_FD_NO_GROUP |
						     PERF_FLAG_FD_OUTPUT);

	if (group_fd < 0) {
		perror("first perf_event_open");
		exit(-1);
	}

	p = mmap(NULL, 3 * 4096, PROT_READ|PROT_WRITE, MAP_SHARED, group_fd, 0);

	if (p == MAP_FAILED) {
		perror("MAP_FAILED");
		exit(-1);
	}
	
	pe.config = 0x12d0;
	
	fd = perf_event_open(&pe, pid, -1, group_fd, PERF_FLAG_FD_CLOEXEC | 
						     PERF_FLAG_FD_NO_GROUP |
						     PERF_FLAG_FD_OUTPUT);

	if (fd < 0) {
		perror("second perf_event_open");
		exit(-1);
	}

	printf("success\n");
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Using perf_event_open() to sample multiple events of a process
  2021-11-05  5:57 Using perf_event_open() to sample multiple events of a process Nadav Amit
@ 2021-11-06  0:45 ` Peter Zijlstra
  2021-11-06  0:57   ` Peter Zijlstra
  2021-11-06  0:59   ` Nadav Amit
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2021-11-06  0:45 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kan.liang, LKML, linux-perf-users

On Thu, Nov 04, 2021 at 10:57:50PM -0700, Nadav Amit wrote:
> Hello Ken, Peter,
> 
> I would appreciate some help regarding the use of perf_event_open()
> to have multiple samples getting into the same mmap’d memory when they
> are both attached to the same process.
> 
> I am doing so (using both PERF_FLAG_FD_NO_GROUP and PERF_FLAG_FD_OUTPUT),
> but it results in -EINVAL. Debugging the code shows that
> perf_event_set_output() fails due to the following check:
> 
>         /*
>          * If its not a per-cpu rb, it must be the same task.
>          */
>         if (output_event->cpu == -1 && output_event->ctx != event->ctx)
>                 goto out;
> 
> However, it appears that at this point, event->ctx is not initialized
> (it is null) so the test fails and the whole perf_event_open() syscall
> fails.
> 
> Am I missing something? If not, I am unsure, unfortunately, what the
> proper way to fix it is…
> 
> I include a small test that fails on my system. The second
> perf_event_open fails due to the check in perf_event_set_output():
> 

Works when you use the SET_OUTPUT ioctl()...

I think something went sideways in the syscall path and things went out
of order :/ I'll try and have a look.


---
#define _GNU_SOURCE 1

#include <asm/unistd.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

long perf_event_open(struct perf_event_attr* event_attr, pid_t pid, int cpu,
		int group_fd, unsigned long flags)
{
	return syscall(__NR_perf_event_open, event_attr, pid, cpu, group_fd, flags);
}

int main(void)
{
	pid_t pid = getpid();
	int group_fd, fd, err;
	void *p;
	struct perf_event_attr pe = {
		.type = 4,
		.size = sizeof(struct perf_event_attr),
		.config = 0x11d0,
		.sample_type = 0x8,
		.sample_period = 1000,
		.precise_ip = 2,
	};

	group_fd = perf_event_open(&pe, pid, -1, -1, PERF_FLAG_FD_CLOEXEC |
			PERF_FLAG_FD_NO_GROUP |
			PERF_FLAG_FD_OUTPUT);

	if (group_fd < 0) {
		perror("first perf_event_open");
		exit(-1);
	}

	p = mmap(NULL, 3 * 4096, PROT_READ|PROT_WRITE, MAP_SHARED, group_fd, 0);

	if (p == MAP_FAILED) {
		perror("MAP_FAILED");
		exit(-1);
	}

	pe.config = 0x12d0;

	fd = perf_event_open(&pe, pid, -1, -1, PERF_FLAG_FD_CLOEXEC |
			PERF_FLAG_FD_NO_GROUP |
			PERF_FLAG_FD_OUTPUT);

	if (fd < 0) {
		perror("second perf_event_open");
		exit(-1);
	}

	err = ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, group_fd);
	if (err < 0) {
		perror("ioctl");
		exit(-1);
	}

	printf("success\n");
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Using perf_event_open() to sample multiple events of a process
  2021-11-06  0:45 ` Peter Zijlstra
@ 2021-11-06  0:57   ` Peter Zijlstra
  2021-11-08 15:24     ` Peter Zijlstra
  2021-11-06  0:59   ` Nadav Amit
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2021-11-06  0:57 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kan.liang, LKML, linux-perf-users

On Sat, Nov 06, 2021 at 01:45:57AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 04, 2021 at 10:57:50PM -0700, Nadav Amit wrote:
> > Hello Ken, Peter,
> > 
> > I would appreciate some help regarding the use of perf_event_open()
> > to have multiple samples getting into the same mmap’d memory when they
> > are both attached to the same process.
> > 
> > I am doing so (using both PERF_FLAG_FD_NO_GROUP and PERF_FLAG_FD_OUTPUT),
> > but it results in -EINVAL. Debugging the code shows that
> > perf_event_set_output() fails due to the following check:
> > 
> >         /*
> >          * If its not a per-cpu rb, it must be the same task.
> >          */
> >         if (output_event->cpu == -1 && output_event->ctx != event->ctx)
> >                 goto out;
> > 
> > However, it appears that at this point, event->ctx is not initialized
> > (it is null) so the test fails and the whole perf_event_open() syscall
> > fails.
> > 
> > Am I missing something? If not, I am unsure, unfortunately, what the
> > proper way to fix it is…
> > 
> > I include a small test that fails on my system. The second
> > perf_event_open fails due to the check in perf_event_set_output():
> > 
> 
> Works when you use the SET_OUTPUT ioctl()...
> 
> I think something went sideways in the syscall path and things went out
> of order :/ I'll try and have a look.

The problem seems to be that we call perf_event_set_output() before we
set event->ctx, which is a bit of a problem.

Now, afaict it's been broken since c3f00c70276d ("perf: Separate
find_get_context() from event initialization"), which is ages ago :/

It's waaay too late to try and fix it; I'll be likely to make an even
bigger mess if I tried. Perhaps tomorrow.

Clearly FD_OUTPUT isn't much used :-(

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Using perf_event_open() to sample multiple events of a process
  2021-11-06  0:45 ` Peter Zijlstra
  2021-11-06  0:57   ` Peter Zijlstra
@ 2021-11-06  0:59   ` Nadav Amit
  1 sibling, 0 replies; 6+ messages in thread
From: Nadav Amit @ 2021-11-06  0:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: kan.liang, LKML, linux-perf-users



> On Nov 5, 2021, at 5:45 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Nov 04, 2021 at 10:57:50PM -0700, Nadav Amit wrote:
>> Hello Ken, Peter,
>> 
>> I would appreciate some help regarding the use of perf_event_open()
>> to have multiple samples getting into the same mmap’d memory when they
>> are both attached to the same process.
>> 
>> I am doing so (using both PERF_FLAG_FD_NO_GROUP and PERF_FLAG_FD_OUTPUT),
>> but it results in -EINVAL. Debugging the code shows that
>> perf_event_set_output() fails due to the following check:
>> 
>>        /*
>>         * If its not a per-cpu rb, it must be the same task.
>>         */
>>        if (output_event->cpu == -1 && output_event->ctx != event->ctx)
>>                goto out;
>> 
>> However, it appears that at this point, event->ctx is not initialized
>> (it is null) so the test fails and the whole perf_event_open() syscall
>> fails.
>> 
>> Am I missing something? If not, I am unsure, unfortunately, what the
>> proper way to fix it is…
>> 
>> I include a small test that fails on my system. The second
>> perf_event_open fails due to the check in perf_event_set_output():
>> 
> 
> Works when you use the SET_OUTPUT ioctl()...
> 
> I think something went sideways in the syscall path and things went out
> of order :/ I'll try and have a look.

Highly appreciated.

While at it, I would note that there is a mistake in Intel SDM 31.4.2.27,
Table 31-47 “Block Item Packet Definition”. The ID[5:0] takes 6 bits,
when in fact I presume it is only 5 bits (according to the table).

Perhaps you can forward it to the relevant people.

Thanks again,
Nadav

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Using perf_event_open() to sample multiple events of a process
  2021-11-06  0:57   ` Peter Zijlstra
@ 2021-11-08 15:24     ` Peter Zijlstra
  2021-11-08 19:51       ` Nadav Amit
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2021-11-08 15:24 UTC (permalink / raw)
  To: Nadav Amit; +Cc: kan.liang, LKML, linux-perf-users

On Sat, Nov 06, 2021 at 01:57:23AM +0100, Peter Zijlstra wrote:

> The problem seems to be that we call perf_event_set_output() before we
> set event->ctx, which is a bit of a problem.
> 
> Now, afaict it's been broken since c3f00c70276d ("perf: Separate
> find_get_context() from event initialization"), which is ages ago :/
> 
> It's waaay too late to try and fix it; I'll be likely to make an even
> bigger mess if I tried. Perhaps tomorrow.
> 
> Clearly FD_OUTPUT isn't much used :-(

The below seems to fix, it's a bit of a hack, but I couldn't really come
up with anything saner.

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f2253ea729a2..dbe766663733 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5595,6 +5595,7 @@ static inline int perf_fget_light(int fd, struct fd *p)
 }
 
 static int perf_event_set_output(struct perf_event *event,
+				 struct perf_event_context *ctx,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
@@ -5647,10 +5648,10 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 			if (ret)
 				return ret;
 			output_event = output.file->private_data;
-			ret = perf_event_set_output(event, output_event);
+			ret = perf_event_set_output(event, event->ctx, output_event);
 			fdput(output);
 		} else {
-			ret = perf_event_set_output(event, NULL);
+			ret = perf_event_set_output(event, event->ctx, NULL);
 		}
 		return ret;
 	}
@@ -11830,7 +11831,9 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 }
 
 static int
-perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
+perf_event_set_output(struct perf_event *event,
+		      struct perf_event_context *event_ctx,
+		      struct perf_event *output_event)
 {
 	struct perf_buffer *rb = NULL;
 	int ret = -EINVAL;
@@ -11851,7 +11854,7 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
 	/*
 	 * If its not a per-cpu rb, it must be the same task.
 	 */
-	if (output_event->cpu == -1 && output_event->ctx != event->ctx)
+	if (output_event->cpu == -1 && output_event->ctx != event_ctx)
 		goto out;
 
 	/*
@@ -12232,7 +12235,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	if (output_event) {
-		err = perf_event_set_output(event, output_event);
+		err = perf_event_set_output(event, ctx, output_event);
 		if (err)
 			goto err_context;
 	}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Using perf_event_open() to sample multiple events of a process
  2021-11-08 15:24     ` Peter Zijlstra
@ 2021-11-08 19:51       ` Nadav Amit
  0 siblings, 0 replies; 6+ messages in thread
From: Nadav Amit @ 2021-11-08 19:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: kan.liang, LKML, linux-perf-users


> On Nov 8, 2021, at 7:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Sat, Nov 06, 2021 at 01:57:23AM +0100, Peter Zijlstra wrote:
> 
>> The problem seems to be that we call perf_event_set_output() before we
>> set event->ctx, which is a bit of a problem.
>> 
>> Now, afaict it's been broken since c3f00c70276d ("perf: Separate
>> find_get_context() from event initialization"), which is ages ago :/
>> 
>> It's waaay too late to try and fix it; I'll be likely to make an even
>> bigger mess if I tried. Perhaps tomorrow.
>> 
>> Clearly FD_OUTPUT isn't much used :-(
> 
> The below seems to fix, it's a bit of a hack, but I couldn't really come
> up with anything saner.

I originally considered doing a similar hack. I assume it should work,
but I moved to using the ioctl workaround that you suggested.

Clearly nobody is using this feature if it is broken for 11 years.
There is always the option to deprecate it if there is an alternative.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-08 19:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-05  5:57 Using perf_event_open() to sample multiple events of a process Nadav Amit
2021-11-06  0:45 ` Peter Zijlstra
2021-11-06  0:57   ` Peter Zijlstra
2021-11-08 15:24     ` Peter Zijlstra
2021-11-08 19:51       ` Nadav Amit
2021-11-06  0:59   ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).