linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-trace-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	David Vernet <void@manifault.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached
Date: Tue, 9 May 2023 16:30:50 -0400	[thread overview]
Message-ID: <20230509163050.127d5123@rorschach.local.home> (raw)
In-Reply-To: <20230509130111.62d587f1@rorschach.local.home>

On Tue, 9 May 2023 13:01:11 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > I see no practical use case for bpf progs to be connected to user events.  
> 
> That's not a technical reason. Obviously they have a use case.

Alexei,

It was great having a chat with you during lunch at LSFMM/BPF!

Looking forward to your technical response that I believe are
legitimate requests. I'm replying here, as during our conversation, you
had the misperception that the user events had a system call when the
event was disabled. I told you I will point out the code that shows
that the kernel sets the bit, and that user space does not do a system
call when the event is disable.

From the user space side, which does:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/user_events/example.c#n60

	/* Check if anyone is listening */
	if (enabled) {
		/* Yep, trace out our data */
		writev(data_fd, (const struct iovec *)io, 2);

		/* Increase the count */
		count++;

		printf("Something was attached, wrote data\n");
	}


Where it told the kernel about that "enabled" variable:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/user_events/example.c#n47

	if (event_reg(data_fd, "test u32 count", &write, &enabled) == -1)
		return errno;

static int event_reg(int fd, const char *command, int *write, int *enabled)
{
	struct user_reg reg = {0};

	reg.size = sizeof(reg);
	reg.enable_bit = 31;
	reg.enable_size = sizeof(*enabled);
	reg.enable_addr = (__u64)enabled;
	reg.name_args = (__u64)command;

	if (ioctl(fd, DIAG_IOCSREG, &reg) == -1)
		return -1;

	*write = reg.write_index;

	return 0;
}

The above will add a trace event into tracefs. When someone does:

 # echo 1 > /sys/kernel/tracing/user_events/test/enable

The kernel will trigger the class->reg function, defined by:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/trace/trace_events_user.c#n1804

	user->class.reg = user_event_reg;

Which calls: 

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/trace/trace_events_user.c#n1555

	update_enable_bit_for(user);

Which does:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/trace/trace_events_user.c#n1465

update_enable_bit_for() { 
	[..]
	user_event_enabler_update(user);


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/trace/trace_events_user.c#n451

user_event_enabler_update() {
	[..]
	user_event_enabler_write(mm, enabler, true, &attempt);

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/trace/trace_events_user.c#n385

static int user_event_enabler_write(struct user_event_mm *mm,
				    struct user_event_enabler *enabler,
				    bool fixup_fault, int *attempt)
{
	unsigned long uaddr = enabler->addr;
	unsigned long *ptr;
	struct page *page;
	void *kaddr;
	int ret;

	lockdep_assert_held(&event_mutex);
	mmap_assert_locked(mm->mm);

	*attempt += 1;

	/* Ensure MM has tasks, cannot use after exit_mm() */
	if (refcount_read(&mm->tasks) == 0)
		return -ENOENT;

	if (unlikely(test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)) ||
		     test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler))))
		return -EBUSY;

	ret = pin_user_pages_remote(mm->mm, uaddr, 1, FOLL_WRITE | FOLL_NOFAULT,
				    &page, NULL, NULL);

	if (unlikely(ret <= 0)) {
		if (!fixup_fault)
			return -EFAULT;

		if (!user_event_enabler_queue_fault(mm, enabler, *attempt))
			pr_warn("user_events: Unable to queue fault handler\n");

		return -EFAULT;
	}

	kaddr = kmap_local_page(page);
	ptr = kaddr + (uaddr & ~PAGE_MASK);

	/* Update bit atomically, user tracers must be atomic as well */
	if (enabler->event && enabler->event->status)
		set_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr);
	else
		clear_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr);

	kunmap_local(kaddr);
	unpin_user_pages_dirty_lock(&page, 1, true);

	return 0;
}

The above maps the user space address and then sets the bit that was
registered.

That is, it changes "enabled" to true, and the if statement:

	if (enabled) {
		/* Yep, trace out our data */
		writev(data_fd, (const struct iovec *)io, 2);

		/* Increase the count */
		count++;

		printf("Something was attached, wrote data\n");
	}

Is now executed.

-- Steve


  reply	other threads:[~2023-05-09 20:30 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 16:37 [PATCH] tracing/user_events: Run BPF program if attached Beau Belgrave
2023-05-09 15:24 ` Alexei Starovoitov
2023-05-09 17:01   ` Steven Rostedt
2023-05-09 20:30     ` Steven Rostedt [this message]
2023-05-09 20:42       ` Steven Rostedt
2023-05-15 16:57       ` Alexei Starovoitov
2023-05-15 18:33         ` Steven Rostedt
2023-05-15 19:35           ` Beau Belgrave
2023-05-15 21:38             ` Steven Rostedt
2023-05-15 19:24         ` Beau Belgrave
2023-05-15 21:57           ` Steven Rostedt
2023-05-17  0:36           ` Alexei Starovoitov
2023-05-17  0:56             ` Linus Torvalds
2023-05-17  1:46               ` Linus Torvalds
2023-05-17  2:29                 ` Steven Rostedt
2023-05-17  3:03                   ` Linus Torvalds
2023-05-17 17:22                     ` Beau Belgrave
2023-05-17 18:15                       ` Linus Torvalds
2023-05-17 19:07                         ` Beau Belgrave
2023-05-17 19:26                           ` Linus Torvalds
2023-05-17 19:36                             ` Beau Belgrave
2023-05-17 19:36                             ` Linus Torvalds
2023-05-17 19:37                               ` Linus Torvalds
2023-05-17 23:00                                 ` Beau Belgrave
2023-05-17 23:14                                   ` Linus Torvalds
2023-05-17 23:25                                     ` Steven Rostedt
2023-05-18  0:14                                       ` Beau Belgrave
2023-05-18  0:23                                         ` Linus Torvalds
2023-05-17 20:08                               ` Linus Torvalds
2023-05-17  1:26             ` Steven Rostedt
2023-05-17 16:50               ` Beau Belgrave
2023-05-18  0:10                 ` Alexei Starovoitov
2023-05-18  0:19                   ` Beau Belgrave
2023-05-18  0:56                     ` Alexei Starovoitov
2023-05-18  1:18                       ` Beau Belgrave
2023-05-18  2:08                         ` Steven Rostedt
2023-05-18  3:14                           ` Alexei Starovoitov
2023-05-18 13:36                             ` Steven Rostedt
2023-05-18 17:28                               ` Beau Belgrave
2023-06-01  9:46                   ` Christian Brauner
2023-06-01 15:24                     ` Beau Belgrave
2023-06-01 15:57                       ` Christian Brauner
2023-06-01 16:29                         ` Beau Belgrave
2023-06-06 13:37                           ` Masami Hiramatsu
2023-06-06 17:05                             ` Beau Belgrave
2023-06-07 14:07                               ` Masami Hiramatsu
2023-06-07 19:26                                 ` Beau Belgrave
2023-06-08  0:25                                   ` Masami Hiramatsu
2023-05-17 17:51             ` Beau Belgrave
2023-06-06 13:57             ` Masami Hiramatsu
2023-06-06 16:57               ` Andrii Nakryiko
2023-06-06 20:57                 ` Beau Belgrave

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230509163050.127d5123@rorschach.local.home \
    --to=rostedt@goodmis.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=beaub@linux.microsoft.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).