Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Ingo Molnar <mingo@kernel.org>, Robert Richter <rric@kernel.org>,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	linux-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	netdev <netdev@vger.kernel.org>,
	bpf@vger.kernel.org
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT
Date: Sun, 27 Jun 2021 10:10:24 +0900	[thread overview]
Message-ID: <fc5d0f90-502d-b217-0ad6-0d17cae12ff7@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20210626142213.6dee5c60@rorschach.local.home>

On 2021/06/27 3:22, Steven Rostedt wrote:
>> If BPF is expected to register the same tracepoint with the same
>> callback and data more than once, then let's add a call to do that
>> without warning. Like I said, other callers expect the call to succeed
>> unless it's out of memory, which tends to cause other problems.
> 
> If BPF is OK with registering the same probe more than once if user
> space expects it, we can add this patch, which allows the caller (in
> this case BPF) to not warn if the probe being registered is already
> registered, and keeps the idea that a probe registered twice is a bug
> for all other use cases.

I think BPF will not register the same tracepoint with the same callback and
data more than once, for bpf(BPF_RAW_TRACEPOINT_OPEN) cleans the request up
by calling bpf_link_cleanup() and returns -EEXIST. But I think BPF relies on
tracepoint_add_func() returning -EEXIST without crashing the kernel.

CPU: 0 PID: 16193 Comm: syz-executor.5 Not tainted 5.13.0-rc7-syzkaller #0
RIP: 0010:tracepoint_add_func+0x1fb/0xa90 kernel/tracepoint.c:291
Call Trace:
 tracepoint_probe_register_prio kernel/tracepoint.c:369 [inline]
 tracepoint_probe_register+0x9c/0xe0 kernel/tracepoint.c:389
 __bpf_probe_register kernel/trace/bpf_trace.c:1843 [inline]
 bpf_probe_register+0x15a/0x1c0 kernel/trace/bpf_trace.c:1848
 bpf_raw_tracepoint_open+0x34a/0x720 kernel/bpf/syscall.c:2895
 __do_sys_bpf+0x2586/0x4f40 kernel/bpf/syscall.c:4453
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae

SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) {
  switch (cmd) {
  case BPF_RAW_TRACEPOINT_OPEN:
    err = bpf_raw_tracepoint_open(&attr) {
      err = bpf_link_prime(&link->link, &link_primer);
      if (err) {
        kfree(link);
        goto out_put_btp;
      }
      err = bpf_probe_register(link->btp, prog) {
        return __bpf_probe_register(btp, prog) {
          return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog) {
            return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO) {
              mutex_lock(&tracepoints_mutex); // Serialization start.
              ret = tracepoint_add_func(tp, &tp_func, prio) {
                old = func_add(&tp_funcs, func, prio); // Returns -EEXIST.
                if (IS_ERR(old)) {
                  WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM); // Crashes due to warn_on_paic==1.
                  return PTR_ERR(old); // Returns -EEXIST.
                }
              }
              mutex_unlock(&tracepoints_mutex); // Serialization end.
              return ret; // Returns -EEXIST.
            }
          }
        }
      }
      if (err) {
        bpf_link_cleanup(&link_primer); // Reject if func_add() returned -EEXIST.
        goto out_put_btp;
      }
      return bpf_link_settle(&link_primer);
    }
    break;
  }
  return ret; // Returns -EEXIST to the userspace.
}

On 2021/06/27 0:41, Steven Rostedt wrote:
>>   (1) func_add() can reject an attempt to add same tracepoint multiple times
>>       by returning -EEXIST to the caller.
>>   (2) But tracepoint_add_func() (the caller of func_add()) is calling WARN_ON_ONCE()
>>       if func_add() returned -EEXIST.
> 
> That's because (before BPF) there's no place in the kernel that tries
> to register the same tracepoint multiple times, and was considered a
> bug if it happened, because there's no ref counters to deal with adding
> them multiple times.

I see. But does that make sense? Since func_add() can fail with -ENOMEM,
all places (even before BPF) needs to be prepared for failures.

> 
> If the tracepoint is already registered (with the given function and
> data), then something likely went wrong.

That can be prepared on the caller side of tracepoint_add_func() rather than
tracepoint_add_func() side.

> 
>>   (3) And tracepoint_add_func() is triggerable via request from userspace.
> 
> Only via BPF correct?
> 
> I'm not sure how it works, but can't BPF catch that it is registering
> the same tracepoint again?

There is no chance to check whether some tracepoint is already registered, for
tracepoints_mutex is the only lock which gives us a chance to check whether
some tracepoint is already registered.

Should bpf() syscall hold a global lock (like tracepoints_mutex) which will serialize
the entire code in order to check whether some tracepoint is already registered?
That might severely damage concurrency.