linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* violating function pointer signature
       [not found]       ` <20201117153451.3015c5c9@gandalf.local.home>
@ 2020-11-18 13:21         ` Peter Zijlstra
  2020-11-18 13:59           ` Florian Weimer
                             ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Peter Zijlstra @ 2020-11-18 13:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote:

> > > Since all tracepoints callbacks have at least one parameter (__data), we
> > > could declare tp_stub_func as:
> > > 
> > > static void tp_stub_func(void *data, ...)
> > > {
> > >	return;
> > > }
> > > 
> > > And now C knows that tp_stub_func() can be called with one or more
> > > parameters, and had better be able to deal with it!  
> > 
> > AFAIU this won't work.
> > 
> > C99 6.5.2.2 Function calls
> > 
> > "If the function is defined with a type that is not compatible with the type (of the
> > expression) pointed to by the expression that denotes the called function, the behavior is
> > undefined."
> 
> But is it really a problem in practice. I'm sure we could create an objtool
> function to check to make sure we don't break anything at build time.

I think that as long as the function is completely empty (it never
touches any of the arguments) this should work in practise.

That is:

  void tp_nop_func(void) { }

can be used as an argument to any function pointer that has a void
return. In fact, I already do that, grep for __static_call_nop().

I'm not sure what the LLVM-CFI crud makes of it, but that's their
problem.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 13:21         ` violating function pointer signature Peter Zijlstra
@ 2020-11-18 13:59           ` Florian Weimer
  2020-11-18 14:12             ` Peter Zijlstra
  2020-11-18 14:22             ` violating function pointer signature Steven Rostedt
  2020-11-18 14:02           ` Steven Rostedt
  2020-11-18 16:50           ` Nick Desaulniers
  2 siblings, 2 replies; 33+ messages in thread
From: Florian Weimer @ 2020-11-18 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

* Peter Zijlstra:

> I think that as long as the function is completely empty (it never
> touches any of the arguments) this should work in practise.
>
> That is:
>
>   void tp_nop_func(void) { }
>
> can be used as an argument to any function pointer that has a void
> return. In fact, I already do that, grep for __static_call_nop().

You can pass it as a function parameter, but in general, you cannot
call the function with a different prototype.  Even trivial
differences such as variadic vs non-variadic prototypes matter.

The default Linux calling conventions are all of the cdecl family,
where the caller pops the argument off the stack.  You didn't quote
enough to context to tell whether other calling conventions matter in
your case.

> I'm not sure what the LLVM-CFI crud makes of it, but that's their
> problem.

LTO can cause problems as well, particularly with whole-program
optimization.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 13:21         ` violating function pointer signature Peter Zijlstra
  2020-11-18 13:59           ` Florian Weimer
@ 2020-11-18 14:02           ` Steven Rostedt
  2020-11-18 16:01             ` Mathieu Desnoyers
  2020-11-18 16:50           ` Nick Desaulniers
  2 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, 18 Nov 2020 14:21:36 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> I think that as long as the function is completely empty (it never
> touches any of the arguments) this should work in practise.
> 
> That is:
> 
>   void tp_nop_func(void) { }

My original version (the OP of this thread) had this:

+static void tp_stub_func(void)
+{
+	return;
+}

> 
> can be used as an argument to any function pointer that has a void
> return. In fact, I already do that, grep for __static_call_nop().
> 
> I'm not sure what the LLVM-CFI crud makes of it, but that's their
> problem.

If it is already done elsewhere in the kernel, then I will call this
precedence, and keep the original version.

This way Alexei can't complain about adding a check in the fast path of
more than one callback attached.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 13:59           ` Florian Weimer
@ 2020-11-18 14:12             ` Peter Zijlstra
  2020-11-18 14:18               ` Florian Weimer
  2020-11-18 14:22             ` violating function pointer signature Steven Rostedt
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2020-11-18 14:12 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 02:59:29PM +0100, Florian Weimer wrote:
> * Peter Zijlstra:
> 
> > I think that as long as the function is completely empty (it never
> > touches any of the arguments) this should work in practise.
> >
> > That is:
> >
> >   void tp_nop_func(void) { }
> >
> > can be used as an argument to any function pointer that has a void
> > return. In fact, I already do that, grep for __static_call_nop().
> 
> You can pass it as a function parameter, but in general, you cannot
> call the function with a different prototype.  Even trivial
> differences such as variadic vs non-variadic prototypes matter.

I don't think any tracepoint uses variadic argument.

> The default Linux calling conventions are all of the cdecl family,
> where the caller pops the argument off the stack.  You didn't quote
> enough to context to tell whether other calling conventions matter in
> your case.

This is strictly in-kernel, and I think we're all cdecl, of which the
important part is caller-cleanup. The function compiles to:

	RET

so whatever the arguments are is irrelevant.

> > I'm not sure what the LLVM-CFI crud makes of it, but that's their
> > problem.
> 
> LTO can cause problems as well, particularly with whole-program
> optimization.

I don't think LTO can de-virtualize a dynamic array of function
pointers, so there's very little risk. That said, the __static_call_nop
case, where everything is inlined, is compiled sub-optimally for both
LLVM and GCC.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 14:12             ` Peter Zijlstra
@ 2020-11-18 14:18               ` Florian Weimer
  2020-11-18 14:34                 ` [PATCH v3] tracepoint: Do not fail unregistering a probe due to memory allocation Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2020-11-18 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

* Peter Zijlstra:

>> The default Linux calling conventions are all of the cdecl family,
>> where the caller pops the argument off the stack.  You didn't quote
>> enough to context to tell whether other calling conventions matter in
>> your case.
>
> This is strictly in-kernel, and I think we're all cdecl, of which the
> important part is caller-cleanup. The function compiles to:
>
> 	RET
>
> so whatever the arguments are is irrelevant.

Yes, then the stub is ABI-compatible, as far as I know.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 13:59           ` Florian Weimer
  2020-11-18 14:12             ` Peter Zijlstra
@ 2020-11-18 14:22             ` Steven Rostedt
  2020-11-18 19:46               ` Alexei Starovoitov
  1 sibling, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 14:22 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Peter Zijlstra, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, 18 Nov 2020 14:59:29 +0100
Florian Weimer <fw@deneb.enyo.de> wrote:

> * Peter Zijlstra:
> 
> > I think that as long as the function is completely empty (it never
> > touches any of the arguments) this should work in practise.
> >
> > That is:
> >
> >   void tp_nop_func(void) { }
> >
> > can be used as an argument to any function pointer that has a void
> > return. In fact, I already do that, grep for __static_call_nop().  
> 
> You can pass it as a function parameter, but in general, you cannot
> call the function with a different prototype.  Even trivial
> differences such as variadic vs non-variadic prototypes matter.

In this case, I don't believe we need to worry about that, for either
tracepoints or static calls. As both don't have any variadic functions.

The function prototypes are defined by macros. For tracepoints, it's
TP_PROTO() and they require matching arguments. And to top it off, the
functions defined, are added to an array of indirect functions and called
separately. It would take a bit of work to even allow tracepoint callbacks
to be variadic functions. The same is true for static calls I believe.

Thus, all functions will be non-variadic in these cases.

> 
> The default Linux calling conventions are all of the cdecl family,
> where the caller pops the argument off the stack.  You didn't quote
> enough to context to tell whether other calling conventions matter in
> your case.
> 
> > I'm not sure what the LLVM-CFI crud makes of it, but that's their
> > problem.  
> 
> LTO can cause problems as well, particularly with whole-program
> optimization.

Again, for tracepoints and static calls that will likely not be an issue.
Because tracepoint callbacks are function parameters. So are static calls.
What happens is, when you update these locations, you pass in a function
you want as a callback, and it's added to an array (and this code is used
for all tracepoints with all different kinds of prototypes, as the function
is simply a void pointer). Then at the call sites, the function pointers are
typecast to the type of the callback function needed, and called.

It basically can not be optimized even when looking at the entire kernel.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3] tracepoint: Do not fail unregistering a probe due to memory allocation
  2020-11-18 14:18               ` Florian Weimer
@ 2020-11-18 14:34                 ` Steven Rostedt
  2020-11-24  5:59                   ` Matt Mullins
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 14:34 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Peter Zijlstra, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The list of tracepoint callbacks is managed by an array that is protected
by RCU. To update this array, a new array is allocated, the updates are
copied over to the new array, and then the list of functions for the
tracepoint is switched over to the new array. After a completion of an RCU
grace period, the old array is freed.

This process happens for both adding a callback as well as removing one.
But on removing a callback, if the new array fails to be allocated, the
callback is not removed, and may be used after it is freed by the clients
of the tracepoint.

There's really no reason to fail if the allocation for a new array fails
when removing a function. Instead, the function can simply be replaced by a
stub function that could be cleaned up on the next modification of the
array. That is, instead of calling the function registered to the
tracepoint, it would call a stub function in its place.

Link: https://lore.kernel.org/r/20201115055256.65625-1-mmullins@mmlx.us
Link: https://lore.kernel.org/r/20201116175107.02db396d@gandalf.local.home
Link: https://lore.kernel.org/r/20201117211836.54acaef2@oasis.local.home

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: netdev <netdev@vger.kernel.org>
Cc: bpf <bpf@vger.kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Florian Weimer <fw@deneb.enyo.de>
Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
Reported-by: syzbot+83aa762ef23b6f0d1991@syzkaller.appspotmail.com
Reported-by: syzbot+d29e58bb557324e55e5e@syzkaller.appspotmail.com
Reported-by: Matt Mullins <mmullins@mmlx.us>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
Changes since v2:
   - Went back to using a stub function and not touching
      the fast path.
   - Removed adding __GFP_NOFAIL from the allocation of the removal.

 kernel/tracepoint.c | 80 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 64 insertions(+), 16 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 3f659f855074..3e261482296c 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -53,6 +53,12 @@ struct tp_probes {
 	struct tracepoint_func probes[];
 };
 
+/* Called in removal of a func but failed to allocate a new tp_funcs */
+static void tp_stub_func(void)
+{
+	return;
+}
+
 static inline void *allocate_probes(int count)
 {
 	struct tp_probes *p  = kmalloc(struct_size(p, probes, count),
@@ -131,6 +137,7 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
 {
 	struct tracepoint_func *old, *new;
 	int nr_probes = 0;
+	int stub_funcs = 0;
 	int pos = -1;
 
 	if (WARN_ON(!tp_func->func))
@@ -147,14 +154,34 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
 			if (old[nr_probes].func == tp_func->func &&
 			    old[nr_probes].data == tp_func->data)
 				return ERR_PTR(-EEXIST);
+			if (old[nr_probes].func == tp_stub_func)
+				stub_funcs++;
 		}
 	}
-	/* + 2 : one for new probe, one for NULL func */
-	new = allocate_probes(nr_probes + 2);
+	/* + 2 : one for new probe, one for NULL func - stub functions */
+	new = allocate_probes(nr_probes + 2 - stub_funcs);
 	if (new == NULL)
 		return ERR_PTR(-ENOMEM);
 	if (old) {
-		if (pos < 0) {
+		if (stub_funcs) {
+			/* Need to copy one at a time to remove stubs */
+			int probes = 0;
+
+			pos = -1;
+			for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
+				if (old[nr_probes].func == tp_stub_func)
+					continue;
+				if (pos < 0 && old[nr_probes].prio < prio)
+					pos = probes++;
+				new[probes++] = old[nr_probes];
+			}
+			nr_probes = probes;
+			if (pos < 0)
+				pos = probes;
+			else
+				nr_probes--; /* Account for insertion */
+
+		} else if (pos < 0) {
 			pos = nr_probes;
 			memcpy(new, old, nr_probes * sizeof(struct tracepoint_func));
 		} else {
@@ -188,8 +215,9 @@ static void *func_remove(struct tracepoint_func **funcs,
 	/* (N -> M), (N > 1, M >= 0) probes */
 	if (tp_func->func) {
 		for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
-			if (old[nr_probes].func == tp_func->func &&
-			     old[nr_probes].data == tp_func->data)
+			if ((old[nr_probes].func == tp_func->func &&
+			     old[nr_probes].data == tp_func->data) ||
+			    old[nr_probes].func == tp_stub_func)
 				nr_del++;
 		}
 	}
@@ -208,14 +236,32 @@ static void *func_remove(struct tracepoint_func **funcs,
 		/* N -> M, (N > 1, M > 0) */
 		/* + 1 for NULL */
 		new = allocate_probes(nr_probes - nr_del + 1);
-		if (new == NULL)
-			return ERR_PTR(-ENOMEM);
-		for (i = 0; old[i].func; i++)
-			if (old[i].func != tp_func->func
-					|| old[i].data != tp_func->data)
-				new[j++] = old[i];
-		new[nr_probes - nr_del].func = NULL;
-		*funcs = new;
+		if (new) {
+			for (i = 0; old[i].func; i++)
+				if ((old[i].func != tp_func->func
+				     || old[i].data != tp_func->data)
+				    && old[i].func != tp_stub_func)
+					new[j++] = old[i];
+			new[nr_probes - nr_del].func = NULL;
+			*funcs = new;
+		} else {
+			/*
+			 * Failed to allocate, replace the old function
+			 * with calls to tp_stub_func.
+			 */
+			for (i = 0; old[i].func; i++)
+				if (old[i].func == tp_func->func &&
+				    old[i].data == tp_func->data) {
+					old[i].func = tp_stub_func;
+					/* Set the prio to the next event. */
+					if (old[i + 1].func)
+						old[i].prio =
+							old[i + 1].prio;
+					else
+						old[i].prio = -1;
+				}
+			*funcs = old;
+		}
 	}
 	debug_print_probes(*funcs);
 	return old;
@@ -295,10 +341,12 @@ static int tracepoint_remove_func(struct tracepoint *tp,
 	tp_funcs = rcu_dereference_protected(tp->funcs,
 			lockdep_is_held(&tracepoints_mutex));
 	old = func_remove(&tp_funcs, func);
-	if (IS_ERR(old)) {
-		WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
+	if (WARN_ON_ONCE(IS_ERR(old)))
 		return PTR_ERR(old);
-	}
+
+	if (tp_funcs == old)
+		/* Failed allocating new tp_funcs, replaced func with stub */
+		return 0;
 
 	if (!tp_funcs) {
 		/* Removed last function */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 14:02           ` Steven Rostedt
@ 2020-11-18 16:01             ` Mathieu Desnoyers
  2020-11-18 16:19               ` David Laight
  0 siblings, 1 reply; 33+ messages in thread
From: Mathieu Desnoyers @ 2020-11-18 16:01 UTC (permalink / raw)
  To: rostedt
  Cc: Peter Zijlstra, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

----- On Nov 18, 2020, at 9:02 AM, rostedt rostedt@goodmis.org wrote:

> On Wed, 18 Nov 2020 14:21:36 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> I think that as long as the function is completely empty (it never
>> touches any of the arguments) this should work in practise.
>> 
>> That is:
>> 
>>   void tp_nop_func(void) { }
> 
> My original version (the OP of this thread) had this:
> 
> +static void tp_stub_func(void)
> +{
> +	return;
> +}
> 
>> 
>> can be used as an argument to any function pointer that has a void
>> return. In fact, I already do that, grep for __static_call_nop().
>> 
>> I'm not sure what the LLVM-CFI crud makes of it, but that's their
>> problem.
> 
> If it is already done elsewhere in the kernel, then I will call this
> precedence, and keep the original version.

It works for me. Bonus points if you can document in a comment that this
trick depends on the cdecl calling convention.

Thanks,

Mathieu

> 
> This way Alexei can't complain about adding a check in the fast path of
> more than one callback attached.
> 
> -- Steve

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: violating function pointer signature
  2020-11-18 16:01             ` Mathieu Desnoyers
@ 2020-11-18 16:19               ` David Laight
  0 siblings, 0 replies; 33+ messages in thread
From: David Laight @ 2020-11-18 16:19 UTC (permalink / raw)
  To: 'Mathieu Desnoyers', rostedt
  Cc: Peter Zijlstra, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

From: Mathieu Desnoyers
> Sent: 18 November 2020 16:01
...
> > If it is already done elsewhere in the kernel, then I will call this
> > precedence, and keep the original version.
> 
> It works for me. Bonus points if you can document in a comment that this
> trick depends on the cdecl calling convention.

It has nothing to do with 'cdecl' - which IIRC is a microsoft term.

Historically C just pushed arguments on the stack (no prototypes)
The calling code knew nothing about the called code or whether a
function might expect to have a variable number of arguments.
To stop this going horribly wrong the stack is tidied up by the caller.

PASCAL (which doesn't really support linking!) didn't support
variable argument lists and would get the called code to remove
the arguments (which is why x86 has a 'ret n' instruction).
In principle this generates smaller/faster code and many of the
32bit windows functions use it - probably due to turbo-pascal).

Modern calling conventions tend to pass some arguments in registers.
All the ones that get used (by default) on linux will get the
caller to tidy the stack.
Although some may use a simpler calling convention for varargs functions.

So a common 'return constant' function can be called from any call site.
But it you actually call a real function (that looks at the arguments)
you better have a matching prototype.
(eg cast the function pointer back to the correct one before the call.)

There are calling conventions where pointer and integer parameters
and results are passed in different registers.
The usual definition of ioctl() is typically broken.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 13:21         ` violating function pointer signature Peter Zijlstra
  2020-11-18 13:59           ` Florian Weimer
  2020-11-18 14:02           ` Steven Rostedt
@ 2020-11-18 16:50           ` Nick Desaulniers
  2020-11-18 17:17             ` Steven Rostedt
  2 siblings, 1 reply; 33+ messages in thread
From: Nick Desaulniers @ 2020-11-18 16:50 UTC (permalink / raw)
  To: Peter Zijlstra, Sami Tolvanen
  Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 5:23 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote:
>
> > > > Since all tracepoints callbacks have at least one parameter (__data), we
> > > > could declare tp_stub_func as:
> > > >
> > > > static void tp_stub_func(void *data, ...)
> > > > {
> > > >   return;
> > > > }
> > > >
> > > > And now C knows that tp_stub_func() can be called with one or more
> > > > parameters, and had better be able to deal with it!
> > >
> > > AFAIU this won't work.
> > >
> > > C99 6.5.2.2 Function calls
> > >
> > > "If the function is defined with a type that is not compatible with the type (of the
> > > expression) pointed to by the expression that denotes the called function, the behavior is
> > > undefined."
> >
> > But is it really a problem in practice. I'm sure we could create an objtool
> > function to check to make sure we don't break anything at build time.
>
> I think that as long as the function is completely empty (it never
> touches any of the arguments) this should work in practise.
>
> That is:
>
>   void tp_nop_func(void) { }

or `void tp_nop_func()` if you plan to call it with different
parameter types that are all unused in the body.  If you do plan to
use them, maybe a pointer to a tagged union would be safer?

>
> can be used as an argument to any function pointer that has a void
> return. In fact, I already do that, grep for __static_call_nop().
>
> I'm not sure what the LLVM-CFI crud makes of it, but that's their
> problem.

If you have instructions on how to exercise the code in question, we
can help test it with CFI.  Better to find any potential issues before
they get committed.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 16:50           ` Nick Desaulniers
@ 2020-11-18 17:17             ` Steven Rostedt
  2020-11-18 18:12               ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 17:17 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, Sami Tolvanen, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf,
	Kees Cook, Josh Poimboeuf, linux-toolchains

On Wed, 18 Nov 2020 08:50:37 -0800
Nick Desaulniers <ndesaulniers@google.com> wrote:

> On Wed, Nov 18, 2020 at 5:23 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote:
> >  
> > > > > Since all tracepoints callbacks have at least one parameter (__data), we
> > > > > could declare tp_stub_func as:
> > > > >
> > > > > static void tp_stub_func(void *data, ...)
> > > > > {
> > > > >   return;
> > > > > }
> > > > >
> > > > > And now C knows that tp_stub_func() can be called with one or more
> > > > > parameters, and had better be able to deal with it!  
> > > >
> > > > AFAIU this won't work.
> > > >
> > > > C99 6.5.2.2 Function calls
> > > >
> > > > "If the function is defined with a type that is not compatible with the type (of the
> > > > expression) pointed to by the expression that denotes the called function, the behavior is
> > > > undefined."  
> > >
> > > But is it really a problem in practice. I'm sure we could create an objtool
> > > function to check to make sure we don't break anything at build time.  
> >
> > I think that as long as the function is completely empty (it never
> > touches any of the arguments) this should work in practise.
> >
> > That is:
> >
> >   void tp_nop_func(void) { }  
> 
> or `void tp_nop_func()` if you plan to call it with different
> parameter types that are all unused in the body.  If you do plan to
> use them, maybe a pointer to a tagged union would be safer?

This stub function will never use the parameters passed to it.

You can see the patch I have for the tracepoint issue here:

 https://lore.kernel.org/r/20201118093405.7a6d2290@gandalf.local.home

I could change the stub from (void) to () if that would be better.

> 
> >
> > can be used as an argument to any function pointer that has a void
> > return. In fact, I already do that, grep for __static_call_nop().
> >
> > I'm not sure what the LLVM-CFI crud makes of it, but that's their
> > problem.  
> 
> If you have instructions on how to exercise the code in question, we
> can help test it with CFI.  Better to find any potential issues before
> they get committed.

If you apply the patch to the Linux kernel, and then apply:

  https://lore.kernel.org/r/20201116181638.6b0de6f7@gandalf.local.home

Which will force the failed case (to use the stubs). And build and boot the
kernel with those patches applied, you can test it with:


 # mount -t tracefs nodev /sys/kernel/tracing
 # cd /sys/kernel/tracing
 # echo 1 > events/sched/sched_switch/enable
 # mkdir instances/foo
 # echo 1 > instances/foo/events/sched/sched_switch/enable
 # echo 0 > events/sched/sched_switch/enable

Which add two callbacks to the function array for the sched_switch
tracepoint. The remove the first one, which would add the stub instead.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 17:17             ` Steven Rostedt
@ 2020-11-18 18:12               ` Segher Boessenkool
  2020-11-18 18:31                 ` Florian Weimer
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2020-11-18 18:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nick Desaulniers, Peter Zijlstra, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> I could change the stub from (void) to () if that would be better.

Don't?  In a function definition they mean exactly the same thing (and
the kernel uses (void) everywhere else, which many people find clearer).

In a function declaration that is not part of a definition it means no
information about the arguments is specified, a quite different thing.

This is an obsolescent feature, too.  Many many years from now it could
perhaps mean the same as (void), just like in C++, but not yet.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 18:12               ` Segher Boessenkool
@ 2020-11-18 18:31                 ` Florian Weimer
  2020-11-18 18:55                   ` Segher Boessenkool
  2020-11-18 18:58                   ` Steven Rostedt
  0 siblings, 2 replies; 33+ messages in thread
From: Florian Weimer @ 2020-11-18 18:31 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Steven Rostedt, Nick Desaulniers, Peter Zijlstra, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

* Segher Boessenkool:

> On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
>> I could change the stub from (void) to () if that would be better.
>
> Don't?  In a function definition they mean exactly the same thing (and
> the kernel uses (void) everywhere else, which many people find clearer).

And I think () functions expected a caller-provided parameter save
area on powerpc64le, while (void) functions do not.  It does not
matter for an empty function, but GCC prefers to use the parameter
save area instead of setting up a stack frame if it is present.  So
you get stack corruption if you call a () function as a (void)
function.  (The other way round is fine.)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 18:31                 ` Florian Weimer
@ 2020-11-18 18:55                   ` Segher Boessenkool
  2020-11-18 18:58                   ` Steven Rostedt
  1 sibling, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2020-11-18 18:55 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Steven Rostedt, Nick Desaulniers, Peter Zijlstra, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 07:31:50PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> >> I could change the stub from (void) to () if that would be better.
> >
> > Don't?  In a function definition they mean exactly the same thing (and
> > the kernel uses (void) everywhere else, which many people find clearer).
> 
> And I think () functions expected a caller-provided parameter save
> area on powerpc64le, while (void) functions do not.

Like I said (but you cut off, didn't realise it matters I guess):

> > In a function declaration that is not part of a definition it means no
> > information about the arguments is specified, a quite different thing.

Since the caller does not know if the callee will need a save area, it
has to assume it does.  Similar is true for many ABIs.

> It does not
> matter for an empty function, but GCC prefers to use the parameter
> save area instead of setting up a stack frame if it is present.  So
> you get stack corruption if you call a () function as a (void)
> function.  (The other way round is fine.)

If you have no prototype for a function, you have to assume worst case,
yes.  Calling things "a () function" can mean two things (a declaration
that is or isn't a definition, two very different things), so it helps
to be explicit about it.

Just use (void) and do not worry :-)


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 18:31                 ` Florian Weimer
  2020-11-18 18:55                   ` Segher Boessenkool
@ 2020-11-18 18:58                   ` Steven Rostedt
  2020-11-18 18:59                     ` Steven Rostedt
  2020-11-18 19:11                     ` Segher Boessenkool
  1 sibling, 2 replies; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 18:58 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Segher Boessenkool, Nick Desaulniers, Peter Zijlstra,
	Sami Tolvanen, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, 18 Nov 2020 19:31:50 +0100
Florian Weimer <fw@deneb.enyo.de> wrote:

> * Segher Boessenkool:
> 
> > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:  
> >> I could change the stub from (void) to () if that would be better.  
> >
> > Don't?  In a function definition they mean exactly the same thing (and
> > the kernel uses (void) everywhere else, which many people find clearer).  
> 
> And I think () functions expected a caller-provided parameter save
> area on powerpc64le, while (void) functions do not.  It does not
> matter for an empty function, but GCC prefers to use the parameter
> save area instead of setting up a stack frame if it is present.  So
> you get stack corruption if you call a () function as a (void)
> function.  (The other way round is fine.)

I wonder if we should define on all architectures a void void_stub(void),
in assembly, that just does a return, an not worry about gcc messing up the
creation of the stub function.

On x86_64:

GLOBAL(void_stub)
	retq


And so on.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 18:58                   ` Steven Rostedt
@ 2020-11-18 18:59                     ` Steven Rostedt
  2020-11-18 19:11                     ` Segher Boessenkool
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 18:59 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Segher Boessenkool, Nick Desaulniers, Peter Zijlstra,
	Sami Tolvanen, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, 18 Nov 2020 13:58:23 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

>  an not worry about gcc 

or LLVM, or whatever is used to build the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 18:58                   ` Steven Rostedt
  2020-11-18 18:59                     ` Steven Rostedt
@ 2020-11-18 19:11                     ` Segher Boessenkool
  2020-11-18 19:33                       ` Steven Rostedt
  2020-11-19  8:36                       ` Peter Zijlstra
  1 sibling, 2 replies; 33+ messages in thread
From: Segher Boessenkool @ 2020-11-18 19:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Florian Weimer, Nick Desaulniers, Peter Zijlstra, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 01:58:23PM -0500, Steven Rostedt wrote:
> I wonder if we should define on all architectures a void void_stub(void),
> in assembly, that just does a return, an not worry about gcc messing up the
> creation of the stub function.
> 
> On x86_64:
> 
> GLOBAL(void_stub)
> 	retq
> 
> 
> And so on.

I don't see how you imagine a compiler to mess this up?

void void_stub(void) { }

will do fine?

        .globl  void_stub
        .type   void_stub, @function
void_stub:
.LFB0:
        .cfi_startproc
        ret
        .cfi_endproc
.LFE0:
        .size   void_stub, .-void_stub


Calling this via a different declared function type is undefined
behaviour, but that is independent of how the function is *defined*.
Your program can make ducks appear from your nose even if that function
is never called, if you do that.  Just don't do UB, not even once!


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 19:11                     ` Segher Boessenkool
@ 2020-11-18 19:33                       ` Steven Rostedt
  2020-11-18 19:48                         ` Segher Boessenkool
  2020-11-19  8:36                       ` Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 19:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Florian Weimer, Nick Desaulniers, Peter Zijlstra, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, 18 Nov 2020 13:11:27 -0600
Segher Boessenkool <segher@kernel.crashing.org> wrote:

> Calling this via a different declared function type is undefined
> behaviour, but that is independent of how the function is *defined*.
> Your program can make ducks appear from your nose even if that function
> is never called, if you do that.  Just don't do UB, not even once!

But that's the whole point of this conversation. We are going to call this
from functions that are going to have some random set of parameters.

But there is a limit to that. All the callers will expect a void return,
and none of the callers will have a variable number of parameters.

The code in question is tracepoints and static calls. For this
conversation, I'll stick with tracepoints (even though static calls are
used too, but including that in the conversation is confusing).

Let me define what is happening:

We have a macro that creates a defined tracepoint with a defined set of
parameters. But each tracepoint can have a different set of parameters. All
of them will have "void *" as the first parameter, but what comes after
that is unique to each tracepoint (defined by a macro). None of them will
be a variadic function call.

The macro looks like this:

	int __traceiter_##_name(void *__data, proto)			\
	{								\
		struct tracepoint_func *it_func_ptr;			\
		void *it_func;						\
									\
		it_func_ptr =						\
			rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
		do {							\
			it_func = (it_func_ptr)->func;			\
			__data = (it_func_ptr)->data;			\
			((void(*)(void *, proto))(it_func))(__data, args); \
		} while ((++it_func_ptr)->func);			\
		return 0;						\
	}


There's an array of struct tracepoint_func pointers, which has the
definition of:

struct tracepoint_func {
	void *func;
	void *data;
	int prio;
};


And you see the above, the macro does:

	((void(*)(void *, proto))(it_func))(__data, args);

With it_func being the func from the struct tracepoint_func, which is a
void pointer, it is typecast to the function that is defined by the
tracepoint. args is defined as the arguments that match the proto.

The way the array is updated, is to use an RCU method, which is to create a
new array, copy the changes to the new array, then switch the "->funcs"
over to the new copy, and after a RCU grace period is finished, we can free
the old array.

The problem we are solving is on the removal case, if the memory is tight,
it is possible that the new array can not be allocated. But we must still
remove the called function. The idea in this case is to replace the
function saved with a stub. The above loop will call the stub and not the
removed function until another update happens.

This thread is about how safe is it to call:

void tp_stub_func(void) { return ; }

instead of the function that was removed?

Thus, we are indeed calling that stub function from a call site that is not
using the same parameters.

The question is, will this break?

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 14:22             ` violating function pointer signature Steven Rostedt
@ 2020-11-18 19:46               ` Alexei Starovoitov
  2020-11-18 20:02                 ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2020-11-18 19:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Florian Weimer, Peter Zijlstra, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf,
	Kees Cook, Josh Poimboeuf, linux-toolchains

On Wed, Nov 18, 2020 at 6:22 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Thus, all functions will be non-variadic in these cases.

That's not the only case where it will blow up.
Try this on sparc:
struct foo {
int a;
};

struct foo foo_struct(void) {
struct foo f = {};
return f;
}
int foo_int(void) {
return 0;
}
or this link:
https://godbolt.org/z/EdM47b

Notice:
jmp %i7+12
The function that returns small struct will jump to a different
instruction in the caller.

I think none of the tracepoints return structs and void foo(void) is
fine on x86.
Just pointing out that it's more than just variadic.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 19:33                       ` Steven Rostedt
@ 2020-11-18 19:48                         ` Segher Boessenkool
  2020-11-18 20:44                           ` Steven Rostedt
  2020-11-19  8:21                           ` Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: Segher Boessenkool @ 2020-11-18 19:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Florian Weimer, Nick Desaulniers, Peter Zijlstra, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 02:33:43PM -0500, Steven Rostedt wrote:
> On Wed, 18 Nov 2020 13:11:27 -0600
> Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
> > Your program can make ducks appear from your nose even if that function
> > is never called, if you do that.  Just don't do UB, not even once!
> 
> But that's the whole point of this conversation. We are going to call this
> from functions that are going to have some random set of parameters.

<snip great summary>

> And you see the above, the macro does:
> 
> 	((void(*)(void *, proto))(it_func))(__data, args);

Yup.

> With it_func being the func from the struct tracepoint_func, which is a
> void pointer, it is typecast to the function that is defined by the
> tracepoint. args is defined as the arguments that match the proto.

If you have at most four or so args, what you wnat to do will work on
all systems the kernel currently supports, as far as I can tell.  It
is not valid C, and none of the compilers have an extension for this
either.  But it will likely work.

> The problem we are solving is on the removal case, if the memory is tight,
> it is possible that the new array can not be allocated. But we must still
> remove the called function. The idea in this case is to replace the
> function saved with a stub. The above loop will call the stub and not the
> removed function until another update happens.
> 
> This thread is about how safe is it to call:
> 
> void tp_stub_func(void) { return ; }
> 
> instead of the function that was removed?

Exactly as safe as calling a stub defined in asm.  The undefined
behaviour happens if your program has such a call, it doesn't matter
how the called function is defined, it doesn't have to be C.

> Thus, we are indeed calling that stub function from a call site that is not
> using the same parameters.
> 
> The question is, will this break?

It is unlikely to break if you use just a few arguments, all of simple
scalar types.  Just hope you will never encounter a crazy ABI :-)


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 19:46               ` Alexei Starovoitov
@ 2020-11-18 20:02                 ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 20:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Florian Weimer, Peter Zijlstra, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf,
	Kees Cook, Josh Poimboeuf, linux-toolchains

On Wed, 18 Nov 2020 11:46:02 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Wed, Nov 18, 2020 at 6:22 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Thus, all functions will be non-variadic in these cases.  
> 
> That's not the only case where it will blow up.
> Try this on sparc:
> struct foo {
> int a;
> };
> 
> struct foo foo_struct(void) {
> struct foo f = {};
> return f;
> }
> int foo_int(void) {
> return 0;
> }
> or this link:
> https://godbolt.org/z/EdM47b
> 
> Notice:
> jmp %i7+12
> The function that returns small struct will jump to a different
> instruction in the caller.
> 
> I think none of the tracepoints return structs and void foo(void) is
> fine on x86.
> Just pointing out that it's more than just variadic.

I also said that this is limited to only functions that have void return.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 19:48                         ` Segher Boessenkool
@ 2020-11-18 20:44                           ` Steven Rostedt
  2020-11-19  8:21                           ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2020-11-18 20:44 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Florian Weimer, Nick Desaulniers, Peter Zijlstra, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, 18 Nov 2020 13:48:37 -0600
Segher Boessenkool <segher@kernel.crashing.org> wrote:

> > With it_func being the func from the struct tracepoint_func, which is a
> > void pointer, it is typecast to the function that is defined by the
> > tracepoint. args is defined as the arguments that match the proto.  
> 
> If you have at most four or so args, what you wnat to do will work on
> all systems the kernel currently supports, as far as I can tell.  It
> is not valid C, and none of the compilers have an extension for this
> either.  But it will likely work.

Well, unfortunately, there's tracepoints with many more than 4 arguments. I
think there's one with up to 13!

> 
> > The problem we are solving is on the removal case, if the memory is tight,
> > it is possible that the new array can not be allocated. But we must still
> > remove the called function. The idea in this case is to replace the
> > function saved with a stub. The above loop will call the stub and not the
> > removed function until another update happens.
> > 
> > This thread is about how safe is it to call:
> > 
> > void tp_stub_func(void) { return ; }
> > 
> > instead of the function that was removed?  
> 
> Exactly as safe as calling a stub defined in asm.  The undefined
> behaviour happens if your program has such a call, it doesn't matter
> how the called function is defined, it doesn't have to be C.
> 
> > Thus, we are indeed calling that stub function from a call site that is not
> > using the same parameters.
> > 
> > The question is, will this break?  
> 
> It is unlikely to break if you use just a few arguments, all of simple
> scalar types.  Just hope you will never encounter a crazy ABI :-)

But in most cases, all the arguments are of scaler types, as anything else
is not recommended, because copying is always slower than just passing a
pointer, especially since it would need to be copied for every instance of
that loop. I could do an audit to see if there's any that exist, and perhaps
even add some static checker to make sure they don't.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 19:48                         ` Segher Boessenkool
  2020-11-18 20:44                           ` Steven Rostedt
@ 2020-11-19  8:21                           ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2020-11-19  8:21 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Steven Rostedt, Florian Weimer, Nick Desaulniers, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 01:48:37PM -0600, Segher Boessenkool wrote:

> If you have at most four or so args, what you wnat to do will work on
> all systems the kernel currently supports, as far as I can tell.  It
> is not valid C, and none of the compilers have an extension for this
> either.  But it will likely work.

So this is where we rely on the calling convention being caller-cleanup
(cdecl has that).

I looked at the GCC preprocessor symbols but couldn't find anything that
seems relevant to the calling convention in use, so barring that, the
best option might to be have a boot-time self-test that triggers this.
Then we'll quickly know if all architectures handle this correctly.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-18 19:11                     ` Segher Boessenkool
  2020-11-18 19:33                       ` Steven Rostedt
@ 2020-11-19  8:36                       ` Peter Zijlstra
  2020-11-19 14:37                         ` Segher Boessenkool
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2020-11-19  8:36 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Steven Rostedt, Florian Weimer, Nick Desaulniers, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote:
> Calling this via a different declared function type is undefined
> behaviour, but that is independent of how the function is *defined*.
> Your program can make ducks appear from your nose even if that function
> is never called, if you do that.  Just don't do UB, not even once!

Ah, see, here I think we disagree. UB is a flaw of the spec, but the
real world often has very sane behaviour there (sometimes also very
much not).

In this particular instance the behaviour is UB because the C spec
doesn't want to pin down the calling convention, which is something I
can understand. But once you combine the C spec with the ABI(s) at hand,
there really isn't two ways about it. This has to work, under the
premise that the ABI defines a caller cleanup calling convention.

So in the view that the compiler is a glorified assembler, I'll take UB
every day if it means I can get the thing to do what I want it to.

Obviously in the interest of co-operation and longer term viability, it
would be nice if we can agree on the behaviour and get a language
extention covering it.

Note that we have a fairly extensive tradition of defining away UB with
language extentions, -fno-strict-overflow, -fno-strict-aliasing,
-fno-delete-null-pointer-checks etc..



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-19  8:36                       ` Peter Zijlstra
@ 2020-11-19 14:37                         ` Segher Boessenkool
  2020-11-19 14:59                           ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2020-11-19 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Florian Weimer, Nick Desaulniers, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Thu, Nov 19, 2020 at 09:36:48AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote:
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
> > Your program can make ducks appear from your nose even if that function
> > is never called, if you do that.  Just don't do UB, not even once!
> 
> Ah, see, here I think we disagree. UB is a flaw of the spec, but the
> real world often has very sane behaviour there (sometimes also very
> much not).

That attitude summons ducks.

> In this particular instance the behaviour is UB because the C spec
> doesn't want to pin down the calling convention, which is something I
> can understand.

How do you know?  Were you at the meetings where this was decided?

The most frequent reason something is made UB is when there are multiple
existing implementations with irreconcilable differences.

> But once you combine the C spec with the ABI(s) at hand,
> there really isn't two ways about it. This has to work, under the
> premise that the ABI defines a caller cleanup calling convention.

This is not clear at all (and what "caller cleanup calling convention"
would mean isn't either).  A function call at the C level does not
necessarily correspond at all with a function call at the ABI level, to
begin with.

> So in the view that the compiler is a glorified assembler,

But it isn't.

> Note that we have a fairly extensive tradition of defining away UB with
> language extentions, -fno-strict-overflow, -fno-strict-aliasing,

These are options to make a large swath of not correct C programs
compile (and often work) anyway.  This is useful because there are so
many such programs, because a) people did not lint; and/or b) the
problem never was obvious with some other (or older) compiler; and/or
c) people do not care about writing portable C and prefer writing in
their own non-C dialect.

> -fno-delete-null-pointer-checks etc..

This was added as a security hardening feature.  It of course also is
useful for other things -- most flags are.  It was not added to make yet
another dialect.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-19 14:37                         ` Segher Boessenkool
@ 2020-11-19 14:59                           ` Steven Rostedt
  2020-11-19 16:35                             ` Segher Boessenkool
  2020-11-19 17:04                             ` Alexei Starovoitov
  0 siblings, 2 replies; 33+ messages in thread
From: Steven Rostedt @ 2020-11-19 14:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Peter Zijlstra, Florian Weimer, Nick Desaulniers, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Thu, 19 Nov 2020 08:37:35 -0600
Segher Boessenkool <segher@kernel.crashing.org> wrote:

> > Note that we have a fairly extensive tradition of defining away UB with
> > language extentions, -fno-strict-overflow, -fno-strict-aliasing,  
> 
> These are options to make a large swath of not correct C programs
> compile (and often work) anyway.  This is useful because there are so
> many such programs, because a) people did not lint; and/or b) the
> problem never was obvious with some other (or older) compiler; and/or
> c) people do not care about writing portable C and prefer writing in
> their own non-C dialect.


Note, this is not about your average C program. This is about the Linux
kernel, which already does a lot of tricks in C. There's a lot of code in
assembly that gets called from C (and vise versa). We modify code on the
fly (which tracepoints use two methods of that - with asm-goto/jump-labels
and static functions).

As for your point c), I'm not sure what you mean about portable C (stuck to
a single compiler, or stuck to a single architecture?). Linux obviously
supports multiple architectures (more than any other OS), but it is pretty
stuck to gcc as a compiler (with LLVM just starting to work too).

We are fine with being stuck to a compiler if it gives us what we want.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-19 14:59                           ` Steven Rostedt
@ 2020-11-19 16:35                             ` Segher Boessenkool
  2020-11-19 17:42                               ` David Laight
  2020-11-19 17:04                             ` Alexei Starovoitov
  1 sibling, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2020-11-19 16:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Florian Weimer, Nick Desaulniers, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Thu, Nov 19, 2020 at 09:59:51AM -0500, Steven Rostedt wrote:
> On Thu, 19 Nov 2020 08:37:35 -0600
> Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > > Note that we have a fairly extensive tradition of defining away UB with
> > > language extentions, -fno-strict-overflow, -fno-strict-aliasing,  
> > 
> > These are options to make a large swath of not correct C programs
> > compile (and often work) anyway.  This is useful because there are so
> > many such programs, because a) people did not lint; and/or b) the
> > problem never was obvious with some other (or older) compiler; and/or
> > c) people do not care about writing portable C and prefer writing in
> > their own non-C dialect.
> 
> Note, this is not about your average C program. This is about the Linux
> kernel, which already does a lot of tricks in C.

Yes, I know.  And some of that can be supported just fine (usually
because the compiler explicitly supports it); some of that will not
cause problems in practice (e.g. the scope it could cause problems in
is very limited); and some of that is just waiting to break down.  The
question is what category your situation here is in.  The middle one I
think.

> There's a lot of code in
> assembly that gets called from C (and vise versa).

That is just fine, that is what ABIs are for :-)

> We modify code on the
> fly (which tracepoints use two methods of that - with asm-goto/jump-labels
> and static functions).

And that is a lot trickier.  It can be made to work, but there are many
dark nooks and crannies problems can hide in.

> As for your point c), I'm not sure what you mean about portable C

I just meant "valid C language code as defined by the standards".  Many
people want all UB to just go away, while that is *impossible* to do for
many compilers: for example where different architectures or different
ABIs have contradictory requirements.

> (stuck to
> a single compiler, or stuck to a single architecture?). Linux obviously
> supports multiple architectures (more than any other OS), but it is pretty
> stuck to gcc as a compiler (with LLVM just starting to work too).
> 
> We are fine with being stuck to a compiler if it gives us what we want.

Right.  Just know that a compiler can make defined behaviour for *some*
things that are UB in standard C (possibly at a runtime performance
cost), but most things are not feasible.

Alexei's SPARC example (thanks!) shows that even "obvious" things are
not so obviously (or at all) implemented the way you expect on all
systems you care about.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-19 14:59                           ` Steven Rostedt
  2020-11-19 16:35                             ` Segher Boessenkool
@ 2020-11-19 17:04                             ` Alexei Starovoitov
  2020-11-19 17:30                               ` Steven Rostedt
  2020-11-20  1:31                               ` Nick Desaulniers
  1 sibling, 2 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2020-11-19 17:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Segher Boessenkool, Peter Zijlstra, Florian Weimer,
	Nick Desaulniers, Sami Tolvanen, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf,
	Kees Cook, Josh Poimboeuf, linux-toolchains

On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> Linux obviously
> supports multiple architectures (more than any other OS), but it is pretty
> stuck to gcc as a compiler (with LLVM just starting to work too).
>
> We are fine with being stuck to a compiler if it gives us what we want.

I beg to disagree.
android, chrome and others changed their kernel builds to
"make LLVM=1" some time ago.
It's absolutely vital for the health of the kernel to be built with
both gcc and llvm.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-19 17:04                             ` Alexei Starovoitov
@ 2020-11-19 17:30                               ` Steven Rostedt
  2020-11-20  1:31                               ` Nick Desaulniers
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2020-11-19 17:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Segher Boessenkool, Peter Zijlstra, Florian Weimer,
	Nick Desaulniers, Sami Tolvanen, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf,
	Kees Cook, Josh Poimboeuf, linux-toolchains

On Thu, 19 Nov 2020 09:04:57 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > Linux obviously
> > supports multiple architectures (more than any other OS), but it is pretty
> > stuck to gcc as a compiler (with LLVM just starting to work too).
> >
> > We are fine with being stuck to a compiler if it gives us what we want.  
> 
> I beg to disagree.

I think you misunderstood.

> android, chrome and others changed their kernel builds to
> "make LLVM=1" some time ago.
> It's absolutely vital for the health of the kernel to be built with
> both gcc and llvm.

That's what I meant with "LLVM just starting to work too".

LLVM has been working hard to make sure that it can do the same tricks that
the kernel depends on gcc for. And LLVM appears to be working fine with the
nop stub logic (it's already in 5.10-rc with with the static callers).

We can easily create a boot up test (config option) that will test it, and
if a compiler breaks it, this test would trigger a failure.

Again, both static calls and tracepoint callbacks are limited in what they
can do. Both return void, and both do are not variadic functions. Although
passing in a struct as a parameter is possible, we could add testing to
detect this, as that's rather slow to begin with.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: violating function pointer signature
  2020-11-19 16:35                             ` Segher Boessenkool
@ 2020-11-19 17:42                               ` David Laight
  2020-11-19 19:27                                 ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: David Laight @ 2020-11-19 17:42 UTC (permalink / raw)
  To: 'Segher Boessenkool', Steven Rostedt
  Cc: Peter Zijlstra, Florian Weimer, Nick Desaulniers, Sami Tolvanen,
	Mathieu Desnoyers, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

From: Segher Boessenkool
> Sent: 19 November 2020 16:35
...
> I just meant "valid C language code as defined by the standards".  Many
> people want all UB to just go away, while that is *impossible* to do for
> many compilers: for example where different architectures or different
> ABIs have contradictory requirements.

Some of the UB in the C language are (probably) there because
certain (now obscure) hardware behaved that way.
For instance integer arithmetic may saturate on overflow
(or do even stranger things if the sign is a separate bit).
I'm not quite sure it was ever possible to write a C compiler
for a cpu that processed numbers in ASCII (up to 10 digits),
binary arithmetic was almost impossible.
There are also the CPU that only have 'word' addressing - so
that 'pointers to characters' take extra instructions.

ISTM that a few years ago the gcc developers started looking
at some of these 'UB' and decided they could make use of
them to make some code faster (and break other code).

One of the problems with UB is that whereas you might expect
UB arithmetic to generate an unexpected result and/or signal
it is completely open-ended and could fire an ICBM at the coder.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-19 17:42                               ` David Laight
@ 2020-11-19 19:27                                 ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2020-11-19 19:27 UTC (permalink / raw)
  To: David Laight
  Cc: Steven Rostedt, Peter Zijlstra, Florian Weimer, Nick Desaulniers,
	Sami Tolvanen, Mathieu Desnoyers, linux-kernel, Matt Mullins,
	Ingo Molnar, Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, netdev, bpf, Kees Cook, Josh Poimboeuf,
	linux-toolchains

On Thu, Nov 19, 2020 at 05:42:34PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 19 November 2020 16:35
> > I just meant "valid C language code as defined by the standards".  Many
> > people want all UB to just go away, while that is *impossible* to do for
> > many compilers: for example where different architectures or different
> > ABIs have contradictory requirements.
> 
> Some of the UB in the C language are (probably) there because
> certain (now obscure) hardware behaved that way.

Yes.

> For instance integer arithmetic may saturate on overflow
> (or do even stranger things if the sign is a separate bit).

And some still does!

> I'm not quite sure it was ever possible to write a C compiler
> for a cpu that processed numbers in ASCII (up to 10 digits),
> binary arithmetic was almost impossible.

A machine that really stores decimal numbers?  Not BCD or the like?
Yeah wow, that will be hard.

> There are also the CPU that only have 'word' addressing - so
> that 'pointers to characters' take extra instructions.

Such machines are still made, and are programmed in C as well.

> ISTM that a few years ago the gcc developers started looking
> at some of these 'UB' and decided they could make use of
> them to make some code faster (and break other code).

When UB would happen in some situation, the compiler can simply assume
that situation does not happen.  This makes it possible to do a lot of
optimisations (many to do with loops) that cannot be done otherwise
(including those to do with signed overflow).  And many of those
optimisations are worthwhile.

> One of the problems with UB is that whereas you might expect
> UB arithmetic to generate an unexpected result and/or signal
> it is completely open-ended and could fire an ICBM at the coder.

Yes, UB is undefined behaviour.  Unspecified is something else (and C
has that as well, also implementation-defined, etc.)

In some cases GCC (and any other modern compiler) can make UB be IB
instead, with some flag for example, like -fno-strict-* does.  In other
cases it isn't so easy at all.  In cases like you have here (where the
validity of what you want to do depends on the ABI in effect) things are
not easy :-/


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: violating function pointer signature
  2020-11-19 17:04                             ` Alexei Starovoitov
  2020-11-19 17:30                               ` Steven Rostedt
@ 2020-11-20  1:31                               ` Nick Desaulniers
  1 sibling, 0 replies; 33+ messages in thread
From: Nick Desaulniers @ 2020-11-20  1:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Segher Boessenkool, Peter Zijlstra,
	Florian Weimer, Sami Tolvanen, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf,
	Kees Cook, Josh Poimboeuf, linux-toolchains, Bill Wendling

On Thu, Nov 19, 2020 at 9:05 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > Linux obviously
> > supports multiple architectures (more than any other OS), but it is pretty
> > stuck to gcc as a compiler (with LLVM just starting to work too).
> >
> > We are fine with being stuck to a compiler if it gives us what we want.
>
> I beg to disagree.
> android, chrome and others changed their kernel builds to
> "make LLVM=1" some time ago.
> It's absolutely vital for the health of the kernel to be built with
> both gcc and llvm.

Our fleet of machines in the data centers is currently mid-ramp, at
around or slightly just over 50% of kernels built with Clang.  Soon to
be 100%.  So "a good chunk of Google services," too, FWIW.

OpenMandriva is on track for their 4.2 release to use LLVM for their kernels.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] tracepoint: Do not fail unregistering a probe due to memory allocation
  2020-11-18 14:34                 ` [PATCH v3] tracepoint: Do not fail unregistering a probe due to memory allocation Steven Rostedt
@ 2020-11-24  5:59                   ` Matt Mullins
  0 siblings, 0 replies; 33+ messages in thread
From: Matt Mullins @ 2020-11-24  5:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Florian Weimer, Peter Zijlstra, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf,
	Kees Cook, Josh Poimboeuf, linux-toolchains

On Wed, Nov 18, 2020 at 09:34:05AM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> The list of tracepoint callbacks is managed by an array that is protected
> by RCU. To update this array, a new array is allocated, the updates are
> copied over to the new array, and then the list of functions for the
> tracepoint is switched over to the new array. After a completion of an RCU
> grace period, the old array is freed.
> 
> This process happens for both adding a callback as well as removing one.
> But on removing a callback, if the new array fails to be allocated, the
> callback is not removed, and may be used after it is freed by the clients
> of the tracepoint.
> 
> There's really no reason to fail if the allocation for a new array fails
> when removing a function. Instead, the function can simply be replaced by a
> stub function that could be cleaned up on the next modification of the
> array. That is, instead of calling the function registered to the
> tracepoint, it would call a stub function in its place.
> 
> Link: https://lore.kernel.org/r/20201115055256.65625-1-mmullins@mmlx.us
> Link: https://lore.kernel.org/r/20201116175107.02db396d@gandalf.local.home
> Link: https://lore.kernel.org/r/20201117211836.54acaef2@oasis.local.home
> 
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Song Liu <songliubraving@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: KP Singh <kpsingh@chromium.org>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: bpf <bpf@vger.kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Florian Weimer <fw@deneb.enyo.de>
> Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
> Reported-by: syzbot+83aa762ef23b6f0d1991@syzkaller.appspotmail.com
> Reported-by: syzbot+d29e58bb557324e55e5e@syzkaller.appspotmail.com
> Reported-by: Matt Mullins <mmullins@mmlx.us>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

I'm a bit late answering your initial query, but yes indeed this fixes
the bug I was hunting.  I just watched it live through the reproducer
for about a half-hour, while unpatched I get an instant "BUG: unable to
handle page fault".

Tested-by: Matt Mullins <mmullins@mmlx.us>

> ---
> Changes since v2:
>    - Went back to using a stub function and not touching
>       the fast path.
>    - Removed adding __GFP_NOFAIL from the allocation of the removal.
> 
>  kernel/tracepoint.c | 80 ++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 64 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 3f659f855074..3e261482296c 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -53,6 +53,12 @@ struct tp_probes {
>  	struct tracepoint_func probes[];
>  };
>  
> +/* Called in removal of a func but failed to allocate a new tp_funcs */
> +static void tp_stub_func(void)
> +{
> +	return;
> +}
> +
>  static inline void *allocate_probes(int count)
>  {
>  	struct tp_probes *p  = kmalloc(struct_size(p, probes, count),
> @@ -131,6 +137,7 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
>  {
>  	struct tracepoint_func *old, *new;
>  	int nr_probes = 0;
> +	int stub_funcs = 0;
>  	int pos = -1;
>  
>  	if (WARN_ON(!tp_func->func))
> @@ -147,14 +154,34 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
>  			if (old[nr_probes].func == tp_func->func &&
>  			    old[nr_probes].data == tp_func->data)
>  				return ERR_PTR(-EEXIST);
> +			if (old[nr_probes].func == tp_stub_func)
> +				stub_funcs++;
>  		}
>  	}
> -	/* + 2 : one for new probe, one for NULL func */
> -	new = allocate_probes(nr_probes + 2);
> +	/* + 2 : one for new probe, one for NULL func - stub functions */
> +	new = allocate_probes(nr_probes + 2 - stub_funcs);
>  	if (new == NULL)
>  		return ERR_PTR(-ENOMEM);
>  	if (old) {
> -		if (pos < 0) {
> +		if (stub_funcs) {
> +			/* Need to copy one at a time to remove stubs */
> +			int probes = 0;
> +
> +			pos = -1;
> +			for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
> +				if (old[nr_probes].func == tp_stub_func)
> +					continue;
> +				if (pos < 0 && old[nr_probes].prio < prio)
> +					pos = probes++;
> +				new[probes++] = old[nr_probes];
> +			}
> +			nr_probes = probes;
> +			if (pos < 0)
> +				pos = probes;
> +			else
> +				nr_probes--; /* Account for insertion */
> +
> +		} else if (pos < 0) {
>  			pos = nr_probes;
>  			memcpy(new, old, nr_probes * sizeof(struct tracepoint_func));
>  		} else {
> @@ -188,8 +215,9 @@ static void *func_remove(struct tracepoint_func **funcs,
>  	/* (N -> M), (N > 1, M >= 0) probes */
>  	if (tp_func->func) {
>  		for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
> -			if (old[nr_probes].func == tp_func->func &&
> -			     old[nr_probes].data == tp_func->data)
> +			if ((old[nr_probes].func == tp_func->func &&
> +			     old[nr_probes].data == tp_func->data) ||
> +			    old[nr_probes].func == tp_stub_func)
>  				nr_del++;
>  		}
>  	}
> @@ -208,14 +236,32 @@ static void *func_remove(struct tracepoint_func **funcs,
>  		/* N -> M, (N > 1, M > 0) */
>  		/* + 1 for NULL */
>  		new = allocate_probes(nr_probes - nr_del + 1);
> -		if (new == NULL)
> -			return ERR_PTR(-ENOMEM);
> -		for (i = 0; old[i].func; i++)
> -			if (old[i].func != tp_func->func
> -					|| old[i].data != tp_func->data)
> -				new[j++] = old[i];
> -		new[nr_probes - nr_del].func = NULL;
> -		*funcs = new;
> +		if (new) {
> +			for (i = 0; old[i].func; i++)
> +				if ((old[i].func != tp_func->func
> +				     || old[i].data != tp_func->data)
> +				    && old[i].func != tp_stub_func)
> +					new[j++] = old[i];
> +			new[nr_probes - nr_del].func = NULL;
> +			*funcs = new;
> +		} else {
> +			/*
> +			 * Failed to allocate, replace the old function
> +			 * with calls to tp_stub_func.
> +			 */
> +			for (i = 0; old[i].func; i++)
> +				if (old[i].func == tp_func->func &&
> +				    old[i].data == tp_func->data) {
> +					old[i].func = tp_stub_func;
> +					/* Set the prio to the next event. */
> +					if (old[i + 1].func)
> +						old[i].prio =
> +							old[i + 1].prio;
> +					else
> +						old[i].prio = -1;
> +				}
> +			*funcs = old;
> +		}
>  	}
>  	debug_print_probes(*funcs);
>  	return old;
> @@ -295,10 +341,12 @@ static int tracepoint_remove_func(struct tracepoint *tp,
>  	tp_funcs = rcu_dereference_protected(tp->funcs,
>  			lockdep_is_held(&tracepoints_mutex));
>  	old = func_remove(&tp_funcs, func);
> -	if (IS_ERR(old)) {
> -		WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
> +	if (WARN_ON_ONCE(IS_ERR(old)))
>  		return PTR_ERR(old);
> -	}
> +
> +	if (tp_funcs == old)
> +		/* Failed allocating new tp_funcs, replaced func with stub */
> +		return 0;
>  
>  	if (!tp_funcs) {
>  		/* Removed last function */
> -- 
> 2.25.4
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-11-24  6:09 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201116175107.02db396d@gandalf.local.home>
     [not found] ` <47463878.48157.1605640510560.JavaMail.zimbra@efficios.com>
     [not found]   ` <20201117142145.43194f1a@gandalf.local.home>
     [not found]     ` <375636043.48251.1605642440621.JavaMail.zimbra@efficios.com>
     [not found]       ` <20201117153451.3015c5c9@gandalf.local.home>
2020-11-18 13:21         ` violating function pointer signature Peter Zijlstra
2020-11-18 13:59           ` Florian Weimer
2020-11-18 14:12             ` Peter Zijlstra
2020-11-18 14:18               ` Florian Weimer
2020-11-18 14:34                 ` [PATCH v3] tracepoint: Do not fail unregistering a probe due to memory allocation Steven Rostedt
2020-11-24  5:59                   ` Matt Mullins
2020-11-18 14:22             ` violating function pointer signature Steven Rostedt
2020-11-18 19:46               ` Alexei Starovoitov
2020-11-18 20:02                 ` Steven Rostedt
2020-11-18 14:02           ` Steven Rostedt
2020-11-18 16:01             ` Mathieu Desnoyers
2020-11-18 16:19               ` David Laight
2020-11-18 16:50           ` Nick Desaulniers
2020-11-18 17:17             ` Steven Rostedt
2020-11-18 18:12               ` Segher Boessenkool
2020-11-18 18:31                 ` Florian Weimer
2020-11-18 18:55                   ` Segher Boessenkool
2020-11-18 18:58                   ` Steven Rostedt
2020-11-18 18:59                     ` Steven Rostedt
2020-11-18 19:11                     ` Segher Boessenkool
2020-11-18 19:33                       ` Steven Rostedt
2020-11-18 19:48                         ` Segher Boessenkool
2020-11-18 20:44                           ` Steven Rostedt
2020-11-19  8:21                           ` Peter Zijlstra
2020-11-19  8:36                       ` Peter Zijlstra
2020-11-19 14:37                         ` Segher Boessenkool
2020-11-19 14:59                           ` Steven Rostedt
2020-11-19 16:35                             ` Segher Boessenkool
2020-11-19 17:42                               ` David Laight
2020-11-19 19:27                                 ` Segher Boessenkool
2020-11-19 17:04                             ` Alexei Starovoitov
2020-11-19 17:30                               ` Steven Rostedt
2020-11-20  1:31                               ` Nick Desaulniers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).