linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
@ 2021-01-27 17:39 Steven Rostedt
  2021-01-27 17:54 ` Steven Rostedt
  2021-01-27 18:00 ` Mathieu Desnoyers
  0 siblings, 2 replies; 7+ messages in thread
From: Steven Rostedt @ 2021-01-27 17:39 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, Matt Mullins, paulmck, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Kees Cook, Peter Zijlstra,
	Josh Poimboeuf, Alexey Kardashevskiy

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The list of tracepoint callbacks is managed by an array that is protected
by RCU. To update this array, a new array is allocated, the updates are
copied over to the new array, and then the list of functions for the
tracepoint is switched over to the new array. After a completion of an RCU
grace period, the old array is freed.

This process happens for both adding a callback as well as removing one.
But on removing a callback, if the new array fails to be allocated, the
callback is not removed, and may be used after it is freed by the clients
of the tracepoint.

The handling of a failed allocation for removing an event can break use
cases as the error report is not propagated up to the original callers. To
make matters worse, there's some paths that can not handle error cases.

Instead of allocating a new array for removing a tracepoint, allocate twice
the needed size when adding tracepoints to the array. On removing, use the
second half of the allocated array. This removes the need to allocate memory
for removing a tracepoint, as the allocation for removals will already have
been done.

Link: https://lkml.kernel.org/r/20201115055256.65625-1-mmullins@mmlx.us
Link: https://lkml.kernel.org/r/20201116175107.02db396d@gandalf.local.home
Link: https://lkml.kennel.org/r/20201118093405.7a6d2290@gandalf.local.home

Reported-by: Matt Mullins <mmullins@mmlx.us>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---

Changes since v3:

  Scrapped the entire idea of having a stub function replace the removed
  tracepoint callback. Instead, simply allocate twice the needed array at
  addition of the tracepoint, and on removal, use the second half of the
  array. This removes the need to allocate anything on removal, which
  removes the possible failure of that allocation.

 kernel/tracepoint.c | 54 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 7261fa0f5e3c..23088f6276a4 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -129,7 +129,7 @@ static struct tracepoint_func *
 func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
 	 int prio)
 {
-	struct tracepoint_func *old, *new;
+	struct tracepoint_func *old, *new, *tp_funcs;
 	int nr_probes = 0;
 	int pos = -1;
 
@@ -149,10 +149,28 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
 				return ERR_PTR(-EEXIST);
 		}
 	}
-	/* + 2 : one for new probe, one for NULL func */
-	new = allocate_probes(nr_probes + 2);
-	if (new == NULL)
+	/*
+	 * The size of the tp_funcs will be the current size, plus
+	 * one for the new probe, one for the NULL func, and one for
+	 * the pointer to the "removal" array.
+	 * And then double the size to create the "removal" array.
+	 */
+	tp_funcs = allocate_probes((nr_probes + 3) * 2);
+	if (tp_funcs == NULL)
 		return ERR_PTR(-ENOMEM);
+	/*
+	 * When removing a probe and there are more probes left,
+	 * we cannot rely on allocation to succeed to create the new
+	 * RCU array. Thus, the array is doubled here, and on removal of
+	 * a probe with other probes still in the array, the second half
+	 * of the array is used.
+	 *
+	 * The first element of the array has its "func" field point to
+	 * the start of the other half of the array.
+	 */
+	tp_funcs->func = tp_funcs + nr_probes + 3;
+	tp_funcs[nr_probes + 3].func = tp_funcs;
+	new = tp_funcs + 1;
 	if (old) {
 		if (pos < 0) {
 			pos = nr_probes;
@@ -164,6 +182,14 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
 			memcpy(new + pos + 1, old + pos,
 			       (nr_probes - pos) * sizeof(struct tracepoint_func));
 		}
+		/* Make old point back to the allocated array */
+		old--;
+		/*
+		 * If this is the second half of the array,
+		 * make it point back to the first half.
+		 */
+		if (old->func < old)
+			old = old->func;
 	} else
 		pos = 0;
 	new[pos] = *tp_func;
@@ -202,14 +228,18 @@ static void *func_remove(struct tracepoint_func **funcs,
 		/* N -> 0, (N > 1) */
 		*funcs = NULL;
 		debug_print_probes(*funcs);
+		/* Set old back to what it was allocated to */
+		old--;
+		if (old->func < old)
+			old = old->func;
 		return old;
 	} else {
 		int j = 0;
-		/* N -> M, (N > 1, M > 0) */
-		/* + 1 for NULL */
-		new = allocate_probes(nr_probes - nr_del + 1);
-		if (new == NULL)
-			return ERR_PTR(-ENOMEM);
+
+		/* Use the other half of the array for the new probes */
+		new = old - 1;
+		new = new->func;
+		new++;
 		for (i = 0; old[i].func; i++)
 			if (old[i].func != tp_func->func
 					|| old[i].data != tp_func->data)
@@ -218,7 +248,7 @@ static void *func_remove(struct tracepoint_func **funcs,
 		*funcs = new;
 	}
 	debug_print_probes(*funcs);
-	return old;
+	return NULL;
 }
 
 static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs, bool sync)
@@ -309,8 +339,8 @@ static int tracepoint_remove_func(struct tracepoint *tp,
 		rcu_assign_pointer(tp->funcs, tp_funcs);
 	} else {
 		rcu_assign_pointer(tp->funcs, tp_funcs);
-		tracepoint_update_call(tp, tp_funcs,
-				       tp_funcs[0].func != old[0].func);
+		/* Need to sync, if going back to a single caller */
+		tracepoint_update_call(tp, tp_funcs, tp_funcs[1].func == NULL);
 	}
 	release_probes(old);
 	return 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
  2021-01-27 17:39 [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure Steven Rostedt
@ 2021-01-27 17:54 ` Steven Rostedt
  2021-01-27 18:00 ` Mathieu Desnoyers
  1 sibling, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2021-01-27 17:54 UTC (permalink / raw)
  To: LKML
  Cc: Mathieu Desnoyers, Matt Mullins, paulmck, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Kees Cook, Peter Zijlstra,
	Josh Poimboeuf, Alexey Kardashevskiy

On Wed, 27 Jan 2021 12:39:51 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>  kernel/tracepoint.c | 54 +++++++++++++++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 7261fa0f5e3c..23088f6276a4 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -129,7 +129,7 @@ static struct tracepoint_func *
>  func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
>  	 int prio)
>  {
> -	struct tracepoint_func *old, *new;
> +	struct tracepoint_func *old, *new, *tp_funcs;
>  	int nr_probes = 0;
>  	int pos = -1;
>  
> @@ -149,10 +149,28 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
>  				return ERR_PTR(-EEXIST);
>  		}
>  	}
> -	/* + 2 : one for new probe, one for NULL func */
> -	new = allocate_probes(nr_probes + 2);
> -	if (new == NULL)
> +	/*
> +	 * The size of the tp_funcs will be the current size, plus
> +	 * one for the new probe, one for the NULL func, and one for
> +	 * the pointer to the "removal" array.
> +	 * And then double the size to create the "removal" array.
> +	 */
> +	tp_funcs = allocate_probes((nr_probes + 3) * 2);

Note, I realize that this double allocation is unnecessary if we add a
single probe. But to make this different for 2 or more probes, would make
the logic more complex, so I just kept the logic the same for the single
probe even though it's not needed.


> +	if (tp_funcs == NULL)
>  		return ERR_PTR(-ENOMEM);
> +	/*
> +	 * When removing a probe and there are more probes left,
> +	 * we cannot rely on allocation to succeed to create the new
> +	 * RCU array. Thus, the array is doubled here, and on removal of
> +	 * a probe with other probes still in the array, the second half
> +	 * of the array is used.
> +	 *
> +	 * The first element of the array has its "func" field point to
> +	 * the start of the other half of the array.
> +	 */
> +	tp_funcs->func = tp_funcs + nr_probes + 3;
> +	tp_funcs[nr_probes + 3].func = tp_funcs;
> +	new = tp_funcs + 1;
>  	if (old) {
>  		if (pos < 0) {
>  			pos = nr_probes;
> @@ -164,6 +182,14 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
>  			memcpy(new + pos + 1, old + pos,
>  			       (nr_probes - pos) * sizeof(struct tracepoint_func));
>  		}
> +		/* Make old point back to the allocated array */
> +		old--;
> +		/*
> +		 * If this is the second half of the array,
> +		 * make it point back to the first half.
> +		 */
> +		if (old->func < old)
> +			old = old->func;
>  	} else
>  		pos = 0;
>  	new[pos] = *tp_func;
> @@ -202,14 +228,18 @@ static void *func_remove(struct tracepoint_func **funcs,
>  		/* N -> 0, (N > 1) */
>  		*funcs = NULL;
>  		debug_print_probes(*funcs);
> +		/* Set old back to what it was allocated to */
> +		old--;
> +		if (old->func < old)
> +			old = old->func;
>  		return old;
>  	} else {
>  		int j = 0;
> -		/* N -> M, (N > 1, M > 0) */
> -		/* + 1 for NULL */
> -		new = allocate_probes(nr_probes - nr_del + 1);
> -		if (new == NULL)
> -			return ERR_PTR(-ENOMEM);
> +
> +		/* Use the other half of the array for the new probes */
> +		new = old - 1;
> +		new = new->func;
> +		new++;
>  		for (i = 0; old[i].func; i++)
>  			if (old[i].func != tp_func->func
>  					|| old[i].data != tp_func->data)
> @@ -218,7 +248,7 @@ static void *func_remove(struct tracepoint_func **funcs,
>  		*funcs = new;
>  	}
>  	debug_print_probes(*funcs);
> -	return old;
> +	return NULL;
>  }
>  
>  static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs, bool sync)
> @@ -309,8 +339,8 @@ static int tracepoint_remove_func(struct tracepoint *tp,
>  		rcu_assign_pointer(tp->funcs, tp_funcs);
>  	} else {
>  		rcu_assign_pointer(tp->funcs, tp_funcs);
> -		tracepoint_update_call(tp, tp_funcs,
> -				       tp_funcs[0].func != old[0].func);
> +		/* Need to sync, if going back to a single caller */
> +		tracepoint_update_call(tp, tp_funcs, tp_funcs[1].func == NULL);

I may make this change a separate patch. As it changes the logic slightly
unrelated to the change being fixed, and was only needed for this patch, to
remove the reference to "old".

-- Steve


>  	}
>  	release_probes(old);
>  	return 0;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
  2021-01-27 17:39 [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure Steven Rostedt
  2021-01-27 17:54 ` Steven Rostedt
@ 2021-01-27 18:00 ` Mathieu Desnoyers
  2021-01-27 18:07   ` Steven Rostedt
  1 sibling, 1 reply; 7+ messages in thread
From: Mathieu Desnoyers @ 2021-01-27 18:00 UTC (permalink / raw)
  To: rostedt, paulmck
  Cc: linux-kernel, Matt Mullins, paulmck, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Kees Cook, Peter Zijlstra,
	Josh Poimboeuf, Alexey Kardashevskiy

----- On Jan 27, 2021, at 12:39 PM, rostedt rostedt@goodmis.org wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> The list of tracepoint callbacks is managed by an array that is protected
> by RCU. To update this array, a new array is allocated, the updates are
> copied over to the new array, and then the list of functions for the
> tracepoint is switched over to the new array. After a completion of an RCU
> grace period, the old array is freed.
> 
> This process happens for both adding a callback as well as removing one.
> But on removing a callback, if the new array fails to be allocated, the
> callback is not removed, and may be used after it is freed by the clients
> of the tracepoint.
> 
> The handling of a failed allocation for removing an event can break use
> cases as the error report is not propagated up to the original callers. To
> make matters worse, there's some paths that can not handle error cases.
> 
> Instead of allocating a new array for removing a tracepoint, allocate twice
> the needed size when adding tracepoints to the array. On removing, use the
> second half of the allocated array. This removes the need to allocate memory
> for removing a tracepoint, as the allocation for removals will already have
> been done.

I don't see how this can work reliably. AFAIU, with RCU, approaches
requiring a pre-allocation of twice the size and swapping to the alternate
memory area on removal falls apart whenever you remove 2 or more elements
back-to-back without waiting for a grace period.

How is this handled by your scheme ?

Thanks,

Mathieu

> 
> Link: https://lkml.kernel.org/r/20201115055256.65625-1-mmullins@mmlx.us
> Link: https://lkml.kernel.org/r/20201116175107.02db396d@gandalf.local.home
> Link: https://lkml.kennel.org/r/20201118093405.7a6d2290@gandalf.local.home
> 
> Reported-by: Matt Mullins <mmullins@mmlx.us>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> 
> Changes since v3:
> 
>  Scrapped the entire idea of having a stub function replace the removed
>  tracepoint callback. Instead, simply allocate twice the needed array at
>  addition of the tracepoint, and on removal, use the second half of the
>  array. This removes the need to allocate anything on removal, which
>  removes the possible failure of that allocation.
> 
> kernel/tracepoint.c | 54 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 7261fa0f5e3c..23088f6276a4 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -129,7 +129,7 @@ static struct tracepoint_func *
> func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
> 	 int prio)
> {
> -	struct tracepoint_func *old, *new;
> +	struct tracepoint_func *old, *new, *tp_funcs;
> 	int nr_probes = 0;
> 	int pos = -1;
> 
> @@ -149,10 +149,28 @@ func_add(struct tracepoint_func **funcs, struct
> tracepoint_func *tp_func,
> 				return ERR_PTR(-EEXIST);
> 		}
> 	}
> -	/* + 2 : one for new probe, one for NULL func */
> -	new = allocate_probes(nr_probes + 2);
> -	if (new == NULL)
> +	/*
> +	 * The size of the tp_funcs will be the current size, plus
> +	 * one for the new probe, one for the NULL func, and one for
> +	 * the pointer to the "removal" array.
> +	 * And then double the size to create the "removal" array.
> +	 */
> +	tp_funcs = allocate_probes((nr_probes + 3) * 2);
> +	if (tp_funcs == NULL)
> 		return ERR_PTR(-ENOMEM);
> +	/*
> +	 * When removing a probe and there are more probes left,
> +	 * we cannot rely on allocation to succeed to create the new
> +	 * RCU array. Thus, the array is doubled here, and on removal of
> +	 * a probe with other probes still in the array, the second half
> +	 * of the array is used.
> +	 *
> +	 * The first element of the array has its "func" field point to
> +	 * the start of the other half of the array.
> +	 */
> +	tp_funcs->func = tp_funcs + nr_probes + 3;
> +	tp_funcs[nr_probes + 3].func = tp_funcs;
> +	new = tp_funcs + 1;
> 	if (old) {
> 		if (pos < 0) {
> 			pos = nr_probes;
> @@ -164,6 +182,14 @@ func_add(struct tracepoint_func **funcs, struct
> tracepoint_func *tp_func,
> 			memcpy(new + pos + 1, old + pos,
> 			       (nr_probes - pos) * sizeof(struct tracepoint_func));
> 		}
> +		/* Make old point back to the allocated array */
> +		old--;
> +		/*
> +		 * If this is the second half of the array,
> +		 * make it point back to the first half.
> +		 */
> +		if (old->func < old)
> +			old = old->func;
> 	} else
> 		pos = 0;
> 	new[pos] = *tp_func;
> @@ -202,14 +228,18 @@ static void *func_remove(struct tracepoint_func **funcs,
> 		/* N -> 0, (N > 1) */
> 		*funcs = NULL;
> 		debug_print_probes(*funcs);
> +		/* Set old back to what it was allocated to */
> +		old--;
> +		if (old->func < old)
> +			old = old->func;
> 		return old;
> 	} else {
> 		int j = 0;
> -		/* N -> M, (N > 1, M > 0) */
> -		/* + 1 for NULL */
> -		new = allocate_probes(nr_probes - nr_del + 1);
> -		if (new == NULL)
> -			return ERR_PTR(-ENOMEM);
> +
> +		/* Use the other half of the array for the new probes */
> +		new = old - 1;
> +		new = new->func;
> +		new++;
> 		for (i = 0; old[i].func; i++)
> 			if (old[i].func != tp_func->func
> 					|| old[i].data != tp_func->data)
> @@ -218,7 +248,7 @@ static void *func_remove(struct tracepoint_func **funcs,
> 		*funcs = new;
> 	}
> 	debug_print_probes(*funcs);
> -	return old;
> +	return NULL;
> }
> 
> static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func
> *tp_funcs, bool sync)
> @@ -309,8 +339,8 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> 		rcu_assign_pointer(tp->funcs, tp_funcs);
> 	} else {
> 		rcu_assign_pointer(tp->funcs, tp_funcs);
> -		tracepoint_update_call(tp, tp_funcs,
> -				       tp_funcs[0].func != old[0].func);
> +		/* Need to sync, if going back to a single caller */
> +		tracepoint_update_call(tp, tp_funcs, tp_funcs[1].func == NULL);
> 	}
> 	release_probes(old);
> 	return 0;
> --
> 2.25.4

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
  2021-01-27 18:00 ` Mathieu Desnoyers
@ 2021-01-27 18:07   ` Steven Rostedt
  2021-01-27 18:13     ` Mathieu Desnoyers
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2021-01-27 18:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paulmck, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Kees Cook, Peter Zijlstra,
	Josh Poimboeuf, Alexey Kardashevskiy

On Wed, 27 Jan 2021 13:00:46 -0500 (EST)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> > Instead of allocating a new array for removing a tracepoint, allocate twice
> > the needed size when adding tracepoints to the array. On removing, use the
> > second half of the allocated array. This removes the need to allocate memory
> > for removing a tracepoint, as the allocation for removals will already have
> > been done.  
> 
> I don't see how this can work reliably. AFAIU, with RCU, approaches
> requiring a pre-allocation of twice the size and swapping to the alternate
> memory area on removal falls apart whenever you remove 2 or more elements
> back-to-back without waiting for a grace period.

Good point ;-)

> 
> How is this handled by your scheme ?

I believe we can detect this case using the "prio" part of extra element,
and force a rcu sync if there's back to back removals on the same
tracepoint. That case does not happen often, so I'm hoping nobody will
notice the slowdown with these syncs. I'll take a look at this.

Thanks for bringing that up.

-- Steve


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
  2021-01-27 18:07   ` Steven Rostedt
@ 2021-01-27 18:13     ` Mathieu Desnoyers
  2021-01-27 19:16       ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Mathieu Desnoyers @ 2021-01-27 18:13 UTC (permalink / raw)
  To: rostedt
  Cc: paulmck, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Kees Cook, Peter Zijlstra,
	Josh Poimboeuf, Alexey Kardashevskiy

----- On Jan 27, 2021, at 1:07 PM, rostedt rostedt@goodmis.org wrote:

> On Wed, 27 Jan 2021 13:00:46 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> > Instead of allocating a new array for removing a tracepoint, allocate twice
>> > the needed size when adding tracepoints to the array. On removing, use the
>> > second half of the allocated array. This removes the need to allocate memory
>> > for removing a tracepoint, as the allocation for removals will already have
>> > been done.
>> 
>> I don't see how this can work reliably. AFAIU, with RCU, approaches
>> requiring a pre-allocation of twice the size and swapping to the alternate
>> memory area on removal falls apart whenever you remove 2 or more elements
>> back-to-back without waiting for a grace period.
> 
> Good point ;-)
> 
>> 
>> How is this handled by your scheme ?
> 
> I believe we can detect this case using the "prio" part of extra element,
> and force a rcu sync if there's back to back removals on the same
> tracepoint. That case does not happen often, so I'm hoping nobody will
> notice the slowdown with these syncs. I'll take a look at this.
> 
> Thanks for bringing that up.

Requiring an RCU synchronize on element removal is quite intrusive, and can
be problematic if tracepoint removal is called from e.g. preempt-off context.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
  2021-01-27 18:13     ` Mathieu Desnoyers
@ 2021-01-27 19:16       ` Steven Rostedt
  2021-01-27 20:17         ` Mathieu Desnoyers
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2021-01-27 19:16 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paulmck, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Kees Cook, Peter Zijlstra,
	Josh Poimboeuf, Alexey Kardashevskiy

On Wed, 27 Jan 2021 13:13:22 -0500 (EST)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> > Thanks for bringing that up.  
> 
> Requiring an RCU synchronize on element removal is quite intrusive, and can
> be problematic if tracepoint removal is called from e.g. preempt-off context.

But how often do you remove more than one callback from the same
tracepoint? Or should I say, from a lot of tracepoints?

This will only synchronize for the following case:

 Add three callbacks to a single tracepoint.
 Remove the first one.
    <rcu callback to update the counters>
 Remove the second one
   <triggers a synchronization if the counters have not been finished
    updating>
 Remove the third one.
   <no synchronization needed, because it's being freed>

And we may be able to make this work in batch too.

More to come, but I really like this approach over the others because it
does not increase the size of the kernel for a failure that should never
happen in practice.

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
  2021-01-27 19:16       ` Steven Rostedt
@ 2021-01-27 20:17         ` Mathieu Desnoyers
  0 siblings, 0 replies; 7+ messages in thread
From: Mathieu Desnoyers @ 2021-01-27 20:17 UTC (permalink / raw)
  To: rostedt, Kees Cook
  Cc: paulmck, linux-kernel, Matt Mullins, Ingo Molnar,
	Alexei Starovoitov, Daniel Borkmann, Dmitry Vyukov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Peter Zijlstra, Josh Poimboeuf,
	Alexey Kardashevskiy

----- On Jan 27, 2021, at 2:16 PM, rostedt rostedt@goodmis.org wrote:

> On Wed, 27 Jan 2021 13:13:22 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> > Thanks for bringing that up.
>> 
>> Requiring an RCU synchronize on element removal is quite intrusive, and can
>> be problematic if tracepoint removal is called from e.g. preempt-off context.
> 
> But how often do you remove more than one callback from the same
> tracepoint? Or should I say, from a lot of tracepoints?
> 
> This will only synchronize for the following case:
> 
> Add three callbacks to a single tracepoint.
> Remove the first one.
>    <rcu callback to update the counters>
> Remove the second one
>   <triggers a synchronization if the counters have not been finished
>    updating>
> Remove the third one.
>   <no synchronization needed, because it's being freed>
> 
> And we may be able to make this work in batch too.
> 
> More to come, but I really like this approach over the others because it
> does not increase the size of the kernel for a failure that should never
> happen in practice.

My concern with introducing a scheme based on synchronize_rcu to handle out-of-memory
scenarios is not about how frequently synchronize_rcu will be called, but
about the added complexity this adds to the tracepoints insertion/removal.
This has chances to explode in unlikely scenarios, which will become harder to test
for. This will also introduce constraints on which kernel context can perform
tracepoint removal.

I notice that the error report which leads to this v4 is against v2 of the patch [1],
which has known issues. I wonder whether there are any such issues with v3, which is
using a function stub ? [2]

The main concern I had about v3 of the patch was that the prototype of the
stub (void argument list) does not match the prototype of the called function. As
discussed in [2], there are other scenarios where the kernel expects this to work,
based on the expectation that all architectures supported by the Linux kernel have
a calling convention where the caller performs the clean-up.

So I would prefer the approach taken in v3 rather than adding the kind of complexity
introduced in v4. Ideally we should document, near the stub function, that this
expects the calling convention to have the caller perform the clean-up (cdecl family),
and that the returned type (void) of the function always match. It's only the arguments
which may differ.

There is still one thing that I'm not 100% sure about. It's related to this comment
from Kees Cook [3], hinting that in a CFI build the function prototype of the call
site and the function need to match. So do we need extra annotation of the stub function
to make this work in a CFI build ?

Thanks,

Mathieu

[1] https://lore.kernel.org/bpf/20201117211836.54acaef2@oasis.local.home/
[2] https://lore.kernel.org/bpf/20201118093405.7a6d2290@gandalf.local.home/
[3] https://lore.kernel.org/bpf/202011171330.94C6BA7E93@keescook/
-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-27 20:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 17:39 [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure Steven Rostedt
2021-01-27 17:54 ` Steven Rostedt
2021-01-27 18:00 ` Mathieu Desnoyers
2021-01-27 18:07   ` Steven Rostedt
2021-01-27 18:13     ` Mathieu Desnoyers
2021-01-27 19:16       ` Steven Rostedt
2021-01-27 20:17         ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).