linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] trace/kprobe: Two fixes for kretprobes
@ 2021-06-14 18:03 Naveen N. Rao
  2021-06-14 18:03 ` [PATCH 1/2] trace/kprobe: Fix count of missed kretprobes in kprobe_profile Naveen N. Rao
  2021-06-14 18:03 ` [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive Naveen N. Rao
  0 siblings, 2 replies; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-14 18:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Peter Zijlstra, Steven Rostedt, Anton Blanchard

The first patch fixes accounting of missed kretprobes in kprobe_profile.  
The second patch removes limit on the maximum active kretprobe 
instances, when registering a kretprobe through tracefs.

- Naveen


Naveen N. Rao (2):
  trace/kprobe: Fix count of missed kretprobes in kprobe_profile
  trace/kprobe: Remove limit on kretprobe maxactive

 kernel/trace/trace_kprobe.c                           | 11 ++---------
 kernel/trace/trace_probe.h                            |  1 -
 .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc      |  1 -
 .../ftrace/test.d/kprobe/kretprobe_maxactive.tc       |  3 ---
 4 files changed, 2 insertions(+), 14 deletions(-)


base-commit: 0b42677e2e5d87c730ddc41544b289b88596738c
-- 
2.31.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] trace/kprobe: Fix count of missed kretprobes in kprobe_profile
  2021-06-14 18:03 [PATCH 0/2] trace/kprobe: Two fixes for kretprobes Naveen N. Rao
@ 2021-06-14 18:03 ` Naveen N. Rao
  2021-06-15  5:47   ` Masami Hiramatsu
  2021-06-14 18:03 ` [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive Naveen N. Rao
  1 sibling, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-14 18:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Peter Zijlstra, Steven Rostedt, Anton Blanchard

For a kretprobe, the miss count includes the number of times the probe
on function entry was missed, as well as the number of times we ran out
of kretprobe_instance structures due to maxactive being too low.

Fixes: cd7e7bd5e44718 ("tracing: Add kprobes event profiling interface")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 kernel/trace/trace_kprobe.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index ea6178cb5e334d..0475e2a6d0825e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1192,7 +1192,8 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
 	seq_printf(m, "  %-44s %15lu %15lu\n",
 		   trace_probe_name(&tk->tp),
 		   trace_kprobe_nhit(tk),
-		   tk->rp.kp.nmissed);
+		   trace_kprobe_is_return(tk) ? tk->rp.kp.nmissed + tk->rp.nmissed
+					      : tk->rp.kp.nmissed);
 
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-14 18:03 [PATCH 0/2] trace/kprobe: Two fixes for kretprobes Naveen N. Rao
  2021-06-14 18:03 ` [PATCH 1/2] trace/kprobe: Fix count of missed kretprobes in kprobe_profile Naveen N. Rao
@ 2021-06-14 18:03 ` Naveen N. Rao
  2021-06-15  9:35   ` Masami Hiramatsu
  1 sibling, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-14 18:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Peter Zijlstra, Steven Rostedt, Anton Blanchard

We currently limit maxactive for a kretprobe to 4096 when registering
the same through tracefs. The comment indicates that this is done so as
to keep list traversal reasonable. However, we don't ever iterate over
all kretprobe_instance structures. The core kprobes infrastructure also
imposes no such limitation.

Remove the limit from the tracefs interface. This limit is easy to hit
on large cpu machines when tracing functions that can sleep.

Reported-by: Anton Blanchard <anton@ozlabs.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 kernel/trace/trace_kprobe.c                               | 8 --------
 kernel/trace/trace_probe.h                                | 1 -
 .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc          | 1 -
 .../selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc | 3 ---
 4 files changed, 13 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 0475e2a6d0825e..b3e214980eed3d 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -21,7 +21,6 @@
 #include "trace_probe_tmpl.h"
 
 #define KPROBE_EVENT_SYSTEM "kprobes"
-#define KRETPROBE_MAXACTIVE_MAX 4096
 
 /* Kprobe early definition from command line */
 static char kprobe_boot_events_buf[COMMAND_LINE_SIZE] __initdata;
@@ -786,13 +785,6 @@ static int __trace_kprobe_create(int argc, const char *argv[])
 			trace_probe_log_err(1, BAD_MAXACT);
 			goto parse_error;
 		}
-		/* kretprobes instances are iterated over via a list. The
-		 * maximum should stay reasonable.
-		 */
-		if (maxactive > KRETPROBE_MAXACTIVE_MAX) {
-			trace_probe_log_err(1, MAXACT_TOO_BIG);
-			goto parse_error;
-		}
 	}
 
 	/* try to parse an address. if that fails, try to read the
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 227d518e5ba521..e331017dc086ed 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -389,7 +389,6 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(BAD_UPROBE_OFFS,	"Invalid uprobe offset"),		\
 	C(MAXACT_NO_KPROBE,	"Maxactive is not for kprobe"),		\
 	C(BAD_MAXACT,		"Invalid maxactive number"),		\
-	C(MAXACT_TOO_BIG,	"Maxactive is too big"),		\
 	C(BAD_PROBE_ADDR,	"Invalid probed address or symbol"),	\
 	C(BAD_RETPROBE,		"Retprobe address must be an function entry"), \
 	C(BAD_ADDR_SUFFIX,	"Invalid probed address suffix"), \
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
index fa928b431555ca..be3360a258bae8 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
@@ -10,7 +10,6 @@ check_error() { # command-with-error-pos-by-^
 if grep -q 'r\[maxactive\]' README; then
 check_error 'p^100 vfs_read'		# MAXACT_NO_KPROBE
 check_error 'r^1a111 vfs_read'		# BAD_MAXACT
-check_error 'r^100000 vfs_read'		# MAXACT_TOO_BIG
 fi
 
 check_error 'p ^non_exist_func'		# BAD_PROBE_ADDR (enoent)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
index 4f0b268c12332a..f57c95bfc5ed5a 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
@@ -6,9 +6,6 @@
 # Test if we successfully reject unknown messages
 if echo 'a:myprobeaccept inet_csk_accept' > kprobe_events; then false; else true; fi
 
-# Test if we successfully reject too big maxactive
-if echo 'r1000000:myprobeaccept inet_csk_accept' > kprobe_events; then false; else true; fi
-
 # Test if we successfully reject unparsable numbers for maxactive
 if echo 'r10fuzz:myprobeaccept inet_csk_accept' > kprobe_events; then false; else true; fi
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] trace/kprobe: Fix count of missed kretprobes in kprobe_profile
  2021-06-14 18:03 ` [PATCH 1/2] trace/kprobe: Fix count of missed kretprobes in kprobe_profile Naveen N. Rao
@ 2021-06-15  5:47   ` Masami Hiramatsu
  0 siblings, 0 replies; 17+ messages in thread
From: Masami Hiramatsu @ 2021-06-15  5:47 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: linux-kernel, Masami Hiramatsu, Peter Zijlstra, Steven Rostedt,
	Anton Blanchard

On Mon, 14 Jun 2021 23:33:28 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> For a kretprobe, the miss count includes the number of times the probe
> on function entry was missed, as well as the number of times we ran out
> of kretprobe_instance structures due to maxactive being too low.
> 
> Fixes: cd7e7bd5e44718 ("tracing: Add kprobes event profiling interface")
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Good catch!

> ---
>  kernel/trace/trace_kprobe.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index ea6178cb5e334d..0475e2a6d0825e 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1192,7 +1192,8 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
>  	seq_printf(m, "  %-44s %15lu %15lu\n",
>  		   trace_probe_name(&tk->tp),
>  		   trace_kprobe_nhit(tk),
> -		   tk->rp.kp.nmissed);
> +		   trace_kprobe_is_return(tk) ? tk->rp.kp.nmissed + tk->rp.nmissed
> +					      : tk->rp.kp.nmissed);

Can you add a static trace_kprobe_nmissed(tk) for wrapping this ?

Thank you,

>  
>  	return 0;
>  }
> -- 
> 2.31.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-14 18:03 ` [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive Naveen N. Rao
@ 2021-06-15  9:35   ` Masami Hiramatsu
  2021-06-15 17:41     ` Naveen N. Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2021-06-15  9:35 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: linux-kernel, Masami Hiramatsu, Peter Zijlstra, Steven Rostedt,
	Anton Blanchard

On Mon, 14 Jun 2021 23:33:29 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> We currently limit maxactive for a kretprobe to 4096 when registering
> the same through tracefs. The comment indicates that this is done so as
> to keep list traversal reasonable. However, we don't ever iterate over
> all kretprobe_instance structures. The core kprobes infrastructure also
> imposes no such limitation.
> 
> Remove the limit from the tracefs interface. This limit is easy to hit
> on large cpu machines when tracing functions that can sleep.
> 
> Reported-by: Anton Blanchard <anton@ozlabs.org>
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

OK, but I don't like to just remove the limit (since it can cause
memory shortage easily.)
Can't we make it configurable? I don't mean Kconfig, but 
tracefs/options/kretprobe_maxactive, or kprobes's debugfs knob.

Hmm, maybe debugfs/kprobes/kretprobe_maxactive will be better since
it can limit both trace_kprobe and kprobes itself.

Let me fix that.

Thank you,

> ---
>  kernel/trace/trace_kprobe.c                               | 8 --------
>  kernel/trace/trace_probe.h                                | 1 -
>  .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc          | 1 -
>  .../selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc | 3 ---
>  4 files changed, 13 deletions(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 0475e2a6d0825e..b3e214980eed3d 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -21,7 +21,6 @@
>  #include "trace_probe_tmpl.h"
>  
>  #define KPROBE_EVENT_SYSTEM "kprobes"
> -#define KRETPROBE_MAXACTIVE_MAX 4096
>  
>  /* Kprobe early definition from command line */
>  static char kprobe_boot_events_buf[COMMAND_LINE_SIZE] __initdata;
> @@ -786,13 +785,6 @@ static int __trace_kprobe_create(int argc, const char *argv[])
>  			trace_probe_log_err(1, BAD_MAXACT);
>  			goto parse_error;
>  		}
> -		/* kretprobes instances are iterated over via a list. The
> -		 * maximum should stay reasonable.
> -		 */
> -		if (maxactive > KRETPROBE_MAXACTIVE_MAX) {
> -			trace_probe_log_err(1, MAXACT_TOO_BIG);
> -			goto parse_error;
> -		}
>  	}
>  
>  	/* try to parse an address. if that fails, try to read the
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 227d518e5ba521..e331017dc086ed 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -389,7 +389,6 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
>  	C(BAD_UPROBE_OFFS,	"Invalid uprobe offset"),		\
>  	C(MAXACT_NO_KPROBE,	"Maxactive is not for kprobe"),		\
>  	C(BAD_MAXACT,		"Invalid maxactive number"),		\
> -	C(MAXACT_TOO_BIG,	"Maxactive is too big"),		\
>  	C(BAD_PROBE_ADDR,	"Invalid probed address or symbol"),	\
>  	C(BAD_RETPROBE,		"Retprobe address must be an function entry"), \
>  	C(BAD_ADDR_SUFFIX,	"Invalid probed address suffix"), \
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
> index fa928b431555ca..be3360a258bae8 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
> @@ -10,7 +10,6 @@ check_error() { # command-with-error-pos-by-^
>  if grep -q 'r\[maxactive\]' README; then
>  check_error 'p^100 vfs_read'		# MAXACT_NO_KPROBE
>  check_error 'r^1a111 vfs_read'		# BAD_MAXACT
> -check_error 'r^100000 vfs_read'		# MAXACT_TOO_BIG
>  fi
>  
>  check_error 'p ^non_exist_func'		# BAD_PROBE_ADDR (enoent)
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
> index 4f0b268c12332a..f57c95bfc5ed5a 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
> @@ -6,9 +6,6 @@
>  # Test if we successfully reject unknown messages
>  if echo 'a:myprobeaccept inet_csk_accept' > kprobe_events; then false; else true; fi
>  
> -# Test if we successfully reject too big maxactive
> -if echo 'r1000000:myprobeaccept inet_csk_accept' > kprobe_events; then false; else true; fi
> -
>  # Test if we successfully reject unparsable numbers for maxactive
>  if echo 'r10fuzz:myprobeaccept inet_csk_accept' > kprobe_events; then false; else true; fi
>  
> -- 
> 2.31.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-15  9:35   ` Masami Hiramatsu
@ 2021-06-15 17:41     ` Naveen N. Rao
  2021-06-16  0:46       ` Masami Hiramatsu
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-15 17:41 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Anton Blanchard, linux-kernel, Peter Zijlstra, Steven Rostedt

Masami Hiramatsu wrote:
> On Mon, 14 Jun 2021 23:33:29 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> We currently limit maxactive for a kretprobe to 4096 when registering
>> the same through tracefs. The comment indicates that this is done so as
>> to keep list traversal reasonable. However, we don't ever iterate over
>> all kretprobe_instance structures. The core kprobes infrastructure also
>> imposes no such limitation.
>> 
>> Remove the limit from the tracefs interface. This limit is easy to hit
>> on large cpu machines when tracing functions that can sleep.
>> 
>> Reported-by: Anton Blanchard <anton@ozlabs.org>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> 
> OK, but I don't like to just remove the limit (since it can cause
> memory shortage easily.)
> Can't we make it configurable? I don't mean Kconfig, but 
> tracefs/options/kretprobe_maxactive, or kprobes's debugfs knob.
> 
> Hmm, maybe debugfs/kprobes/kretprobe_maxactive will be better since
> it can limit both trace_kprobe and kprobes itself.

I don't think it is good to put a new tunable in debugfs -- we don't 
have any kprobes tunable there, so this adds a dependency on debugfs 
which shouldn't be necessary.

/proc/sys/debug/ may be a better fit since we have the 
kprobes-optimization flag to disable optprobes there, though I'm not 
sure if a new sysfs file is agreeable.


But, I'm not too sure this really is a problem. Maxactive is a user 
_opt-in_ feature which needs to be explicitly added to an event 
definition. In that sense, isn't this already a tunable?


- Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-15 17:41     ` Naveen N. Rao
@ 2021-06-16  0:46       ` Masami Hiramatsu
  2021-06-16  1:03         ` Steven Rostedt
  2021-06-17 16:19         ` Naveen N. Rao
  0 siblings, 2 replies; 17+ messages in thread
From: Masami Hiramatsu @ 2021-06-16  0:46 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Anton Blanchard, linux-kernel, Peter Zijlstra, Steven Rostedt

On Tue, 15 Jun 2021 23:11:27 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> Masami Hiramatsu wrote:
> > On Mon, 14 Jun 2021 23:33:29 +0530
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> > 
> >> We currently limit maxactive for a kretprobe to 4096 when registering
> >> the same through tracefs. The comment indicates that this is done so as
> >> to keep list traversal reasonable. However, we don't ever iterate over
> >> all kretprobe_instance structures. The core kprobes infrastructure also
> >> imposes no such limitation.
> >> 
> >> Remove the limit from the tracefs interface. This limit is easy to hit
> >> on large cpu machines when tracing functions that can sleep.
> >> 
> >> Reported-by: Anton Blanchard <anton@ozlabs.org>
> >> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> > 
> > OK, but I don't like to just remove the limit (since it can cause
> > memory shortage easily.)
> > Can't we make it configurable? I don't mean Kconfig, but 
> > tracefs/options/kretprobe_maxactive, or kprobes's debugfs knob.
> > 
> > Hmm, maybe debugfs/kprobes/kretprobe_maxactive will be better since
> > it can limit both trace_kprobe and kprobes itself.
> 
> I don't think it is good to put a new tunable in debugfs -- we don't 
> have any kprobes tunable there, so this adds a dependency on debugfs 
> which shouldn't be necessary.
> 
> /proc/sys/debug/ may be a better fit since we have the 
> kprobes-optimization flag to disable optprobes there, though I'm not 
> sure if a new sysfs file is agreeable.

Indeed.

> But, I'm not too sure this really is a problem. Maxactive is a user 
> _opt-in_ feature which needs to be explicitly added to an event 
> definition. In that sense, isn't this already a tunable?

Let me explain the background of the limiation.

Maxactive is currently no limit for the kprobe kernel module API,
because the kernel module developer must take care of the max memory
usage (and they can).

But the tracefs user may NOT have enough information about what
happens if they pass something like 10M for maxactive (it will consume
around 500MB kernel memory for one kretprobe).

To avoid such trouble, I had set the 4096 limitation for the maxactive
parameter. Of course 4096 may not enough for some use-cases. I'm welcome
to expand it (e.g. 32k, isn't it enough?), but removing the limitation
may cause OOM trouble easily.

Thank you,

> 
> 
> - Naveen
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-16  0:46       ` Masami Hiramatsu
@ 2021-06-16  1:03         ` Steven Rostedt
  2021-06-16  2:27           ` Masami Hiramatsu
  2021-06-17 16:19         ` Naveen N. Rao
  1 sibling, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2021-06-16  1:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Naveen N. Rao, Anton Blanchard, linux-kernel, Peter Zijlstra

On Wed, 16 Jun 2021 09:46:22 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> To avoid such trouble, I had set the 4096 limitation for the maxactive
> parameter. Of course 4096 may not enough for some use-cases. I'm welcome
> to expand it (e.g. 32k, isn't it enough?), but removing the limitation
> may cause OOM trouble easily.

What if you just made the max as 10 * number of possible cpus, or 4096,
which ever is greater? Why would a user need more?

I'd still like to get a wrapper around function graph tracing so that
kretprobes could use it. I think that would get rid of the requirement
of maxactive, because isn't that just used to have a way to know the
original return value?

-- Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-16  1:03         ` Steven Rostedt
@ 2021-06-16  2:27           ` Masami Hiramatsu
  2021-06-16 15:10             ` Masami Hiramatsu
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2021-06-16  2:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Naveen N. Rao, Anton Blanchard, linux-kernel, Peter Zijlstra

On Tue, 15 Jun 2021 21:03:51 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 16 Jun 2021 09:46:22 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > To avoid such trouble, I had set the 4096 limitation for the maxactive
> > parameter. Of course 4096 may not enough for some use-cases. I'm welcome
> > to expand it (e.g. 32k, isn't it enough?), but removing the limitation
> > may cause OOM trouble easily.
> 
> What if you just made the max as 10 * number of possible cpus, or 4096,
> which ever is greater? Why would a user need more?

It could be. But actually, that is not correct number because the
number of instances depends on the number of processes and the possiblity
of recursive. Thus the huge system which runs more than 64k processes,
may need more than that.

> I'd still like to get a wrapper around function graph tracing so that
> kretprobes could use it. I think that would get rid of the requirement
> of maxactive, because isn't that just used to have a way to know the
> original return value?

Hmm, yes, on some arch, it can be done. But on other arch we still need
current implementation for generic solution.
What I need is not fully wrapped by the function graph, but just share
the per-task (software) shadow stack.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-16  2:27           ` Masami Hiramatsu
@ 2021-06-16 15:10             ` Masami Hiramatsu
  2021-06-17 16:34               ` Naveen N. Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2021-06-16 15:10 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Naveen N. Rao, Anton Blanchard, linux-kernel,
	Peter Zijlstra

On Wed, 16 Jun 2021 11:27:11 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Tue, 15 Jun 2021 21:03:51 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Wed, 16 Jun 2021 09:46:22 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > 
> > > To avoid such trouble, I had set the 4096 limitation for the maxactive
> > > parameter. Of course 4096 may not enough for some use-cases. I'm welcome
> > > to expand it (e.g. 32k, isn't it enough?), but removing the limitation
> > > may cause OOM trouble easily.
> > 
> > What if you just made the max as 10 * number of possible cpus, or 4096,
> > which ever is greater? Why would a user need more?
> 
> It could be. But actually, that is not correct number because the
> number of instances depends on the number of processes and the possiblity
> of recursive. Thus the huge system which runs more than 64k processes,
> may need more than that.
> 
> > I'd still like to get a wrapper around function graph tracing so that
> > kretprobes could use it. I think that would get rid of the requirement
> > of maxactive, because isn't that just used to have a way to know the
> > original return value?
> 
> Hmm, yes, on some arch, it can be done. But on other arch we still need
> current implementation for generic solution.
> What I need is not fully wrapped by the function graph, but just share
> the per-task (software) shadow stack.

BTW, I have 2 ideas to fix this except for wrapper.

1. Use func-graph tracer API directly from dynamic event instead of
  kretprobes. This will be enabled only if the arch supports fgraph
  tracer and enable it. maxactive will be ignored if this is enabled,
  and tracefs user may not need except for the return value 
  (BTW, is that possible to access the stack? In some case, return
  value can be passed via stack)

2. Move the kretprobe instance pool from kretprobe to struct task.
  This pool will allocates one page per task, and shared among all
  kretprobes. This pool will be allocated when the 1st kretprobe
  is registered. maxactive will be kept for someone who wants to
  use per-instance data. But since dynamic event doesn't use it,
  it will be removed from tracefs and perf.

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-16  0:46       ` Masami Hiramatsu
  2021-06-16  1:03         ` Steven Rostedt
@ 2021-06-17 16:19         ` Naveen N. Rao
  2021-06-18  6:17           ` Masami Hiramatsu
  1 sibling, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-17 16:19 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Anton Blanchard, linux-kernel, Peter Zijlstra, Steven Rostedt

Masami Hiramatsu wrote:
> On Tue, 15 Jun 2021 23:11:27 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> Masami Hiramatsu wrote:
>> > On Mon, 14 Jun 2021 23:33:29 +0530
>> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
>> > 
>> >> We currently limit maxactive for a kretprobe to 4096 when registering
>> >> the same through tracefs. The comment indicates that this is done so as
>> >> to keep list traversal reasonable. However, we don't ever iterate over
>> >> all kretprobe_instance structures. The core kprobes infrastructure also
>> >> imposes no such limitation.
>> >> 
>> >> Remove the limit from the tracefs interface. This limit is easy to hit
>> >> on large cpu machines when tracing functions that can sleep.
>> >> 
>> >> Reported-by: Anton Blanchard <anton@ozlabs.org>
>> >> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> > 
>> > OK, but I don't like to just remove the limit (since it can cause
>> > memory shortage easily.)
>> > Can't we make it configurable? I don't mean Kconfig, but 
>> > tracefs/options/kretprobe_maxactive, or kprobes's debugfs knob.
>> > 
>> > Hmm, maybe debugfs/kprobes/kretprobe_maxactive will be better since
>> > it can limit both trace_kprobe and kprobes itself.
>> 
>> I don't think it is good to put a new tunable in debugfs -- we don't 
>> have any kprobes tunable there, so this adds a dependency on debugfs 
>> which shouldn't be necessary.
>> 
>> /proc/sys/debug/ may be a better fit since we have the 
>> kprobes-optimization flag to disable optprobes there, though I'm not 
>> sure if a new sysfs file is agreeable.
> 
> Indeed.
> 
>> But, I'm not too sure this really is a problem. Maxactive is a user 
>> _opt-in_ feature which needs to be explicitly added to an event 
>> definition. In that sense, isn't this already a tunable?
> 
> Let me explain the background of the limiation.

Thanks for the background on this.

> 
> Maxactive is currently no limit for the kprobe kernel module API,
> because the kernel module developer must take care of the max memory
> usage (and they can).
> 
> But the tracefs user may NOT have enough information about what
> happens if they pass something like 10M for maxactive (it will consume
> around 500MB kernel memory for one kretprobe).

Ok, thinking more about this...

Right now, the only way for a user to notice that kretprobe maxactive is 
an issue is by looking at kprobe_profile.  This is not even possible if 
using a bcc tool, which uses perf_event_open().  It took the reporting 
team some effort to even identify that the reason why they were getting 
weird results when tracing was due to the default value used for 
kretprobe maxactive; and then that 4096 was the hard limit through 
tracefs.

So, IMO, anyone using any existing bcc tool, or a pre-canned perf script 
will not even be able to identify this as a problem to begin with... at 
least, not without some effort.

To address this, as a first step, we should probably consider parsing 
kprobe_profile and printing a warning with 'perf' if we detect a 
non-zero miss count for a probe -- both a regular probe, as well as a 
retprobe.

If we do this, the nice thing with kprobe_profile is that the probe miss 
count is available, and can serve as a good way to decide what a more 
reasonable maxactive value should be. This should help prevent users 
from trying with arbitrary maxactive values.

For perf_event_open(), perhaps we can introduce an ioctl to query the 
probe miss count.

> 
> To avoid such trouble, I had set the 4096 limitation for the maxactive
> parameter. Of course 4096 may not enough for some use-cases. I'm welcome
> to expand it (e.g. 32k, isn't it enough?), but removing the limitation
> may cause OOM trouble easily.

Do you have suggestions for how we can determine a better limit? As you 
point out in the other email, there could very well be 64k or more 
processes on a large machine. Since the primary concern is memory usage, 
we probably need to decide this based on total memory. But, memory usage 
will vary depending on system load...

Perhaps we can start by making maxactive limit be a tunable with a 
default value of 4096, with the understanding that users will be careful 
when bumping up this value. Hopefully, scripts won't simply start 
writing into this file ;)

If we can feed back the probe miss count, tools should be able to guide 
users on what would be a reasonable maxactive value to use.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-16 15:10             ` Masami Hiramatsu
@ 2021-06-17 16:34               ` Naveen N. Rao
  2021-06-17 17:07                 ` Steven Rostedt
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-17 16:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Anton Blanchard, linux-kernel, Peter Zijlstra, Steven Rostedt

Masami Hiramatsu wrote:
> On Wed, 16 Jun 2021 11:27:11 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
>> On Tue, 15 Jun 2021 21:03:51 -0400
>> Steven Rostedt <rostedt@goodmis.org> wrote:
>> 
>> > On Wed, 16 Jun 2021 09:46:22 +0900
>> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
>> > 
>> > > To avoid such trouble, I had set the 4096 limitation for the maxactive
>> > > parameter. Of course 4096 may not enough for some use-cases. I'm welcome
>> > > to expand it (e.g. 32k, isn't it enough?), but removing the limitation
>> > > may cause OOM trouble easily.
>> > 
>> > What if you just made the max as 10 * number of possible cpus, or 4096,
>> > which ever is greater? Why would a user need more?
>> 
>> It could be. But actually, that is not correct number because the
>> number of instances depends on the number of processes and the possiblity
>> of recursive. Thus the huge system which runs more than 64k processes,
>> may need more than that.
>> 
>> > I'd still like to get a wrapper around function graph tracing so that
>> > kretprobes could use it. I think that would get rid of the requirement
>> > of maxactive, because isn't that just used to have a way to know the
>> > original return value?
>> 
>> Hmm, yes, on some arch, it can be done. But on other arch we still need
>> current implementation for generic solution.
>> What I need is not fully wrapped by the function graph, but just share
>> the per-task (software) shadow stack.
> 
> BTW, I have 2 ideas to fix this except for wrapper.
> 
> 1. Use func-graph tracer API directly from dynamic event instead of
>   kretprobes. This will be enabled only if the arch supports fgraph
>   tracer and enable it. maxactive will be ignored if this is enabled,
>   and tracefs user may not need except for the return value 
>   (BTW, is that possible to access the stack? In some case, return
>   value can be passed via stack)
> 
> 2. Move the kretprobe instance pool from kretprobe to struct task.
>   This pool will allocates one page per task, and shared among all
>   kretprobes. This pool will be allocated when the 1st kretprobe
>   is registered. maxactive will be kept for someone who wants to
>   use per-instance data. But since dynamic event doesn't use it,
>   it will be removed from tracefs and perf.

Won't this result in _more_ memory usage compared to what we have now?

Thanks,
Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-17 16:34               ` Naveen N. Rao
@ 2021-06-17 17:07                 ` Steven Rostedt
  2021-06-18  4:26                   ` Masami Hiramatsu
  0 siblings, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2021-06-17 17:07 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Masami Hiramatsu, Anton Blanchard, linux-kernel, Peter Zijlstra

On Thu, 17 Jun 2021 22:04:34 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> > 2. Move the kretprobe instance pool from kretprobe to struct task.
> >   This pool will allocates one page per task, and shared among all
> >   kretprobes. This pool will be allocated when the 1st kretprobe
> >   is registered. maxactive will be kept for someone who wants to
> >   use per-instance data. But since dynamic event doesn't use it,
> >   it will be removed from tracefs and perf.  
> 
> Won't this result in _more_ memory usage compared to what we have now?

Maybe or maybe not. At least with this approach (or the function graph
one), you will allocate enough for the environment involved. If there's
thousands of tasks, then yes, it will allocate more memory. But if you are
running thousands of tasks, you should have a lot of memory in the machine.

If you are only running a few tasks, it will be less than the current
approach.

-- Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-17 17:07                 ` Steven Rostedt
@ 2021-06-18  4:26                   ` Masami Hiramatsu
  2021-06-18  8:41                     ` Naveen N. Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  4:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Naveen N. Rao, Masami Hiramatsu, Anton Blanchard, linux-kernel,
	Peter Zijlstra

On Thu, 17 Jun 2021 13:07:13 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 17 Jun 2021 22:04:34 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
> > > 2. Move the kretprobe instance pool from kretprobe to struct task.
> > >   This pool will allocates one page per task, and shared among all
> > >   kretprobes. This pool will be allocated when the 1st kretprobe
> > >   is registered. maxactive will be kept for someone who wants to
> > >   use per-instance data. But since dynamic event doesn't use it,
> > >   it will be removed from tracefs and perf.  
> > 
> > Won't this result in _more_ memory usage compared to what we have now?
> 
> Maybe or maybe not. At least with this approach (or the function graph
> one), you will allocate enough for the environment involved. If there's
> thousands of tasks, then yes, it will allocate more memory. But if you are
> running thousands of tasks, you should have a lot of memory in the machine.
> 
> If you are only running a few tasks, it will be less than the current
> approach.

Right, this depends on how many tasks you are running on your machine.
Anyway, since you may not sure how much maxactive is enough, you will
set maxactive high, then it can consume more than that. Of course you
can optimize by trial and error. But that does not guarantee all cases,
because the number of tasks can be increased while tracing. You might
need to re-configure it by checking the nmissed count again.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-17 16:19         ` Naveen N. Rao
@ 2021-06-18  6:17           ` Masami Hiramatsu
  2021-06-18 13:19             ` Naveen N. Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2021-06-18  6:17 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Anton Blanchard, linux-kernel, Peter Zijlstra, Steven Rostedt

On Thu, 17 Jun 2021 21:49:36 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> Masami Hiramatsu wrote:
> > On Tue, 15 Jun 2021 23:11:27 +0530
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> > 
> >> Masami Hiramatsu wrote:
> >> > On Mon, 14 Jun 2021 23:33:29 +0530
> >> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> >> > 
> >> >> We currently limit maxactive for a kretprobe to 4096 when registering
> >> >> the same through tracefs. The comment indicates that this is done so as
> >> >> to keep list traversal reasonable. However, we don't ever iterate over
> >> >> all kretprobe_instance structures. The core kprobes infrastructure also
> >> >> imposes no such limitation.
> >> >> 
> >> >> Remove the limit from the tracefs interface. This limit is easy to hit
> >> >> on large cpu machines when tracing functions that can sleep.
> >> >> 
> >> >> Reported-by: Anton Blanchard <anton@ozlabs.org>
> >> >> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> >> > 
> >> > OK, but I don't like to just remove the limit (since it can cause
> >> > memory shortage easily.)
> >> > Can't we make it configurable? I don't mean Kconfig, but 
> >> > tracefs/options/kretprobe_maxactive, or kprobes's debugfs knob.
> >> > 
> >> > Hmm, maybe debugfs/kprobes/kretprobe_maxactive will be better since
> >> > it can limit both trace_kprobe and kprobes itself.
> >> 
> >> I don't think it is good to put a new tunable in debugfs -- we don't 
> >> have any kprobes tunable there, so this adds a dependency on debugfs 
> >> which shouldn't be necessary.
> >> 
> >> /proc/sys/debug/ may be a better fit since we have the 
> >> kprobes-optimization flag to disable optprobes there, though I'm not 
> >> sure if a new sysfs file is agreeable.
> > 
> > Indeed.
> > 
> >> But, I'm not too sure this really is a problem. Maxactive is a user 
> >> _opt-in_ feature which needs to be explicitly added to an event 
> >> definition. In that sense, isn't this already a tunable?
> > 
> > Let me explain the background of the limiation.
> 
> Thanks for the background on this.
> 
> > 
> > Maxactive is currently no limit for the kprobe kernel module API,
> > because the kernel module developer must take care of the max memory
> > usage (and they can).
> > 
> > But the tracefs user may NOT have enough information about what
> > happens if they pass something like 10M for maxactive (it will consume
> > around 500MB kernel memory for one kretprobe).
> 
> Ok, thinking more about this...
> 
> Right now, the only way for a user to notice that kretprobe maxactive is 
> an issue is by looking at kprobe_profile.  This is not even possible if 
> using a bcc tool, which uses perf_event_open().  It took the reporting 
> team some effort to even identify that the reason why they were getting 
> weird results when tracing was due to the default value used for 
> kretprobe maxactive; and then that 4096 was the hard limit through 
> tracefs.
> 
> So, IMO, anyone using any existing bcc tool, or a pre-canned perf script 
> will not even be able to identify this as a problem to begin with... at 
> least, not without some effort.

Yeah, the nmissed counter must be exposed in that case via tracefs or
debugfs. Maybe ebpf can also warn it (by checking nmissed count). 


> To address this, as a first step, we should probably consider parsing 
> kprobe_profile and printing a warning with 'perf' if we detect a 
> non-zero miss count for a probe -- both a regular probe, as well as a 
> retprobe.

Yeah, it is doable. Note that perf-probe only set up the event and
perf-trace or other commands will use it.


> If we do this, the nice thing with kprobe_profile is that the probe miss 
> count is available, and can serve as a good way to decide what a more 
> reasonable maxactive value should be. This should help prevent users 
> from trying with arbitrary maxactive values.

Such feedback loop is an interesting idea.
Note that nmissed count is an accumulate value, not the max number of
the instance which will be needed.

> For perf_event_open(), perhaps we can introduce an ioctl to query the 
> probe miss count.

Or, maybe we can expand the maxactive in runtime. e.g. add a shortage
counter on the kretprobe, and run a monitor kernel thread (or kworker).
If the shortage counter is incremented, the monitor allocates instances
(2x counter) and give it to the kretprobe. And it resets the shortage
counter. This adaptive maxactive may cause mis-hit in the beginning,
but finally find the optimal maxactive value automatically.


> > To avoid such trouble, I had set the 4096 limitation for the maxactive
> > parameter. Of course 4096 may not enough for some use-cases. I'm welcome
> > to expand it (e.g. 32k, isn't it enough?), but removing the limitation
> > may cause OOM trouble easily.
> 
> Do you have suggestions for how we can determine a better limit? As you 
> point out in the other email, there could very well be 64k or more 
> processes on a large machine. Since the primary concern is memory usage, 
> we probably need to decide this based on total memory. But, memory usage 
> will vary depending on system load...

This is very good question. IMHO, it might better to calculate the total
maxactive from the system memory size. For example, 1% of system memory
can be used for the kretprobes, 16GB system will allow using 160MB for
kretprobes, which means about "30M" is the max number of maxactive, or
multiple kretprobes can share it. Doesn't it sound enough? Of course
this will need to show the current usage of the kretprobe instance objects
via tracefs or debugfs. But this total cap seems reasonable for me to
avoid OOM trouble.

> Perhaps we can start by making maxactive limit be a tunable with a 
> default value of 4096, with the understanding that users will be careful 
> when bumping up this value. Hopefully, scripts won't simply start 
> writing into this file ;)

Yeah, that's what I suggested at first, because the best maxactive will
depend on the max number of the *processes* and the probed function.

If the probed function will NOT be preempted or slept, maxactive will be
the number of *processor cores*. Or, if it can be preempted or slept, it
will be the max number of *processes*. If the probed function can
recursively called (Note: this is rare case), the maxactive has to
be multiplied.

It is hard to estimate the max number of processes, since it depends
on the system. Small embedded systems don't run thousands of processes,
but big servers will run more than ten thousands of processes.
Thus make it tunable will be a good idea.

Thank you,

> 
> If we can feed back the probe miss count, tools should be able to guide 
> users on what would be a reasonable maxactive value to use.
> 
> 
> Thanks,
> Naveen
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-18  4:26                   ` Masami Hiramatsu
@ 2021-06-18  8:41                     ` Naveen N. Rao
  0 siblings, 0 replies; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-18  8:41 UTC (permalink / raw)
  To: Masami Hiramatsu, Steven Rostedt
  Cc: Anton Blanchard, linux-kernel, Peter Zijlstra

Masami Hiramatsu wrote:
> On Thu, 17 Jun 2021 13:07:13 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> On Thu, 17 Jun 2021 22:04:34 +0530
>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
>> 
>> > > 2. Move the kretprobe instance pool from kretprobe to struct task.
>> > >   This pool will allocates one page per task, and shared among all
>> > >   kretprobes. This pool will be allocated when the 1st kretprobe
>> > >   is registered. maxactive will be kept for someone who wants to
>> > >   use per-instance data. But since dynamic event doesn't use it,
>> > >   it will be removed from tracefs and perf.  
>> > 
>> > Won't this result in _more_ memory usage compared to what we have now?
>> 
>> Maybe or maybe not. At least with this approach (or the function graph
>> one), you will allocate enough for the environment involved. If there's
>> thousands of tasks, then yes, it will allocate more memory. But if you are
>> running thousands of tasks, you should have a lot of memory in the machine.
>> 
>> If you are only running a few tasks, it will be less than the current
>> approach.
> 
> Right, this depends on how many tasks you are running on your machine.
> Anyway, since you may not sure how much maxactive is enough, you will
> set maxactive high, then it can consume more than that. Of course you
> can optimize by trial and error. But that does not guarantee all cases,
> because the number of tasks can be increased while tracing. You might
> need to re-configure it by checking the nmissed count again.

Yes. If we go down this route, we should limit the per-task allocation 
to a more reasonable 4k -- powerpc uses 64k pages.

Thanks,
Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive
  2021-06-18  6:17           ` Masami Hiramatsu
@ 2021-06-18 13:19             ` Naveen N. Rao
  0 siblings, 0 replies; 17+ messages in thread
From: Naveen N. Rao @ 2021-06-18 13:19 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Anton Blanchard, linux-kernel, Peter Zijlstra, Steven Rostedt

Masami Hiramatsu wrote:
> 
>> To address this, as a first step, we should probably consider parsing 
>> kprobe_profile and printing a warning with 'perf' if we detect a 
>> non-zero miss count for a probe -- both a regular probe, as well as a 
>> retprobe.
> 
> Yeah, it is doable. Note that perf-probe only set up the event and
> perf-trace or other commands will use it.
> 
> 
>> If we do this, the nice thing with kprobe_profile is that the probe miss 
>> count is available, and can serve as a good way to decide what a more 
>> reasonable maxactive value should be. This should help prevent users 
>> from trying with arbitrary maxactive values.
> 
> Such feedback loop is an interesting idea.
> Note that nmissed count is an accumulate value, not the max number of
> the instance which will be needed.

Yes, we will have to factor-in the duration during which the event was 
active. This will still be an approximation, but serves as a good 
starting point. It may need a few tries to get this right, but more
importantly, the user knows instantly that there are missed probes.

> 
>> For perf_event_open(), perhaps we can introduce an ioctl to query the 
>> probe miss count.
> 
> Or, maybe we can expand the maxactive in runtime. e.g. add a shortage
> counter on the kretprobe, and run a monitor kernel thread (or kworker).
> If the shortage counter is incremented, the monitor allocates instances
> (2x counter) and give it to the kretprobe. And it resets the shortage
> counter. This adaptive maxactive may cause mis-hit in the beginning,
> but finally find the optimal maxactive value automatically.

I like this idea and I have been thinking along these lines too. If we 
start with a better default (rather than just num_possible_cpus() used 
today), I suspect we may be able to get this to work well enough to not 
have to miss any probes. Specifying 'maxactive' can still serve as a 
workaround to allocate a larger initial set of kretprobe_instances in 
case this doesn't work.

> 
> 
>> > To avoid such trouble, I had set the 4096 limitation for the maxactive
>> > parameter. Of course 4096 may not enough for some use-cases. I'm 
>> > welcome
>> > to expand it (e.g. 32k, isn't it enough?), but removing the limitation
>> > may cause OOM trouble easily.
>> 
>> Do you have suggestions for how we can determine a better limit? As you 
>> point out in the other email, there could very well be 64k or more 
>> processes on a large machine. Since the primary concern is memory usage, 
>> we probably need to decide this based on total memory. But, memory usage 
>> will vary depending on system load...
> 
> This is very good question. IMHO, it might better to calculate the total
> maxactive from the system memory size. For example, 1% of system memory
> can be used for the kretprobes, 16GB system will allow using 160MB for
> kretprobes, which means about "30M" is the max number of maxactive, or
> multiple kretprobes can share it. Doesn't it sound enough? Of course
> this will need to show the current usage of the kretprobe instance objects
> via tracefs or debugfs. But this total cap seems reasonable for me to
> avoid OOM trouble.
> 
>> Perhaps we can start by making maxactive limit be a tunable with a 
>> default value of 4096, with the understanding that users will be careful 
>> when bumping up this value. Hopefully, scripts won't simply start 
>> writing into this file ;)
> 
> Yeah, that's what I suggested at first, because the best maxactive will
> depend on the max number of the *processes* and the probed function.
> 
> If the probed function will NOT be preempted or slept, maxactive will be
> the number of *processor cores*. Or, if it can be preempted or slept, it
> will be the max number of *processes*. If the probed function can
> recursively called (Note: this is rare case), the maxactive has to
> be multiplied.
> 
> It is hard to estimate the max number of processes, since it depends
> on the system. Small embedded systems don't run thousands of processes,
> but big servers will run more than ten thousands of processes.
> Thus make it tunable will be a good idea.

Agree.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-18 13:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14 18:03 [PATCH 0/2] trace/kprobe: Two fixes for kretprobes Naveen N. Rao
2021-06-14 18:03 ` [PATCH 1/2] trace/kprobe: Fix count of missed kretprobes in kprobe_profile Naveen N. Rao
2021-06-15  5:47   ` Masami Hiramatsu
2021-06-14 18:03 ` [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive Naveen N. Rao
2021-06-15  9:35   ` Masami Hiramatsu
2021-06-15 17:41     ` Naveen N. Rao
2021-06-16  0:46       ` Masami Hiramatsu
2021-06-16  1:03         ` Steven Rostedt
2021-06-16  2:27           ` Masami Hiramatsu
2021-06-16 15:10             ` Masami Hiramatsu
2021-06-17 16:34               ` Naveen N. Rao
2021-06-17 17:07                 ` Steven Rostedt
2021-06-18  4:26                   ` Masami Hiramatsu
2021-06-18  8:41                     ` Naveen N. Rao
2021-06-17 16:19         ` Naveen N. Rao
2021-06-18  6:17           ` Masami Hiramatsu
2021-06-18 13:19             ` Naveen N. Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).