All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] tracing: Have filters and histograms use a call table instead of pointers
@ 2022-08-23 21:46 Steven Rostedt
  2022-08-23 21:46 ` [PATCH 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Steven Rostedt @ 2022-08-23 21:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi

While looking at the histogram and filter code, I realized that it's filled
with function pointers. With retpolines causing a big slowdown, I thought
that was problematic. Thus, I decided to see what would happen if I changed
the function pointers into enums, and instead called a single function
that did a switch on those enums and called the necessary functions
directly. The results were pretty clear.

The first patch was to update the trace event benchmark event to include
a integer value "delta" of the delta that it took to complete
(it currently only shows the delta as part of a string). By doing
so, I could benchmark the histogram and filter logic with it.

Before this series, the histogram with a single filter (to ignore the
first event, which has a delta of zero), had:

# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#

{ delta:        129 } hitcount:       2213
{ delta:        130 } hitcount:     285965
{ delta:        131 } hitcount:    1146545
{ delta:        132 } hitcount:    5185432
{ delta:        133 } hitcount:   19896215
{ delta:        134 } hitcount:   53118616
{ delta:        135 } hitcount:   83816709
{ delta:        136 } hitcount:   68329562
{ delta:        137 } hitcount:   41859349
{ delta:        138 } hitcount:   46257797
{ delta:        139 } hitcount:   54400831
{ delta:        140 } hitcount:   72875007
{ delta:        141 } hitcount:   76193272
{ delta:        142 } hitcount:   49504263
{ delta:        143 } hitcount:   38821072
{ delta:        144 } hitcount:   47702679
{ delta:        145 } hitcount:   41357297
{ delta:        146 } hitcount:   22058238
{ delta:        147 } hitcount:    9720002
{ delta:        148 } hitcount:    3193542
{ delta:        149 } hitcount:     927030
{ delta:        150 } hitcount:     850772
{ delta:        151 } hitcount:    1477380
{ delta:        152 } hitcount:    2687977
{ delta:        153 } hitcount:    2865985
{ delta:        154 } hitcount:    1977492
{ delta:        155 } hitcount:    2475607
{ delta:        156 } hitcount:    3403612
{ delta:        157 } hitcount:    2264011
{ delta:        158 } hitcount:    1096214
{ delta:        159 } hitcount:     504653
{ delta:        160 } hitcount:     218869
{ delta:        161 } hitcount:     103246
[..]

Where the bulk was around 142ns, and the fastest time was 129ns.

After this series:

# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#

{ delta:        103 } hitcount:         60
{ delta:        104 } hitcount:      16966
{ delta:        105 } hitcount:     396625
{ delta:        106 } hitcount:    3223400
{ delta:        107 } hitcount:   12053754
{ delta:        108 } hitcount:   20241711
{ delta:        109 } hitcount:   14850200
{ delta:        110 } hitcount:    4946599
{ delta:        111 } hitcount:    3479315
{ delta:        112 } hitcount:   18698299
{ delta:        113 } hitcount:   62388733
{ delta:        114 } hitcount:   95803834
{ delta:        115 } hitcount:   58278130
{ delta:        116 } hitcount:   15364800
{ delta:        117 } hitcount:    5586866
{ delta:        118 } hitcount:    2346880
{ delta:        119 } hitcount:    1131091
{ delta:        120 } hitcount:     620896
{ delta:        121 } hitcount:     236652
{ delta:        122 } hitcount:     105957
{ delta:        123 } hitcount:     119107
{ delta:        124 } hitcount:      54494
{ delta:        125 } hitcount:      63856
{ delta:        126 } hitcount:      64454
{ delta:        127 } hitcount:      34818
{ delta:        128 } hitcount:      41446
{ delta:        129 } hitcount:      51242
{ delta:        130 } hitcount:      28361
{ delta:        131 } hitcount:      23926
{ delta:        132 } hitcount:      22253
{ delta:        133 } hitcount:      16994
{ delta:        134 } hitcount:      14970
{ delta:        135 } hitcount:      13464
{ delta:        136 } hitcount:      11452
{ delta:        137 } hitcount:      12212
{ delta:        138 } hitcount:      12280
{ delta:        139 } hitcount:       9127
{ delta:        140 } hitcount:       9553

Where the bulk was around 114ns and the fast time was 103ns.

That's almost a 20% speedup!!!


Steven Rostedt (Google) (4):
      tracing: Add numeric delta time to the trace event benchmark
      tracing/hist: Call hist functions directly via a switch statement
      tracing: Move struct filter_pred into trace_events_filter.c
      tracing/filter: Call filter predicate functions directly via a switch statement

----
 kernel/trace/trace.h               |  13 --
 kernel/trace/trace_benchmark.c     |   2 +-
 kernel/trace/trace_benchmark.h     |   8 +-
 kernel/trace/trace_events_filter.c | 239 +++++++++++++++++++++++++-----------
 kernel/trace/trace_events_hist.c   | 244 +++++++++++++++++++++++++------------
 5 files changed, 342 insertions(+), 164 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] tracing: Add numeric delta time to the trace event benchmark
  2022-08-23 21:46 [PATCH 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
@ 2022-08-23 21:46 ` Steven Rostedt
  2022-08-23 21:46 ` [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2022-08-23 21:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

In order to testing filtering and histograms via the trace event
benchmark, record the delta time of the last event as a numeric value
(currently, it just saves it within the string) so that filters and
histograms can use it.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_benchmark.c | 2 +-
 kernel/trace/trace_benchmark.h | 8 +++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index 801c2a7f7605..54d5fa35c90a 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -51,7 +51,7 @@ static void trace_do_benchmark(void)
 
 	local_irq_disable();
 	start = trace_clock_local();
-	trace_benchmark_event(bm_str);
+	trace_benchmark_event(bm_str, bm_last);
 	stop = trace_clock_local();
 	local_irq_enable();
 
diff --git a/kernel/trace/trace_benchmark.h b/kernel/trace/trace_benchmark.h
index 79e6fbe5b365..c3e91060dc94 100644
--- a/kernel/trace/trace_benchmark.h
+++ b/kernel/trace/trace_benchmark.h
@@ -14,19 +14,21 @@ extern void trace_benchmark_unreg(void);
 
 TRACE_EVENT_FN(benchmark_event,
 
-	TP_PROTO(const char *str),
+	TP_PROTO(const char *str, u64 delta),
 
-	TP_ARGS(str),
+	TP_ARGS(str, delta),
 
 	TP_STRUCT__entry(
 		__array(	char,	str,	BENCHMARK_EVENT_STRLEN	)
+		__field(	u64,	delta)
 	),
 
 	TP_fast_assign(
 		memcpy(__entry->str, str, BENCHMARK_EVENT_STRLEN);
+		__entry->delta = delta;
 	),
 
-	TP_printk("%s", __entry->str),
+	TP_printk("%s delta=%llu", __entry->str, __entry->delta),
 
 	trace_benchmark_reg, trace_benchmark_unreg
 );
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement
  2022-08-23 21:46 [PATCH 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
  2022-08-23 21:46 ` [PATCH 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
@ 2022-08-23 21:46 ` Steven Rostedt
  2022-09-05 13:48   ` Masami Hiramatsu
  2022-08-23 21:46 ` [PATCH 3/4] tracing: Move struct filter_pred into trace_events_filter.c Steven Rostedt
  2022-08-23 21:46 ` [PATCH 4/4] tracing/filter: Call filter predicate functions directly via a switch statement Steven Rostedt
  3 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2022-08-23 21:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Due to retpolines, indirect calls are much more expensive than direct
calls. The histograms have a select set of functions it uses for the
histograms, instead of using function pointers to call them, create a
hist_fn_call() function that uses a switch statement to call the histogram
functions directly. This gives a 13% speedup to the histogram logic.

Using the histogram benchmark:

Before:

 # event histogram
 #
 # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
 #

{ delta:        129 } hitcount:       2213
{ delta:        130 } hitcount:     285965
{ delta:        131 } hitcount:    1146545
{ delta:        132 } hitcount:    5185432
{ delta:        133 } hitcount:   19896215
{ delta:        134 } hitcount:   53118616
{ delta:        135 } hitcount:   83816709
{ delta:        136 } hitcount:   68329562
{ delta:        137 } hitcount:   41859349
{ delta:        138 } hitcount:   46257797
{ delta:        139 } hitcount:   54400831
{ delta:        140 } hitcount:   72875007
{ delta:        141 } hitcount:   76193272
{ delta:        142 } hitcount:   49504263
{ delta:        143 } hitcount:   38821072
{ delta:        144 } hitcount:   47702679
{ delta:        145 } hitcount:   41357297
{ delta:        146 } hitcount:   22058238
{ delta:        147 } hitcount:    9720002
{ delta:        148 } hitcount:    3193542
{ delta:        149 } hitcount:     927030
{ delta:        150 } hitcount:     850772
{ delta:        151 } hitcount:    1477380
{ delta:        152 } hitcount:    2687977
{ delta:        153 } hitcount:    2865985
{ delta:        154 } hitcount:    1977492
{ delta:        155 } hitcount:    2475607
{ delta:        156 } hitcount:    3403612

After:

 # event histogram
 #
 # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
 #

{ delta:        113 } hitcount:        272
{ delta:        114 } hitcount:        840
{ delta:        118 } hitcount:        344
{ delta:        119 } hitcount:      25428
{ delta:        120 } hitcount:     350590
{ delta:        121 } hitcount:    1892484
{ delta:        122 } hitcount:    6205004
{ delta:        123 } hitcount:   11583521
{ delta:        124 } hitcount:   37590979
{ delta:        125 } hitcount:  108308504
{ delta:        126 } hitcount:  131672461
{ delta:        127 } hitcount:   88700598
{ delta:        128 } hitcount:   65939870
{ delta:        129 } hitcount:   45055004
{ delta:        130 } hitcount:   33174464
{ delta:        131 } hitcount:   31813493
{ delta:        132 } hitcount:   29011676
{ delta:        133 } hitcount:   22798782
{ delta:        134 } hitcount:   22072486
{ delta:        135 } hitcount:   17034113
{ delta:        136 } hitcount:    8982490
{ delta:        137 } hitcount:    2865908
{ delta:        138 } hitcount:     980382
{ delta:        139 } hitcount:    1651944
{ delta:        140 } hitcount:    4112073
{ delta:        141 } hitcount:    3963269
{ delta:        142 } hitcount:    1712508
{ delta:        143 } hitcount:     575941
{ delta:        144 } hitcount:     351427
{ delta:        145 } hitcount:     218077
{ delta:        146 } hitcount:     167297
{ delta:        147 } hitcount:     146198
{ delta:        148 } hitcount:     116122
{ delta:        149 } hitcount:      58993
{ delta:        150 } hitcount:      40228

The delta above is in nanoseconds. It brings the fastest time down from
129ns to 113ns, and the peak from 141ns to 126ns.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_hist.c | 244 +++++++++++++++++++++----------
 1 file changed, 168 insertions(+), 76 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index fdf784620c28..92fbd72b6408 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -104,6 +104,38 @@ enum field_op_id {
 	FIELD_OP_MULT,
 };
 
+enum hist_field_fn {
+	HIST_FIELD_FN_NOP,
+	HIST_FIELD_FN_VAR_REF,
+	HIST_FIELD_FN_COUNTER,
+	HIST_FIELD_FN_CONST,
+	HIST_FIELD_FN_LOG2,
+	HIST_FIELD_FN_BUCKET,
+	HIST_FIELD_FN_TIMESTAMP,
+	HIST_FIELD_FN_CPU,
+	HIST_FIELD_FN_STRING,
+	HIST_FIELD_FN_DYNSTRING,
+	HIST_FIELD_FN_RELDYNSTRING,
+	HIST_FIELD_FN_PSTRING,
+	HIST_FIELD_FN_S64,
+	HIST_FIELD_FN_U64,
+	HIST_FIELD_FN_S32,
+	HIST_FIELD_FN_U32,
+	HIST_FIELD_FN_S16,
+	HIST_FIELD_FN_U16,
+	HIST_FIELD_FN_S8,
+	HIST_FIELD_FN_U8,
+	HIST_FIELD_FN_UMINUS,
+	HIST_FIELD_FN_MINUS,
+	HIST_FIELD_FN_PLUS,
+	HIST_FIELD_FN_DIV,
+	HIST_FIELD_FN_MULT,
+	HIST_FIELD_FN_DIV_POWER2,
+	HIST_FIELD_FN_DIV_NOT_POWER2,
+	HIST_FIELD_FN_DIV_MULT_SHIFT,
+	HIST_FIELD_FN_EXECNAME,
+};
+
 /*
  * A hist_var (histogram variable) contains variable information for
  * hist_fields having the HIST_FIELD_FL_VAR or HIST_FIELD_FL_VAR_REF
@@ -123,15 +155,15 @@ struct hist_var {
 struct hist_field {
 	struct ftrace_event_field	*field;
 	unsigned long			flags;
-	hist_field_fn_t			fn;
-	unsigned int			ref;
-	unsigned int			size;
-	unsigned int			offset;
-	unsigned int                    is_signed;
 	unsigned long			buckets;
 	const char			*type;
 	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
 	struct hist_trigger_data	*hist_data;
+	enum hist_field_fn		fn_num;
+	unsigned int			ref;
+	unsigned int			size;
+	unsigned int			offset;
+	unsigned int                    is_signed;
 
 	/*
 	 * Variable fields contain variable-specific info in var.
@@ -166,14 +198,11 @@ struct hist_field {
 	u64				div_multiplier;
 };
 
-static u64 hist_field_none(struct hist_field *field,
-			   struct tracing_map_elt *elt,
-			   struct trace_buffer *buffer,
-			   struct ring_buffer_event *rbe,
-			   void *event)
-{
-	return 0;
-}
+static u64 hist_fn_call(struct hist_field *hist_field,
+			struct tracing_map_elt *elt,
+			struct trace_buffer *buffer,
+			struct ring_buffer_event *rbe,
+			void *event);
 
 static u64 hist_field_const(struct hist_field *field,
 			   struct tracing_map_elt *elt,
@@ -250,7 +279,7 @@ static u64 hist_field_log2(struct hist_field *hist_field,
 {
 	struct hist_field *operand = hist_field->operands[0];
 
-	u64 val = operand->fn(operand, elt, buffer, rbe, event);
+	u64 val = hist_fn_call(operand, elt, buffer, rbe, event);
 
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
@@ -264,7 +293,7 @@ static u64 hist_field_bucket(struct hist_field *hist_field,
 	struct hist_field *operand = hist_field->operands[0];
 	unsigned long buckets = hist_field->buckets;
 
-	u64 val = operand->fn(operand, elt, buffer, rbe, event);
+	u64 val = hist_fn_call(operand, elt, buffer, rbe, event);
 
 	if (WARN_ON_ONCE(!buckets))
 		return val;
@@ -285,8 +314,8 @@ static u64 hist_field_plus(struct hist_field *hist_field,
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
-	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
 
 	return val1 + val2;
 }
@@ -300,8 +329,8 @@ static u64 hist_field_minus(struct hist_field *hist_field,
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
-	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
 
 	return val1 - val2;
 }
@@ -315,8 +344,8 @@ static u64 hist_field_div(struct hist_field *hist_field,
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
-	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
 
 	/* Return -1 for the undefined case */
 	if (!val2)
@@ -338,7 +367,7 @@ static u64 div_by_power_of_two(struct hist_field *hist_field,
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
 
 	return val1 >> __ffs64(operand2->constant);
 }
@@ -352,7 +381,7 @@ static u64 div_by_not_power_of_two(struct hist_field *hist_field,
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
 
 	return div64_u64(val1, operand2->constant);
 }
@@ -366,7 +395,7 @@ static u64 div_by_mult_and_shift(struct hist_field *hist_field,
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
 
 	/*
 	 * If the divisor is a constant, do a multiplication and shift instead.
@@ -400,8 +429,8 @@ static u64 hist_field_mult(struct hist_field *hist_field,
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
-	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
 
 	return val1 * val2;
 }
@@ -414,7 +443,7 @@ static u64 hist_field_unary_minus(struct hist_field *hist_field,
 {
 	struct hist_field *operand = hist_field->operands[0];
 
-	s64 sval = (s64)operand->fn(operand, elt, buffer, rbe, event);
+	s64 sval = (s64)hist_fn_call(operand, elt, buffer, rbe, event);
 	u64 val = (u64)-sval;
 
 	return val;
@@ -657,19 +686,19 @@ struct snapshot_context {
  * Returns the specific division function to use if the divisor
  * is constant. This avoids extra branches when the trigger is hit.
  */
-static hist_field_fn_t hist_field_get_div_fn(struct hist_field *divisor)
+static enum hist_field_fn hist_field_get_div_fn(struct hist_field *divisor)
 {
 	u64 div = divisor->constant;
 
 	if (!(div & (div - 1)))
-		return div_by_power_of_two;
+		return HIST_FIELD_FN_DIV_POWER2;
 
 	/* If the divisor is too large, do a regular division */
 	if (div > (1 << HIST_DIV_SHIFT))
-		return div_by_not_power_of_two;
+		return HIST_FIELD_FN_DIV_NOT_POWER2;
 
 	divisor->div_multiplier = div64_u64((u64)(1 << HIST_DIV_SHIFT), div);
-	return div_by_mult_and_shift;
+	return HIST_FIELD_FN_DIV_MULT_SHIFT;
 }
 
 static void track_data_free(struct track_data *track_data)
@@ -1334,38 +1363,32 @@ static const char *hist_field_name(struct hist_field *field,
 	return field_name;
 }
 
-static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
+static enum hist_field_fn select_value_fn(int field_size, int field_is_signed)
 {
-	hist_field_fn_t fn = NULL;
-
 	switch (field_size) {
 	case 8:
 		if (field_is_signed)
-			fn = hist_field_s64;
+			return HIST_FIELD_FN_S64;
 		else
-			fn = hist_field_u64;
-		break;
+			return HIST_FIELD_FN_U64;
 	case 4:
 		if (field_is_signed)
-			fn = hist_field_s32;
+			return HIST_FIELD_FN_S32;
 		else
-			fn = hist_field_u32;
-		break;
+			return HIST_FIELD_FN_U32;
 	case 2:
 		if (field_is_signed)
-			fn = hist_field_s16;
+			return HIST_FIELD_FN_S16;
 		else
-			fn = hist_field_u16;
-		break;
+			return HIST_FIELD_FN_U16;
 	case 1:
 		if (field_is_signed)
-			fn = hist_field_s8;
+			return HIST_FIELD_FN_S8;
 		else
-			fn = hist_field_u8;
-		break;
+			return HIST_FIELD_FN_U8;
 	}
 
-	return fn;
+	return HIST_FIELD_FN_NOP;
 }
 
 static int parse_map_size(char *str)
@@ -1922,19 +1945,19 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 		goto out; /* caller will populate */
 
 	if (flags & HIST_FIELD_FL_VAR_REF) {
-		hist_field->fn = hist_field_var_ref;
+		hist_field->fn_num = HIST_FIELD_FN_VAR_REF;
 		goto out;
 	}
 
 	if (flags & HIST_FIELD_FL_HITCOUNT) {
-		hist_field->fn = hist_field_counter;
+		hist_field->fn_num = HIST_FIELD_FN_COUNTER;
 		hist_field->size = sizeof(u64);
 		hist_field->type = "u64";
 		goto out;
 	}
 
 	if (flags & HIST_FIELD_FL_CONST) {
-		hist_field->fn = hist_field_const;
+		hist_field->fn_num = HIST_FIELD_FN_CONST;
 		hist_field->size = sizeof(u64);
 		hist_field->type = kstrdup("u64", GFP_KERNEL);
 		if (!hist_field->type)
@@ -1943,14 +1966,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 	}
 
 	if (flags & HIST_FIELD_FL_STACKTRACE) {
-		hist_field->fn = hist_field_none;
+		hist_field->fn_num = HIST_FIELD_FN_NOP;
 		goto out;
 	}
 
 	if (flags & (HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET)) {
 		unsigned long fl = flags & ~(HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET);
-		hist_field->fn = flags & HIST_FIELD_FL_LOG2 ? hist_field_log2 :
-			hist_field_bucket;
+		hist_field->fn_num = flags & HIST_FIELD_FL_LOG2 ? HIST_FIELD_FN_LOG2 :
+			HIST_FIELD_FN_BUCKET;
 		hist_field->operands[0] = create_hist_field(hist_data, field, fl, NULL);
 		hist_field->size = hist_field->operands[0]->size;
 		hist_field->type = kstrdup_const(hist_field->operands[0]->type, GFP_KERNEL);
@@ -1960,14 +1983,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 	}
 
 	if (flags & HIST_FIELD_FL_TIMESTAMP) {
-		hist_field->fn = hist_field_timestamp;
+		hist_field->fn_num = HIST_FIELD_FN_TIMESTAMP;
 		hist_field->size = sizeof(u64);
 		hist_field->type = "u64";
 		goto out;
 	}
 
 	if (flags & HIST_FIELD_FL_CPU) {
-		hist_field->fn = hist_field_cpu;
+		hist_field->fn_num = HIST_FIELD_FN_CPU;
 		hist_field->size = sizeof(int);
 		hist_field->type = "unsigned int";
 		goto out;
@@ -1987,14 +2010,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 			goto free;
 
 		if (field->filter_type == FILTER_STATIC_STRING) {
-			hist_field->fn = hist_field_string;
+			hist_field->fn_num = HIST_FIELD_FN_STRING;
 			hist_field->size = field->size;
 		} else if (field->filter_type == FILTER_DYN_STRING) {
-			hist_field->fn = hist_field_dynstring;
+			hist_field->fn_num = HIST_FIELD_FN_DYNSTRING;
 		} else if (field->filter_type == FILTER_RDYN_STRING)
-			hist_field->fn = hist_field_reldynstring;
+			hist_field->fn_num = HIST_FIELD_FN_RELDYNSTRING;
 		else
-			hist_field->fn = hist_field_pstring;
+			hist_field->fn_num = HIST_FIELD_FN_PSTRING;
 	} else {
 		hist_field->size = field->size;
 		hist_field->is_signed = field->is_signed;
@@ -2002,9 +2025,9 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 		if (!hist_field->type)
 			goto free;
 
-		hist_field->fn = select_value_fn(field->size,
-						 field->is_signed);
-		if (!hist_field->fn) {
+		hist_field->fn_num = select_value_fn(field->size,
+						     field->is_signed);
+		if (hist_field->fn_num == HIST_FIELD_FN_NOP) {
 			destroy_hist_field(hist_field, 0);
 			return NULL;
 		}
@@ -2340,7 +2363,7 @@ static struct hist_field *create_alias(struct hist_trigger_data *hist_data,
 	if (!alias)
 		return NULL;
 
-	alias->fn = var_ref->fn;
+	alias->fn_num = var_ref->fn_num;
 	alias->operands[0] = var_ref;
 
 	if (init_var_ref(alias, var_ref, var_ref->system, var_ref->event_name)) {
@@ -2523,7 +2546,7 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
 
 	expr->flags |= operand1->flags &
 		(HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS);
-	expr->fn = hist_field_unary_minus;
+	expr->fn_num = HIST_FIELD_FN_UMINUS;
 	expr->operands[0] = operand1;
 	expr->size = operand1->size;
 	expr->is_signed = operand1->is_signed;
@@ -2595,7 +2618,7 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 	unsigned long operand_flags, operand2_flags;
 	int field_op, ret = -EINVAL;
 	char *sep, *operand1_str;
-	hist_field_fn_t op_fn;
+	enum hist_field_fn op_fn;
 	bool combine_consts;
 
 	if (*n_subexprs > 3) {
@@ -2654,16 +2677,16 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 
 	switch (field_op) {
 	case FIELD_OP_MINUS:
-		op_fn = hist_field_minus;
+		op_fn = HIST_FIELD_FN_MINUS;
 		break;
 	case FIELD_OP_PLUS:
-		op_fn = hist_field_plus;
+		op_fn = HIST_FIELD_FN_PLUS;
 		break;
 	case FIELD_OP_DIV:
-		op_fn = hist_field_div;
+		op_fn = HIST_FIELD_FN_DIV;
 		break;
 	case FIELD_OP_MULT:
-		op_fn = hist_field_mult;
+		op_fn = HIST_FIELD_FN_MULT;
 		break;
 	default:
 		ret = -EINVAL;
@@ -2725,7 +2748,8 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 		if (var2)
 			expr->operands[1] = var2;
 
-		expr->constant = op_fn(expr, NULL, NULL, NULL, NULL);
+		expr->fn_num = op_fn;
+		expr->constant = hist_fn_call(expr, NULL, NULL, NULL, NULL);
 
 		expr->operands[0] = NULL;
 		expr->operands[1] = NULL;
@@ -2739,7 +2763,7 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 
 		expr->name = expr_str(expr, 0);
 	} else {
-		expr->fn = op_fn;
+		expr->fn_num = op_fn;
 
 		/* The operand sizes should be the same, so just pick one */
 		expr->size = operand1->size;
@@ -3065,7 +3089,7 @@ static inline void __update_field_vars(struct tracing_map_elt *elt,
 		struct hist_field *var = field_var->var;
 		struct hist_field *val = field_var->val;
 
-		var_val = val->fn(val, elt, buffer, rbe, rec);
+		var_val = hist_fn_call(val, elt, buffer, rbe, rec);
 		var_idx = var->var.idx;
 
 		if (val->flags & HIST_FIELD_FL_STRING) {
@@ -4186,6 +4210,74 @@ static u64 hist_field_execname(struct hist_field *hist_field,
 	return (u64)(unsigned long)(elt_data->comm);
 }
 
+static u64 hist_fn_call(struct hist_field *hist_field,
+			struct tracing_map_elt *elt,
+			struct trace_buffer *buffer,
+			struct ring_buffer_event *rbe,
+			void *event)
+{
+	switch (hist_field->fn_num) {
+	case HIST_FIELD_FN_VAR_REF:
+		return hist_field_var_ref(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_COUNTER:
+		return hist_field_counter(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_CONST:
+		return hist_field_const(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_LOG2:
+		return hist_field_log2(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_BUCKET:
+		return hist_field_bucket(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_TIMESTAMP:
+		return hist_field_timestamp(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_CPU:
+		return hist_field_cpu(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_STRING:
+		return hist_field_string(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_DYNSTRING:
+		return hist_field_dynstring(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_RELDYNSTRING:
+		return hist_field_reldynstring(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_PSTRING:
+		return hist_field_pstring(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_S64:
+		return hist_field_s64(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_U64:
+		return hist_field_u64(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_S32:
+		return hist_field_s32(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_U32:
+		return hist_field_u32(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_S16:
+		return hist_field_s16(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_U16:
+		return hist_field_u16(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_S8:
+		return hist_field_s8(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_U8:
+		return hist_field_u8(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_UMINUS:
+		return hist_field_unary_minus(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_MINUS:
+		return hist_field_minus(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_PLUS:
+		return hist_field_plus(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_DIV:
+		return hist_field_div(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_MULT:
+		return hist_field_mult(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_DIV_POWER2:
+		return div_by_power_of_two(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_DIV_NOT_POWER2:
+		return div_by_not_power_of_two(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_DIV_MULT_SHIFT:
+		return div_by_mult_and_shift(hist_field, elt, buffer, rbe, event);
+	case HIST_FIELD_FN_EXECNAME:
+		return hist_field_execname(hist_field, elt, buffer, rbe, event);
+	default:
+		return 0;
+	}
+}
+
 /* Convert a var that points to common_pid.execname to a string */
 static void update_var_execname(struct hist_field *hist_field)
 {
@@ -4197,7 +4289,7 @@ static void update_var_execname(struct hist_field *hist_field)
 	kfree_const(hist_field->type);
 	hist_field->type = "char[]";
 
-	hist_field->fn = hist_field_execname;
+	hist_field->fn_num = HIST_FIELD_FN_EXECNAME;
 }
 
 static int create_var_field(struct hist_trigger_data *hist_data,
@@ -4956,7 +5048,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
 
 	for_each_hist_val_field(i, hist_data) {
 		hist_field = hist_data->fields[i];
-		hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec);
+		hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec);
 		if (hist_field->flags & HIST_FIELD_FL_VAR) {
 			var_idx = hist_field->var.idx;
 
@@ -4987,7 +5079,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
 	for_each_hist_key_field(i, hist_data) {
 		hist_field = hist_data->fields[i];
 		if (hist_field->flags & HIST_FIELD_FL_VAR) {
-			hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec);
+			hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec);
 			var_idx = hist_field->var.idx;
 			tracing_map_set_var(elt, var_idx, hist_val);
 		}
@@ -5062,7 +5154,7 @@ static void event_hist_trigger(struct event_trigger_data *data,
 					 HIST_STACKTRACE_SKIP);
 			key = entries;
 		} else {
-			field_contents = key_field->fn(key_field, elt, buffer, rbe, rec);
+			field_contents = hist_fn_call(key_field, elt, buffer, rbe, rec);
 			if (key_field->flags & HIST_FIELD_FL_STRING) {
 				key = (void *)(unsigned long)field_contents;
 				use_compound_key = true;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] tracing: Move struct filter_pred into trace_events_filter.c
  2022-08-23 21:46 [PATCH 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
  2022-08-23 21:46 ` [PATCH 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
  2022-08-23 21:46 ` [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement Steven Rostedt
@ 2022-08-23 21:46 ` Steven Rostedt
  2022-08-23 21:46 ` [PATCH 4/4] tracing/filter: Call filter predicate functions directly via a switch statement Steven Rostedt
  3 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2022-08-23 21:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The structure filter_pred and the typedef of the function used are only
referenced by trace_events_filter.c. There's no reason to have it in an
external header file. Move them into the only file they are used in.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace.h               | 13 -------------
 kernel/trace/trace_events_filter.c | 13 +++++++++++++
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 900e75d96c84..54ee5711c729 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1435,8 +1435,6 @@ event_trigger_unlock_commit(struct trace_event_file *file,
 struct filter_pred;
 struct regex;
 
-typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event);
-
 typedef int (*regex_match_func)(char *str, struct regex *r, int len);
 
 enum regex_type {
@@ -1455,17 +1453,6 @@ struct regex {
 	regex_match_func	match;
 };
 
-struct filter_pred {
-	filter_pred_fn_t 	fn;
-	u64 			val;
-	struct regex		regex;
-	unsigned short		*ops;
-	struct ftrace_event_field *field;
-	int 			offset;
-	int			not;
-	int 			op;
-};
-
 static inline bool is_string_field(struct ftrace_event_field *field)
 {
 	return field->filter_type == FILTER_DYN_STRING ||
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 4b1057ab9d96..c49c689ce4ad 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -43,6 +43,19 @@ enum filter_op_ids { OPS };
 
 static const char * ops[] = { OPS };
 
+typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event);
+
+struct filter_pred {
+	filter_pred_fn_t 	fn;
+	u64 			val;
+	struct regex		regex;
+	unsigned short		*ops;
+	struct ftrace_event_field *field;
+	int 			offset;
+	int			not;
+	int 			op;
+};
+
 /*
  * pred functions are OP_LE, OP_LT, OP_GE, OP_GT, and OP_BAND
  * pred_funcs_##type below must match the order of them above.
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] tracing/filter: Call filter predicate functions directly via a switch statement
  2022-08-23 21:46 [PATCH 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
                   ` (2 preceding siblings ...)
  2022-08-23 21:46 ` [PATCH 3/4] tracing: Move struct filter_pred into trace_events_filter.c Steven Rostedt
@ 2022-08-23 21:46 ` Steven Rostedt
  3 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2022-08-23 21:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Due to retpolines, indirect calls are much more expensive than direct
calls. The filters have a select set of functions it uses for the
predicates. Instead of using function pointers to call them, create a
filter_pred_fn_call() function that uses a switch statement to call the
predicate functions directly. This gives almost a 10% speedup to the
filter logic.

Using the histogram benchmark:

Before:

 # event histogram
 #
 # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
 #

{ delta:        113 } hitcount:        272
{ delta:        114 } hitcount:        840
{ delta:        118 } hitcount:        344
{ delta:        119 } hitcount:      25428
{ delta:        120 } hitcount:     350590
{ delta:        121 } hitcount:    1892484
{ delta:        122 } hitcount:    6205004
{ delta:        123 } hitcount:   11583521
{ delta:        124 } hitcount:   37590979
{ delta:        125 } hitcount:  108308504
{ delta:        126 } hitcount:  131672461
{ delta:        127 } hitcount:   88700598
{ delta:        128 } hitcount:   65939870
{ delta:        129 } hitcount:   45055004
{ delta:        130 } hitcount:   33174464
{ delta:        131 } hitcount:   31813493
{ delta:        132 } hitcount:   29011676
{ delta:        133 } hitcount:   22798782
{ delta:        134 } hitcount:   22072486
{ delta:        135 } hitcount:   17034113
{ delta:        136 } hitcount:    8982490
{ delta:        137 } hitcount:    2865908
{ delta:        138 } hitcount:     980382
{ delta:        139 } hitcount:    1651944
{ delta:        140 } hitcount:    4112073
{ delta:        141 } hitcount:    3963269
{ delta:        142 } hitcount:    1712508
{ delta:        143 } hitcount:     575941

After:

 # event histogram
 #
 # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
 #

{ delta:        103 } hitcount:         60
{ delta:        104 } hitcount:      16966
{ delta:        105 } hitcount:     396625
{ delta:        106 } hitcount:    3223400
{ delta:        107 } hitcount:   12053754
{ delta:        108 } hitcount:   20241711
{ delta:        109 } hitcount:   14850200
{ delta:        110 } hitcount:    4946599
{ delta:        111 } hitcount:    3479315
{ delta:        112 } hitcount:   18698299
{ delta:        113 } hitcount:   62388733
{ delta:        114 } hitcount:   95803834
{ delta:        115 } hitcount:   58278130
{ delta:        116 } hitcount:   15364800
{ delta:        117 } hitcount:    5586866
{ delta:        118 } hitcount:    2346880
{ delta:        119 } hitcount:    1131091
{ delta:        120 } hitcount:     620896
{ delta:        121 } hitcount:     236652
{ delta:        122 } hitcount:     105957
{ delta:        123 } hitcount:     119107
{ delta:        124 } hitcount:      54494
{ delta:        125 } hitcount:      63856
{ delta:        126 } hitcount:      64454
{ delta:        127 } hitcount:      34818
{ delta:        128 } hitcount:      41446
{ delta:        129 } hitcount:      51242
{ delta:        130 } hitcount:      28361
{ delta:        131 } hitcount:      23926

The peak before was 126ns per event, after the peak is 114ns, and the
fastest time went from 113ns to 103ns.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_filter.c | 230 ++++++++++++++++++++---------
 1 file changed, 157 insertions(+), 73 deletions(-)

diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index c49c689ce4ad..96acc2b71ac7 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -43,10 +43,33 @@ enum filter_op_ids { OPS };
 
 static const char * ops[] = { OPS };
 
-typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event);
+enum filter_pred_fn {
+	FILTER_PRED_FN_NOP,
+	FILTER_PRED_FN_64,
+	FILTER_PRED_FN_S64,
+	FILTER_PRED_FN_U64,
+	FILTER_PRED_FN_32,
+	FILTER_PRED_FN_S32,
+	FILTER_PRED_FN_U32,
+	FILTER_PRED_FN_16,
+	FILTER_PRED_FN_S16,
+	FILTER_PRED_FN_U16,
+	FILTER_PRED_FN_8,
+	FILTER_PRED_FN_S8,
+	FILTER_PRED_FN_U8,
+	FILTER_PRED_FN_COMM,
+	FILTER_PRED_FN_STRING,
+	FILTER_PRED_FN_STRLOC,
+	FILTER_PRED_FN_STRRELLOC,
+	FILTER_PRED_FN_PCHAR_USER,
+	FILTER_PRED_FN_PCHAR,
+	FILTER_PRED_FN_CPU,
+	FILTER_PRED_FN_,
+	FILTER_PRED_TEST_VISITED,
+};
 
 struct filter_pred {
-	filter_pred_fn_t 	fn;
+	enum filter_pred_fn 	fn_num;
 	u64 			val;
 	struct regex		regex;
 	unsigned short		*ops;
@@ -603,44 +626,48 @@ predicate_parse(const char *str, int nr_parens, int nr_preds,
 	return ERR_PTR(ret);
 }
 
+enum pred_cmp_types {
+	PRED_CMP_TYPE_NOP,
+	PRED_CMP_TYPE_LT,
+	PRED_CMP_TYPE_LE,
+	PRED_CMP_TYPE_GT,
+	PRED_CMP_TYPE_GE,
+	PRED_CMP_TYPE_BAND,
+};
+
 #define DEFINE_COMPARISON_PRED(type)					\
-static int filter_pred_LT_##type(struct filter_pred *pred, void *event)	\
-{									\
-	type *addr = (type *)(event + pred->offset);			\
-	type val = (type)pred->val;					\
-	return *addr < val;						\
-}									\
-static int filter_pred_LE_##type(struct filter_pred *pred, void *event)	\
+static int filter_pred_##type(struct filter_pred *pred, void *event)	\
 {									\
-	type *addr = (type *)(event + pred->offset);			\
-	type val = (type)pred->val;					\
-	return *addr <= val;						\
-}									\
-static int filter_pred_GT_##type(struct filter_pred *pred, void *event)	\
-{									\
-	type *addr = (type *)(event + pred->offset);			\
-	type val = (type)pred->val;					\
-	return *addr > val;					\
-}									\
-static int filter_pred_GE_##type(struct filter_pred *pred, void *event)	\
-{									\
-	type *addr = (type *)(event + pred->offset);			\
-	type val = (type)pred->val;					\
-	return *addr >= val;						\
-}									\
-static int filter_pred_BAND_##type(struct filter_pred *pred, void *event) \
-{									\
-	type *addr = (type *)(event + pred->offset);			\
-	type val = (type)pred->val;					\
-	return !!(*addr & val);						\
-}									\
-static const filter_pred_fn_t pred_funcs_##type[] = {			\
-	filter_pred_LE_##type,						\
-	filter_pred_LT_##type,						\
-	filter_pred_GE_##type,						\
-	filter_pred_GT_##type,						\
-	filter_pred_BAND_##type,					\
-};
+	switch (pred->op) {						\
+	case OP_LT: {							\
+		type *addr = (type *)(event + pred->offset);		\
+		type val = (type)pred->val;				\
+		return *addr < val;					\
+	}								\
+	case OP_LE: {					\
+		type *addr = (type *)(event + pred->offset);		\
+		type val = (type)pred->val;				\
+		return *addr <= val;					\
+	}								\
+	case OP_GT: {					\
+		type *addr = (type *)(event + pred->offset);		\
+		type val = (type)pred->val;				\
+		return *addr > val;					\
+	}								\
+	case OP_GE: {					\
+		type *addr = (type *)(event + pred->offset);		\
+		type val = (type)pred->val;				\
+		return *addr >= val;					\
+	}								\
+	case OP_BAND: {					\
+		type *addr = (type *)(event + pred->offset);		\
+		type val = (type)pred->val;				\
+		return !!(*addr & val);					\
+	}								\
+	default:							\
+		return 0;						\
+	}								\
+}
 
 #define DEFINE_EQUALITY_PRED(size)					\
 static int filter_pred_##size(struct filter_pred *pred, void *event)	\
@@ -849,11 +876,6 @@ static int filter_pred_comm(struct filter_pred *pred, void *event)
 	return cmp ^ pred->not;
 }
 
-static int filter_pred_none(struct filter_pred *pred, void *event)
-{
-	return 0;
-}
-
 /*
  * regex_match_foo - Basic regex callbacks
  *
@@ -999,6 +1021,19 @@ static void filter_build_regex(struct filter_pred *pred)
 	}
 }
 
+
+#ifdef CONFIG_FTRACE_STARTUP_TEST
+static int test_pred_visited_fn(struct filter_pred *pred, void *event);
+#else
+static int test_pred_visited_fn(struct filter_pred *pred, void *event)
+{
+	return 0;
+}
+#endif
+
+
+static int filter_pred_fn_call(struct filter_pred *pred, void *event);
+
 /* return 1 if event matches, 0 otherwise (discard) */
 int filter_match_preds(struct event_filter *filter, void *rec)
 {
@@ -1016,7 +1051,7 @@ int filter_match_preds(struct event_filter *filter, void *rec)
 
 	for (i = 0; prog[i].pred; i++) {
 		struct filter_pred *pred = prog[i].pred;
-		int match = pred->fn(pred, rec);
+		int match = filter_pred_fn_call(pred, rec);
 		if (match == prog[i].when_to_branch)
 			i = prog[i].target;
 	}
@@ -1202,10 +1237,10 @@ int filter_assign_type(const char *type)
 	return FILTER_OTHER;
 }
 
-static filter_pred_fn_t select_comparison_fn(enum filter_op_ids op,
-					    int field_size, int field_is_signed)
+static enum filter_pred_fn select_comparison_fn(enum filter_op_ids op,
+						int field_size, int field_is_signed)
 {
-	filter_pred_fn_t fn = NULL;
+	enum filter_pred_fn fn = FILTER_PRED_FN_NOP;
 	int pred_func_index = -1;
 
 	switch (op) {
@@ -1214,50 +1249,99 @@ static filter_pred_fn_t select_comparison_fn(enum filter_op_ids op,
 		break;
 	default:
 		if (WARN_ON_ONCE(op < PRED_FUNC_START))
-			return NULL;
+			return fn;
 		pred_func_index = op - PRED_FUNC_START;
 		if (WARN_ON_ONCE(pred_func_index > PRED_FUNC_MAX))
-			return NULL;
+			return fn;
 	}
 
 	switch (field_size) {
 	case 8:
 		if (pred_func_index < 0)
-			fn = filter_pred_64;
+			fn = FILTER_PRED_FN_64;
 		else if (field_is_signed)
-			fn = pred_funcs_s64[pred_func_index];
+			fn = FILTER_PRED_FN_S64;
 		else
-			fn = pred_funcs_u64[pred_func_index];
+			fn = FILTER_PRED_FN_U64;
 		break;
 	case 4:
 		if (pred_func_index < 0)
-			fn = filter_pred_32;
+			fn = FILTER_PRED_FN_32;
 		else if (field_is_signed)
-			fn = pred_funcs_s32[pred_func_index];
+			fn = FILTER_PRED_FN_S32;
 		else
-			fn = pred_funcs_u32[pred_func_index];
+			fn = FILTER_PRED_FN_U32;
 		break;
 	case 2:
 		if (pred_func_index < 0)
-			fn = filter_pred_16;
+			fn = FILTER_PRED_FN_16;
 		else if (field_is_signed)
-			fn = pred_funcs_s16[pred_func_index];
+			fn = FILTER_PRED_FN_S16;
 		else
-			fn = pred_funcs_u16[pred_func_index];
+			fn = FILTER_PRED_FN_U16;
 		break;
 	case 1:
 		if (pred_func_index < 0)
-			fn = filter_pred_8;
+			fn = FILTER_PRED_FN_8;
 		else if (field_is_signed)
-			fn = pred_funcs_s8[pred_func_index];
+			fn = FILTER_PRED_FN_S8;
 		else
-			fn = pred_funcs_u8[pred_func_index];
+			fn = FILTER_PRED_FN_U8;
 		break;
 	}
 
 	return fn;
 }
 
+
+static int filter_pred_fn_call(struct filter_pred *pred, void *event)
+{
+	switch (pred->fn_num) {
+	case FILTER_PRED_FN_64:
+		return filter_pred_64(pred, event);
+	case FILTER_PRED_FN_S64:
+		return filter_pred_s64(pred, event);
+	case FILTER_PRED_FN_U64:
+		return filter_pred_u64(pred, event);
+	case FILTER_PRED_FN_32:
+		return filter_pred_32(pred, event);
+	case FILTER_PRED_FN_S32:
+		return filter_pred_s32(pred, event);
+	case FILTER_PRED_FN_U32:
+		return filter_pred_u32(pred, event);
+	case FILTER_PRED_FN_16:
+		return filter_pred_16(pred, event);
+	case FILTER_PRED_FN_S16:
+		return filter_pred_s16(pred, event);
+	case FILTER_PRED_FN_U16:
+		return filter_pred_u16(pred, event);
+	case FILTER_PRED_FN_8:
+		return filter_pred_8(pred, event);
+	case FILTER_PRED_FN_S8:
+		return filter_pred_s8(pred, event);
+	case FILTER_PRED_FN_U8:
+		return filter_pred_u8(pred, event);
+	case FILTER_PRED_FN_COMM:
+		return filter_pred_comm(pred, event);
+	case FILTER_PRED_FN_STRING:
+		return filter_pred_string(pred, event);
+	case FILTER_PRED_FN_STRLOC:
+		return filter_pred_strloc(pred, event);
+	case FILTER_PRED_FN_STRRELLOC:
+		return filter_pred_strrelloc(pred, event);
+	case FILTER_PRED_FN_PCHAR_USER:
+		return filter_pred_pchar_user(pred, event);
+	case FILTER_PRED_FN_PCHAR:
+		return filter_pred_pchar(pred, event);
+	case FILTER_PRED_FN_CPU:
+		return filter_pred_cpu(pred, event);
+	case FILTER_PRED_TEST_VISITED:
+		return test_pred_visited_fn(pred, event);
+	default:
+		return 0;
+	}
+}
+
 /* Called when a predicate is encountered by predicate_parse() */
 static int parse_pred(const char *str, void *data,
 		      int pos, struct filter_parse_error *pe,
@@ -1351,7 +1435,7 @@ static int parse_pred(const char *str, void *data,
 			parse_error(pe, FILT_ERR_IP_FIELD_ONLY, pos + i);
 			goto err_free;
 		}
-		pred->fn = filter_pred_none;
+		pred->fn_num = FILTER_PRED_FN_NOP;
 
 		/*
 		 * Quotes are not required, but if they exist then we need
@@ -1429,16 +1513,16 @@ static int parse_pred(const char *str, void *data,
 		filter_build_regex(pred);
 
 		if (field->filter_type == FILTER_COMM) {
-			pred->fn = filter_pred_comm;
+			pred->fn_num = FILTER_PRED_FN_COMM;
 
 		} else if (field->filter_type == FILTER_STATIC_STRING) {
-			pred->fn = filter_pred_string;
+			pred->fn_num = FILTER_PRED_FN_STRING;
 			pred->regex.field_len = field->size;
 
 		} else if (field->filter_type == FILTER_DYN_STRING) {
-			pred->fn = filter_pred_strloc;
+			pred->fn_num = FILTER_PRED_FN_STRLOC;
 		} else if (field->filter_type == FILTER_RDYN_STRING)
-			pred->fn = filter_pred_strrelloc;
+			pred->fn_num = FILTER_PRED_FN_STRRELLOC;
 		else {
 
 			if (!ustring_per_cpu) {
@@ -1449,9 +1533,9 @@ static int parse_pred(const char *str, void *data,
 			}
 
 			if (ustring)
-				pred->fn = filter_pred_pchar_user;
+				pred->fn_num = FILTER_PRED_FN_PCHAR_USER;
 			else
-				pred->fn = filter_pred_pchar;
+				pred->fn_num = FILTER_PRED_FN_PCHAR;
 		}
 		/* go past the last quote */
 		i++;
@@ -1499,10 +1583,10 @@ static int parse_pred(const char *str, void *data,
 		pred->val = val;
 
 		if (field->filter_type == FILTER_CPU)
-			pred->fn = filter_pred_cpu;
+			pred->fn_num = FILTER_PRED_FN_CPU;
 		else {
-			pred->fn = select_comparison_fn(pred->op, field->size,
-							field->is_signed);
+			pred->fn_num = select_comparison_fn(pred->op, field->size,
+							    field->is_signed);
 			if (pred->op == OP_NE)
 				pred->not = 1;
 		}
@@ -2309,7 +2393,7 @@ static void update_pred_fn(struct event_filter *filter, char *fields)
 		struct filter_pred *pred = prog[i].pred;
 		struct ftrace_event_field *field = pred->field;
 
-		WARN_ON_ONCE(!pred->fn);
+		WARN_ON_ONCE(pred->fn_num == FILTER_PRED_FN_NOP);
 
 		if (!field) {
 			WARN_ONCE(1, "all leafs should have field defined %d", i);
@@ -2319,7 +2403,7 @@ static void update_pred_fn(struct event_filter *filter, char *fields)
 		if (!strchr(fields, *field->name))
 			continue;
 
-		pred->fn = test_pred_visited_fn;
+		pred->fn_num = FILTER_PRED_TEST_VISITED;
 	}
 }
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement
  2022-08-23 21:46 ` [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement Steven Rostedt
@ 2022-09-05 13:48   ` Masami Hiramatsu
  2022-09-05 15:29     ` Steven Rostedt
  2022-09-06 21:38     ` Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Masami Hiramatsu @ 2022-09-05 13:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi

Hi Steve,

I found this crashed the kernel if I ran 

 # ftracetest test.d/trigger/trigger-hist-expressions.tc

More specifically, here is the minimum reproduce command.

 # echo "hist:keys=common_pid:x=16-8-4" > events/sched/sched_process_fork/trigger 
[   75.874402] BUG: kernel NULL pointer dereference, address: 0000000000000038
[   75.875378] #PF: supervisor read access in kernel mode

I investigated it and found you missed to set correct fn_num for
combined constant field;

> @@ -2725,7 +2748,8 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
>  		if (var2)
>  			expr->operands[1] = var2;
>  
> -		expr->constant = op_fn(expr, NULL, NULL, NULL, NULL);
> +		expr->fn_num = op_fn;
> +		expr->constant = hist_fn_call(expr, NULL, NULL, NULL, NULL);
>  
>  		expr->operands[0] = NULL;
>  		expr->operands[1] = NULL;

Here, we need this;

		expr->fn_num = HIST_FIELD_FN_CONST;

Without this fix, the binary op_fn is kept on this const field and it
causes the problem because the const field doesn't have operands.

Thank you,


On Tue, 23 Aug 2022 17:46:08 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> Due to retpolines, indirect calls are much more expensive than direct
> calls. The histograms have a select set of functions it uses for the
> histograms, instead of using function pointers to call them, create a
> hist_fn_call() function that uses a switch statement to call the histogram
> functions directly. This gives a 13% speedup to the histogram logic.
> 
> Using the histogram benchmark:
> 
> Before:
> 
>  # event histogram
>  #
>  # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
>  #
> 
> { delta:        129 } hitcount:       2213
> { delta:        130 } hitcount:     285965
> { delta:        131 } hitcount:    1146545
> { delta:        132 } hitcount:    5185432
> { delta:        133 } hitcount:   19896215
> { delta:        134 } hitcount:   53118616
> { delta:        135 } hitcount:   83816709
> { delta:        136 } hitcount:   68329562
> { delta:        137 } hitcount:   41859349
> { delta:        138 } hitcount:   46257797
> { delta:        139 } hitcount:   54400831
> { delta:        140 } hitcount:   72875007
> { delta:        141 } hitcount:   76193272
> { delta:        142 } hitcount:   49504263
> { delta:        143 } hitcount:   38821072
> { delta:        144 } hitcount:   47702679
> { delta:        145 } hitcount:   41357297
> { delta:        146 } hitcount:   22058238
> { delta:        147 } hitcount:    9720002
> { delta:        148 } hitcount:    3193542
> { delta:        149 } hitcount:     927030
> { delta:        150 } hitcount:     850772
> { delta:        151 } hitcount:    1477380
> { delta:        152 } hitcount:    2687977
> { delta:        153 } hitcount:    2865985
> { delta:        154 } hitcount:    1977492
> { delta:        155 } hitcount:    2475607
> { delta:        156 } hitcount:    3403612
> 
> After:
> 
>  # event histogram
>  #
>  # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
>  #
> 
> { delta:        113 } hitcount:        272
> { delta:        114 } hitcount:        840
> { delta:        118 } hitcount:        344
> { delta:        119 } hitcount:      25428
> { delta:        120 } hitcount:     350590
> { delta:        121 } hitcount:    1892484
> { delta:        122 } hitcount:    6205004
> { delta:        123 } hitcount:   11583521
> { delta:        124 } hitcount:   37590979
> { delta:        125 } hitcount:  108308504
> { delta:        126 } hitcount:  131672461
> { delta:        127 } hitcount:   88700598
> { delta:        128 } hitcount:   65939870
> { delta:        129 } hitcount:   45055004
> { delta:        130 } hitcount:   33174464
> { delta:        131 } hitcount:   31813493
> { delta:        132 } hitcount:   29011676
> { delta:        133 } hitcount:   22798782
> { delta:        134 } hitcount:   22072486
> { delta:        135 } hitcount:   17034113
> { delta:        136 } hitcount:    8982490
> { delta:        137 } hitcount:    2865908
> { delta:        138 } hitcount:     980382
> { delta:        139 } hitcount:    1651944
> { delta:        140 } hitcount:    4112073
> { delta:        141 } hitcount:    3963269
> { delta:        142 } hitcount:    1712508
> { delta:        143 } hitcount:     575941
> { delta:        144 } hitcount:     351427
> { delta:        145 } hitcount:     218077
> { delta:        146 } hitcount:     167297
> { delta:        147 } hitcount:     146198
> { delta:        148 } hitcount:     116122
> { delta:        149 } hitcount:      58993
> { delta:        150 } hitcount:      40228
> 
> The delta above is in nanoseconds. It brings the fastest time down from
> 129ns to 113ns, and the peak from 141ns to 126ns.
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_events_hist.c | 244 +++++++++++++++++++++----------
>  1 file changed, 168 insertions(+), 76 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index fdf784620c28..92fbd72b6408 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -104,6 +104,38 @@ enum field_op_id {
>  	FIELD_OP_MULT,
>  };
>  
> +enum hist_field_fn {
> +	HIST_FIELD_FN_NOP,
> +	HIST_FIELD_FN_VAR_REF,
> +	HIST_FIELD_FN_COUNTER,
> +	HIST_FIELD_FN_CONST,
> +	HIST_FIELD_FN_LOG2,
> +	HIST_FIELD_FN_BUCKET,
> +	HIST_FIELD_FN_TIMESTAMP,
> +	HIST_FIELD_FN_CPU,
> +	HIST_FIELD_FN_STRING,
> +	HIST_FIELD_FN_DYNSTRING,
> +	HIST_FIELD_FN_RELDYNSTRING,
> +	HIST_FIELD_FN_PSTRING,
> +	HIST_FIELD_FN_S64,
> +	HIST_FIELD_FN_U64,
> +	HIST_FIELD_FN_S32,
> +	HIST_FIELD_FN_U32,
> +	HIST_FIELD_FN_S16,
> +	HIST_FIELD_FN_U16,
> +	HIST_FIELD_FN_S8,
> +	HIST_FIELD_FN_U8,
> +	HIST_FIELD_FN_UMINUS,
> +	HIST_FIELD_FN_MINUS,
> +	HIST_FIELD_FN_PLUS,
> +	HIST_FIELD_FN_DIV,
> +	HIST_FIELD_FN_MULT,
> +	HIST_FIELD_FN_DIV_POWER2,
> +	HIST_FIELD_FN_DIV_NOT_POWER2,
> +	HIST_FIELD_FN_DIV_MULT_SHIFT,
> +	HIST_FIELD_FN_EXECNAME,
> +};
> +
>  /*
>   * A hist_var (histogram variable) contains variable information for
>   * hist_fields having the HIST_FIELD_FL_VAR or HIST_FIELD_FL_VAR_REF
> @@ -123,15 +155,15 @@ struct hist_var {
>  struct hist_field {
>  	struct ftrace_event_field	*field;
>  	unsigned long			flags;
> -	hist_field_fn_t			fn;
> -	unsigned int			ref;
> -	unsigned int			size;
> -	unsigned int			offset;
> -	unsigned int                    is_signed;
>  	unsigned long			buckets;
>  	const char			*type;
>  	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
>  	struct hist_trigger_data	*hist_data;
> +	enum hist_field_fn		fn_num;
> +	unsigned int			ref;
> +	unsigned int			size;
> +	unsigned int			offset;
> +	unsigned int                    is_signed;
>  
>  	/*
>  	 * Variable fields contain variable-specific info in var.
> @@ -166,14 +198,11 @@ struct hist_field {
>  	u64				div_multiplier;
>  };
>  
> -static u64 hist_field_none(struct hist_field *field,
> -			   struct tracing_map_elt *elt,
> -			   struct trace_buffer *buffer,
> -			   struct ring_buffer_event *rbe,
> -			   void *event)
> -{
> -	return 0;
> -}
> +static u64 hist_fn_call(struct hist_field *hist_field,
> +			struct tracing_map_elt *elt,
> +			struct trace_buffer *buffer,
> +			struct ring_buffer_event *rbe,
> +			void *event);
>  
>  static u64 hist_field_const(struct hist_field *field,
>  			   struct tracing_map_elt *elt,
> @@ -250,7 +279,7 @@ static u64 hist_field_log2(struct hist_field *hist_field,
>  {
>  	struct hist_field *operand = hist_field->operands[0];
>  
> -	u64 val = operand->fn(operand, elt, buffer, rbe, event);
> +	u64 val = hist_fn_call(operand, elt, buffer, rbe, event);
>  
>  	return (u64) ilog2(roundup_pow_of_two(val));
>  }
> @@ -264,7 +293,7 @@ static u64 hist_field_bucket(struct hist_field *hist_field,
>  	struct hist_field *operand = hist_field->operands[0];
>  	unsigned long buckets = hist_field->buckets;
>  
> -	u64 val = operand->fn(operand, elt, buffer, rbe, event);
> +	u64 val = hist_fn_call(operand, elt, buffer, rbe, event);
>  
>  	if (WARN_ON_ONCE(!buckets))
>  		return val;
> @@ -285,8 +314,8 @@ static u64 hist_field_plus(struct hist_field *hist_field,
>  	struct hist_field *operand1 = hist_field->operands[0];
>  	struct hist_field *operand2 = hist_field->operands[1];
>  
> -	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
> -	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
> +	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
> +	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
>  
>  	return val1 + val2;
>  }
> @@ -300,8 +329,8 @@ static u64 hist_field_minus(struct hist_field *hist_field,
>  	struct hist_field *operand1 = hist_field->operands[0];
>  	struct hist_field *operand2 = hist_field->operands[1];
>  
> -	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
> -	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
> +	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
> +	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
>  
>  	return val1 - val2;
>  }
> @@ -315,8 +344,8 @@ static u64 hist_field_div(struct hist_field *hist_field,
>  	struct hist_field *operand1 = hist_field->operands[0];
>  	struct hist_field *operand2 = hist_field->operands[1];
>  
> -	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
> -	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
> +	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
> +	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
>  
>  	/* Return -1 for the undefined case */
>  	if (!val2)
> @@ -338,7 +367,7 @@ static u64 div_by_power_of_two(struct hist_field *hist_field,
>  	struct hist_field *operand1 = hist_field->operands[0];
>  	struct hist_field *operand2 = hist_field->operands[1];
>  
> -	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
> +	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
>  
>  	return val1 >> __ffs64(operand2->constant);
>  }
> @@ -352,7 +381,7 @@ static u64 div_by_not_power_of_two(struct hist_field *hist_field,
>  	struct hist_field *operand1 = hist_field->operands[0];
>  	struct hist_field *operand2 = hist_field->operands[1];
>  
> -	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
> +	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
>  
>  	return div64_u64(val1, operand2->constant);
>  }
> @@ -366,7 +395,7 @@ static u64 div_by_mult_and_shift(struct hist_field *hist_field,
>  	struct hist_field *operand1 = hist_field->operands[0];
>  	struct hist_field *operand2 = hist_field->operands[1];
>  
> -	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
> +	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
>  
>  	/*
>  	 * If the divisor is a constant, do a multiplication and shift instead.
> @@ -400,8 +429,8 @@ static u64 hist_field_mult(struct hist_field *hist_field,
>  	struct hist_field *operand1 = hist_field->operands[0];
>  	struct hist_field *operand2 = hist_field->operands[1];
>  
> -	u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
> -	u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
> +	u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
> +	u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
>  
>  	return val1 * val2;
>  }
> @@ -414,7 +443,7 @@ static u64 hist_field_unary_minus(struct hist_field *hist_field,
>  {
>  	struct hist_field *operand = hist_field->operands[0];
>  
> -	s64 sval = (s64)operand->fn(operand, elt, buffer, rbe, event);
> +	s64 sval = (s64)hist_fn_call(operand, elt, buffer, rbe, event);
>  	u64 val = (u64)-sval;
>  
>  	return val;
> @@ -657,19 +686,19 @@ struct snapshot_context {
>   * Returns the specific division function to use if the divisor
>   * is constant. This avoids extra branches when the trigger is hit.
>   */
> -static hist_field_fn_t hist_field_get_div_fn(struct hist_field *divisor)
> +static enum hist_field_fn hist_field_get_div_fn(struct hist_field *divisor)
>  {
>  	u64 div = divisor->constant;
>  
>  	if (!(div & (div - 1)))
> -		return div_by_power_of_two;
> +		return HIST_FIELD_FN_DIV_POWER2;
>  
>  	/* If the divisor is too large, do a regular division */
>  	if (div > (1 << HIST_DIV_SHIFT))
> -		return div_by_not_power_of_two;
> +		return HIST_FIELD_FN_DIV_NOT_POWER2;
>  
>  	divisor->div_multiplier = div64_u64((u64)(1 << HIST_DIV_SHIFT), div);
> -	return div_by_mult_and_shift;
> +	return HIST_FIELD_FN_DIV_MULT_SHIFT;
>  }
>  
>  static void track_data_free(struct track_data *track_data)
> @@ -1334,38 +1363,32 @@ static const char *hist_field_name(struct hist_field *field,
>  	return field_name;
>  }
>  
> -static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
> +static enum hist_field_fn select_value_fn(int field_size, int field_is_signed)
>  {
> -	hist_field_fn_t fn = NULL;
> -
>  	switch (field_size) {
>  	case 8:
>  		if (field_is_signed)
> -			fn = hist_field_s64;
> +			return HIST_FIELD_FN_S64;
>  		else
> -			fn = hist_field_u64;
> -		break;
> +			return HIST_FIELD_FN_U64;
>  	case 4:
>  		if (field_is_signed)
> -			fn = hist_field_s32;
> +			return HIST_FIELD_FN_S32;
>  		else
> -			fn = hist_field_u32;
> -		break;
> +			return HIST_FIELD_FN_U32;
>  	case 2:
>  		if (field_is_signed)
> -			fn = hist_field_s16;
> +			return HIST_FIELD_FN_S16;
>  		else
> -			fn = hist_field_u16;
> -		break;
> +			return HIST_FIELD_FN_U16;
>  	case 1:
>  		if (field_is_signed)
> -			fn = hist_field_s8;
> +			return HIST_FIELD_FN_S8;
>  		else
> -			fn = hist_field_u8;
> -		break;
> +			return HIST_FIELD_FN_U8;
>  	}
>  
> -	return fn;
> +	return HIST_FIELD_FN_NOP;
>  }
>  
>  static int parse_map_size(char *str)
> @@ -1922,19 +1945,19 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  		goto out; /* caller will populate */
>  
>  	if (flags & HIST_FIELD_FL_VAR_REF) {
> -		hist_field->fn = hist_field_var_ref;
> +		hist_field->fn_num = HIST_FIELD_FN_VAR_REF;
>  		goto out;
>  	}
>  
>  	if (flags & HIST_FIELD_FL_HITCOUNT) {
> -		hist_field->fn = hist_field_counter;
> +		hist_field->fn_num = HIST_FIELD_FN_COUNTER;
>  		hist_field->size = sizeof(u64);
>  		hist_field->type = "u64";
>  		goto out;
>  	}
>  
>  	if (flags & HIST_FIELD_FL_CONST) {
> -		hist_field->fn = hist_field_const;
> +		hist_field->fn_num = HIST_FIELD_FN_CONST;
>  		hist_field->size = sizeof(u64);
>  		hist_field->type = kstrdup("u64", GFP_KERNEL);
>  		if (!hist_field->type)
> @@ -1943,14 +1966,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  	}
>  
>  	if (flags & HIST_FIELD_FL_STACKTRACE) {
> -		hist_field->fn = hist_field_none;
> +		hist_field->fn_num = HIST_FIELD_FN_NOP;
>  		goto out;
>  	}
>  
>  	if (flags & (HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET)) {
>  		unsigned long fl = flags & ~(HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET);
> -		hist_field->fn = flags & HIST_FIELD_FL_LOG2 ? hist_field_log2 :
> -			hist_field_bucket;
> +		hist_field->fn_num = flags & HIST_FIELD_FL_LOG2 ? HIST_FIELD_FN_LOG2 :
> +			HIST_FIELD_FN_BUCKET;
>  		hist_field->operands[0] = create_hist_field(hist_data, field, fl, NULL);
>  		hist_field->size = hist_field->operands[0]->size;
>  		hist_field->type = kstrdup_const(hist_field->operands[0]->type, GFP_KERNEL);
> @@ -1960,14 +1983,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  	}
>  
>  	if (flags & HIST_FIELD_FL_TIMESTAMP) {
> -		hist_field->fn = hist_field_timestamp;
> +		hist_field->fn_num = HIST_FIELD_FN_TIMESTAMP;
>  		hist_field->size = sizeof(u64);
>  		hist_field->type = "u64";
>  		goto out;
>  	}
>  
>  	if (flags & HIST_FIELD_FL_CPU) {
> -		hist_field->fn = hist_field_cpu;
> +		hist_field->fn_num = HIST_FIELD_FN_CPU;
>  		hist_field->size = sizeof(int);
>  		hist_field->type = "unsigned int";
>  		goto out;
> @@ -1987,14 +2010,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  			goto free;
>  
>  		if (field->filter_type == FILTER_STATIC_STRING) {
> -			hist_field->fn = hist_field_string;
> +			hist_field->fn_num = HIST_FIELD_FN_STRING;
>  			hist_field->size = field->size;
>  		} else if (field->filter_type == FILTER_DYN_STRING) {
> -			hist_field->fn = hist_field_dynstring;
> +			hist_field->fn_num = HIST_FIELD_FN_DYNSTRING;
>  		} else if (field->filter_type == FILTER_RDYN_STRING)
> -			hist_field->fn = hist_field_reldynstring;
> +			hist_field->fn_num = HIST_FIELD_FN_RELDYNSTRING;
>  		else
> -			hist_field->fn = hist_field_pstring;
> +			hist_field->fn_num = HIST_FIELD_FN_PSTRING;
>  	} else {
>  		hist_field->size = field->size;
>  		hist_field->is_signed = field->is_signed;
> @@ -2002,9 +2025,9 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  		if (!hist_field->type)
>  			goto free;
>  
> -		hist_field->fn = select_value_fn(field->size,
> -						 field->is_signed);
> -		if (!hist_field->fn) {
> +		hist_field->fn_num = select_value_fn(field->size,
> +						     field->is_signed);
> +		if (hist_field->fn_num == HIST_FIELD_FN_NOP) {
>  			destroy_hist_field(hist_field, 0);
>  			return NULL;
>  		}
> @@ -2340,7 +2363,7 @@ static struct hist_field *create_alias(struct hist_trigger_data *hist_data,
>  	if (!alias)
>  		return NULL;
>  
> -	alias->fn = var_ref->fn;
> +	alias->fn_num = var_ref->fn_num;
>  	alias->operands[0] = var_ref;
>  
>  	if (init_var_ref(alias, var_ref, var_ref->system, var_ref->event_name)) {
> @@ -2523,7 +2546,7 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
>  
>  	expr->flags |= operand1->flags &
>  		(HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS);
> -	expr->fn = hist_field_unary_minus;
> +	expr->fn_num = HIST_FIELD_FN_UMINUS;
>  	expr->operands[0] = operand1;
>  	expr->size = operand1->size;
>  	expr->is_signed = operand1->is_signed;
> @@ -2595,7 +2618,7 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
>  	unsigned long operand_flags, operand2_flags;
>  	int field_op, ret = -EINVAL;
>  	char *sep, *operand1_str;
> -	hist_field_fn_t op_fn;
> +	enum hist_field_fn op_fn;
>  	bool combine_consts;
>  
>  	if (*n_subexprs > 3) {
> @@ -2654,16 +2677,16 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
>  
>  	switch (field_op) {
>  	case FIELD_OP_MINUS:
> -		op_fn = hist_field_minus;
> +		op_fn = HIST_FIELD_FN_MINUS;
>  		break;
>  	case FIELD_OP_PLUS:
> -		op_fn = hist_field_plus;
> +		op_fn = HIST_FIELD_FN_PLUS;
>  		break;
>  	case FIELD_OP_DIV:
> -		op_fn = hist_field_div;
> +		op_fn = HIST_FIELD_FN_DIV;
>  		break;
>  	case FIELD_OP_MULT:
> -		op_fn = hist_field_mult;
> +		op_fn = HIST_FIELD_FN_MULT;
>  		break;
>  	default:
>  		ret = -EINVAL;
> @@ -2725,7 +2748,8 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
>  		if (var2)
>  			expr->operands[1] = var2;
>  
> -		expr->constant = op_fn(expr, NULL, NULL, NULL, NULL);
> +		expr->fn_num = op_fn;
> +		expr->constant = hist_fn_call(expr, NULL, NULL, NULL, NULL);
>  
>  		expr->operands[0] = NULL;
>  		expr->operands[1] = NULL;
> @@ -2739,7 +2763,7 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
>  
>  		expr->name = expr_str(expr, 0);
>  	} else {
> -		expr->fn = op_fn;
> +		expr->fn_num = op_fn;
>  
>  		/* The operand sizes should be the same, so just pick one */
>  		expr->size = operand1->size;
> @@ -3065,7 +3089,7 @@ static inline void __update_field_vars(struct tracing_map_elt *elt,
>  		struct hist_field *var = field_var->var;
>  		struct hist_field *val = field_var->val;
>  
> -		var_val = val->fn(val, elt, buffer, rbe, rec);
> +		var_val = hist_fn_call(val, elt, buffer, rbe, rec);
>  		var_idx = var->var.idx;
>  
>  		if (val->flags & HIST_FIELD_FL_STRING) {
> @@ -4186,6 +4210,74 @@ static u64 hist_field_execname(struct hist_field *hist_field,
>  	return (u64)(unsigned long)(elt_data->comm);
>  }
>  
> +static u64 hist_fn_call(struct hist_field *hist_field,
> +			struct tracing_map_elt *elt,
> +			struct trace_buffer *buffer,
> +			struct ring_buffer_event *rbe,
> +			void *event)
> +{
> +	switch (hist_field->fn_num) {
> +	case HIST_FIELD_FN_VAR_REF:
> +		return hist_field_var_ref(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_COUNTER:
> +		return hist_field_counter(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_CONST:
> +		return hist_field_const(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_LOG2:
> +		return hist_field_log2(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_BUCKET:
> +		return hist_field_bucket(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_TIMESTAMP:
> +		return hist_field_timestamp(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_CPU:
> +		return hist_field_cpu(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_STRING:
> +		return hist_field_string(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_DYNSTRING:
> +		return hist_field_dynstring(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_RELDYNSTRING:
> +		return hist_field_reldynstring(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_PSTRING:
> +		return hist_field_pstring(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_S64:
> +		return hist_field_s64(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_U64:
> +		return hist_field_u64(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_S32:
> +		return hist_field_s32(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_U32:
> +		return hist_field_u32(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_S16:
> +		return hist_field_s16(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_U16:
> +		return hist_field_u16(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_S8:
> +		return hist_field_s8(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_U8:
> +		return hist_field_u8(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_UMINUS:
> +		return hist_field_unary_minus(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_MINUS:
> +		return hist_field_minus(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_PLUS:
> +		return hist_field_plus(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_DIV:
> +		return hist_field_div(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_MULT:
> +		return hist_field_mult(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_DIV_POWER2:
> +		return div_by_power_of_two(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_DIV_NOT_POWER2:
> +		return div_by_not_power_of_two(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_DIV_MULT_SHIFT:
> +		return div_by_mult_and_shift(hist_field, elt, buffer, rbe, event);
> +	case HIST_FIELD_FN_EXECNAME:
> +		return hist_field_execname(hist_field, elt, buffer, rbe, event);
> +	default:
> +		return 0;
> +	}
> +}
> +
>  /* Convert a var that points to common_pid.execname to a string */
>  static void update_var_execname(struct hist_field *hist_field)
>  {
> @@ -4197,7 +4289,7 @@ static void update_var_execname(struct hist_field *hist_field)
>  	kfree_const(hist_field->type);
>  	hist_field->type = "char[]";
>  
> -	hist_field->fn = hist_field_execname;
> +	hist_field->fn_num = HIST_FIELD_FN_EXECNAME;
>  }
>  
>  static int create_var_field(struct hist_trigger_data *hist_data,
> @@ -4956,7 +5048,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
>  
>  	for_each_hist_val_field(i, hist_data) {
>  		hist_field = hist_data->fields[i];
> -		hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec);
> +		hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec);
>  		if (hist_field->flags & HIST_FIELD_FL_VAR) {
>  			var_idx = hist_field->var.idx;
>  
> @@ -4987,7 +5079,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
>  	for_each_hist_key_field(i, hist_data) {
>  		hist_field = hist_data->fields[i];
>  		if (hist_field->flags & HIST_FIELD_FL_VAR) {
> -			hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec);
> +			hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec);
>  			var_idx = hist_field->var.idx;
>  			tracing_map_set_var(elt, var_idx, hist_val);
>  		}
> @@ -5062,7 +5154,7 @@ static void event_hist_trigger(struct event_trigger_data *data,
>  					 HIST_STACKTRACE_SKIP);
>  			key = entries;
>  		} else {
> -			field_contents = key_field->fn(key_field, elt, buffer, rbe, rec);
> +			field_contents = hist_fn_call(key_field, elt, buffer, rbe, rec);
>  			if (key_field->flags & HIST_FIELD_FL_STRING) {
>  				key = (void *)(unsigned long)field_contents;
>  				use_compound_key = true;
> -- 
> 2.35.1


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement
  2022-09-05 13:48   ` Masami Hiramatsu
@ 2022-09-05 15:29     ` Steven Rostedt
  2022-09-06 21:38     ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2022-09-05 15:29 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Tom Zanussi

On Mon, 5 Sep 2022 22:48:49 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> Hi Steve,
> 
> I found this crashed the kernel if I ran 
> 
>  # ftracetest test.d/trigger/trigger-hist-expressions.tc
> 
> More specifically, here is the minimum reproduce command.
> 
>  # echo "hist:keys=common_pid:x=16-8-4" > events/sched/sched_process_fork/trigger 
> [   75.874402] BUG: kernel NULL pointer dereference, address: 0000000000000038
> [   75.875378] #PF: supervisor read access in kernel mode
> 
> I investigated it and found you missed to set correct fn_num for
> combined constant field;
> 
> > @@ -2725,7 +2748,8 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
> >  		if (var2)
> >  			expr->operands[1] = var2;
> >  
> > -		expr->constant = op_fn(expr, NULL, NULL, NULL, NULL);
> > +		expr->fn_num = op_fn;
> > +		expr->constant = hist_fn_call(expr, NULL, NULL, NULL, NULL);
> >  
> >  		expr->operands[0] = NULL;
> >  		expr->operands[1] = NULL;  
> 
> Here, we need this;
> 
> 		expr->fn_num = HIST_FIELD_FN_CONST;
> 
> Without this fix, the binary op_fn is kept on this const field and it
> causes the problem because the const field doesn't have operands.
> 

Thanks Masami,

Today is a US holiday, and I'm only cleaning out some email (and working on
my LPC presentation for tomorrow's speaker's training), but not doing any
other work. I'll look at this tomorrow.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement
  2022-09-05 13:48   ` Masami Hiramatsu
  2022-09-05 15:29     ` Steven Rostedt
@ 2022-09-06 21:38     ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2022-09-06 21:38 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Tom Zanussi

On Mon, 5 Sep 2022 22:48:49 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > @@ -2725,7 +2748,8 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
> >  		if (var2)
> >  			expr->operands[1] = var2;
> >  
> > -		expr->constant = op_fn(expr, NULL, NULL, NULL, NULL);
> > +		expr->fn_num = op_fn;
> > +		expr->constant = hist_fn_call(expr, NULL, NULL, NULL, NULL);
> >  
> >  		expr->operands[0] = NULL;
> >  		expr->operands[1] = NULL;  
> 
> Here, we need this;
> 
> 		expr->fn_num = HIST_FIELD_FN_CONST;
> 
> Without this fix, the binary op_fn is kept on this const field and it
> causes the problem because the const field doesn't have operands.


Yep, I added it just below the function.

Thanks, I'll send a v2 (after testing the rest of the patches).

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-06 21:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23 21:46 [PATCH 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
2022-08-23 21:46 ` [PATCH 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
2022-08-23 21:46 ` [PATCH 2/4] tracing/hist: Call hist functions directly via a switch statement Steven Rostedt
2022-09-05 13:48   ` Masami Hiramatsu
2022-09-05 15:29     ` Steven Rostedt
2022-09-06 21:38     ` Steven Rostedt
2022-08-23 21:46 ` [PATCH 3/4] tracing: Move struct filter_pred into trace_events_filter.c Steven Rostedt
2022-08-23 21:46 ` [PATCH 4/4] tracing/filter: Call filter predicate functions directly via a switch statement Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.