linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 v2] tracing: Simplify and document the trace event filtering temp buffer code
@ 2021-06-09 22:43 Steven Rostedt
  2021-06-09 22:43 ` [PATCH 1/2 v2] tracing: Simplify the max length test when using the filtering temp buffer Steven Rostedt
  2021-06-09 22:43 ` [PATCH 2/2 v2] tracing: Add better comments for the filtering temp buffer use case Steven Rostedt
  0 siblings, 2 replies; 3+ messages in thread
From: Steven Rostedt @ 2021-06-09 22:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton


When filtering trace events, a temp buffer is used because the extra copy
from the temp buffer into the ring buffer is still faster than the direct
write into the ring buffer followed by a discard if the filter does not
match.

But the data that can be stored in the temp buffer is a PAGE_SIZE minus the
ring buffer event header. The calculation of that header size is complex,
but using the helper macro "struct_size()" can simplify it.

Also, add more documentation about what is going on.

Link: https://lore.kernel.org/stable/CAHk-=whKbJkuVmzb0hD3N6q7veprUrSpiBHRxVY=AffWZPtxmg@mail.gmail.com/

Changes since v1:
  Switch "unlikely" to "likely"
  Test max_len <= instead of just <

Steven Rostedt (VMware) (2):
      tracing: Simplify the max length test when using the filtering temp buffer
      tracing: Add better comments for the filtering temp buffer use case

----
 kernel/trace/trace.c | 40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2 v2] tracing: Simplify the max length test when using the filtering temp buffer
  2021-06-09 22:43 [PATCH 0/2 v2] tracing: Simplify and document the trace event filtering temp buffer code Steven Rostedt
@ 2021-06-09 22:43 ` Steven Rostedt
  2021-06-09 22:43 ` [PATCH 2/2 v2] tracing: Add better comments for the filtering temp buffer use case Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2021-06-09 22:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

When filtering trace events, a temp buffer is used because the extra copy
from the temp buffer into the ring buffer is still faster than the direct
write into the ring buffer followed by a discard if the filter does not
match.

But the data that can be stored in the temp buffer is a PAGE_SIZE minus the
ring buffer event header. The calculation of that header size is complex,
but using the helper macro "struct_size()" can simplify it.

Link: https://lore.kernel.org/stable/CAHk-=whKbJkuVmzb0hD3N6q7veprUrSpiBHRxVY=AffWZPtxmg@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
Changes since v1:
  Switch "unlikely" to "likely"
  Test max_len <= instead of just <

 kernel/trace/trace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f6c4d9cf0377..3366922076e6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2737,8 +2737,10 @@ trace_event_buffer_lock_reserve(struct trace_buffer **current_rb,
 	    (trace_file->flags & (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
 	    (entry = this_cpu_read(trace_buffered_event))) {
 		/* Try to use the per cpu buffer first */
+		int max_len = PAGE_SIZE - struct_size(entry, array, 1);
+
 		val = this_cpu_inc_return(trace_buffered_event_cnt);
-		if ((len < (PAGE_SIZE - sizeof(*entry) - sizeof(entry->array[0]))) && val == 1) {
+		if (val == 1 && likely(len <= max_len)) {
 			trace_event_setup(entry, type, trace_ctx);
 			entry->array[0] = len;
 			return entry;
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2 v2] tracing: Add better comments for the filtering temp buffer use case
  2021-06-09 22:43 [PATCH 0/2 v2] tracing: Simplify and document the trace event filtering temp buffer code Steven Rostedt
  2021-06-09 22:43 ` [PATCH 1/2 v2] tracing: Simplify the max length test when using the filtering temp buffer Steven Rostedt
@ 2021-06-09 22:43 ` Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2021-06-09 22:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

When filtering is enabled, the event is copied into a temp buffer instead
of being written into the ring buffer directly, because the discarding of
events from the ring buffer is very expensive, and doing the extra copy is
much faster than having to discard most of the time.

As that logic is subtle, add comments to explain in more detail to what is
going on and how it works.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 3366922076e6..ba502cd9df2f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2736,10 +2736,44 @@ trace_event_buffer_lock_reserve(struct trace_buffer **current_rb,
 	if (!tr->no_filter_buffering_ref &&
 	    (trace_file->flags & (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
 	    (entry = this_cpu_read(trace_buffered_event))) {
-		/* Try to use the per cpu buffer first */
+		/*
+		 * Filtering is on, so try to use the per cpu buffer first.
+		 * This buffer will simulate a ring_buffer_event,
+		 * where the type_len is zero and the array[0] will
+		 * hold the full length.
+		 * (see include/linux/ring-buffer.h for details on
+		 *  how the ring_buffer_event is structured).
+		 *
+		 * Using a temp buffer during filtering and copying it
+		 * on a matched filter is quicker than writing directly
+		 * into the ring buffer and then discarding it when
+		 * it doesn't match. That is because the discard
+		 * requires several atomic operations to get right.
+		 * Copying on match and doing nothing on a failed match
+		 * is still quicker than no copy on match, but having
+		 * to discard out of the ring buffer on a failed match.
+		 */
 		int max_len = PAGE_SIZE - struct_size(entry, array, 1);
 
 		val = this_cpu_inc_return(trace_buffered_event_cnt);
+
+		/*
+		 * Preemption is disabled, but interrupts and NMIs
+		 * can still come in now. If that happens after
+		 * the above increment, then it will have to go
+		 * back to the old method of allocating the event
+		 * on the ring buffer, and if the filter fails, it
+		 * will have to call ring_buffer_discard_commit()
+		 * to remove it.
+		 *
+		 * Need to also check the unlikely case that the
+		 * length is bigger than the temp buffer size.
+		 * If that happens, then the reserve is pretty much
+		 * guaranteed to fail, as the ring buffer currently
+		 * only allows events less than a page. But that may
+		 * change in the future, so let the ring buffer reserve
+		 * handle the failure in that case.
+		 */
 		if (val == 1 && likely(len <= max_len)) {
 			trace_event_setup(entry, type, trace_ctx);
 			entry->array[0] = len;
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-09 22:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 22:43 [PATCH 0/2 v2] tracing: Simplify and document the trace event filtering temp buffer code Steven Rostedt
2021-06-09 22:43 ` [PATCH 1/2 v2] tracing: Simplify the max length test when using the filtering temp buffer Steven Rostedt
2021-06-09 22:43 ` [PATCH 2/2 v2] tracing: Add better comments for the filtering temp buffer use case Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).