linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vincent Donnefort <vdonnefort@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	suleiman@google.com, Thomas Gleixner <tglx@linutronix.de>,
	Vineeth Pillai <vineeth@bitbyteword.org>,
	Youssef Esmat <youssefesmat@google.com>,
	Beau Belgrave <beaub@linux.microsoft.com>,
	Alexander Graf <graf@amazon.com>, Baoquan He <bhe@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	David Howells <dhowells@redhat.com>
Subject: [PATCH 0/8] tracing: Persistent traces across a reboot or crash
Date: Tue, 05 Mar 2024 20:59:10 -0500	[thread overview]
Message-ID: <20240306015910.766510873@goodmis.org> (raw)


This is a way to map a ring buffer instance across reboots.
The requirement is that you have a memory region that is not erased.
I tested this on a Debian VM running on qemu on a Debian server,
and even tested it on a baremetal box running Fedora. I was
surprised that it worked on the baremetal box, but it does so
surprisingly consistently.

The idea is that you can reserve a memory region and save it in two
special variables:

  trace_buffer_start and trace_buffer_size

If these are set by fs_initcall() then a "boot_mapped" instance
is created. The memory that was reserved is used by the ring buffer
of this instance. It acts like a memory mapped instance so it has
some limitations. It does not allow snapshots nor does it allow
tracers which use a snapshot buffer (like irqsoff and wakeup tracers).

On boot up, when setting up the ring buffer, it looks at the current
content and does a vigorous test to see if the content is valid.
It even walks the events in all the sub-buffers to make sure the
ring buffer meta data is correct. If it determines that the content
is valid, it will reconstruct the ring buffer to use the content
it has found.

If the buffer is valid, on the next boot, the boot_mapped instance
will contain the data from the previous boot. You can cat the
trace or trace_pipe file, or even run trace-cmd extract on it to
make a trace.dat file that holds the date. This is much better than
dealing with a ftrace_dump_on_opps (I wish I had this a decade ago!)

There are still some limitations of this buffer. One is that it assumes
that the kernel you are booting back into is the same one that crashed.
At least the trace_events (like sched_switch and friends) all have the
same ids. This would be true with the same kernel as the ids are determined
at link time.

Module events could possible be a problem as the ids may not match.

One idea is to just print the raw fields and not process the print formats
for this instance, as the print formats may do some crazy things with
data that does not match.

Another limitation is any print format that has "%pS" will likely not work.
That's because the pointer in the old ring buffer is for an address that
may be different than the function points to now. I was thinking of
adding a file in the boot_mapped instance that holds the delta of the
old mapping to the new mapping, so that trace-cmd and perf could
calculate the current kallsyms from the old pointers.

Finally, this is still a proof of concept. How to create this memory
mapping isn't decided yet. In this patch set I simply hacked into
kexec crash code and hard coded an address that worked for one of my
machines (for the other machine I had to play around to find another
address). Perhaps we could add a kernel command line parameter that
lets people decided, or an option where it could possibly look at
the ACPI (for intel) tables to come up with an address on its own.

Anyway, I plan on using this for debugging, as it already is pretty
featureful but there's much more that can be done.

Basically, all you need to do is:

  echo 1 > /sys/kernel/tracing/instances/boot_mapped/events/enable

Do what ever you want and the system crashes (and boots to the same
kernel). Then:

  cat /sys/kernel/tracing/instances/boot_mapped/trace

and it will have the trace.

I'm sure there's still some gotchas here, which is why this is currently
still just a POC.

Enjoy...

Steven Rostedt (Google) (8):
      ring-buffer: Allow mapped field to be set without mapping
      ring-buffer: Add ring_buffer_alloc_range()
      tracing: Create "boot_mapped" instance for memory mapped buffer
      HACK: Hard code in mapped tracing buffer address
      ring-buffer: Add ring_buffer_meta data
      ring-buffer: Add output of ring buffer meta page
      ring-buffer: Add test if range of boot buffer is valid
      ring-buffer: Validate boot range memory events

----
 arch/x86/kernel/setup.c     |  20 ++
 include/linux/ring_buffer.h |  17 +
 include/linux/trace.h       |   7 +
 kernel/trace/ring_buffer.c  | 826 ++++++++++++++++++++++++++++++++++++++------
 kernel/trace/trace.c        |  95 ++++-
 kernel/trace/trace.h        |   5 +
 6 files changed, 856 insertions(+), 114 deletions(-)

             reply	other threads:[~2024-03-06  1:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-06  1:59 Steven Rostedt [this message]
2024-03-06  1:59 ` [PATCH 1/8] ring-buffer: Allow mapped field to be set without mapping Steven Rostedt
2024-03-06  1:59 ` [PATCH 2/8] ring-buffer: Add ring_buffer_alloc_range() Steven Rostedt
2024-03-06  1:59 ` [PATCH 3/8] tracing: Create "boot_mapped" instance for memory mapped buffer Steven Rostedt
2024-03-06  1:59 ` [PATCH 4/8] HACK: Hard code in mapped tracing buffer address Steven Rostedt
2024-03-06  1:59 ` [PATCH 5/8] ring-buffer: Add ring_buffer_meta data Steven Rostedt
2024-03-08 10:42   ` kernel test robot
2024-03-08 10:54   ` kernel test robot
2024-03-06  1:59 ` [PATCH 6/8] ring-buffer: Add output of ring buffer meta page Steven Rostedt
2024-03-06  1:59 ` [PATCH 7/8] ring-buffer: Add test if range of boot buffer is valid Steven Rostedt
2024-03-06  1:59 ` [PATCH 8/8] ring-buffer: Validate boot range memory events Steven Rostedt
2024-03-08 15:39   ` kernel test robot
2024-03-06  2:01 ` [POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash Steven Rostedt
2024-03-06  2:01   ` Steven Rostedt
2024-03-09 18:27 ` Kees Cook
2024-03-09 18:51   ` Steven Rostedt
2024-03-09 20:40     ` Kees Cook
2024-03-20  0:44       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240306015910.766510873@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=beaub@linux.microsoft.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=graf@amazon.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=suleiman@google.com \
    --cc=tglx@linutronix.de \
    --cc=vdonnefort@google.com \
    --cc=vineeth@bitbyteword.org \
    --cc=youssefesmat@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).