linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] Trace events to pstore
@ 2020-09-02 20:00 Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 1/7] tracing: Add support to allocate pages from persistent memory Nachammai Karuppiah
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

Hi,

This patch series adds support to store trace events in pstore.

Storing trace entries in persistent RAM would help in understanding what 
happened just before the system went down. The trace events that led to the 
crash can be retrieved from the pstore after a warm reboot. This will help 
debug what happened before machine’s last breath. This has to be done in a 
scalable way so that tracing a live system does not impact the performance 
of the system.

This requires a new backend - ramtrace that allocates pages from 
persistent storage for the tracing utility. This feature can be enabled
using TRACE_EVENTS_TO_PSTORE.
In this feature, the new backend is used only as a page allocator and 
once the  users chooses to use pstore to record trace entries, the ring 
buffer pages are freed and allocated in pstore. Once this switch is done,
ring_buffer continues to operate just as before without much overhead.
Since the ring buffer uses the persistent RAM buffer directly to record 
trace entries, all tracers would also persist across reboot.

To test this feature, I used a simple module that would call panic during 
a write operation to file in tracefs directory. Before writing to the file, 
the ring buffer is moved to persistent RAM buffer through command line 
as shown below,

$echo 1 > /sys/kernel/tracing/options/persist 

Writing to the file,
$echo 1 > /sys/kernel/tracing/crash/panic_on_write

The above write operation results in system crash. After reboot, once the
pstore is mounted, the trace entries from previous boot are available in file,
/sys/fs/pstore/trace-ramtrace-0

Looking through this file, gives us the stack trace that led to the crash. 

           <...>-1     [001] ....    49.083909: __vfs_write <-vfs_write                         
           <...>-1     [001] ....    49.083933: panic <-panic_on_write                   
           <...>-1     [001] d...    49.084195: printk <-panic                                 
           <...>-1     [001] d...    49.084201: vprintk_func <-printk                          
           <...>-1     [001] d...    49.084207: vprintk_default <-printk                          
           <...>-1     [001] d...    49.084211: vprintk_emit <-printk                          
           <...>-1     [001] d...    49.084216: __printk_safe_enter <-vprintk_emit         
           <...>-1     [001] d...    49.084219: _raw_spin_lock <-vprintk_emit       
           <...>-1     [001] d...    49.084223: vprintk_store <-vprintk_emit                    

Patchwise oneline description is given below:

Patch 1 adds support to allocate ring buffer pages from persistent RAM buffer.

Patch 2 introduces a new backend, ramtrace.

Patch 3 adds methods to read previous boot pages from pstore.

Patch 4 adds the functionality to allocate page-sized memory from pstore.

Patch 5 adds the seq_operation methods to iterate through trace entries.

Patch 6 modifies ring_buffer to allocate from ramtrace when pstore is used.

Patch 7 adds ramtrace DT node as child-node of /reserved-memory. 

Nachammai Karuppiah (7):
  tracing: Add support to allocate pages from persistent memory
  pstore: Support a new backend, ramtrace
  pstore: Read and iterate through trace entries in PSTORE
  pstore: Allocate and free page-sized memory in persistent RAM buffer
  tracing: Add support to iterate through pages retrieved from pstore
  tracing: Use ramtrace alloc and free methods while using persistent
    RAM
  dt-bindings: ramtrace: Add ramtrace DT node

 .../bindings/reserved-memory/ramtrace.txt          |  13 +
 drivers/of/platform.c                              |   1 +
 fs/pstore/Makefile                                 |   2 +
 fs/pstore/inode.c                                  |  46 +-
 fs/pstore/platform.c                               |   1 +
 fs/pstore/ramtrace.c                               | 821 +++++++++++++++++++++
 include/linux/pstore.h                             |   3 +
 include/linux/ramtrace.h                           |  28 +
 include/linux/ring_buffer.h                        |  19 +
 include/linux/trace.h                              |  13 +
 kernel/trace/Kconfig                               |  10 +
 kernel/trace/ring_buffer.c                         | 663 ++++++++++++++++-
 kernel/trace/trace.c                               | 312 +++++++-
 kernel/trace/trace.h                               |   5 +-
 14 files changed, 1924 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/reserved-memory/ramtrace.txt
 create mode 100644 fs/pstore/ramtrace.c
 create mode 100644 include/linux/ramtrace.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/7] tracing: Add support to allocate pages from persistent memory
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
@ 2020-09-02 20:00 ` Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 2/7] pstore: Support a new backend, ramtrace Nachammai Karuppiah
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

Add support in ring buffer to allocate pages from persistent RAM
buffer. This feature supports switching to persistent memory and vice-versa.
A new option 'persist' has been added and once this is enabled, the pages in
ring buffer are freed up and new pages are allocated from persistent
memory.

Signed-off-by: Nachammai Karuppiah <nachukannan@gmail.com>
---
 kernel/trace/Kconfig       |  10 ++
 kernel/trace/ring_buffer.c | 257 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/trace.c       |  12 ++-
 kernel/trace/trace.h       |   3 +-
 4 files changed, 279 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a4020c0..f72a9df 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -739,6 +739,16 @@ config GCOV_PROFILE_FTRACE
 	  Note that on a kernel compiled with this config, ftrace will
 	  run significantly slower.
 
+config TRACE_EVENTS_TO_PSTORE
+	bool "Enable users to store trace records in persistent storage"
+	default n
+	help
+	  This option enables users to store trace records in a
+	  persistent RAM buffer so that they can be retrieved after
+	  system reboot.
+
+	  If unsure, say N.
+
 config FTRACE_SELFTEST
 	bool
 
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f15471c..60b587a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -25,7 +25,7 @@
 #include <linux/list.h>
 #include <linux/cpu.h>
 #include <linux/oom.h>
-
+#include <linux/ramtrace.h>
 #include <asm/local.h>
 
 static void update_pages_handler(struct work_struct *work);
@@ -479,6 +479,9 @@ struct ring_buffer_per_cpu {
 	struct completion		update_done;
 
 	struct rb_irq_work		irq_work;
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	bool				use_pstore;
+#endif
 };
 
 struct trace_buffer {
@@ -513,6 +516,15 @@ struct ring_buffer_iter {
 	int				missed_events;
 };
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+/* This semaphore is being used to ensure that buffer_data_page memory
+ * is not switched to persistent storage or vice versa while a reader page
+ * is swapped out. All consuming reads need to be finished before memory
+ * switch happens.
+ */
+DECLARE_RWSEM(trace_read_sem);
+#endif
+
 /**
  * ring_buffer_nr_pages - get the number of buffer pages in the ring buffer
  * @buffer: The ring_buffer to get the number of pages from
@@ -1705,6 +1717,247 @@ static void update_pages_handler(struct work_struct *work)
 	complete(&cpu_buffer->update_done);
 }
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+static void free_buffer_data_page(struct buffer_data_page *page, int cpu,
+				  bool persist)
+{
+	if (persist)
+		ramtrace_free_page(page, cpu);
+	else
+		free_page((unsigned long)page);
+
+}
+
+static int rb_allocate_persistent_pages(struct buffer_data_page **pages,
+					int nr_pages, int cpu)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; i++) {
+		void *address = ramtrace_alloc_page(cpu);
+
+		if (!address)
+			goto free_pages;
+		pages[i] = address;
+	}
+	return 0;
+
+free_pages:
+	for (i = 0; i < nr_pages; i++)
+		ramtrace_free_page(pages[i], cpu);
+
+	return -ENOMEM;
+}
+
+static int
+rb_allocate_buffer_data_pages(struct buffer_data_page **pages, int nr_pages,
+			      int cpu)
+{
+	bool user_thread = current->mm != NULL;
+	gfp_t mflags;
+	long i;
+
+	/*
+	 * Check if the available memory is there first.
+	 * Note, si_mem_available() only gives us a rough estimate of available
+	 * memory. It may not be accurate. But we don't care, we just want
+	 * to prevent doing any allocation when it is obvious that it is
+	 * not going to succeed.
+	 */
+	i = si_mem_available();
+	if (i < nr_pages)
+		return -ENOMEM;
+
+	/*
+	 * __GFP_RETRY_MAYFAIL flag makes sure that the allocation fails
+	 * gracefully without invoking oom-killer and the system is not
+	 * destabilized.
+	 */
+	mflags = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
+
+	/*
+	 * If a user thread allocates too much, and si_mem_available()
+	 * reports there's enough memory, even though there is not.
+	 * Make sure the OOM killer kills this thread. This can happen
+	 * even with RETRY_MAYFAIL because another task may be doing
+	 * an allocation after this task has taken all memory.
+	 * This is the task the OOM killer needs to take out during this
+	 * loop, even if it was triggered by an allocation somewhere else.
+	 */
+	if (user_thread)
+		set_current_oom_origin();
+	for (i = 0; i < nr_pages; i++) {
+		struct page *page;
+
+		page = alloc_pages_node(cpu_to_node(cpu), mflags, 0);
+		if (!page)
+			goto free_pages;
+		pages[i] = page_address(page);
+		rb_init_page(pages[i]);
+
+		if (user_thread && fatal_signal_pending(current))
+			goto free_pages;
+	}
+
+	if (user_thread)
+		clear_current_oom_origin();
+	return 0;
+free_pages:
+	for (i = 0; i < nr_pages; i++)
+		free_page((unsigned long)pages[i]);
+
+	return -ENOMEM;
+}
+
+static void rb_switch_memory(struct trace_buffer *buffer, bool persist)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+	struct list_head *head;
+	struct buffer_page *bpage;
+	struct buffer_data_page ***new_pages;
+	unsigned long flags;
+	int cpu, nr_pages;
+
+	new_pages = kmalloc_array(buffer->cpus, sizeof(void *), GFP_KERNEL);
+
+	for_each_buffer_cpu(buffer, cpu) {
+		cpu_buffer = buffer->buffers[cpu];
+		nr_pages = cpu_buffer->nr_pages;
+		/* Include the reader page */
+		new_pages[cpu] = kmalloc_array(nr_pages + 1, sizeof(void *), GFP_KERNEL);
+		if (persist) {
+			if (rb_allocate_persistent_pages(new_pages[cpu],
+						nr_pages + 1, cpu) < 0)
+			goto out;
+		} else {
+			if (rb_allocate_buffer_data_pages(new_pages[cpu],
+					nr_pages + 1, cpu) < 0)
+				goto out;
+		}
+	}
+
+	for_each_buffer_cpu(buffer, cpu) {
+		int i = 0;
+
+		cpu_buffer = buffer->buffers[cpu];
+		nr_pages = cpu_buffer->nr_pages;
+		/* Acquire the reader lock to ensure reading is disabled.*/
+		raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+		if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
+			goto out;
+		/* Prevent another thread from grabbing free_page. */
+		arch_spin_lock(&cpu_buffer->lock);
+
+		free_buffer_data_page(cpu_buffer->reader_page->page,
+				      cpu, cpu_buffer->use_pstore);
+		cpu_buffer->reader_page->page = new_pages[cpu][i++];
+		rb_head_page_deactivate(cpu_buffer);
+
+		head = cpu_buffer->pages;
+		if (head) {
+			list_for_each_entry(bpage, head, list) {
+				free_buffer_data_page(bpage->page, cpu,
+						      cpu_buffer->use_pstore);
+				bpage->page = new_pages[cpu][i++];
+				rb_init_page(bpage->page);
+			}
+			bpage = list_entry(head, struct buffer_page, list);
+			free_buffer_data_page(bpage->page, cpu,
+					      cpu_buffer->use_pstore);
+			bpage->page = new_pages[cpu][nr_pages];
+			rb_init_page(bpage->page);
+		}
+		kfree(new_pages[cpu]);
+
+		if (cpu_buffer->free_page) {
+			free_buffer_data_page(cpu_buffer->free_page, cpu,
+					      cpu_buffer->use_pstore);
+			cpu_buffer->free_page = 0;
+		}
+
+		cpu_buffer->use_pstore = persist;
+
+		rb_reset_cpu(cpu_buffer);
+		arch_spin_unlock(&cpu_buffer->lock);
+		raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+	}
+
+	kfree(new_pages);
+	return;
+out:
+	for_each_buffer_cpu(buffer, cpu) {
+		int i = 0;
+
+		cpu_buffer = buffer->buffers[cpu];
+		for (i = 0; i < cpu_buffer->nr_pages + 1; i++) {
+			if (new_pages[cpu][i])
+				free_buffer_data_page(new_pages[cpu][i], cpu,
+						      persist);
+		}
+		kfree(new_pages[cpu]);
+	}
+	kfree(new_pages);
+}
+
+void pstore_tracing_off(void);
+
+/**
+ * ring_buffer_switch_memory - If boolean argument 'persist' is true, switch
+ * to persistent memory and if false, switch to non persistent memory.
+ */
+int
+ring_buffer_switch_memory(struct trace_buffer *buffer, const char *tracer_name,
+			  int clock_id, bool persist)
+{
+	int cpu;
+	int online_cpu = 0;
+	int nr_pages_total = 0;
+
+	if (RB_WARN_ON(buffer, !down_write_trylock(&trace_read_sem)))
+		return -EBUSY;
+
+	if (persist) {
+		/* Quit if there is no reserved ramtrace region available */
+		if (!is_ramtrace_available())
+			return -ENOMEM;
+
+		/* Disable pstore_trace buffers which are used for reading
+		 * previous boot data pages.
+		 */
+		pstore_tracing_off();
+
+		/* Estimate the number of pages needed. */
+		for_each_buffer_cpu(buffer, cpu) {
+			online_cpu++;
+			/* count the reader page as well */
+			nr_pages_total += buffer->buffers[cpu]->nr_pages + 1;
+		}
+		/* Initialize ramtrace pages */
+		if (init_ramtrace_pages(online_cpu, nr_pages_total, tracer_name, clock_id))
+			return -ENOMEM;
+	}
+
+
+	ring_buffer_record_disable(buffer);
+
+	/* Make sure all pending commits have finished */
+	synchronize_rcu();
+
+	/* prevent another thread from changing buffer sizes */
+	mutex_lock(&buffer->mutex);
+
+	rb_switch_memory(buffer, persist);
+
+	mutex_unlock(&buffer->mutex);
+
+	ring_buffer_record_enable(buffer);
+	up_write(&trace_read_sem);
+	return 0;
+
+}
+#endif
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -4716,6 +4969,7 @@ void *ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
 
  out:
 	rb_init_page(bpage);
+	down_read(&trace_read_sem);
 
 	return bpage;
 }
@@ -4753,6 +5007,7 @@ void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data
 
  out:
 	free_page((unsigned long)bpage);
+	up_read(&trace_read_sem);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_free_read_page);
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bb62269..2b3d8e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -48,6 +48,7 @@
 #include <linux/fsnotify.h>
 #include <linux/irq_work.h>
 #include <linux/workqueue.h>
+#include <linux/ramtrace.h>
 
 #include "trace.h"
 #include "trace_output.h"
@@ -265,7 +266,8 @@ unsigned long long ns2usecs(u64 nsec)
 
 /* trace_flags that are default zero for instances */
 #define ZEROED_TRACE_FLAGS \
-	(TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK)
+	(TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK |			\
+	 TRACE_ITER_PERSIST)
 
 /*
  * The global_trace is the descriptor that holds the top-level tracing
@@ -4851,6 +4853,14 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
 		trace_printk_control(enabled);
 	}
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	if (mask == TRACE_ITER_PERSIST) {
+		ring_buffer_switch_memory(tr->array_buffer.buffer,
+					  tr->current_trace->name,
+					  tr->clock_id, enabled);
+	}
+#endif
+
 	return 0;
 }
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 13db400..2a4ab72 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1336,7 +1336,8 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
 		FUNCTION_FLAGS					\
 		FGRAPH_FLAGS					\
 		STACK_FLAGS					\
-		BRANCH_FLAGS
+		BRANCH_FLAGS					\
+		C(PERSIST,		"persist"),
 
 /*
  * By defining C, we can make TRACE_FLAGS a list of bit names
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/7] pstore: Support a new backend, ramtrace
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 1/7] tracing: Add support to allocate pages from persistent memory Nachammai Karuppiah
@ 2020-09-02 20:00 ` Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 3/7] pstore: Read and iterate through trace entries in PSTORE Nachammai Karuppiah
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

ramtrace provides persistent RAM storage for trace, so they can be
recovered after a reboot. This is a child-node of "/reserved-memory",
and is named "ramtrace" after the backend.

ramtrace supports allocation and deallocation of page-sized memory
This functionality is used by ring buffer in trace when the user switches to
persistent storage. The ring buffer writes directly to the persistent RAM and doesn't use
the write API provided by pstore. For these reasons, a new backend is
needed.

Required properties:

- compatible: must be "ramtrace"

- reg: region of memory that is preserved between reboots

Also, can be enabled on kernel command line as,
ramtrace.mem_address=0x100000000 ramtrace.mem_size=0x7ffffff

Signed-off-by: Nachammai Karuppiah <nachukannan@gmail.com>
---
 drivers/of/platform.c    |   1 +
 fs/pstore/Makefile       |   2 +
 fs/pstore/ramtrace.c     | 346 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pstore.h   |   3 +
 include/linux/ramtrace.h |  15 ++
 5 files changed, 367 insertions(+)
 create mode 100644 fs/pstore/ramtrace.c
 create mode 100644 include/linux/ramtrace.h

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 071f04d..1a11f54 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -511,6 +511,7 @@ static const struct of_device_id reserved_mem_matches[] = {
 	{ .compatible = "qcom,rmtfs-mem" },
 	{ .compatible = "qcom,cmd-db" },
 	{ .compatible = "ramoops" },
+	{ .compatile = "ramtrace" },
 	{}
 };
 
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index c270467..4cee2f0 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -18,3 +18,5 @@ obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
 
 pstore_blk-objs += blk.o
 obj-$(CONFIG_PSTORE_BLK)	+= pstore_blk.o
+
+obj-$(CONFIG_TRACE_EVENTS_TO_PSTORE) += ramtrace.o
diff --git a/fs/pstore/ramtrace.c b/fs/pstore/ramtrace.c
new file mode 100644
index 0000000..57f59e0
--- /dev/null
+++ b/fs/pstore/ramtrace.c
@@ -0,0 +1,346 @@
+#include <linux/ring_buffer.h>
+#include <linux/err.h>
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/seq_file.h>
+#include <linux/ramtrace.h>
+#include <generated/utsrelease.h>
+#include <linux/of.h>
+#include <linux/vmalloc.h>
+
+static unsigned long long mem_address;
+module_param_hw(mem_address, ullong, other, 0400);
+MODULE_PARM_DESC(mem_address,
+		"start of reserved RAM used to store trace data");
+
+static ulong mem_size;
+module_param(mem_size, ulong, 0400);
+MODULE_PARM_DESC(mem_size,
+		"size of reserved RAM used to store trace data");
+
+
+struct ramtrace_context {
+	phys_addr_t phys_addr;	/* Physical address of the persistent memory */
+	unsigned long size;     /* size of the persistent memory */
+	void *vaddr;	/* Virtual address of the first page i.e metadata page */
+	int *clock_id;  /* Pointer to clock id in metadata page */
+	char *tracer_name; /* Pointer to tracer name in metadata page. */
+	spinlock_t lock;
+	struct ramtrace_pagelist *freelist;	/* Linked list of free pages */
+	void **bitmap_pages;	/* Array of bitmap pages per CPU */
+	struct tr_persistent_info *persist_info;
+	struct pstore_info pstore;
+	void *base_address;     /* First page available for allocation */
+	int num_bitmap_per_cpu, cpu;
+	int pages_available;
+	int read_buffer_status;
+};
+
+static int ramtrace_pstore_open(struct pstore_info *psi);
+static ssize_t ramtrace_pstore_read(struct pstore_record *record);
+static int ramtrace_pstore_erase(struct pstore_record *record);
+
+static int ramtrace_pstore_write(struct pstore_record *record)
+{
+	return 0;
+}
+
+static void free_persist_info(void);
+
+static struct ramtrace_context trace_ctx = {
+	.size = 0,
+	.pstore = {
+		.owner	= THIS_MODULE,
+		.name	= "ramtrace",
+		.open	= ramtrace_pstore_open,
+		.read	= ramtrace_pstore_read,
+		.write	= ramtrace_pstore_write,
+		.erase	= ramtrace_pstore_erase,
+	},
+	.read_buffer_status = -1,
+};
+
+static int ramtrace_pstore_open(struct pstore_info *psi)
+{
+	/*
+	 * If there is any data to be read from previous boot, turn
+	 * read_buffer_status to 0, to indicate that data is available to be
+	 * read
+	 */
+	if (trace_ctx.persist_info)
+		trace_ctx.read_buffer_status = 0;
+	return 0;
+}
+
+static ssize_t ramtrace_pstore_read(struct pstore_record *record)
+{
+	if (trace_ctx.read_buffer_status)
+		return 0;
+
+	trace_ctx.read_buffer_status = 1;
+
+	record->time.tv_sec = 0;
+	record->time.tv_nsec = 0;
+	record->compressed = false;
+
+	/*
+	 * Set size as non-zero. This is a place holder value since pstore
+	 * doesn't accept zero-sized buffer. The actual buffer size is unknown.
+	 */
+	record->size = PAGE_SIZE;
+	record->id = 0;
+	record->type = PSTORE_TYPE_TRACE;
+	/*
+	 * Since the buffer used by trace isn't contigous, do not provide
+	 * pstore with a buffer. Instead, the data field in pstore_record
+	 * contains a pointer to pstore_trace_seq_ops structure which provides
+	 * the required interface to iterate through the ramtrace pages.
+	 */
+	record->buf = NULL;
+
+	return record->size;
+}
+
+static int ramtrace_pstore_erase(struct pstore_record *record)
+{
+	pstore_tracing_erase();
+	free_persist_info();
+}
+
+static struct platform_device *dummy;
+
+static int ramtrace_parse_dt(struct platform_device *pdev,
+			    struct ramtrace_platform_data *pdata)
+{
+	struct resource *res;
+
+
+	dev_dbg(&pdev->dev, "using Device Tree\n");
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		dev_err(&pdev->dev,
+			"failed to locate DT /reserved-memory resource\n");
+		return -EINVAL;
+	}
+
+	pdata->mem_size = resource_size(res);
+	pdata->mem_address = res->start;
+
+	return 0;
+}
+
+static int ramtrace_init_mem(struct ramtrace_context *ctx)
+{
+
+	struct page **pages;
+	unsigned int page_count;
+	pgprot_t prot;
+	unsigned int i;
+	struct ramtrace_pagelist *freelist;
+
+	page_count = DIV_ROUND_UP(ctx->size, PAGE_SIZE);
+
+	prot = pgprot_noncached(PAGE_KERNEL);
+
+	pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL);
+	if (!pages) {
+		pr_err("%s: Failed to allocate array for %u pages\n",
+		       __func__, page_count);
+		return 0;
+	}
+	freelist = kzalloc(sizeof(struct ramtrace_pagelist), GFP_KERNEL);
+	INIT_LIST_HEAD(&freelist->list);
+	trace_ctx.freelist = freelist;
+	for (i = 0; i < page_count; i++) {
+		phys_addr_t addr = ctx->phys_addr + i * PAGE_SIZE;
+
+		pages[i] = pfn_to_page(addr >> PAGE_SHIFT);
+	}
+
+	ctx->vaddr = vmap(pages, page_count, VM_MAP, prot);
+
+	/* Initialize the freelist - free page pool.
+	 * Note - This doesn't initialize the page.
+	 */
+	for (i = 0; i < page_count; i++) {
+		struct ramtrace_pagelist *freelist_node;
+		void *addr = ctx->vaddr + i * PAGE_SIZE;
+
+		freelist_node = kmalloc(sizeof(*freelist_node), GFP_KERNEL);
+		freelist_node->page = addr;
+		list_add_tail(&freelist_node->list, &freelist->list);
+	}
+	spin_lock_init(&ctx->lock);
+
+	/* Read the data from previous boot, if any */
+	ramtrace_read_pages();
+	kfree(pages);
+	return 1;
+}
+
+static int ramtrace_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct ramtrace_platform_data *pdata = dev->platform_data;
+	struct ramtrace_platform_data pdata_local;
+	struct ramtrace_context *cxt = &trace_ctx;
+	int err = -EINVAL;
+
+	/*
+	 * Only a single ramtrace area allowed at a time, so fail extra
+	 * probes.
+	 */
+	if (cxt->size) {
+		pr_err("already initialized\n");
+		goto fail_out;
+	}
+
+	if (dev_of_node(dev) && !pdata) {
+		pdata = &pdata_local;
+		memset(pdata, 0, sizeof(*pdata));
+
+		err = ramtrace_parse_dt(pdev, pdata);
+		if (err < 0)
+			goto fail_out;
+	}
+
+	/* Make sure we didn't get bogus platform data pointer. */
+	if (!pdata) {
+		pr_err("NULL platform data\n");
+		goto fail_out;
+	}
+
+	if (!pdata->mem_size) {
+		pr_err("The memory size must be non-zero\n");
+		goto fail_out;
+	}
+
+	cxt->size = pdata->mem_size;
+	cxt->phys_addr = pdata->mem_address;
+
+	err = ramtrace_init_mem(cxt);
+
+	/*
+	 * Update the module parameter variables as well so they are visible
+	 * through /sys/module/ramtrace/parameters/
+	 */
+	mem_size = pdata->mem_size;
+	mem_address = pdata->mem_address;
+
+	pr_info("using 0x%lx@0x%llx\n",
+		cxt->size, (unsigned long long)cxt->phys_addr);
+
+	/* Initialize struct pstore_info and register with pstore */
+	cxt->pstore.flags = PSTORE_FLAGS_TRACE;
+	cxt->pstore.data = &pstore_trace_seq_ops;
+	err = pstore_register(&cxt->pstore);
+	if (err) {
+		pr_err("registering with pstore failed\n");
+		goto fail_out;
+	}
+
+	return 0;
+
+fail_out:
+	return err;
+
+}
+
+static int ramtrace_remove(struct platform_device *pdev)
+{
+	struct ramtrace_context *cxt = &trace_ctx;
+	struct ramtrace_pagelist *freelist = cxt->freelist;
+
+	pstore_unregister(&cxt->pstore);
+
+	cxt->pstore.bufsize = 0;
+
+	pstore_tracing_erase();
+	free_persist_info();
+
+	if (!list_empty(&freelist->list)) {
+		struct ramtrace_pagelist *node, *tmp;
+
+		list_for_each_entry_safe(node, tmp, &freelist->list, list) {
+			list_del(&node->list);
+			kfree(node);
+		}
+	}
+	cxt->size = 0;
+	return 0;
+}
+
+static const struct of_device_id dt_match[] = {
+	{ .compatible = "ramtrace" },
+	{}
+};
+
+static struct platform_driver ramtrace_driver = {
+	.probe		= ramtrace_probe,
+	.remove		= ramtrace_remove,
+	.driver		= {
+		.name		= "ramtrace",
+		.of_match_table	= dt_match,
+	},
+};
+
+static inline void ramtrace_unregister_dummy(void)
+{
+	platform_device_unregister(dummy);
+	dummy = NULL;
+}
+
+static void __init ramtrace_register_dummy(void)
+{
+	struct ramtrace_platform_data pdata;
+
+	/*
+	 * Prepare a dummy platform data structure to carry the module
+	 * parameters. If mem_size isn't set, then there are no module
+	 * parameters, and we can skip this.
+	 */
+	if (!mem_size)
+		return;
+
+	pr_info("using module parameters\n");
+
+	memset(&pdata, 0, sizeof(pdata));
+	pdata.mem_size = mem_size;
+	pdata.mem_address = mem_address;
+
+
+	dummy = platform_device_register_data(NULL, "ramtrace", -1,
+			&pdata, sizeof(pdata));
+	if (IS_ERR(dummy)) {
+		pr_info("could not create platform device: %ld\n",
+			PTR_ERR(dummy));
+		dummy = NULL;
+		ramtrace_unregister_dummy();
+	}
+}
+
+static int __init ramtrace_init(void)
+{
+	int ret;
+
+	ramtrace_register_dummy();
+	ret = platform_driver_register(&ramtrace_driver);
+	if (ret != 0)
+		ramtrace_unregister_dummy();
+
+	return ret;
+}
+postcore_initcall(ramtrace_init);
+
+static void __exit ramtrace_exit(void)
+{
+	platform_driver_unregister(&ramtrace_driver);
+	ramtrace_unregister_dummy();
+}
+module_exit(ramtrace_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nachammai Karuppiah nachukannan@gmail.com");
+MODULE_DESCRIPTION("RAM trace buffer manager/driver");
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index eb93a54..20bae10 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -39,6 +39,8 @@ enum pstore_type_id {
 	PSTORE_TYPE_PMSG	= 7,
 	PSTORE_TYPE_PPC_OPAL	= 8,
 
+	PSTORE_TYPE_TRACE	= 9,
+
 	/* End of the list */
 	PSTORE_TYPE_MAX
 };
@@ -202,6 +204,7 @@ struct pstore_info {
 #define PSTORE_FLAGS_CONSOLE	BIT(1)
 #define PSTORE_FLAGS_FTRACE	BIT(2)
 #define PSTORE_FLAGS_PMSG	BIT(3)
+#define PSTORE_FLAGS_TRACE	BIT(4)
 
 extern int pstore_register(struct pstore_info *);
 extern void pstore_unregister(struct pstore_info *);
diff --git a/include/linux/ramtrace.h b/include/linux/ramtrace.h
new file mode 100644
index 0000000..faf459f
--- /dev/null
+++ b/include/linux/ramtrace.h
@@ -0,0 +1,15 @@
+#include <linux/list.h>
+#include <linux/trace.h>
+#include <linux/trace_events.h>
+#include <linux/pstore.h>
+
+/*
+ * Ramoops platform data
+ * @mem_size	memory size for ramtrace
+ * @mem_address	physical memory address to contain ramtrace
+ */
+
+struct ramtrace_platform_data {
+	unsigned long	mem_size;
+	phys_addr_t	mem_address;
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/7] pstore: Read and iterate through trace entries in PSTORE
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 1/7] tracing: Add support to allocate pages from persistent memory Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 2/7] pstore: Support a new backend, ramtrace Nachammai Karuppiah
@ 2020-09-02 20:00 ` Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 4/7] pstore: Allocate and free page-sized memory in persistent RAM buffer Nachammai Karuppiah
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

Incase of trace records that are being read from pstore, the numerical
*pos is not enough to hold state from session to session. A trace
iterator is better suited for this. So for PSTORE_TYPE_TRACE record
types, the seq_file->private field holds pointer to trace_iterator and not
pstore_private. For this reason, PSTORE_TYPE_TRACE requires a different
file_operations implementation, pstore_trace_file_operations.

The method, ramtrace_read_pages initiates the retrieval of pages from
pstore.
The first page in the pstore is the metadata page. Below is the layout
of the metadata page.
  +------------------------------------------+
  | Kernel Version                           |
  +------------------------------------------+
  | tracer_name                              |
  +------------------------------------------+
  | Number of CPU’s Buffers = N              |
  +------------------------------------------+
  | trace_clock_name                         |
  +------------------------------------------+
  | Number of bitmap pages per cpu	     |
  +------------------------------------------+
The metadata page is then followed by bitmap pages which is a bitmap of
pages that are allocated per CPU. From the bitmaps, the list of
pages per cpu is computed and ordered by timestamp.

Signed-off-by: Nachammai Karuppiah <nachukannan@gmail.com>
---
 fs/pstore/inode.c    |  46 +++++++++++++++--
 fs/pstore/platform.c |   1 +
 fs/pstore/ramtrace.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 181 insertions(+), 5 deletions(-)

diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index c331efe..173269a 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -147,13 +147,20 @@ static int pstore_file_open(struct inode *inode, struct file *file)
 	if (ps->record->type == PSTORE_TYPE_FTRACE)
 		sops = &pstore_ftrace_seq_ops;
 
+	if (ps->record->type == PSTORE_TYPE_TRACE)
+		sops = ps->record->psi->data;
+
 	err = seq_open(file, sops);
 	if (err < 0)
 		return err;
-
-	sf = file->private_data;
-	sf->private = ps;
-
+	/* Incase of PSTORE_TYPE_TRACE, the private field in seq_file
+	 * would be initialized later during seq_read and freed in
+	 * seq_release.
+	 */
+	if (ps->record->type != PSTORE_TYPE_TRACE) {
+		sf = file->private_data;
+		sf->private = ps;
+	}
 	return 0;
 }
 
@@ -173,6 +180,30 @@ static const struct file_operations pstore_file_operations = {
 	.release	= seq_release,
 };
 
+static ssize_t pstore_trace_file_read(struct file *file, char __user *userbuf,
+						size_t count, loff_t *ppos)
+{
+	return seq_read(file, userbuf, count, ppos);
+}
+
+extern int pstore_tracing_release(void *v);
+
+static int pstore_trace_file_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *s = file->private_data;
+	void *v = s->private;
+
+	pstore_tracing_release(v);
+	return seq_release(inode, file);
+}
+
+static const struct file_operations pstore_trace_file_operations = {
+	.open		= pstore_file_open,
+	.read		= pstore_trace_file_read,
+	.llseek		= pstore_file_llseek,
+	.release	= pstore_trace_file_release,
+};
+
 /*
  * When a file is unlinked from our file system we call the
  * platform driver to erase the record from persistent store.
@@ -369,7 +400,6 @@ int pstore_mkfile(struct dentry *root, struct pstore_record *record)
 	if (!inode)
 		goto fail;
 	inode->i_mode = S_IFREG | 0444;
-	inode->i_fop = &pstore_file_operations;
 	scnprintf(name, sizeof(name), "%s-%s-%llu%s",
 			pstore_type_to_name(record->type),
 			record->psi->name, record->id,
@@ -386,6 +416,12 @@ int pstore_mkfile(struct dentry *root, struct pstore_record *record)
 	private->dentry = dentry;
 	private->record = record;
 	inode->i_size = private->total_size = size;
+
+	if (record->type == PSTORE_TYPE_TRACE)
+		inode->i_fop = &pstore_trace_file_operations;
+	else
+		inode->i_fop = &pstore_file_operations;
+
 	inode->i_private = private;
 
 	if (record->time.tv_sec)
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 36714df..e3d5b43 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -58,6 +58,7 @@ static const char * const pstore_type_names[] = {
 	"powerpc-common",
 	"pmsg",
 	"powerpc-opal",
+	"trace",
 };
 
 static int pstore_new_entry;
diff --git a/fs/pstore/ramtrace.c b/fs/pstore/ramtrace.c
index 57f59e0..ca48a76 100644
--- a/fs/pstore/ramtrace.c
+++ b/fs/pstore/ramtrace.c
@@ -61,6 +61,18 @@ static struct ramtrace_context trace_ctx = {
 	.read_buffer_status = -1,
 };
 
+/*
+ * pstore_trace_seq_ops: This consists of the callback functions
+ * required by pstore to iterate through the trace entries
+ */
+
+static struct seq_operations pstore_trace_seq_ops = {
+	.start	= pstore_trace_start,
+	.next	= pstore_trace_next,
+	.stop	= pstore_trace_stop,
+	.show	= pstore_trace_show,
+};
+
 static int ramtrace_pstore_open(struct pstore_info *psi)
 {
 	/*
@@ -131,6 +143,133 @@ static int ramtrace_parse_dt(struct platform_device *pdev,
 	return 0;
 }
 
+static int ramtrace_read_int(int **buffer)
+{
+	int data = **buffer;
+
+	(*buffer)++;
+	return data;
+}
+
+static char *ramtrace_read_string(char **buffer)
+{
+	int len = strlen(*buffer) + 1;
+
+	if (len > 1) {
+		char *s = kmalloc(len, GFP_KERNEL);
+
+		strncpy(s, *buffer, len);
+		*buffer = (*buffer) + len;
+		return s;
+	}
+	return NULL;
+
+}
+
+static struct list_head*
+ramtrace_read_bitmap_per_cpu(int n_bitmap, unsigned long long *bitmap,
+			     void *first_page)
+{
+	int j, k;
+	struct list_head *pages = kmalloc(sizeof(struct list_head), GFP_KERNEL);
+
+	INIT_LIST_HEAD(pages);
+	for (k = 0; k <  n_bitmap; k++) {
+		bitmap = (void *)(bitmap) + PAGE_SIZE * k;
+		for (j = 0; j < PAGE_SIZE/sizeof(long long); j++) {
+			struct ramtrace_pagelist *list_page;
+			unsigned long long k = bitmap[j];
+			int count = 0;
+			int nth = j * sizeof(k) * 8;
+
+			while (k) {
+				if (k & 1) {
+					list_page = kzalloc(sizeof(*list_page),
+							    GFP_KERNEL);
+					list_page->page = (first_page +
+						     (nth + count) * PAGE_SIZE);
+					list_add_tail(&list_page->list, pages);
+				}
+				count++;
+				k = k >> 1;
+			}
+
+		}
+	}
+	if (list_empty(pages))
+		return NULL;
+	return pages;
+}
+
+/* ramtrace_read_bitmap: Read bitmap pages.
+ *
+ * Read bitmap pages from previous boot and order the
+ * buffer_data_page as per the timestamp.
+ */
+static void
+ramtrace_read_bitmap(int n_cpu, int n_bitmap, struct list_head **per_cpu)
+{
+	int i;
+	void *first_bitmap = trace_ctx.vaddr + PAGE_SIZE;
+	void *base_addr = first_bitmap + (n_cpu * n_bitmap * PAGE_SIZE);
+	struct list_head *per_cpu_list;
+
+	for (i = 0; i < n_cpu; i++) {
+		void *bitmap_addr = first_bitmap + i * n_bitmap * PAGE_SIZE;
+
+		per_cpu_list = ramtrace_read_bitmap_per_cpu(n_bitmap,
+							bitmap_addr, base_addr);
+		if (per_cpu_list) {
+			ring_buffer_order_pages(per_cpu_list);
+			if (list_empty(per_cpu_list))
+				per_cpu[i] = NULL;
+			else
+				per_cpu[i] = per_cpu_list;
+		} else
+			per_cpu[i] = NULL;
+
+	}
+
+}
+
+static void ramtrace_read_pages(void)
+{
+	void *metapage = trace_ctx.vaddr;
+	const char current_kernel_version[] = UTS_RELEASE;
+
+
+	int n_cpu = ramtrace_read_int((int **)&metapage);
+	int trace_clock = ramtrace_read_int((int **)&metapage);
+	int n_bitmap = ramtrace_read_int((int **)&metapage);
+	char *kernel_version = ramtrace_read_string((char **)&metapage);
+	char *tracer = ramtrace_read_string((char **)&metapage);
+	struct tr_persistent_info *persist = NULL;
+
+	if (kernel_version && tracer) {
+		struct list_head **per_cpu_list;
+
+		/* If we have booted with a different version of OS, then
+		 * do not try to read from persistent store
+		 */
+		if (strcmp(kernel_version, current_kernel_version)) {
+			pr_err("Booted with a different version of OS. "
+				"The trace in PSTORE pertains to kernel "
+				"version %s.\n", kernel_version);
+			goto out;
+		}
+		per_cpu_list = kmalloc_array(n_cpu,
+				sizeof(struct list_head *), GFP_KERNEL);
+		ramtrace_read_bitmap(n_cpu, n_bitmap, per_cpu_list);
+		persist = kzalloc(sizeof(*persist), GFP_KERNEL);
+
+		persist->tracer_name = tracer;
+		persist->trace_clock = trace_clock;
+		persist->nr_cpus = n_cpu;
+		persist->data_pages = per_cpu_list;
+	}
+out:
+	trace_ctx.persist_info = persist;
+}
 static int ramtrace_init_mem(struct ramtrace_context *ctx)
 {
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 4/7] pstore: Allocate and free page-sized memory in persistent RAM buffer
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
                   ` (2 preceding siblings ...)
  2020-09-02 20:00 ` [RFC PATCH 3/7] pstore: Read and iterate through trace entries in PSTORE Nachammai Karuppiah
@ 2020-09-02 20:00 ` Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 5/7] tracing: Add support to iterate through pages retrieved from pstore Nachammai Karuppiah
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

ramtrace backend acts as a page allocator and manages
the persistent RAM buffer.

ramtrace supports allocation and deallocation of page-sized memory
through methods, ramtrace_alloc_page and ramtrace_free_page.
This functionality is required by ring buffer in trace when the user
switches to persistent storage.

Just prior to allocating pages for recording trace entries, ramtrace
backend frees up the list used to maintain pages pertaining to previous
boot. After this, reading previous boot trace entries from
/sys/fs/pstore is disabled.

Signed-off-by: Nachammai Karuppiah <nachukannan@gmail.com>
---
 fs/pstore/ramtrace.c     | 336 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ramtrace.h |  13 ++
 2 files changed, 349 insertions(+)

diff --git a/fs/pstore/ramtrace.c b/fs/pstore/ramtrace.c
index ca48a76..de6d09e8 100644
--- a/fs/pstore/ramtrace.c
+++ b/fs/pstore/ramtrace.c
@@ -19,6 +19,17 @@ module_param(mem_size, ulong, 0400);
 MODULE_PARM_DESC(mem_size,
 		"size of reserved RAM used to store trace data");
 
+struct ramtrace_pagelist {
+	struct list_head list;
+	void *page;
+};
+
+struct tr_persistent_info {
+	char                    *tracer_name;
+	int		       trace_clock;
+	unsigned int            nr_cpus;
+	struct list_head        **data_pages;
+};
 
 struct ramtrace_context {
 	phys_addr_t phys_addr;	/* Physical address of the persistent memory */
@@ -37,6 +48,50 @@ struct ramtrace_context {
 	int read_buffer_status;
 };
 
+/*
+ * The first page in the ramtrace area is the metadata page, followed by
+ * bitmap pages and then the buffer_data_page allocated by trace.
+ * Each bitmap page can represent upto page_size * 8 number of pages.
+ * The number of bitmaps needed per cpu is determined by the size of the
+ * pstore memory. Each CPU is allocated sufficient bitmap pages to represent
+ * the entire memory region.
+ * The figure below illustrates how the ramtrace memory area is organized.
+ *
+ * +------------------------------------------+
+ * | metadata                                 |
+ * +------------------------------------------+
+ * | CPU 1 Bitmap 1 to buffer pages           |
+ * +------------------------------------------+
+ * | CPU 1 Bitmap 2 to buffer pages           |
+ * +------------------------------------------+
+ * | . . .                                    |
+ * +------------------------------------------+
+ * | CPU 1 Bitmap K to buffer pages           |
+ * +------------------------------------------+
+ * | CPU 2 Bitmap 1 to buffer pages           |
+ * +------------------------------------------+
+ * | CPU 2 Bitmap 2 to buffer pages           |
+ * +------------------------------------------+
+ * | . . .                                    |
+ * +------------------------------------------+
+ * | CPU 2 Bitmap K to buffer pages           |
+ * +------------------------------------------+
+ * | . . .      . . .                         |
+ * +------------------------------------------+
+ * | CPU N Bitmap K to buffer pages           |
+ * +------------------------------------------+
+ * | buffer_data_page 1 belonging to any CPU  |
+ * +------------------------------------------+
+ * | buffer_data_page 2 belonging to any CPU  |
+ * +------------------------------------------+
+ * | . . .                                    |
+ * +------------------------------------------+
+ * | buffer_data_page (K x 4096)              |
+ * | belonging to any CPU                     |
+ * +------------------------------------------+
+ */
+
+static void ramtrace_read_pages(void);
 static int ramtrace_pstore_open(struct pstore_info *psi);
 static ssize_t ramtrace_pstore_read(struct pstore_record *record);
 static int ramtrace_pstore_erase(struct pstore_record *record);
@@ -122,6 +177,227 @@ static int ramtrace_pstore_erase(struct pstore_record *record)
 
 static struct platform_device *dummy;
 
+bool is_ramtrace_available(void)
+{
+	return (trace_ctx.size > 0) ? 1 : 0;
+}
+
+int ramtrace_available_mem(void)
+{
+	return trace_ctx.pages_available;
+}
+
+/**
+ * ramtrace_init_bitmap: Initialize bitmap pages.
+ *
+ * This method allocates and initializes bitmap pages.
+ */
+static void ramtrace_init_bitmap(unsigned int npages)
+{
+	int i;
+	unsigned long flags;
+	struct ramtrace_pagelist *freelist = trace_ctx.freelist;
+
+	trace_ctx.bitmap_pages = kmalloc_array(npages, sizeof(void *),
+					       GFP_KERNEL);
+	spin_lock_irqsave(&trace_ctx.lock, flags);
+	for (i = 0; i < npages; i++) {
+		struct ramtrace_pagelist *freelist_node;
+		void *page;
+
+		freelist_node = list_next_entry(freelist, list);
+		page = freelist_node->page;
+		memset(page, 0, PAGE_SIZE);
+		trace_ctx.bitmap_pages[i] = page;
+		list_del(&freelist_node->list);
+		kfree(freelist_node);
+	}
+	spin_unlock_irqrestore(&trace_ctx.lock, flags);
+	trace_ctx.base_address = trace_ctx.bitmap_pages[npages - 1] + PAGE_SIZE;
+}
+
+
+static void ramtrace_write_int(int **buffer, int n)
+{
+	**buffer = n;
+	(*buffer)++;
+}
+
+void ramtrace_set_clock_id(int clock_id)
+{
+	*(trace_ctx.clock_id) = clock_id;
+}
+
+void ramtrace_set_tracer_name(const char *tracer_name)
+{
+	sprintf(trace_ctx.tracer_name, "%s", tracer_name);
+}
+
+/*
+ * init_ramtrace_pages: Initialize metadata page, bitmap and trace context.
+ *
+ * Below is the layout of the metadata page.
+ * +------------------------------------------+
+ * | Kernel Version                           |
+ * +------------------------------------------+
+ * | tracer_name                              |
+ * +------------------------------------------+
+ * | Number of CPU’s Buffers = N              |
+ * +------------------------------------------+
+ * | trace_clock_name                         |
+ * +------------------------------------------+
+ * | pages per cpu			      |
+ * +------------------------------------------+
+ */
+
+int
+init_ramtrace_pages(int cpu, unsigned long npages, const char *tracer_name,
+		    int clock_id)
+{
+	const char kernel_version[] = UTS_RELEASE;
+	struct ramtrace_pagelist *freelist_node;
+	void *metapage;
+	unsigned long flags;
+	int n_bitmap = 0;
+	int ramtrace_pages;
+
+
+	ramtrace_pages = (trace_ctx.size >> PAGE_SHIFT) - 1;
+
+	/* Calculate number of bitmap pages required for npages */
+	n_bitmap = ramtrace_pages / ((PAGE_SIZE << 3) + cpu);
+
+	if (ramtrace_pages % (PAGE_SIZE << 3) > cpu)
+		n_bitmap++;
+	if (ramtrace_pages - n_bitmap < npages)
+		return 1;
+
+	spin_lock_irqsave(&trace_ctx.lock, flags);
+	freelist_node = list_next_entry(trace_ctx.freelist, list);
+	metapage = freelist_node->page;
+	list_del(&freelist_node->list);
+	spin_unlock_irqrestore(&trace_ctx.lock, flags);
+
+	pstore_tracing_erase();
+	free_persist_info();
+
+	/* Initialize metadata page */
+	ramtrace_write_int((int **)&metapage, cpu);
+	trace_ctx.clock_id = (int *)metapage;
+	ramtrace_write_int((int **)&metapage, clock_id);
+	ramtrace_write_int((int **)&metapage, n_bitmap);
+	sprintf(metapage, "%s", kernel_version);
+	metapage += strlen(kernel_version) + 1;
+	trace_ctx.tracer_name = (char *)metapage;
+	sprintf(metapage, "%s", tracer_name);
+
+	kfree(freelist_node);
+	trace_ctx.cpu = cpu;
+	trace_ctx.num_bitmap_per_cpu = n_bitmap;
+	trace_ctx.pages_available = ramtrace_pages - n_bitmap;
+	ramtrace_init_bitmap(cpu * n_bitmap);
+	return 0;
+}
+
+static void ramtrace_set_bit(char *bitmap, int index)
+{
+	bitmap[index >> 3] |= (1 << index % 8);
+}
+
+static bool ramtrace_is_allocated(char *bitmap, int index)
+{
+	return bitmap[index >> 3] & (1 << index % 8);
+}
+
+static void ramtrace_reset_bit(char *bitmap, int index)
+{
+	bitmap[index >> 3] &= ~(1 << index % 8);
+}
+
+
+void *ramtrace_alloc_page(int cpu)
+{
+	void *address = NULL;
+	struct ramtrace_pagelist *freelist = trace_ctx.freelist;
+
+	if (!list_empty(&freelist->list)) {
+		struct ramtrace_pagelist *freelist_node;
+		char *bitmap_page;
+		unsigned long page_num;
+		unsigned long flags;
+		int index, bitmap_page_index;
+
+		/* Acquire lock and obtain a page from freelist */
+		spin_lock_irqsave(&trace_ctx.lock, flags);
+		freelist_node = list_next_entry(freelist, list);
+		list_del(&freelist_node->list);
+		trace_ctx.pages_available--;
+		spin_unlock_irqrestore(&trace_ctx.lock, flags);
+
+		address = freelist_node->page;
+		memset(address, 0, PAGE_SIZE);
+
+		/* Determine the bitmap index for the allocated page */
+		page_num = (address - trace_ctx.base_address) >> PAGE_SHIFT;
+
+		/* Every bitmap page represents PAGE_SIZE * 8 or
+		 * 1 << (PAGE_SHIFT + 3) pages. Determine the nth bitmap for
+		 * this cpu assosciated with the allocated page address.
+		 */
+		bitmap_page_index = page_num >> (PAGE_SHIFT + 3);
+		bitmap_page = trace_ctx.bitmap_pages[trace_ctx.num_bitmap_per_cpu * cpu + bitmap_page_index];
+		/* Determine the index */
+		index = page_num - (bitmap_page_index << (PAGE_SHIFT + 3));
+
+		ramtrace_set_bit(bitmap_page, index);
+
+	}
+	return address;
+
+}
+
+void ramtrace_free_page(void *page_address, int cpu)
+{
+	void *bitmap;
+	int index;
+
+	/*
+	 * Determine the page number by calculating the offset from the base
+	 * address and divide it by page size.
+	 * Each bitmap can hold page_size * 8 indices. In case we have more
+	 * than one bitmap per cpu, divide page_num by (page size * 8).
+	 */
+	unsigned long page_num = (page_address - trace_ctx.base_address) >> PAGE_SHIFT;
+	int bitmap_page_index = page_num >> (PAGE_SHIFT + 3);
+
+	if (page_address == NULL)
+		return;
+	bitmap = (char *)(trace_ctx.bitmap_pages[trace_ctx.num_bitmap_per_cpu * cpu + bitmap_page_index]);
+	/*
+	 * When a single bitmap per cpu is used, page_num gives the index
+	 * in the bitmap. In case of multiple bitmaps per cpu,
+	 * page_num - bitmap_page_index * page_size * 8 gives the index.
+	 * Note: When page_num is less than (page_size * 8), bitmap_page_index
+	 * is zero.
+	 * */
+	index = page_num - (bitmap_page_index << (PAGE_SHIFT + 3));
+	if (ramtrace_is_allocated(bitmap, index)) {
+		struct ramtrace_pagelist *freelist_node =
+			kmalloc(sizeof(struct ramtrace_pagelist), GFP_KERNEL);
+		unsigned long flags;
+
+		freelist_node->page = page_address;
+		spin_lock_irqsave(&trace_ctx.lock, flags);
+		list_add_tail(&freelist_node->list, &(trace_ctx.freelist->list));
+		trace_ctx.pages_available++;
+		spin_unlock_irqrestore(&trace_ctx.lock, flags);
+
+		ramtrace_reset_bit(bitmap, index);
+	}
+
+}
+
+
 static int ramtrace_parse_dt(struct platform_device *pdev,
 			    struct ramtrace_platform_data *pdata)
 {
@@ -232,6 +508,32 @@ ramtrace_read_bitmap(int n_cpu, int n_bitmap, struct list_head **per_cpu)
 
 }
 
+struct list_head *ramtrace_get_read_buffer(int n_cpu)
+{
+	if (n_cpu >= (trace_ctx.persist_info)->nr_cpus)
+		return NULL;
+
+	return (trace_ctx.persist_info)->data_pages[n_cpu];
+}
+
+int ramtrace_get_prev_boot_nr_cpus(void)
+{
+	return (trace_ctx.persist_info)->nr_cpus;
+}
+
+int ramtrace_get_prev_boot_clock_id(void)
+{
+	return (trace_ctx.persist_info)->trace_clock;
+}
+
+char *ramtrace_get_prev_boot_tracer_name(void)
+{
+	return (trace_ctx.persist_info)->tracer_name;
+}
+
+
+
+
 static void ramtrace_read_pages(void)
 {
 	void *metapage = trace_ctx.vaddr;
@@ -270,6 +572,40 @@ static void ramtrace_read_pages(void)
 out:
 	trace_ctx.persist_info = persist;
 }
+
+/**
+ * free_persist_info - free the list pertaining to previous boot.
+ *
+ * Free the list and array that was allocated to manage previous boot data.
+ * Note: There is no need to free the ramtrace pages memory area.
+ */
+static void free_persist_info(void)
+{
+	struct tr_persistent_info *persist;
+	int i;
+
+	persist = trace_ctx.persist_info;
+
+	if (persist) {
+		for (i = 0; i < persist->nr_cpus; i++) {
+			struct ramtrace_pagelist *node, *tmp;
+			struct list_head *page_list = persist->data_pages[i];
+
+			if (page_list == NULL)
+				continue;
+			list_for_each_entry_safe(node, tmp, page_list, list) {
+				list_del(&node->list);
+				kfree(node);
+			}
+			kfree(page_list);
+		}
+		kfree(persist->data_pages);
+		kfree(persist->tracer_name);
+		kfree(persist);
+	}
+	trace_ctx.persist_info = NULL;
+}
+
 static int ramtrace_init_mem(struct ramtrace_context *ctx)
 {
 
diff --git a/include/linux/ramtrace.h b/include/linux/ramtrace.h
index faf459f..8f9936c 100644
--- a/include/linux/ramtrace.h
+++ b/include/linux/ramtrace.h
@@ -13,3 +13,16 @@ struct ramtrace_platform_data {
 	unsigned long	mem_size;
 	phys_addr_t	mem_address;
 };
+
+void *ramtrace_alloc_page(int cpu);
+void ramtrace_free_page(void *address, int cpu);
+void ramtrace_dump(void);
+int init_ramtrace_pages(int cpu, unsigned long npages,
+			const char *tracer_name, int clock_id);
+bool is_ramtrace_available(void);
+struct list_head *ramtrace_get_read_buffer(int cpu);
+char *ramtrace_get_prev_boot_tracer_name(void);
+int ramtrace_get_prev_boot_clock_id(void);
+int ramtrace_get_prev_boot_nr_cpus(void);
+int ramtrace_available_mem(void);
+void ramtrace_set_tracer_name(const char *tracer_name);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 5/7] tracing: Add support to iterate through pages retrieved from pstore
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
                   ` (3 preceding siblings ...)
  2020-09-02 20:00 ` [RFC PATCH 4/7] pstore: Allocate and free page-sized memory in persistent RAM buffer Nachammai Karuppiah
@ 2020-09-02 20:00 ` Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 6/7] tracing: Use ramtrace alloc and free methods while using persistent RAM Nachammai Karuppiah
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

Add a new trace_array, pstore_trace. This descriptor holds the
top-level buffers used for managing the pages retrieved from
persistent RAM. Since pstore_trace uses the pages that pertain to
previous boot, there is no write that happens to these buffers. The
reads are non-consuming and hence we do not have to serialize the
readers.

The buffers in pstore_trace are disabled once the user switches live
tracing to use persistent RAM buffer.

During the first seq_start method call to read the previous boot
trace entries, the top-level buffers of pstore_trace are set up.
The pages retrieved from pstore are used to construct
cpu_buffer->pages for pstore_trace.

Signed-off-by: Nachammai Karuppiah <nachukannan@gmail.com>
---
 include/linux/ring_buffer.h |  19 +++
 include/linux/trace.h       |  13 ++
 kernel/trace/ring_buffer.c  | 284 +++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c        | 300 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/trace.h        |   2 +
 5 files changed, 616 insertions(+), 2 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index c76b2f3..ece71c9 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -18,6 +18,13 @@ struct ring_buffer_event {
 	u32		array[];
 };
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+struct data_page {
+	struct list_head        list;
+	struct buffer_data_page *page;
+};
+#endif
+
 /**
  * enum ring_buffer_type - internal ring buffer types
  *
@@ -210,4 +217,16 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
 #define trace_rb_cpu_prepare	NULL
 #endif
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+struct trace_buffer *reconstruct_ring_buffer(void);
+
+void ring_buffer_order_pages(struct list_head *pages);
+int ring_buffer_switch_memory(struct trace_buffer *buffer,
+			      const char *tracer_name, int clock_id,
+			      bool persist);
+void ring_buffer_set_tracer_name(struct trace_buffer *buffer,
+				 const char *tracer_name);
+void ring_buffer_free_pstore_trace(struct trace_buffer *buffer);
+#endif
+
 #endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/linux/trace.h b/include/linux/trace.h
index 7fd86d3..8f37b70 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -32,6 +32,19 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip,
 void trace_array_put(struct trace_array *tr);
 struct trace_array *trace_array_get_by_name(const char *name);
 int trace_array_destroy(struct trace_array *tr);
+
+
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+struct trace_iterator;
+
+void *pstore_trace_start(struct seq_file *m, loff_t *pos);
+void *pstore_trace_next(struct seq_file *m, void *v, loff_t *pos);
+int pstore_trace_show(struct seq_file *m, void *v);
+void pstore_trace_stop(struct seq_file *m, void *v);
+int pstore_tracing_release(struct trace_iterator *iter);
+void pstore_tracing_erase(void);
+#endif
+
 #endif	/* CONFIG_TRACING */
 
 #endif	/* _LINUX_TRACE_H */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 60b587a..34e50c1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1296,6 +1296,92 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	return 0;
 }
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+static int rb_reconstruct_pages(struct ring_buffer_per_cpu *cpu_buffer,
+				struct list_head *dpages, int cpu)
+{
+	struct buffer_page *bpage, *tmp;
+	struct data_page *dpage;
+	LIST_HEAD(pages);
+
+	list_for_each_entry(dpage, dpages, list) {
+		bpage = kzalloc(ALIGN(sizeof(*bpage), cache_line_size()),
+				GFP_KERNEL);
+		if (!bpage)
+			goto free_pages;
+
+		list_add_tail(&bpage->list, &pages);
+		bpage->page = dpage->page;
+	}
+
+	if (!list_empty(&pages)) {
+		cpu_buffer->pages = pages.next;
+		list_del(&pages);
+	} else
+		cpu_buffer->pages = NULL;
+
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, &pages, list) {
+		list_del(&bpage->list);
+		kfree(bpage);
+	}
+	return -ENOMEM;
+}
+
+static struct ring_buffer_per_cpu *
+__reconstruct_cpu_buffer(struct trace_buffer *rb, struct list_head *dpages,
+			 void *page, int cpu)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+	struct buffer_page *bpage;
+	struct data_page *dpage;
+
+	cpu_buffer = kzalloc(ALIGN(sizeof(*cpu_buffer), cache_line_size()),
+				GFP_KERNEL);
+	if (!cpu_buffer)
+		return NULL;
+
+	cpu_buffer->buffer = rb;
+	raw_spin_lock_init(&cpu_buffer->reader_lock);
+	lockdep_set_class(&cpu_buffer->reader_lock, rb->reader_lock_key);
+	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
+	bpage = kzalloc(ALIGN(sizeof(*bpage), cache_line_size()),
+			GFP_KERNEL);
+	if (!bpage)
+		goto fail_free_buffer;
+
+	bpage->page = page;
+
+	rb_check_bpage(cpu_buffer, bpage);
+	cpu_buffer->reader_page = bpage;
+	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
+
+	if (rb_reconstruct_pages(cpu_buffer, dpages, cpu) < 0)
+		goto fail_free_reader;
+	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+
+	cpu_buffer->head_page = list_entry(cpu_buffer->pages,
+					struct buffer_page, list);
+	cpu_buffer->commit_page = list_entry(cpu_buffer->pages->prev,
+					struct buffer_page, list);
+
+	rb_head_page_activate(cpu_buffer);
+
+	return cpu_buffer;
+
+fail_free_reader:
+	free_buffer_page(cpu_buffer->reader_page);
+
+fail_free_buffer:
+	kfree(cpu_buffer);
+	return NULL;
+}
+#endif /* CONFIG_TRACE_EVENTS_TO_PSTORE */
+
 static struct ring_buffer_per_cpu *
 rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 {
@@ -1378,6 +1464,81 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
 	kfree(cpu_buffer);
 }
 
+
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+/**
+ * reconstruct_ring_buffer - reconstruct ring_buffer for pstore trace
+ */
+struct trace_buffer *reconstruct_ring_buffer(void)
+{
+	struct trace_buffer *buffer;
+	static struct lock_class_key __key;
+	void *page;
+	int bsize;
+	int i, cpu;
+
+	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
+		     GFP_KERNEL);
+	if (!buffer)
+		return NULL;
+
+	if (!zalloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
+		goto release_buffer;
+	buffer->cpus = ramtrace_get_prev_boot_nr_cpus();
+
+	buffer->reader_lock_key = &__key;
+
+	bsize = sizeof(void *) * buffer->cpus;
+	buffer->buffers = kzalloc(ALIGN(bsize, cache_line_size()),
+				GFP_KERNEL);
+	if (!buffer->buffers)
+		goto release_cpumask_var;
+
+	/*
+	 * Allocate an empty reader page. This page doesn't contain any data
+	 * and is set as the reader page. The same reader page is used for all
+	 * CPU. Since this is an empty page and guaranteed to be empty always,
+	 * all CPUs can use the same page. The pages retrieved from PSTORE are
+	 * used to populate cpu_buffer->pages list.
+	 */
+	page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0);
+	if (!page)
+		goto release_buffers;
+	page = page_address(page);
+	rb_init_page(page);
+	for (i = 0; i < buffer->cpus; i++) {
+		struct list_head *dpages = ramtrace_get_read_buffer(i);
+
+		if (dpages) {
+			buffer->buffers[i] = __reconstruct_cpu_buffer(buffer,
+								dpages, page, i);
+			if (!buffer->buffers[i])
+				goto release_reader_page;
+			cpumask_set_cpu(i, buffer->cpumask);
+		}
+
+	}
+	if (cpumask_empty(buffer->cpumask))
+		goto release_reader_page;
+
+	return buffer;
+
+release_reader_page:
+	free_page((unsigned long)page);
+release_buffers:
+	for_each_buffer_cpu(buffer, cpu) {
+		if (buffer->buffers[cpu])
+			rb_free_cpu_buffer(buffer->buffers[cpu]);
+	}
+	kfree(buffer->buffers);
+release_cpumask_var:
+	free_cpumask_var(buffer->cpumask);
+release_buffer:
+	kfree(buffer);
+	return NULL;
+}
+#endif /* CONFIG_TRACE_EVENTS_TO_PSTORE */
+
 /**
  * __ring_buffer_alloc - allocate a new ring_buffer
  * @size: the size in bytes per cpu that is needed.
@@ -1478,12 +1639,75 @@ ring_buffer_free(struct trace_buffer *buffer)
 }
 EXPORT_SYMBOL_GPL(ring_buffer_free);
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+static void
+rb_free_pstore_trace_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	struct list_head *head = cpu_buffer->pages;
+	struct buffer_page *bpage, *tmp;
+
+	kfree(cpu_buffer->reader_page);
+
+	if (head) {
+		rb_head_page_deactivate(cpu_buffer);
+		list_for_each_entry_safe(bpage, tmp, head, list) {
+			list_del_init(&bpage->list);
+			kfree(bpage);
+		}
+
+		bpage = list_entry(head, struct buffer_page, list);
+		kfree(bpage);
+	}
+	kfree(cpu_buffer);
+}
+
+/**
+ * ring_buffer_free_pstore_trace - free pstore_trace buffers.
+ *
+ * Free top-level buffers and buffer_page pertaining to previous boot trace
+ * provided by pstore_trace descriptor.
+ */
+void ring_buffer_free_pstore_trace(struct trace_buffer *buffer)
+{
+	int cpu;
+	void *page = NULL;
+
+	for_each_buffer_cpu(buffer, cpu) {
+		if (!page) {
+			page = buffer->buffers[cpu]->reader_page->page;
+	printk("reader page %px\n", page);
+	free_page((unsigned long)page);
+			}
+		rb_free_pstore_trace_cpu_buffer(buffer->buffers[cpu]);
+	}
+	kfree(buffer->buffers);
+	free_cpumask_var(buffer->cpumask);
+
+	kfree(buffer);
+}
+#endif
+
 void ring_buffer_set_clock(struct trace_buffer *buffer,
 			   u64 (*clock)(void))
 {
 	buffer->clock = clock;
 }
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+void ring_buffer_set_tracer_name(struct trace_buffer *buffer,
+				 const char *tracer_name)
+{
+	int cpu;
+
+	get_online_cpus();
+	for_each_buffer_cpu(buffer, cpu)
+		if (buffer->buffers[cpu]->use_pstore && cpu_online(cpu)) {
+			ramtrace_set_tracer_name(tracer_name);
+			break;
+		}
+}
+#endif
+
 void ring_buffer_set_time_stamp_abs(struct trace_buffer *buffer, bool abs)
 {
 	buffer->time_stamp_abs = abs;
@@ -5251,6 +5475,66 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node)
 	return 0;
 }
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+void ring_buffer_order_pages(struct list_head *pages)
+{
+	struct data_page *temp, *data_page, *min_page;
+	u64 min_ts = 0;
+	u64 prev_ts;
+	int count = 0;
+
+	min_page = NULL;
+
+	/* Find the oldest page and move the list head before it.
+	 * While starting from the oldest page, the list should mostly be in
+	 * order except for few out of order pages as long as the buffer had
+	 * not been repeatedly expanded and shrunk.
+	 */
+	list_for_each_entry_safe(data_page, temp, pages, list) {
+		u64 ts = data_page->page->time_stamp;
+
+		if (ts == 0) {
+			list_del(&data_page->list);
+			kfree(data_page);
+		} else {
+			count++;
+			if (ts < min_ts || min_ts == 0) {
+				min_ts = ts;
+				min_page = data_page;
+			}
+		}
+	}
+
+	if (min_ts) {
+		/* move the list head before the oldest page */
+		list_move_tail(pages, &min_page->list);
+		prev_ts = min_ts;
+		data_page = min_page;
+		list_for_each_entry(data_page, pages, list) {
+			u64 ts = data_page->page->time_stamp;
+
+			if (ts >= prev_ts)
+				prev_ts = ts;
+			else {
+				struct data_page *node, *swap_page;
+
+				/* Move out of order page to the right place */
+				list_for_each_entry(node, pages, list) {
+					if (node->page->time_stamp > ts) {
+						swap_page = data_page;
+						data_page = list_entry(data_page->list.prev, struct data_page, list);
+						list_del(&swap_page->list);
+						list_add_tail(&swap_page->list, &node->list);
+						break;
+					}
+				}
+			}
+		}
+	}
+
+}
+#endif /* CONFIG_TRACE_EVENTS_TO_PSTORE */
+
 #ifdef CONFIG_RING_BUFFER_STARTUP_TEST
 /*
  * This is a basic integrity check of the ring buffer.
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2b3d8e9..16e50ba8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -277,6 +277,16 @@ static struct trace_array global_trace = {
 	.trace_flags = TRACE_DEFAULT_FLAGS,
 };
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+/*
+ * The pstore_trace is the descriptor that holds the top-level tracing
+ * buffers for the pages retrieved from persistent storage.
+ */
+static struct trace_array pstore_trace = {
+	.trace_flags = TRACE_DEFAULT_FLAGS,
+};
+#endif
+
 LIST_HEAD(ftrace_trace_arrays);
 
 int trace_array_get(struct trace_array *this_tr)
@@ -650,6 +660,26 @@ int tracing_is_enabled(void)
 	return !global_trace.buffer_disabled;
 }
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+/**
+ * pstore_tracing_is_enabled - Show if pstore_trace has been disabled
+ *
+ * This is similar to tracing_is_enabled() but checks pstore_trace instead.
+ * pstore_trace holds the tracing buffers for the pages pertaining to previous
+ * boot retrieved from pstore.
+ */
+int pstore_tracing_is_enabled(void)
+{
+	/*
+	 * For quick access (irqsoff uses this in fast path), just
+	 * return the mirror variable of the state of the ring buffer.
+	 * It's a little racy, but we don't really care.
+	 */
+	smp_rmb();
+	return !pstore_trace.buffer_disabled;
+}
+#endif
+
 /*
  * trace_buf_size is the size in bytes that is allocated
  * for a buffer. Note, the number of bytes is always rounded
@@ -1299,6 +1329,21 @@ void tracing_off(void)
 }
 EXPORT_SYMBOL_GPL(tracing_off);
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+
+/**
+ * pstore_tracing_off - turn off tracing buffers.
+ *
+ * Incase of pstore_trace, turning off tracing buffers stops readers from
+ * retrieving any more data. This is needed once the global_trace tries to
+ * use pstore memory.
+ */
+void pstore_tracing_off(void)
+{
+	tracer_tracing_off(&pstore_trace);
+}
+#endif
+
 void disable_trace_on_warning(void)
 {
 	if (__disable_trace_on_warning) {
@@ -5826,7 +5871,7 @@ static void tracing_set_nop(struct trace_array *tr)
 {
 	if (tr->current_trace == &nop_trace)
 		return;
-	
+
 	tr->current_trace->enabled--;
 
 	if (tr->current_trace->reset)
@@ -5945,6 +5990,9 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
 	tr->current_trace = t;
 	tr->current_trace->enabled++;
 	trace_branch_enable(tr);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	ring_buffer_set_tracer_name(tr->array_buffer.buffer, tr->current_trace->name);
+#endif
  out:
 	mutex_unlock(&trace_types_lock);
 
@@ -7056,9 +7104,257 @@ static int snapshot_raw_open(struct inode *inode, struct file *filp)
 
 	return ret;
 }
-
 #endif /* CONFIG_TRACER_SNAPSHOT */
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+/*
+ * pstore_trace_set_up - set up pstore_trace descriptor.
+ *
+ * This method is called during the seq_start method call to setup the
+ * pstore_trace for the very first read operation. The pages from pstore are
+ * read and ring buffer is constructed.
+ */
+static struct trace_array *pstore_trace_set_up(void)
+{
+	struct trace_array *p_tr = &pstore_trace;
+	struct tracer *t;
+	char *tracer_name;
+
+	/*
+	 * Create the top level buffers during first seq_start call.
+	 * Use the previously created one in the subsequent calls.
+	 */
+	if (p_tr->array_buffer.buffer)
+		return p_tr;
+
+	tracer_name = ramtrace_get_prev_boot_tracer_name();
+	mutex_lock(&trace_types_lock);
+	for (t = trace_types; t; t = t->next) {
+		if (strcmp(t->name, tracer_name) == 0)
+			break;
+	}
+	mutex_unlock(&trace_types_lock);
+	if (!t)
+		goto release_tr_info;
+	p_tr->current_trace = t;
+
+	p_tr->clock_id = ramtrace_get_prev_boot_clock_id();
+
+	p_tr->array_buffer.tr = p_tr;
+
+	p_tr->array_buffer.buffer = reconstruct_ring_buffer();
+	if (!p_tr->array_buffer.buffer)
+		goto release_tr_info;
+
+	raw_spin_lock_init(&p_tr->start_lock);
+
+	INIT_LIST_HEAD(&p_tr->systems);
+	INIT_LIST_HEAD(&p_tr->events);
+	INIT_LIST_HEAD(&p_tr->hist_vars);
+	INIT_LIST_HEAD(&p_tr->err_log);
+
+	ftrace_init_trace_array(p_tr);
+	list_add(&p_tr->list, &ftrace_trace_arrays);
+
+	return p_tr;
+
+release_tr_info:
+	return NULL;
+}
+
+static struct trace_iterator *pstore_iter_setup(void)
+{
+	struct trace_array *p_tr;
+	struct trace_iterator *iter;
+	int cpu, cpus;
+
+	p_tr = pstore_trace_set_up();
+	if (!p_tr)
+		return NULL;
+
+	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+	if (!iter)
+		goto out;
+
+	iter->buffer_iter = kcalloc(nr_cpu_ids, sizeof(*iter->buffer_iter),
+			GFP_KERNEL);
+	if (!iter->buffer_iter)
+		goto release;
+
+	memset(iter->buffer_iter, 0, nr_cpu_ids * sizeof(*iter->buffer_iter));
+	iter->trace = p_tr->current_trace;
+	iter->trace->use_max_tr = false;
+
+	if (!zalloc_cpumask_var(&iter->started, GFP_KERNEL))
+		goto fail;
+
+	iter->tr = p_tr;
+	iter->array_buffer = &p_tr->array_buffer;
+
+	iter->snapshot = true;
+
+	iter->pos = -1;
+	iter->cpu_file = RING_BUFFER_ALL_CPUS;
+	mutex_init(&iter->mutex);
+
+	if (iter->trace && iter->trace->open)
+		iter->trace->open(iter);
+
+	if (trace_clocks[p_tr->clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
+	cpus = ramtrace_get_prev_boot_nr_cpus();
+	for (cpu = 0; cpu < cpus; cpu++) {
+		iter->buffer_iter[cpu] =
+		  ring_buffer_read_prepare(iter->array_buffer->buffer, cpu,
+					   GFP_KERNEL);
+	}
+	ring_buffer_read_prepare_sync();
+	for (cpu = 0; cpu < cpus; cpu++)
+		ring_buffer_read_start(iter->buffer_iter[cpu]);
+
+	return iter;
+
+fail:
+	kfree(iter->buffer_iter);
+release:
+	kfree(iter);
+out:
+	return ERR_PTR(-ENOMEM);
+}
+
+void *pstore_trace_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct trace_iterator *iter = m->private;
+	int i = (int) *pos;
+	void *ent;
+
+	WARN_ON_ONCE(iter->leftover);
+
+	(*pos)++;
+
+	/* can't go backwards */
+	if (iter->idx > i)
+		return NULL;
+
+	if (iter->idx < 0)
+		ent = trace_find_next_entry_inc(iter);
+	else
+		ent = iter;
+
+	while (ent && iter->idx < i)
+		ent = trace_find_next_entry_inc(iter);
+
+	iter->pos = *pos;
+
+	if (ent == NULL)
+		return NULL;
+	return iter;
+}
+/*
+ * Below are the seq_operation methods used to read the previous boot
+ * data pages from pstore. In this case, there is no producer and no
+ * consuming read. So we do not have to serialize readers.
+ */
+void *pstore_trace_start(struct seq_file *m, loff_t *pos)
+{
+	struct trace_iterator *iter = m->private;
+	void *p = NULL;
+	loff_t l = 0;
+
+	/*
+	 * pstore_trace is disabled once the user starts utilizing the
+	 * ramtrace pstore region to write the trace records.
+	 */
+	if (!pstore_tracing_is_enabled())
+		return NULL;
+	if (iter == NULL) {
+		iter = pstore_iter_setup();
+		if (!iter)
+			return NULL;
+		m->private = iter;
+	}
+
+
+	if (*pos != iter->pos) {
+		iter->ent = NULL;
+		iter->cpu = 0;
+		iter->idx = -1;
+		iter->leftover = 0;
+		for (p = iter; p && l < *pos; p = pstore_trace_next(m, p, &l))
+			;
+	} else {
+		if (!iter->leftover) {
+			l = *pos - 1;
+			p = pstore_trace_next(m, iter, &l);
+		} else
+			p = iter;
+	}
+
+	return p;
+}
+
+int pstore_trace_show(struct seq_file *m, void *v)
+{
+	struct trace_iterator *iter = v;
+	int ret;
+
+	if (iter->ent == NULL) {
+		if (iter->tr) {
+			seq_printf(m, "# tracer: %s\n", iter->trace->name);
+			seq_puts(m, "#\n");
+		}
+	} else if (iter->leftover) {
+		ret = trace_print_seq(m, &iter->seq);
+		iter->leftover = ret;
+
+	} else {
+		print_trace_line(iter);
+		ret = trace_print_seq(m, &iter->seq);
+		iter->leftover = ret;
+	}
+	return 0;
+}
+
+void pstore_trace_stop(struct seq_file *m, void *v)
+{
+}
+
+int pstore_tracing_release(struct trace_iterator *iter)
+{
+	int cpu;
+
+	if (!iter)
+		return 0;
+	mutex_lock(&trace_types_lock);
+	for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+		if (iter->buffer_iter[cpu])
+			ring_buffer_read_finish(iter->buffer_iter[cpu]);
+
+	if (iter->trace && iter->trace->close)
+		iter->trace->close(iter);
+
+	mutex_unlock(&trace_types_lock);
+	mutex_destroy(&iter->mutex);
+	free_cpumask_var(iter->started);
+	kfree(iter->buffer_iter);
+	kfree(iter);
+
+	return 0;
+}
+
+void pstore_tracing_erase(void)
+{
+	struct trace_array *trace = &pstore_trace;
+
+	if (!trace->array_buffer.buffer)
+		return;
+	ring_buffer_free_pstore_trace(trace->array_buffer.buffer);
+	trace->array_buffer.buffer = NULL;
+
+}
+#endif /* CONFIG_TRACE_EVENTS_TO_PSTORE */
+
 
 static const struct file_operations tracing_thresh_fops = {
 	.open		= tracing_open_generic,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2a4ab72..66670f8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -2078,4 +2078,6 @@ static __always_inline void trace_iterator_reset(struct trace_iterator *iter)
 	iter->pos = -1;
 }
 
+void pstore_tracing_off(void);
+
 #endif /* _LINUX_KERNEL_TRACE_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 6/7] tracing: Use ramtrace alloc and free methods while using persistent RAM
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
                   ` (4 preceding siblings ...)
  2020-09-02 20:00 ` [RFC PATCH 5/7] tracing: Add support to iterate through pages retrieved from pstore Nachammai Karuppiah
@ 2020-09-02 20:00 ` Nachammai Karuppiah
  2020-09-02 20:00 ` [RFC PATCH 7/7] dt-bindings: ramtrace: Add ramtrace DT node Nachammai Karuppiah
  2020-09-02 21:47 ` [RFC PATCH 0/7] Trace events to pstore Joel Fernandes
  7 siblings, 0 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

If persistent RAM is being used to record trace entries, allocate and
free pages using ramtrace_alloc_page and ramtrace_free_page.

Signed-off-by: Nachammai Karuppiah <nachukannan@gmail.com>
---
 kernel/trace/ring_buffer.c | 122 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 119 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 34e50c1..c99719e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -353,6 +353,18 @@ static void free_buffer_page(struct buffer_page *bpage)
 	kfree(bpage);
 }
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+static void
+free_buffer_page_cpu(struct buffer_page *bpage, int cpu, bool use_pstore)
+{
+	if (use_pstore) {
+		ramtrace_free_page(bpage->page, cpu);
+		kfree(bpage);
+	} else
+		free_buffer_page(bpage);
+}
+#endif
+
 /*
  * We need to fit the time_stamp delta into 27 bits.
  */
@@ -1200,7 +1212,12 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu,
+				bool use_pstore)
+#else
 static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu)
+#endif
 {
 	struct buffer_page *bpage, *tmp;
 	bool user_thread = current->mm != NULL;
@@ -1214,6 +1231,11 @@ static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu)
 	 * to prevent doing any allocation when it is obvious that it is
 	 * not going to succeed.
 	 */
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	if (use_pstore)
+		i = ramtrace_available_mem();
+	else
+#endif
 	i = si_mem_available();
 	if (i < nr_pages)
 		return -ENOMEM;
@@ -1246,10 +1268,22 @@ static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu)
 
 		list_add(&bpage->list, pages);
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+		if (use_pstore) {
+			void *address = ramtrace_alloc_page(cpu);
+
+			if (!address)
+				goto free_pages;
+			bpage->page = address;
+		} else {
+#endif
 		page = alloc_pages_node(cpu_to_node(cpu), mflags, 0);
 		if (!page)
 			goto free_pages;
 		bpage->page = page_address(page);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+		}
+#endif
 		rb_init_page(bpage->page);
 
 		if (user_thread && fatal_signal_pending(current))
@@ -1263,7 +1297,11 @@ static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu)
 free_pages:
 	list_for_each_entry_safe(bpage, tmp, pages, list) {
 		list_del_init(&bpage->list);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+		free_buffer_page_cpu(bpage, cpu, use_pstore);
+#else
 		free_buffer_page(bpage);
+#endif
 	}
 	if (user_thread)
 		clear_current_oom_origin();
@@ -1278,7 +1316,12 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 	WARN_ON(!nr_pages);
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu,
+				cpu_buffer->use_pstore))
+#else
 	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+#endif
 		return -ENOMEM;
 
 	/*
@@ -1414,10 +1457,23 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 	rb_check_bpage(cpu_buffer, bpage);
 
 	cpu_buffer->reader_page = bpage;
+
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	if (cpu_buffer->use_pstore) {
+		void *address = ramtrace_alloc_page(cpu);
+
+		if (!address)
+			goto fail_free_reader;
+		bpage->page = address;
+	} else {
+#endif
 	page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0);
 	if (!page)
 		goto fail_free_reader;
 	bpage->page = page_address(page);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	}
+#endif
 	rb_init_page(bpage->page);
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
@@ -1436,7 +1492,12 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
 	return cpu_buffer;
 
  fail_free_reader:
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	free_buffer_page_cpu(cpu_buffer->reader_page, cpu,
+			     cpu_buffer->use_pstore);
+#else
 	free_buffer_page(cpu_buffer->reader_page);
+#endif
 
  fail_free_buffer:
 	kfree(cpu_buffer);
@@ -1447,18 +1508,32 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	struct list_head *head = cpu_buffer->pages;
 	struct buffer_page *bpage, *tmp;
-
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	free_buffer_page_cpu(cpu_buffer->reader_page, cpu_buffer->cpu,
+			     cpu_buffer->use_pstore);
+#else
 	free_buffer_page(cpu_buffer->reader_page);
+#endif
 
 	rb_head_page_deactivate(cpu_buffer);
 
 	if (head) {
 		list_for_each_entry_safe(bpage, tmp, head, list) {
 			list_del_init(&bpage->list);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+			free_buffer_page_cpu(bpage, cpu_buffer->cpu,
+					     cpu_buffer->use_pstore);
+#else
 			free_buffer_page(bpage);
+#endif
 		}
 		bpage = list_entry(head, struct buffer_page, list);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+		free_buffer_page_cpu(bpage, cpu_buffer->cpu,
+				     cpu_buffer->use_pstore);
+#else
 		free_buffer_page(bpage);
+#endif
 	}
 
 	kfree(cpu_buffer);
@@ -1832,7 +1907,12 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
 		 * We have already removed references to this list item, just
 		 * free up the buffer_page and its page
 		 */
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+		free_buffer_page_cpu(to_remove_page, cpu_buffer->cpu,
+				     cpu_buffer->use_pstore);
+#else
 		free_buffer_page(to_remove_page);
+#endif
 		nr_removed--;
 
 	} while (to_remove_page != last_page);
@@ -1913,7 +1993,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					 list) {
 			list_del_init(&bpage->list);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+			free_buffer_page_cpu(bpage, cpu_buffer->cpu,
+					     cpu_buffer->use_pstore);
+#else
 			free_buffer_page(bpage);
+#endif
 		}
 	}
 	return success;
@@ -2252,8 +2337,14 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 			 * allocated without receiving ENOMEM
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu,
+						cpu_buffer->use_pstore)) {
+#else
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
 						&cpu_buffer->new_pages, cpu)) {
+#endif
 				/* not enough memory for new pages */
 				err = -ENOMEM;
 				goto out_err;
@@ -2319,7 +2410,12 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+					    &cpu_buffer->new_pages, cpu_id,
+					    cpu_buffer->use_pstore)) {
+#else
 					    &cpu_buffer->new_pages, cpu_id)) {
+#endif
 			err = -ENOMEM;
 			goto out_err;
 		}
@@ -2379,7 +2475,12 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+			free_buffer_page_cpu(bpage, cpu,
+					     cpu_buffer->use_pstore);
+#else
 			free_buffer_page(bpage);
+#endif
 		}
 	}
  out_err_unlock:
@@ -5184,13 +5285,22 @@ void *ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
 	if (bpage)
 		goto out;
 
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	if (cpu_buffer->use_pstore) {
+		bpage = (struct buffer_data_page *)ramtrace_alloc_page(cpu);
+		if (!bpage)
+			return ERR_PTR(-ENOMEM);
+	} else {
+#endif
 	page = alloc_pages_node(cpu_to_node(cpu),
 				GFP_KERNEL | __GFP_NORETRY, 0);
 	if (!page)
 		return ERR_PTR(-ENOMEM);
 
 	bpage = page_address(page);
-
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	}
+#endif
  out:
 	rb_init_page(bpage);
 	down_read(&trace_read_sem);
@@ -5229,7 +5339,13 @@ void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data
 	arch_spin_unlock(&cpu_buffer->lock);
 	local_irq_restore(flags);
 
- out:
+out:
+#ifdef CONFIG_TRACE_EVENTS_TO_PSTORE
+	if (cpu_buffer->use_pstore) {
+		ramtrace_free_page(bpage, cpu);
+		return;
+	}
+#endif
 	free_page((unsigned long)bpage);
 	up_read(&trace_read_sem);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 7/7] dt-bindings: ramtrace: Add ramtrace DT node
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
                   ` (5 preceding siblings ...)
  2020-09-02 20:00 ` [RFC PATCH 6/7] tracing: Use ramtrace alloc and free methods while using persistent RAM Nachammai Karuppiah
@ 2020-09-02 20:00 ` Nachammai Karuppiah
  2020-09-02 21:47 ` [RFC PATCH 0/7] Trace events to pstore Joel Fernandes
  7 siblings, 0 replies; 18+ messages in thread
From: Nachammai Karuppiah @ 2020-09-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck
  Cc: joel, linux-kernel, Nachammai Karuppiah

Add ramtrace as child node of /reserved-memory.

Signed-off-by: Nachammai Karuppiah <nachukannan@gmail.com>
---
 .../devicetree/bindings/reserved-memory/ramtrace.txt        | 13 +++++++++++++
 1 file changed, 13 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/reserved-memory/ramtrace.txt

diff --git a/Documentation/devicetree/bindings/reserved-memory/ramtrace.txt b/Documentation/devicetree/bindings/reserved-memory/ramtrace.txt
new file mode 100644
index 0000000..0a8515c
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-memory/ramtrace.txt
@@ -0,0 +1,13 @@
+Ramtrace trace events logger
+============================
+
+ramtrace provides persistent RAM storage for ring buffer in which the trace
+events are recorded, so they can be recovered after a reboot. This is a
+child-node of "/reserved-memory", and is named "ramtrace" after the backend,
+rather than "pstore" which is the subsystem.
+
+Required properties:
+
+- compatible: must be "ramtrace"
+
+- reg: region of memory that is preserved between reboots
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
                   ` (6 preceding siblings ...)
  2020-09-02 20:00 ` [RFC PATCH 7/7] dt-bindings: ramtrace: Add ramtrace DT node Nachammai Karuppiah
@ 2020-09-02 21:47 ` Joel Fernandes
  2020-09-02 21:54   ` Joel Fernandes
                     ` (2 more replies)
  7 siblings, 3 replies; 18+ messages in thread
From: Joel Fernandes @ 2020-09-02 21:47 UTC (permalink / raw)
  To: Nachammai Karuppiah
  Cc: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, computersforpeace

On Wed, Sep 2, 2020 at 4:01 PM Nachammai Karuppiah
<nachukannan@gmail.com> wrote:
>
> Hi,
>
> This patch series adds support to store trace events in pstore.
>
> Storing trace entries in persistent RAM would help in understanding what
> happened just before the system went down. The trace events that led to the
> crash can be retrieved from the pstore after a warm reboot. This will help
> debug what happened before machine’s last breath. This has to be done in a
> scalable way so that tracing a live system does not impact the performance
> of the system.

Just to add, Nachammai was my intern in the recent outreachy program
and we designed together a way for trace events to be written to
pstore backed memory directory instead of regular memory. The basic
idea is to allocate frace's ring buffer on pstore memory and have it
right there. Then recover it on reboot. Nachammai wrote the code with
some guidance :) . I talked to Steve as well in the past about the
basic of idea of this. Steve is on vacation this week though.

This is similar to what +Sai Prakash Ranjan was trying to do sometime
ago: https://lkml.org/lkml/2018/9/8/221 . But that approach involved
higher overhead due to synchronization of writing to the otherwise
lockless ring buffer.

+Brian Norris has also expressed interest for this feature.

thanks,

 - Joel

>
> This requires a new backend - ramtrace that allocates pages from
> persistent storage for the tracing utility. This feature can be enabled
> using TRACE_EVENTS_TO_PSTORE.
> In this feature, the new backend is used only as a page allocator and
> once the  users chooses to use pstore to record trace entries, the ring
> buffer pages are freed and allocated in pstore. Once this switch is done,
> ring_buffer continues to operate just as before without much overhead.
> Since the ring buffer uses the persistent RAM buffer directly to record
> trace entries, all tracers would also persist across reboot.
>
> To test this feature, I used a simple module that would call panic during
> a write operation to file in tracefs directory. Before writing to the file,
> the ring buffer is moved to persistent RAM buffer through command line
> as shown below,
>
> $echo 1 > /sys/kernel/tracing/options/persist
>
> Writing to the file,
> $echo 1 > /sys/kernel/tracing/crash/panic_on_write
>
> The above write operation results in system crash. After reboot, once the
> pstore is mounted, the trace entries from previous boot are available in file,
> /sys/fs/pstore/trace-ramtrace-0
>
> Looking through this file, gives us the stack trace that led to the crash.
>
>            <...>-1     [001] ....    49.083909: __vfs_write <-vfs_write
>            <...>-1     [001] ....    49.083933: panic <-panic_on_write
>            <...>-1     [001] d...    49.084195: printk <-panic
>            <...>-1     [001] d...    49.084201: vprintk_func <-printk
>            <...>-1     [001] d...    49.084207: vprintk_default <-printk
>            <...>-1     [001] d...    49.084211: vprintk_emit <-printk
>            <...>-1     [001] d...    49.084216: __printk_safe_enter <-vprintk_emit
>            <...>-1     [001] d...    49.084219: _raw_spin_lock <-vprintk_emit
>            <...>-1     [001] d...    49.084223: vprintk_store <-vprintk_emit
>
> Patchwise oneline description is given below:
>
> Patch 1 adds support to allocate ring buffer pages from persistent RAM buffer.
>
> Patch 2 introduces a new backend, ramtrace.
>
> Patch 3 adds methods to read previous boot pages from pstore.
>
> Patch 4 adds the functionality to allocate page-sized memory from pstore.
>
> Patch 5 adds the seq_operation methods to iterate through trace entries.
>
> Patch 6 modifies ring_buffer to allocate from ramtrace when pstore is used.
>
> Patch 7 adds ramtrace DT node as child-node of /reserved-memory.
>
> Nachammai Karuppiah (7):
>   tracing: Add support to allocate pages from persistent memory
>   pstore: Support a new backend, ramtrace
>   pstore: Read and iterate through trace entries in PSTORE
>   pstore: Allocate and free page-sized memory in persistent RAM buffer
>   tracing: Add support to iterate through pages retrieved from pstore
>   tracing: Use ramtrace alloc and free methods while using persistent
>     RAM
>   dt-bindings: ramtrace: Add ramtrace DT node
>
>  .../bindings/reserved-memory/ramtrace.txt          |  13 +
>  drivers/of/platform.c                              |   1 +
>  fs/pstore/Makefile                                 |   2 +
>  fs/pstore/inode.c                                  |  46 +-
>  fs/pstore/platform.c                               |   1 +
>  fs/pstore/ramtrace.c                               | 821 +++++++++++++++++++++
>  include/linux/pstore.h                             |   3 +
>  include/linux/ramtrace.h                           |  28 +
>  include/linux/ring_buffer.h                        |  19 +
>  include/linux/trace.h                              |  13 +
>  kernel/trace/Kconfig                               |  10 +
>  kernel/trace/ring_buffer.c                         | 663 ++++++++++++++++-
>  kernel/trace/trace.c                               | 312 +++++++-
>  kernel/trace/trace.h                               |   5 +-
>  14 files changed, 1924 insertions(+), 13 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/reserved-memory/ramtrace.txt
>  create mode 100644 fs/pstore/ramtrace.c
>  create mode 100644 include/linux/ramtrace.h
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2020-09-02 21:47 ` [RFC PATCH 0/7] Trace events to pstore Joel Fernandes
@ 2020-09-02 21:54   ` Joel Fernandes
  2020-09-03  5:36   ` Sai Prakash Ranjan
  2020-09-03 18:09   ` Rob Herring
  2 siblings, 0 replies; 18+ messages in thread
From: Joel Fernandes @ 2020-09-02 21:54 UTC (permalink / raw)
  To: Nachammai Karuppiah
  Cc: Steven Rostedt, Ingo Molnar, Rob Herring, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, computersforpeace

On Wed, Sep 2, 2020 at 5:47 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Wed, Sep 2, 2020 at 4:01 PM Nachammai Karuppiah
> <nachukannan@gmail.com> wrote:
> >
> > Hi,
> >
> > This patch series adds support to store trace events in pstore.
> >

Been a long day...

> > Storing trace entries in persistent RAM would help in understanding what
> > happened just before the system went down. The trace events that led to the
> > crash can be retrieved from the pstore after a warm reboot. This will help
> > debug what happened before machine’s last breath. This has to be done in a
> > scalable way so that tracing a live system does not impact the performance
> > of the system.
>
> Just to add, Nachammai was my intern in the recent outreachy program
> and we designed together a way for trace events to be written to
> pstore backed memory directory instead of regular memory. The basic

s/directory/directly/

> idea is to allocate frace's ring buffer on pstore memory and have it
> right there. Then recover it on reboot. Nachammai wrote the code with

s/right/write/

 - Joel


> some guidance :) . I talked to Steve as well in the past about the
> basic of idea of this. Steve is on vacation this week though.
>
> This is similar to what +Sai Prakash Ranjan was trying to do sometime
> ago: https://lkml.org/lkml/2018/9/8/221 . But that approach involved
> higher overhead due to synchronization of writing to the otherwise
> lockless ring buffer.
>
> +Brian Norris has also expressed interest for this feature.
>
> thanks,
>
>  - Joel
>
> >
> > This requires a new backend - ramtrace that allocates pages from
> > persistent storage for the tracing utility. This feature can be enabled
> > using TRACE_EVENTS_TO_PSTORE.
> > In this feature, the new backend is used only as a page allocator and
> > once the  users chooses to use pstore to record trace entries, the ring
> > buffer pages are freed and allocated in pstore. Once this switch is done,
> > ring_buffer continues to operate just as before without much overhead.
> > Since the ring buffer uses the persistent RAM buffer directly to record
> > trace entries, all tracers would also persist across reboot.
> >
> > To test this feature, I used a simple module that would call panic during
> > a write operation to file in tracefs directory. Before writing to the file,
> > the ring buffer is moved to persistent RAM buffer through command line
> > as shown below,
> >
> > $echo 1 > /sys/kernel/tracing/options/persist
> >
> > Writing to the file,
> > $echo 1 > /sys/kernel/tracing/crash/panic_on_write
> >
> > The above write operation results in system crash. After reboot, once the
> > pstore is mounted, the trace entries from previous boot are available in file,
> > /sys/fs/pstore/trace-ramtrace-0
> >
> > Looking through this file, gives us the stack trace that led to the crash.
> >
> >            <...>-1     [001] ....    49.083909: __vfs_write <-vfs_write
> >            <...>-1     [001] ....    49.083933: panic <-panic_on_write
> >            <...>-1     [001] d...    49.084195: printk <-panic
> >            <...>-1     [001] d...    49.084201: vprintk_func <-printk
> >            <...>-1     [001] d...    49.084207: vprintk_default <-printk
> >            <...>-1     [001] d...    49.084211: vprintk_emit <-printk
> >            <...>-1     [001] d...    49.084216: __printk_safe_enter <-vprintk_emit
> >            <...>-1     [001] d...    49.084219: _raw_spin_lock <-vprintk_emit
> >            <...>-1     [001] d...    49.084223: vprintk_store <-vprintk_emit
> >
> > Patchwise oneline description is given below:
> >
> > Patch 1 adds support to allocate ring buffer pages from persistent RAM buffer.
> >
> > Patch 2 introduces a new backend, ramtrace.
> >
> > Patch 3 adds methods to read previous boot pages from pstore.
> >
> > Patch 4 adds the functionality to allocate page-sized memory from pstore.
> >
> > Patch 5 adds the seq_operation methods to iterate through trace entries.
> >
> > Patch 6 modifies ring_buffer to allocate from ramtrace when pstore is used.
> >
> > Patch 7 adds ramtrace DT node as child-node of /reserved-memory.
> >
> > Nachammai Karuppiah (7):
> >   tracing: Add support to allocate pages from persistent memory
> >   pstore: Support a new backend, ramtrace
> >   pstore: Read and iterate through trace entries in PSTORE
> >   pstore: Allocate and free page-sized memory in persistent RAM buffer
> >   tracing: Add support to iterate through pages retrieved from pstore
> >   tracing: Use ramtrace alloc and free methods while using persistent
> >     RAM
> >   dt-bindings: ramtrace: Add ramtrace DT node
> >
> >  .../bindings/reserved-memory/ramtrace.txt          |  13 +
> >  drivers/of/platform.c                              |   1 +
> >  fs/pstore/Makefile                                 |   2 +
> >  fs/pstore/inode.c                                  |  46 +-
> >  fs/pstore/platform.c                               |   1 +
> >  fs/pstore/ramtrace.c                               | 821 +++++++++++++++++++++
> >  include/linux/pstore.h                             |   3 +
> >  include/linux/ramtrace.h                           |  28 +
> >  include/linux/ring_buffer.h                        |  19 +
> >  include/linux/trace.h                              |  13 +
> >  kernel/trace/Kconfig                               |  10 +
> >  kernel/trace/ring_buffer.c                         | 663 ++++++++++++++++-
> >  kernel/trace/trace.c                               | 312 +++++++-
> >  kernel/trace/trace.h                               |   5 +-
> >  14 files changed, 1924 insertions(+), 13 deletions(-)
> >  create mode 100644 Documentation/devicetree/bindings/reserved-memory/ramtrace.txt
> >  create mode 100644 fs/pstore/ramtrace.c
> >  create mode 100644 include/linux/ramtrace.h
> >
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2020-09-02 21:47 ` [RFC PATCH 0/7] Trace events to pstore Joel Fernandes
  2020-09-02 21:54   ` Joel Fernandes
@ 2020-09-03  5:36   ` Sai Prakash Ranjan
  2020-09-03 18:09   ` Rob Herring
  2 siblings, 0 replies; 18+ messages in thread
From: Sai Prakash Ranjan @ 2020-09-03  5:36 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nachammai Karuppiah, Steven Rostedt, Ingo Molnar, Rob Herring,
	Frank Rowand, Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	LKML, computersforpeace

On 2020-09-03 03:17, Joel Fernandes wrote:
> On Wed, Sep 2, 2020 at 4:01 PM Nachammai Karuppiah
> <nachukannan@gmail.com> wrote:
>> 
>> Hi,
>> 
>> This patch series adds support to store trace events in pstore.
>> 
>> Storing trace entries in persistent RAM would help in understanding 
>> what
>> happened just before the system went down. The trace events that led 
>> to the
>> crash can be retrieved from the pstore after a warm reboot. This will 
>> help
>> debug what happened before machine’s last breath. This has to be done 
>> in a
>> scalable way so that tracing a live system does not impact the 
>> performance
>> of the system.
> 
> Just to add, Nachammai was my intern in the recent outreachy program
> and we designed together a way for trace events to be written to
> pstore backed memory directory instead of regular memory. The basic
> idea is to allocate frace's ring buffer on pstore memory and have it
> right there. Then recover it on reboot. Nachammai wrote the code with
> some guidance :) . I talked to Steve as well in the past about the
> basic of idea of this. Steve is on vacation this week though.
> 
> This is similar to what +Sai Prakash Ranjan was trying to do sometime
> ago: https://lkml.org/lkml/2018/9/8/221 . But that approach involved
> higher overhead due to synchronization of writing to the otherwise
> lockless ring buffer.
> 
> +Brian Norris has also expressed interest for this feature.
> 

Great work Nachammai and Joel, I have few boards with warm reboot 
support and will test
this series in coming days.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2020-09-02 21:47 ` [RFC PATCH 0/7] Trace events to pstore Joel Fernandes
  2020-09-02 21:54   ` Joel Fernandes
  2020-09-03  5:36   ` Sai Prakash Ranjan
@ 2020-09-03 18:09   ` Rob Herring
  2020-09-11  1:25     ` Joel Fernandes
  2 siblings, 1 reply; 18+ messages in thread
From: Rob Herring @ 2020-09-03 18:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nachammai Karuppiah, Steven Rostedt, Ingo Molnar, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, Brian Norris

On Wed, Sep 2, 2020 at 3:47 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Wed, Sep 2, 2020 at 4:01 PM Nachammai Karuppiah
> <nachukannan@gmail.com> wrote:
> >
> > Hi,
> >
> > This patch series adds support to store trace events in pstore.
> >
> > Storing trace entries in persistent RAM would help in understanding what
> > happened just before the system went down. The trace events that led to the
> > crash can be retrieved from the pstore after a warm reboot. This will help
> > debug what happened before machine’s last breath. This has to be done in a
> > scalable way so that tracing a live system does not impact the performance
> > of the system.
>
> Just to add, Nachammai was my intern in the recent outreachy program
> and we designed together a way for trace events to be written to
> pstore backed memory directory instead of regular memory. The basic
> idea is to allocate frace's ring buffer on pstore memory and have it
> right there. Then recover it on reboot. Nachammai wrote the code with
> some guidance :) . I talked to Steve as well in the past about the
> basic of idea of this. Steve is on vacation this week though.

ramoops is already the RAM backend for pstore and ramoops already has
an ftrace region defined. What am I missing?

From a DT standpoint, we already have a reserved persistent RAM
binding too. There's already too much kernel specifics on how it is
used, we don't need more of that in DT. We're not going to add another
separate region (actually, you can have as many regions defined as you
want. They will just all be 'ramoops' compatible).

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2020-09-03 18:09   ` Rob Herring
@ 2020-09-11  1:25     ` Joel Fernandes
  2022-06-30 19:48       ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2020-09-11  1:25 UTC (permalink / raw)
  To: Rob Herring
  Cc: Nachammai Karuppiah, Steven Rostedt, Ingo Molnar, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, Brian Norris

Hi Rob,
(Back from holidays, digging through the email pile). Reply below:

On Thu, Sep 3, 2020 at 2:09 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Wed, Sep 2, 2020 at 3:47 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Wed, Sep 2, 2020 at 4:01 PM Nachammai Karuppiah
> > <nachukannan@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > This patch series adds support to store trace events in pstore.
> > >
> > > Storing trace entries in persistent RAM would help in understanding what
> > > happened just before the system went down. The trace events that led to the
> > > crash can be retrieved from the pstore after a warm reboot. This will help
> > > debug what happened before machine’s last breath. This has to be done in a
> > > scalable way so that tracing a live system does not impact the performance
> > > of the system.
> >
> > Just to add, Nachammai was my intern in the recent outreachy program
> > and we designed together a way for trace events to be written to
> > pstore backed memory directory instead of regular memory. The basic
> > idea is to allocate frace's ring buffer on pstore memory and have it
> > right there. Then recover it on reboot. Nachammai wrote the code with
> > some guidance :) . I talked to Steve as well in the past about the
> > basic of idea of this. Steve is on vacation this week though.
>
> ramoops is already the RAM backend for pstore and ramoops already has
> an ftrace region defined. What am I missing?

ramoops is too slow for tracing. Honestly, the ftrace functionality in
ramoops should be removed in favor of Nachammai's patches (she did it
for events but function tracing could be trivially added). No one uses
the current ftrace in pstore because it is darned slow. ramoops sits
in between the writing of the ftrace record and the memory being
written to adding more overhead in the process, while also writing
ftrace records in a non-ftrace format. So ramoop's API and
infrastructure fundamentally does not meet the requirements of high
speed persistent tracing.  The idea of this work is to keep the trace
events enabled for a long period time (possibly even in production)
and low overhead until the problem like machine crashing happens.

> From a DT standpoint, we already have a reserved persistent RAM
> binding too. There's already too much kernel specifics on how it is
> used, we don't need more of that in DT. We're not going to add another
> separate region (actually, you can have as many regions defined as you
> want. They will just all be 'ramoops' compatible).

I agree with the sentiment here on DT. Maybe the DT can be generalized
to provide a ram region to which either ramoops or ramtrace can
attach.

 - Joel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2020-09-11  1:25     ` Joel Fernandes
@ 2022-06-30 19:48       ` Steven Rostedt
  2022-07-01 16:37         ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2022-06-30 19:48 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Rob Herring, Nachammai Karuppiah, Ingo Molnar, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, Brian Norris

On Thu, 10 Sep 2020 21:25:11 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> Hi Rob,
> (Back from holidays, digging through the email pile). Reply below:

What ever happen to this?

Sorry, I was expecting more replies, and when there was nothing, it got
lost in my inbox.


> 
> On Thu, Sep 3, 2020 at 2:09 PM Rob Herring <robh+dt@kernel.org> wrote:
> >
> > On Wed, Sep 2, 2020 at 3:47 PM Joel Fernandes <joel@joelfernandes.org> wrote:  
> > >
> > > On Wed, Sep 2, 2020 at 4:01 PM Nachammai Karuppiah
> > > <nachukannan@gmail.com> wrote:  
> > > >
> > > > Hi,
> > > >
> > > > This patch series adds support to store trace events in pstore.
> > > >
> > > > Storing trace entries in persistent RAM would help in understanding what
> > > > happened just before the system went down. The trace events that led to the
> > > > crash can be retrieved from the pstore after a warm reboot. This will help
> > > > debug what happened before machine’s last breath. This has to be done in a
> > > > scalable way so that tracing a live system does not impact the performance
> > > > of the system.  
> > >
> > > Just to add, Nachammai was my intern in the recent outreachy program
> > > and we designed together a way for trace events to be written to
> > > pstore backed memory directory instead of regular memory. The basic
> > > idea is to allocate frace's ring buffer on pstore memory and have it
> > > right there. Then recover it on reboot. Nachammai wrote the code with
> > > some guidance :) . I talked to Steve as well in the past about the
> > > basic of idea of this. Steve is on vacation this week though.  
> >
> > ramoops is already the RAM backend for pstore and ramoops already has
> > an ftrace region defined. What am I missing?  
> 
> ramoops is too slow for tracing. Honestly, the ftrace functionality in
> ramoops should be removed in favor of Nachammai's patches (she did it
> for events but function tracing could be trivially added). No one uses
> the current ftrace in pstore because it is darned slow. ramoops sits
> in between the writing of the ftrace record and the memory being
> written to adding more overhead in the process, while also writing
> ftrace records in a non-ftrace format. So ramoop's API and
> infrastructure fundamentally does not meet the requirements of high
> speed persistent tracing.  The idea of this work is to keep the trace
> events enabled for a long period time (possibly even in production)
> and low overhead until the problem like machine crashing happens.
> 
> > From a DT standpoint, we already have a reserved persistent RAM
> > binding too. There's already too much kernel specifics on how it is
> > used, we don't need more of that in DT. We're not going to add another
> > separate region (actually, you can have as many regions defined as you
> > want. They will just all be 'ramoops' compatible).  
> 
> I agree with the sentiment here on DT. Maybe the DT can be generalized
> to provide a ram region to which either ramoops or ramtrace can
> attach.

Right,

Perhaps just remove patch 7, but still have the ramoops work move forward?

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2022-06-30 19:48       ` Steven Rostedt
@ 2022-07-01 16:37         ` Joel Fernandes
  2022-07-01 16:46           ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2022-07-01 16:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rob Herring, Nachammai Karuppiah, Ingo Molnar, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, Brian Norris

On Thu, Jun 30, 2022 at 3:48 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 10 Sep 2020 21:25:11 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
> > Hi Rob,
> > (Back from holidays, digging through the email pile). Reply below:
>
> What ever happen to this?
>
> Sorry, I was expecting more replies, and when there was nothing, it got
> lost in my inbox.
>
[...]
> > > From a DT standpoint, we already have a reserved persistent RAM
> > > binding too. There's already too much kernel specifics on how it is
> > > used, we don't need more of that in DT. We're not going to add another
> > > separate region (actually, you can have as many regions defined as you
> > > want. They will just all be 'ramoops' compatible).
> >
> > I agree with the sentiment here on DT. Maybe the DT can be generalized
> > to provide a ram region to which either ramoops or ramtrace can
> > attach.
>
> Right,
>
> Perhaps just remove patch 7, but still have the ramoops work move forward?

This was an internship project submission which stalled after the
internship ended, I imagine Nachammai has moved on to doing other
things since.

I am curious how this came on your radar after 2 years, did someone
tell you to prioritize improving performance of ftrace on pstore? I
could probably make time to work on it more if someone has a usecase
for this or something.

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2022-07-01 16:37         ` Joel Fernandes
@ 2022-07-01 16:46           ` Steven Rostedt
  2022-07-01 16:53             ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2022-07-01 16:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Rob Herring, Nachammai Karuppiah, Ingo Molnar, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, Brian Norris

On Fri, 1 Jul 2022 12:37:35 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> I am curious how this came on your radar after 2 years, did someone
> tell you to prioritize improving performance of ftrace on pstore? I
> could probably make time to work on it more if someone has a usecase
> for this or something.

I'm looking into ways to extract the ftrace ring buffer from crashes, and
it was brought up that pstore was used before.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2022-07-01 16:46           ` Steven Rostedt
@ 2022-07-01 16:53             ` Joel Fernandes
  2022-07-01 17:57               ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2022-07-01 16:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rob Herring, Nachammai Karuppiah, Ingo Molnar, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, Brian Norris

On Fri, Jul 1, 2022 at 12:46 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 1 Jul 2022 12:37:35 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
> > I am curious how this came on your radar after 2 years, did someone
> > tell you to prioritize improving performance of ftrace on pstore? I
> > could probably make time to work on it more if someone has a usecase
> > for this or something.
>
> I'm looking into ways to extract the ftrace ring buffer from crashes, and
> it was brought up that pstore was used before.

Interesting. In the case of pstore, you know exactly where the pages
are for ftrace. How would you know that for the buddy system where
pages are in the wild wild west? I guess you would need to track where
ftrace pages where allocated, within the crash dump/report.

 - Joel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Trace events to pstore
  2022-07-01 16:53             ` Joel Fernandes
@ 2022-07-01 17:57               ` Steven Rostedt
  0 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2022-07-01 17:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Rob Herring, Nachammai Karuppiah, Ingo Molnar, Frank Rowand,
	Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, LKML,
	Sai Prakash Ranjan, Brian Norris

On Fri, 1 Jul 2022 12:53:17 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> Interesting. In the case of pstore, you know exactly where the pages
> are for ftrace. How would you know that for the buddy system where
> pages are in the wild wild west? I guess you would need to track where
> ftrace pages where allocated, within the crash dump/report.

kexec/kdump already does that (of course it requires the DWARF symbols of
the kernel to be accessible by the kdump kernel).

But if we write the raw ftrace data into persistent memory that can survive
a reboot, then we can extract that raw data and convert it back to text
offline.

Thus, I would like to remove the converting to text and compressing into
pstore, and possibly look at a solution that simply writes the raw data
into pstore.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-07-01 17:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02 20:00 [RFC PATCH 0/7] Trace events to pstore Nachammai Karuppiah
2020-09-02 20:00 ` [RFC PATCH 1/7] tracing: Add support to allocate pages from persistent memory Nachammai Karuppiah
2020-09-02 20:00 ` [RFC PATCH 2/7] pstore: Support a new backend, ramtrace Nachammai Karuppiah
2020-09-02 20:00 ` [RFC PATCH 3/7] pstore: Read and iterate through trace entries in PSTORE Nachammai Karuppiah
2020-09-02 20:00 ` [RFC PATCH 4/7] pstore: Allocate and free page-sized memory in persistent RAM buffer Nachammai Karuppiah
2020-09-02 20:00 ` [RFC PATCH 5/7] tracing: Add support to iterate through pages retrieved from pstore Nachammai Karuppiah
2020-09-02 20:00 ` [RFC PATCH 6/7] tracing: Use ramtrace alloc and free methods while using persistent RAM Nachammai Karuppiah
2020-09-02 20:00 ` [RFC PATCH 7/7] dt-bindings: ramtrace: Add ramtrace DT node Nachammai Karuppiah
2020-09-02 21:47 ` [RFC PATCH 0/7] Trace events to pstore Joel Fernandes
2020-09-02 21:54   ` Joel Fernandes
2020-09-03  5:36   ` Sai Prakash Ranjan
2020-09-03 18:09   ` Rob Herring
2020-09-11  1:25     ` Joel Fernandes
2022-06-30 19:48       ` Steven Rostedt
2022-07-01 16:37         ` Joel Fernandes
2022-07-01 16:46           ` Steven Rostedt
2022-07-01 16:53             ` Joel Fernandes
2022-07-01 17:57               ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).