[PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
@ 2020-05-08  6:39 ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

Hi!

This is a v4 of WeiXiong Liao's series. I spent time porting this on top
of the latest pstore (mainly to support max_reason), and I started making
various other changes, mostly just bikeshed stuff.

Changes since v3:
	fixing up various typos, alternate phrases, and language. For
	example:
	        recorder -> frontend
	        Pstore -> pstore

	filenames:
	        rename pstore_*.c -> *.c and adjust Makefile
		(I decided fs/pstore/pstore_zone.c repeated "pstore" one too many time. ;)
		pstore-block.rst -> pstore_blk.rst

	conversion of dump_oops -> max_reason

	refactor/rename get_reason_str() and move to kernel/printk/printk.c

	psz* -> pstore_zone* renamings:
	        psblk_usr_info() ->... pstore_blk_usr_info()
	        psz_zone -> pstore_zone
	        pszinfo -> pstore_zone_info

	register_pstore_zone():
	        registration reporting via pr_cont(), with max_reason
	        remove needless get/put_module()

	public API renamings: VERB_NOUN()
	        psz_*register() -> *register_pstore_zone()

v3: https://lore.kernel.org/lkml/1585126506-18635-1-git-send-email-liaoweixiong@allwinnertech.com/
v2: https://lore.kernel.org/lkml/1581078355-19647-1-git-send-email-liaoweixiong@allwinnertech.com/
v1: https://lore.kernel.org/lkml/1579482233-2672-1-git-send-email-liaoweixiong@allwinnertech.com/

So far, I've identified the following stuff left to do:
        - settle on various function/struct renamings
        - review locking
        - implement ramoops-like probe feature for pstore/blk
	- spend time seeing how ramoops might use pstore/zone

But I wanted to get this update published just to show what I've done
so far in my bikeshed review. :)

Thanks!

-Kees


Kees Cook (1):
  printk: Introduce kmsg_dump_reason_str()

WeiXiong Liao (11):
  pstore/zone: Introduce common layer to manage storage zones
  pstore/blk: Introduce backend for block devices
  pstore/blk: Provide way to choose pstore frontend support
  pstore/blk: Add support for pmsg frontend
  pstore/blk: Add console frontend support
  pstore/blk: Add ftrace frontend support
  Documentation: Add details for pstore/blk
  pstore/zone: Provide way to skip "broken" zone for MTD devices
  pstore/blk: Provide way to query pstore configuration
  pstore/blk: Support non-block storage devices
  mtd: Support kmsg dumper based on pstore/blk

 Documentation/admin-guide/pstore-blk.rst |  243 ++++
 MAINTAINERS                              |    1 +
 drivers/mtd/Kconfig                      |   10 +
 drivers/mtd/Makefile                     |    1 +
 drivers/mtd/mtdpstore.c                  |  564 ++++++++
 fs/pstore/Kconfig                        |  109 ++
 fs/pstore/Makefile                       |    6 +
 fs/pstore/blk.c                          |  481 +++++++
 fs/pstore/platform.c                     |   22 +-
 fs/pstore/zone.c                         | 1498 ++++++++++++++++++++++
 include/linux/kmsg_dump.h                |    7 +
 include/linux/pstore_blk.h               |   77 ++
 include/linux/pstore_zone.h              |   60 +
 kernel/printk/printk.c                   |   21 +
 14 files changed, 3079 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/admin-guide/pstore-blk.rst
 create mode 100644 drivers/mtd/mtdpstore.c
 create mode 100644 fs/pstore/blk.c
 create mode 100644 fs/pstore/zone.c
 create mode 100644 include/linux/pstore_blk.h
 create mode 100644 include/linux/pstore_zone.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
@ 2020-05-08  6:39 ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

Hi!

This is a v4 of WeiXiong Liao's series. I spent time porting this on top
of the latest pstore (mainly to support max_reason), and I started making
various other changes, mostly just bikeshed stuff.

Changes since v3:
	fixing up various typos, alternate phrases, and language. For
	example:
	        recorder -> frontend
	        Pstore -> pstore

	filenames:
	        rename pstore_*.c -> *.c and adjust Makefile
		(I decided fs/pstore/pstore_zone.c repeated "pstore" one too many time. ;)
		pstore-block.rst -> pstore_blk.rst

	conversion of dump_oops -> max_reason

	refactor/rename get_reason_str() and move to kernel/printk/printk.c

	psz* -> pstore_zone* renamings:
	        psblk_usr_info() ->... pstore_blk_usr_info()
	        psz_zone -> pstore_zone
	        pszinfo -> pstore_zone_info

	register_pstore_zone():
	        registration reporting via pr_cont(), with max_reason
	        remove needless get/put_module()

	public API renamings: VERB_NOUN()
	        psz_*register() -> *register_pstore_zone()

v3: https://lore.kernel.org/lkml/1585126506-18635-1-git-send-email-liaoweixiong@allwinnertech.com/
v2: https://lore.kernel.org/lkml/1581078355-19647-1-git-send-email-liaoweixiong@allwinnertech.com/
v1: https://lore.kernel.org/lkml/1579482233-2672-1-git-send-email-liaoweixiong@allwinnertech.com/

So far, I've identified the following stuff left to do:
        - settle on various function/struct renamings
        - review locking
        - implement ramoops-like probe feature for pstore/blk
	- spend time seeing how ramoops might use pstore/zone

But I wanted to get this update published just to show what I've done
so far in my bikeshed review. :)

Thanks!

-Kees


Kees Cook (1):
  printk: Introduce kmsg_dump_reason_str()

WeiXiong Liao (11):
  pstore/zone: Introduce common layer to manage storage zones
  pstore/blk: Introduce backend for block devices
  pstore/blk: Provide way to choose pstore frontend support
  pstore/blk: Add support for pmsg frontend
  pstore/blk: Add console frontend support
  pstore/blk: Add ftrace frontend support
  Documentation: Add details for pstore/blk
  pstore/zone: Provide way to skip "broken" zone for MTD devices
  pstore/blk: Provide way to query pstore configuration
  pstore/blk: Support non-block storage devices
  mtd: Support kmsg dumper based on pstore/blk

 Documentation/admin-guide/pstore-blk.rst |  243 ++++
 MAINTAINERS                              |    1 +
 drivers/mtd/Kconfig                      |   10 +
 drivers/mtd/Makefile                     |    1 +
 drivers/mtd/mtdpstore.c                  |  564 ++++++++
 fs/pstore/Kconfig                        |  109 ++
 fs/pstore/Makefile                       |    6 +
 fs/pstore/blk.c                          |  481 +++++++
 fs/pstore/platform.c                     |   22 +-
 fs/pstore/zone.c                         | 1498 ++++++++++++++++++++++
 include/linux/kmsg_dump.h                |    7 +
 include/linux/pstore_blk.h               |   77 ++
 include/linux/pstore_zone.h              |   60 +
 kernel/printk/printk.c                   |   21 +
 14 files changed, 3079 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/admin-guide/pstore-blk.rst
 create mode 100644 drivers/mtd/mtdpstore.c
 create mode 100644 fs/pstore/blk.c
 create mode 100644 fs/pstore/zone.c
 create mode 100644 include/linux/pstore_blk.h
 create mode 100644 include/linux/pstore_zone.h

-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 01/12] printk: Introduce kmsg_dump_reason_str()
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:39   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

The pstore subsystem already had a private version of this function.
With the coming addition of the pstore/zone driver, this needs to be
shared. As it really should live with printk, move it there instead.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/kmsg_dump.h |  7 +++++++
 kernel/printk/printk.c    | 21 +++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index cfc042066be7..b3ddb0b2ee40 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -72,6 +72,8 @@ void kmsg_dump_rewind(struct kmsg_dumper *dumper);
 int kmsg_dump_register(struct kmsg_dumper *dumper);
 
 int kmsg_dump_unregister(struct kmsg_dumper *dumper);
+
+const char *kmsg_dump_reason_str(enum kmsg_dump_reason reason);
 #else
 static inline void kmsg_dump(enum kmsg_dump_reason reason)
 {
@@ -113,6 +115,11 @@ static inline int kmsg_dump_unregister(struct kmsg_dumper *dumper)
 {
 	return -EINVAL;
 }
+
+static inline const char *kmsg_dump_reason_str(enum kmsg_dump_reason reason)
+{
+	return "Disabled";
+}
 #endif
 
 #endif /* _LINUX_KMSG_DUMP_H */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1aab69a8a2bf..67a284830d74 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3144,6 +3144,27 @@ EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
 static bool always_kmsg_dump;
 module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
 
+const char *kmsg_dump_reason_str(enum kmsg_dump_reason reason)
+{
+	switch (reason) {
+	case KMSG_DUMP_PANIC:
+		return "Panic";
+	case KMSG_DUMP_OOPS:
+		return "Oops";
+	case KMSG_DUMP_EMERG:
+		return "Emergency";
+	case KMSG_DUMP_RESTART:
+		return "Restart";
+	case KMSG_DUMP_HALT:
+		return "Halt";
+	case KMSG_DUMP_POWEROFF:
+		return "Poweroff";
+	default:
+		return "Unknown";
+	}
+}
+EXPORT_SYMBOL_GPL(kmsg_dump_reason_str);
+
 /**
  * kmsg_dump - dump kernel log to kernel message dumpers.
  * @reason: the reason (oops, panic etc) for dumping
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 01/12] printk: Introduce kmsg_dump_reason_str()
@ 2020-05-08  6:39   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

The pstore subsystem already had a private version of this function.
With the coming addition of the pstore/zone driver, this needs to be
shared. As it really should live with printk, move it there instead.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/kmsg_dump.h |  7 +++++++
 kernel/printk/printk.c    | 21 +++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index cfc042066be7..b3ddb0b2ee40 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -72,6 +72,8 @@ void kmsg_dump_rewind(struct kmsg_dumper *dumper);
 int kmsg_dump_register(struct kmsg_dumper *dumper);
 
 int kmsg_dump_unregister(struct kmsg_dumper *dumper);
+
+const char *kmsg_dump_reason_str(enum kmsg_dump_reason reason);
 #else
 static inline void kmsg_dump(enum kmsg_dump_reason reason)
 {
@@ -113,6 +115,11 @@ static inline int kmsg_dump_unregister(struct kmsg_dumper *dumper)
 {
 	return -EINVAL;
 }
+
+static inline const char *kmsg_dump_reason_str(enum kmsg_dump_reason reason)
+{
+	return "Disabled";
+}
 #endif
 
 #endif /* _LINUX_KMSG_DUMP_H */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1aab69a8a2bf..67a284830d74 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3144,6 +3144,27 @@ EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
 static bool always_kmsg_dump;
 module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
 
+const char *kmsg_dump_reason_str(enum kmsg_dump_reason reason)
+{
+	switch (reason) {
+	case KMSG_DUMP_PANIC:
+		return "Panic";
+	case KMSG_DUMP_OOPS:
+		return "Oops";
+	case KMSG_DUMP_EMERG:
+		return "Emergency";
+	case KMSG_DUMP_RESTART:
+		return "Restart";
+	case KMSG_DUMP_HALT:
+		return "Halt";
+	case KMSG_DUMP_POWEROFF:
+		return "Poweroff";
+	default:
+		return "Unknown";
+	}
+}
+EXPORT_SYMBOL_GPL(kmsg_dump_reason_str);
+
 /**
  * kmsg_dump - dump kernel log to kernel message dumpers.
  * @reason: the reason (oops, panic etc) for dumping
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 02/12] pstore/zone: Introduce common layer to manage storage zones
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:39   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Implement a common set of APIs needed to support pstore storage zones,
based on how ramoops is designed. This will be used by pstore/blk with
the intention of migrating pstore/ram in the future.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-2-git-send-email-liaoweixiong@allwinnertech.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           |   7 +
 fs/pstore/Makefile          |   3 +
 fs/pstore/zone.c            | 973 ++++++++++++++++++++++++++++++++++++
 include/linux/pstore_zone.h |  44 ++
 4 files changed, 1027 insertions(+)
 create mode 100644 fs/pstore/zone.c
 create mode 100644 include/linux/pstore_zone.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 8f0369aad22a..98d2457bdd9f 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -153,3 +153,10 @@ config PSTORE_RAM
 	  "ramoops.ko".
 
 	  For more information, see Documentation/admin-guide/ramoops.rst.
+
+config PSTORE_ZONE
+	tristate
+	depends on PSTORE
+	help
+	  The common layer for pstore/blk (and pstore/ram in the future)
+	  to manage storage in zones.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 967b5891f325..58a967cbe4af 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
 
 ramoops-objs += ram.o ram_core.o
 obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
+
+pstore_zone-objs += zone.o
+obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
new file mode 100644
index 000000000000..6c25c443c8e2
--- /dev/null
+++ b/fs/pstore/zone.c
@@ -0,0 +1,973 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define MODNAME "pstore-zone"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/pstore.h>
+#include <linux/mount.h>
+#include <linux/printk.h>
+#include <linux/fs.h>
+#include <linux/pstore_zone.h>
+#include <linux/kdev_t.h>
+#include <linux/device.h>
+#include <linux/namei.h>
+#include <linux/fcntl.h>
+#include <linux/uio.h>
+#include <linux/writeback.h>
+
+/**
+ * struct psz_head - header of zone to flush to storage
+ *
+ * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
+ * @datalen: length of data in @data
+ * @data: zone data.
+ */
+struct psz_buffer {
+#define PSZ_SIG (0x43474244) /* DBGC */
+	uint32_t sig;
+	atomic_t datalen;
+	uint8_t data[];
+};
+
+/**
+ * struct psz_oops_header - sub header of oops zones to flush to storage
+ *
+ * @magic: magic num for oops header
+ * @time: oops/panic trigger time
+ * @compressed: whether conpressed
+ * @counter: oops/panic counter
+ * @reason: identify oops or panic
+ * @data: pointer to log data
+ *
+ * It's a sub-header of oops zone, trailing after &psz_buffer.
+ */
+struct psz_oops_header {
+#define OOPS_HEADER_MAGIC 0x4dfc3ae5 /* Just a ramdom number */
+	uint32_t magic;
+	struct timespec64 time;
+	bool compressed;
+	uint32_t counter;
+	enum kmsg_dump_reason reason;
+	uint8_t data[];
+};
+
+/**
+ * struct pstore_zone - zone information
+ *
+ * @off: zone offset of storage
+ * @type: front-end type for this zone
+ * @name: front-end name for this zone
+ * @buffer: pointer to data buffer managed by this zone
+ * @oldbuf: pointer to old data buffer.
+ * @buffer_size: bytes in @buffer->data
+ * @should_recover: whether this zone should recover from storage
+ * @dirty: whether the data in @buffer dirty
+ *
+ * zone structure in memory.
+ */
+struct pstore_zone {
+	loff_t off;
+	const char *name;
+	enum pstore_type_id type;
+
+	struct psz_buffer *buffer;
+	struct psz_buffer *oldbuf;
+	size_t buffer_size;
+	bool should_recover;
+	atomic_t dirty;
+};
+
+/**
+ * struct psz_context - all about running state of pstore/zone
+ *
+ * @opszs: oops/panic storage zones
+ * @oops_max_cnt: max count of @opszs
+ * @oops_read_cnt: counter to read oops zone
+ * @oops_write_cnt: counter to write
+ * @oops_counter: counter to oops
+ * @panic_counter: counter to panic
+ * @recovered: whether finish recovering data from storage
+ * @on_panic: whether occur panic
+ * @pstore_zone_info_lock: lock to @pstore_zone_info
+ * @pstore_zone_info: information from back-end
+ * @pstore: structure for pstore
+ */
+struct psz_context {
+	struct pstore_zone **opszs;
+	unsigned int oops_max_cnt;
+	unsigned int oops_read_cnt;
+	unsigned int oops_write_cnt;
+	/*
+	 * the counter should be recovered when recover.
+	 * It records the oops/panic times after burning rather than booting.
+	 */
+	unsigned int oops_counter;
+	unsigned int panic_counter;
+	atomic_t recovered;
+	atomic_t on_panic;
+
+	/*
+	 * pstore_zone_info_lock just protects "pstore_zone_info" during calls to
+	 * register_pstore_zone/unregister_pstore_zone
+	 */
+	struct mutex pstore_zone_info_lock;
+	struct pstore_zone_info *pstore_zone_info;
+	struct pstore_info pstore;
+};
+static struct psz_context psz_cxt;
+
+/**
+ * enum psz_flush_mode - flush mode for psz_zone_write()
+ *
+ * @FLUSH_NONE: do not flush to storage but update data on memory
+ * @FLUSH_PART: just flush part of data including meta data to storage
+ * @FLUSH_META: just flush meta data of zone to storage
+ * @FLUSH_ALL: flush all of zone
+ */
+enum psz_flush_mode {
+	FLUSH_NONE = 0,
+	FLUSH_PART,
+	FLUSH_META,
+	FLUSH_ALL,
+};
+
+static inline int buffer_datalen(struct pstore_zone *zone)
+{
+	return atomic_read(&zone->buffer->datalen);
+}
+
+static inline bool is_on_panic(void)
+{
+	struct psz_context *cxt = &psz_cxt;
+
+	return atomic_read(&cxt->on_panic);
+}
+
+static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
+		size_t len, unsigned long off)
+{
+	if (!buf || !zone->buffer)
+		return -EINVAL;
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	len = min_t(size_t, len, zone->buffer_size - off);
+	memcpy(buf, zone->buffer->data + off, len);
+	return len;
+}
+
+static int psz_zone_write(struct pstore_zone *zone,
+		enum psz_flush_mode flush_mode, const char *buf,
+		size_t len, unsigned long off)
+{
+	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
+	ssize_t wcnt = 0;
+	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
+	size_t wlen;
+
+	if (off > zone->buffer_size)
+		return -EINVAL;
+
+	wlen = min_t(size_t, len, zone->buffer_size - off);
+	if (buf && wlen) {
+		memcpy(zone->buffer->data + off, buf, wlen);
+		atomic_set(&zone->buffer->datalen, wlen + off);
+	}
+
+	/* avoid to damage old records */
+	if (!is_on_panic() && !atomic_read(&psz_cxt.recovered))
+		goto dirty;
+
+	writeop = is_on_panic() ? info->panic_write : info->write;
+	if (!writeop)
+		goto dirty;
+
+	switch (flush_mode) {
+	case FLUSH_NONE:
+		if (unlikely(buf && wlen))
+			goto dirty;
+		return 0;
+	case FLUSH_PART:
+		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
+				zone->off + sizeof(*zone->buffer) + off);
+		if (wcnt != wlen)
+			goto dirty;
+		fallthrough;
+	case FLUSH_META:
+		wlen = sizeof(struct psz_buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto dirty;
+		break;
+	case FLUSH_ALL:
+		wlen = zone->buffer_size + sizeof(*zone->buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto dirty;
+		break;
+	}
+
+	return 0;
+dirty:
+	atomic_set(&zone->dirty, true);
+	return -EBUSY;
+}
+
+static int psz_flush_dirty_zone(struct pstore_zone *zone)
+{
+	int ret;
+
+	if (!zone)
+		return -EINVAL;
+
+	if (!atomic_read(&zone->dirty))
+		return 0;
+
+	if (!atomic_read(&psz_cxt.recovered))
+		return -EBUSY;
+
+	ret = psz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
+	if (!ret)
+		atomic_set(&zone->dirty, false);
+	return ret;
+}
+
+static int psz_flush_dirty_zones(struct pstore_zone **zones, unsigned int cnt)
+{
+	int i, ret;
+	struct pstore_zone *zone;
+
+	if (!zones)
+		return -EINVAL;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (!zone)
+			return -EINVAL;
+		ret = psz_flush_dirty_zone(zone);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
+{
+	const char *data = (const char *)old->buffer->data;
+	int ret;
+
+	ret = psz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
+	if (ret) {
+		atomic_set(&new->buffer->datalen, 0);
+		atomic_set(&new->dirty, false);
+		return ret;
+	}
+	atomic_set(&old->buffer->datalen, 0);
+	return 0;
+}
+
+static int psz_recover_oops_data(struct psz_context *cxt)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	struct pstore_zone *zone = NULL;
+	struct psz_buffer *buf;
+	unsigned long i;
+	ssize_t rcnt;
+
+	if (!info->read)
+		return -EINVAL;
+
+	for (i = 0; i < cxt->oops_max_cnt; i++) {
+		zone = cxt->opszs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+		if (atomic_read(&zone->dirty)) {
+			unsigned int wcnt = cxt->oops_write_cnt;
+			struct pstore_zone *new = cxt->opszs[wcnt];
+			int ret;
+
+			ret = psz_move_zone(zone, new);
+			if (ret) {
+				pr_err("move zone from %lu to %d failed\n",
+						i, wcnt);
+				return ret;
+			}
+			cxt->oops_write_cnt = (wcnt + 1) % cxt->oops_max_cnt;
+		}
+		if (!zone->should_recover)
+			continue;
+		buf = zone->buffer;
+		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
+				zone->off);
+		if (rcnt != zone->buffer_size + sizeof(*buf))
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+	}
+	return 0;
+}
+
+static int psz_recover_oops_meta(struct psz_context *cxt)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	struct pstore_zone *zone;
+	size_t rcnt, len;
+	struct psz_buffer *buf;
+	struct psz_oops_header *hdr;
+	struct timespec64 time = {0};
+	unsigned long i;
+	/*
+	 * Recover may on panic, we can't allocate any memory by kmalloc.
+	 * So, we use local array instead.
+	 */
+	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
+
+	if (!info->read)
+		return -EINVAL;
+
+	len = sizeof(*buf) + sizeof(*hdr);
+	buf = (struct psz_buffer *)buffer_header;
+	for (i = 0; i < cxt->oops_max_cnt; i++) {
+		zone = cxt->opszs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+
+		rcnt = info->read((char *)buf, len, zone->off);
+		if (rcnt != len) {
+			pr_err("read %s with id %lu failed\n", zone->name, i);
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+		}
+
+		if (buf->sig != zone->buffer->sig) {
+			pr_debug("no valid data in oops zone %lu\n", i);
+			continue;
+		}
+
+		if (zone->buffer_size < atomic_read(&buf->datalen)) {
+			pr_info("found overtop zone: %s: id %lu, off %lld, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		hdr = (struct psz_oops_header *)buf->data;
+		if (hdr->magic != OOPS_HEADER_MAGIC) {
+			pr_info("found invalid zone: %s: id %lu, off %lld, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		/*
+		 * we get the newest zone, and the next one must be the oldest
+		 * or unused zone, because we do write one by one like a circle.
+		 */
+		if (hdr->time.tv_sec >= time.tv_sec) {
+			time.tv_sec = hdr->time.tv_sec;
+			cxt->oops_write_cnt = (i + 1) % cxt->oops_max_cnt;
+		}
+
+		if (hdr->reason == KMSG_DUMP_OOPS)
+			cxt->oops_counter =
+				max(cxt->oops_counter, hdr->counter);
+		else
+			cxt->panic_counter =
+				max(cxt->panic_counter, hdr->counter);
+
+		if (!atomic_read(&buf->datalen)) {
+			pr_debug("found erased zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
+					zone->name, i, zone->off,
+					zone->buffer_size,
+					atomic_read(&buf->datalen));
+			continue;
+		}
+
+		if (!is_on_panic())
+			zone->should_recover = true;
+		pr_debug("found nice zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
+				zone->name, i, zone->off,
+				zone->buffer_size, atomic_read(&buf->datalen));
+	}
+
+	return 0;
+}
+
+static int psz_recover_oops(struct psz_context *cxt)
+{
+	int ret;
+
+	if (!cxt->opszs)
+		return 0;
+
+	ret = psz_recover_oops_meta(cxt);
+	if (ret)
+		goto recover_fail;
+
+	ret = psz_recover_oops_data(cxt);
+	if (ret)
+		goto recover_fail;
+
+	return 0;
+recover_fail:
+	pr_debug("recover oops failed\n");
+	return ret;
+}
+
+/**
+ * psz_recovery() - recover data from storage
+ * @cxt: the context of pstore/zone
+ *
+ * recovery means reading data back from storage after rebooting
+ *
+ * Return: 0 on success, others on failure.
+ */
+static inline int psz_recovery(struct psz_context *cxt)
+{
+	int ret = -EBUSY;
+
+	if (atomic_read(&cxt->recovered))
+		return 0;
+
+	ret = psz_recover_oops(cxt);
+	if (ret)
+		goto recover_fail;
+
+	pr_debug("recover end!\n");
+	atomic_set(&cxt->recovered, 1);
+	return 0;
+
+recover_fail:
+	pr_err("recover failed\n");
+	return ret;
+}
+
+static int psz_pstore_open(struct pstore_info *psi)
+{
+	struct psz_context *cxt = psi->data;
+
+	cxt->oops_read_cnt = 0;
+	return 0;
+}
+
+static inline bool psz_ok(struct pstore_zone *zone)
+{
+	if (zone && zone->buffer && buffer_datalen(zone))
+		return true;
+	return false;
+}
+
+static inline int psz_oops_erase(struct psz_context *cxt,
+		struct pstore_zone *zone, struct pstore_record *record)
+{
+	struct psz_buffer *buffer = zone->buffer;
+	struct psz_oops_header *hdr =
+		(struct psz_oops_header *)buffer->data;
+
+	if (unlikely(!psz_ok(zone)))
+		return 0;
+	/* this zone is already updated, no need to erase */
+	if (record->count != hdr->counter)
+		return 0;
+
+	atomic_set(&zone->buffer->datalen, 0);
+	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+}
+
+static int psz_pstore_erase(struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		if (record->id >= cxt->oops_max_cnt)
+			return -EINVAL;
+		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
+	default:
+		return -EINVAL;
+	}
+}
+
+static void psz_write_kmsg_hdr(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+	struct psz_buffer *buffer = zone->buffer;
+	struct psz_oops_header *hdr =
+		(struct psz_oops_header *)buffer->data;
+
+	hdr->magic = OOPS_HEADER_MAGIC;
+	hdr->compressed = record->compressed;
+	hdr->time.tv_sec = record->time.tv_sec;
+	hdr->time.tv_nsec = record->time.tv_nsec;
+	hdr->reason = record->reason;
+	if (hdr->reason == KMSG_DUMP_OOPS)
+		hdr->counter = ++cxt->oops_counter;
+	else
+		hdr->counter = ++cxt->panic_counter;
+}
+
+static inline int notrace psz_oops_write_record(struct psz_context *cxt,
+		struct pstore_record *record)
+{
+	size_t size, hlen;
+	struct pstore_zone *zone;
+	unsigned int zonenum;
+
+	zonenum = cxt->oops_write_cnt;
+	zone = cxt->opszs[zonenum];
+	if (unlikely(!zone))
+		return -ENOSPC;
+	cxt->oops_write_cnt = (zonenum + 1) % cxt->oops_max_cnt;
+
+	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+	psz_write_kmsg_hdr(zone, record);
+	hlen = sizeof(struct psz_oops_header);
+	size = min_t(size_t, record->size, zone->buffer_size - hlen);
+	return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+}
+
+static int notrace psz_oops_write(struct psz_context *cxt,
+		struct pstore_record *record)
+{
+	int ret;
+
+	/*
+	 * Explicitly only take the first part of any new crash.
+	 * If our buffer is larger than kmsg_bytes, this can never happen,
+	 * and if our buffer is smaller than kmsg_bytes, we don't want the
+	 * report split across multiple records.
+	 */
+	if (record->part != 1)
+		return -ENOSPC;
+
+	if (!cxt->opszs)
+		return -ENOSPC;
+
+	ret = psz_oops_write_record(cxt, record);
+	if (!ret) {
+		pr_debug("try to flush other dirty oops zones\n");
+		psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+	}
+
+	/* always return 0 as we had handled it on buffer */
+	return 0;
+}
+
+static int notrace psz_pstore_write(struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+
+	if (record->type == PSTORE_TYPE_DMESG &&
+			record->reason == KMSG_DUMP_PANIC)
+		atomic_set(&cxt->on_panic, 1);
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		return psz_oops_write(cxt, record);
+	default:
+		return -EINVAL;
+	}
+}
+
+static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
+{
+	struct pstore_zone *zone = NULL;
+
+	while (cxt->oops_read_cnt < cxt->oops_max_cnt) {
+		zone = cxt->opszs[cxt->oops_read_cnt++];
+		if (psz_ok(zone))
+			return zone;
+	}
+
+	return NULL;
+}
+
+static int psz_read_oops_hdr(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	struct psz_buffer *buffer = zone->buffer;
+	struct psz_oops_header *hdr =
+		(struct psz_oops_header *)buffer->data;
+
+	if (hdr->magic != OOPS_HEADER_MAGIC)
+		return -EINVAL;
+	record->compressed = hdr->compressed;
+	record->time.tv_sec = hdr->time.tv_sec;
+	record->time.tv_nsec = hdr->time.tv_nsec;
+	record->reason = hdr->reason;
+	record->count = hdr->counter;
+	return 0;
+}
+
+static ssize_t psz_oops_read(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	ssize_t size, hlen = 0;
+
+	size = buffer_datalen(zone);
+	/* Clear and skip this oops record if it has no valid header */
+	if (psz_read_oops_hdr(zone, record)) {
+		atomic_set(&zone->buffer->datalen, 0);
+		atomic_set(&zone->dirty, 0);
+		return -ENOMSG;
+	}
+	size -= sizeof(struct psz_oops_header);
+
+	if (!record->compressed) {
+		char *buf = kasprintf(GFP_KERNEL, "%s: Total %d times\n",
+				      kmsg_dump_reason_str(record->reason),
+				      record->count);
+		hlen = strlen(buf);
+		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
+		if (!record->buf) {
+			kfree(buf);
+			return -ENOMEM;
+		}
+	} else {
+		record->buf = kmalloc(size, GFP_KERNEL);
+		if (!record->buf)
+			return -ENOMEM;
+	}
+
+	size = psz_zone_read(zone, record->buf + hlen, size,
+			sizeof(struct psz_oops_header) < 0);
+	if (unlikely(size < 0)) {
+		kfree(record->buf);
+		return -ENOMSG;
+	}
+
+	return size + hlen;
+}
+
+static ssize_t psz_pstore_read(struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+	ssize_t (*readop)(struct pstore_zone *zone,
+			struct pstore_record *record);
+	struct pstore_zone *zone;
+	ssize_t ret;
+
+	/* before read, we must recover from storage */
+	ret = psz_recovery(cxt);
+	if (ret)
+		return ret;
+
+next_zone:
+	zone = psz_read_next_zone(cxt);
+	if (!zone)
+		return 0;
+
+	record->type = zone->type;
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		readop = psz_oops_read;
+		record->id = cxt->oops_read_cnt - 1;
+		break;
+	default:
+		goto next_zone;
+	}
+
+	ret = readop(zone, record);
+	if (ret == -ENOMSG)
+		goto next_zone;
+	return ret;
+}
+
+static struct psz_context psz_cxt = {
+	.pstore_zone_info_lock = __MUTEX_INITIALIZER(psz_cxt.pstore_zone_info_lock),
+	.recovered = ATOMIC_INIT(0),
+	.on_panic = ATOMIC_INIT(0),
+	.pstore = {
+		.owner = THIS_MODULE,
+		.name = MODNAME,
+		.open = psz_pstore_open,
+		.read = psz_pstore_read,
+		.write = psz_pstore_write,
+		.erase = psz_pstore_erase,
+	},
+};
+
+static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
+		loff_t *off, size_t size)
+{
+	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
+	struct pstore_zone *zone;
+	const char *name = pstore_type_to_name(type);
+
+	if (!size)
+		return NULL;
+
+	if (*off + size > info->total_size) {
+		pr_err("no room for %s (0x%zx@0x%llx over 0x%lx)\n",
+			name, size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	zone = kzalloc(sizeof(struct pstore_zone), GFP_KERNEL);
+	if (!zone)
+		return ERR_PTR(-ENOMEM);
+
+	zone->buffer = kmalloc(size, GFP_KERNEL);
+	if (!zone->buffer) {
+		kfree(zone);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zone->buffer, 0xFF, size);
+	zone->off = *off;
+	zone->name = name;
+	zone->type = type;
+	zone->buffer_size = size - sizeof(struct psz_buffer);
+	zone->buffer->sig = type ^ PSZ_SIG;
+	atomic_set(&zone->dirty, 0);
+	atomic_set(&zone->buffer->datalen, 0);
+
+	*off += size;
+
+	pr_debug("pszone %s: off 0x%llx, %zu header, %zu data\n", zone->name,
+			zone->off, sizeof(*zone->buffer), zone->buffer_size);
+	return zone;
+}
+
+static struct pstore_zone **psz_init_zones(enum pstore_type_id type,
+	loff_t *off, size_t total_size, ssize_t record_size,
+	unsigned int *cnt)
+{
+	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
+	struct pstore_zone **zones, *zone;
+	const char *name = pstore_type_to_name(type);
+	int c, i;
+
+	if (!total_size || !record_size)
+		return NULL;
+
+	if (*off + total_size > info->total_size) {
+		pr_err("no room for zones %s (0x%zx@0x%llx over 0x%lx)\n",
+			name, total_size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	c = total_size / record_size;
+	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
+	if (!zones) {
+		pr_err("allocate for zones %s failed\n", name);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zones, 0, c * sizeof(*zones));
+
+	for (i = 0; i < c; i++) {
+		zone = psz_init_zone(type, off, record_size);
+		if (!zone || IS_ERR(zone)) {
+			pr_err("initialize zones %s failed\n", name);
+			while (--i >= 0) {
+				kfree(zones[i]->buffer);
+				kfree(zones[i]);
+			}
+			kfree(zones);
+			return (void *)zone;
+		}
+		zones[i] = zone;
+	}
+
+	*cnt = c;
+	return zones;
+}
+
+static void psz_free_zone(struct pstore_zone **pszone)
+{
+	struct pstore_zone *zone = *pszone;
+
+	if (!zone)
+		return;
+
+	kfree(zone->buffer);
+	kfree(zone);
+	*pszone = NULL;
+}
+
+static void psz_free_zones(struct pstore_zone ***pszones, unsigned int *cnt)
+{
+	struct pstore_zone **zones = *pszones;
+
+	if (!zones)
+		return;
+
+	while (*cnt > 0) {
+		psz_free_zone(&zones[*cnt]);
+		(*cnt)--;
+	}
+	kfree(zones);
+	*pszones = NULL;
+}
+
+static void psz_free_all_zones(struct psz_context *cxt)
+{
+	if (cxt->opszs)
+		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
+}
+
+static int psz_alloc_zones(struct psz_context *cxt)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	loff_t off = 0;
+	int err;
+	size_t size;
+
+	size = info->total_size;
+	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+			info->kmsg_size, &cxt->oops_max_cnt);
+	if (IS_ERR(cxt->opszs)) {
+		err = PTR_ERR(cxt->opszs);
+		goto fail_out;
+	}
+
+	return 0;
+fail_out:
+	return err;
+}
+
+/**
+ * register_pstore_zone() - register to pstore/zone
+ *
+ * @info: back-end driver information. See &struct pstore_zone_info.
+ *
+ * Only one back-end at one time.
+ *
+ * Return: 0 on success, others on failure.
+ */
+int register_pstore_zone(struct pstore_zone_info *info)
+{
+	int err = -EINVAL;
+	struct psz_context *cxt = &psz_cxt;
+
+	if (!info->total_size) {
+		pr_warn("the total size must be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->kmsg_size) {
+		pr_warn("at least one of the records be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->name || !info->name[0])
+		return -EINVAL;
+
+	if (info->total_size < 4096) {
+		pr_err("total size must be greater than 4096 bytes\n");
+		return -EINVAL;
+	}
+
+#define check_size(name, size) {					\
+		if (info->name > 0 && info->name < (size)) {		\
+			pr_err(#name " must be over %d\n", (size));	\
+			return -EINVAL;					\
+		}							\
+		if (info->name & (size - 1)) {				\
+			pr_err(#name " must be a multiple of %d\n",	\
+					(size));			\
+			return -EINVAL;					\
+		}							\
+	}
+
+	check_size(total_size, 4096);
+	check_size(kmsg_size, SECTOR_SIZE);
+
+#undef check_size
+
+	/*
+	 * the @read and @write must be applied.
+	 * if no @read, pstore may mount failed.
+	 * if no @write, pstore do not support to remove record file.
+	 */
+	if (!info->read || !info->write) {
+		pr_err("no valid general read/write interface\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&cxt->pstore_zone_info_lock);
+	if (cxt->pstore_zone_info) {
+		pr_warn("'%s' already loaded: ignoring '%s'\n",
+				cxt->pstore_zone_info->name, info->name);
+		mutex_unlock(&cxt->pstore_zone_info_lock);
+		return -EBUSY;
+	}
+	cxt->pstore_zone_info = info;
+	mutex_unlock(&cxt->pstore_zone_info_lock);
+
+	pr_debug("register %s with properties:\n", info->name);
+	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
+	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
+
+	err = psz_alloc_zones(cxt);
+	if (err) {
+		pr_err("alloc zones failed\n");
+		goto fail_out;
+	}
+
+	if (info->kmsg_size) {
+		cxt->pstore.bufsize = cxt->opszs[0]->buffer_size -
+			sizeof(struct psz_oops_header);
+		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
+		if (!cxt->pstore.buf) {
+			err = -ENOMEM;
+			goto free_all_zones;
+		}
+	}
+	cxt->pstore.data = cxt;
+
+	pr_info("registered %s as backend for", info->name);
+	cxt->pstore.max_reason = info->max_reason;
+	if (info->kmsg_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
+		pr_cont(" kmsg(%s",
+			kmsg_dump_reason_str(cxt->pstore.max_reason));
+		if (cxt->pstore_zone_info->panic_write)
+			pr_cont(",panic_write");
+		pr_cont(")");
+	}
+	pr_cont("\n");
+
+	err = pstore_register(&cxt->pstore);
+	if (err) {
+		pr_err("registering with pstore failed\n");
+		goto free_pstore_buf;
+	}
+
+	return 0;
+
+free_pstore_buf:
+	kfree(cxt->pstore.buf);
+free_all_zones:
+	psz_free_all_zones(cxt);
+fail_out:
+	mutex_lock(&psz_cxt.pstore_zone_info_lock);
+	psz_cxt.pstore_zone_info = NULL;
+	mutex_unlock(&psz_cxt.pstore_zone_info_lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(register_pstore_zone);
+
+/**
+ * unregister_pstore_zone() - unregister to pstore/zone
+ *
+ * @info: back-end driver information. See struct pstore_zone_info.
+ */
+void unregister_pstore_zone(struct pstore_zone_info *info)
+{
+	struct psz_context *cxt = &psz_cxt;
+
+	pstore_unregister(&cxt->pstore);
+	kfree(cxt->pstore.buf);
+	cxt->pstore.bufsize = 0;
+
+	mutex_lock(&cxt->pstore_zone_info_lock);
+	cxt->pstore_zone_info = NULL;
+	mutex_unlock(&cxt->pstore_zone_info_lock);
+
+	psz_free_all_zones(cxt);
+}
+EXPORT_SYMBOL_GPL(unregister_pstore_zone);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("Storage Manager for pstore/blk");
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
new file mode 100644
index 000000000000..a6a79ff1351b
--- /dev/null
+++ b/include/linux/pstore_zone.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSTORE_ZONE_H_
+#define __PSTORE_ZONE_H_
+
+#include <linux/types.h>
+
+typedef ssize_t (*psz_read_op)(char *, size_t, loff_t);
+typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
+/**
+ * struct pstore_zone_info - pstore/zone back-end driver structure
+ *
+ * @owner:	Module which is responsible for this back-end driver.
+ * @name:	Name of the back-end driver.
+ * @total_size: The total size in bytes pstore/zone can use. It must be greater
+ *		than 4096 and be multiple of 4096.
+ * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
+ *		it must be multiple of SECTOR_SIZE(512 Bytes).
+ * @max_reason: Maximum kmsg dump reason to store.
+ * @read:	The general read operation. Both of the function parameters
+ *		@size and @offset are relative value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		means error.
+ * @write:	The same as @read.
+ * @panic_write:The write operation only used for panic case. It's optional
+ *		if you do not care panic log. The parameters and return value
+ *		are the same as @read.
+ */
+struct pstore_zone_info {
+	struct module *owner;
+	const char *name;
+
+	unsigned long total_size;
+	unsigned long kmsg_size;
+	int max_reason;
+	psz_read_op read;
+	psz_write_op write;
+	psz_write_op panic_write;
+};
+
+extern int register_pstore_zone(struct pstore_zone_info *info);
+extern void unregister_pstore_zone(struct pstore_zone_info *info);
+
+#endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 02/12] pstore/zone: Introduce common layer to manage storage zones
@ 2020-05-08  6:39   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Implement a common set of APIs needed to support pstore storage zones,
based on how ramoops is designed. This will be used by pstore/blk with
the intention of migrating pstore/ram in the future.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-2-git-send-email-liaoweixiong@allwinnertech.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           |   7 +
 fs/pstore/Makefile          |   3 +
 fs/pstore/zone.c            | 973 ++++++++++++++++++++++++++++++++++++
 include/linux/pstore_zone.h |  44 ++
 4 files changed, 1027 insertions(+)
 create mode 100644 fs/pstore/zone.c
 create mode 100644 include/linux/pstore_zone.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 8f0369aad22a..98d2457bdd9f 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -153,3 +153,10 @@ config PSTORE_RAM
 	  "ramoops.ko".
 
 	  For more information, see Documentation/admin-guide/ramoops.rst.
+
+config PSTORE_ZONE
+	tristate
+	depends on PSTORE
+	help
+	  The common layer for pstore/blk (and pstore/ram in the future)
+	  to manage storage in zones.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 967b5891f325..58a967cbe4af 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
 
 ramoops-objs += ram.o ram_core.o
 obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
+
+pstore_zone-objs += zone.o
+obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
new file mode 100644
index 000000000000..6c25c443c8e2
--- /dev/null
+++ b/fs/pstore/zone.c
@@ -0,0 +1,973 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define MODNAME "pstore-zone"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/pstore.h>
+#include <linux/mount.h>
+#include <linux/printk.h>
+#include <linux/fs.h>
+#include <linux/pstore_zone.h>
+#include <linux/kdev_t.h>
+#include <linux/device.h>
+#include <linux/namei.h>
+#include <linux/fcntl.h>
+#include <linux/uio.h>
+#include <linux/writeback.h>
+
+/**
+ * struct psz_head - header of zone to flush to storage
+ *
+ * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
+ * @datalen: length of data in @data
+ * @data: zone data.
+ */
+struct psz_buffer {
+#define PSZ_SIG (0x43474244) /* DBGC */
+	uint32_t sig;
+	atomic_t datalen;
+	uint8_t data[];
+};
+
+/**
+ * struct psz_oops_header - sub header of oops zones to flush to storage
+ *
+ * @magic: magic num for oops header
+ * @time: oops/panic trigger time
+ * @compressed: whether conpressed
+ * @counter: oops/panic counter
+ * @reason: identify oops or panic
+ * @data: pointer to log data
+ *
+ * It's a sub-header of oops zone, trailing after &psz_buffer.
+ */
+struct psz_oops_header {
+#define OOPS_HEADER_MAGIC 0x4dfc3ae5 /* Just a ramdom number */
+	uint32_t magic;
+	struct timespec64 time;
+	bool compressed;
+	uint32_t counter;
+	enum kmsg_dump_reason reason;
+	uint8_t data[];
+};
+
+/**
+ * struct pstore_zone - zone information
+ *
+ * @off: zone offset of storage
+ * @type: front-end type for this zone
+ * @name: front-end name for this zone
+ * @buffer: pointer to data buffer managed by this zone
+ * @oldbuf: pointer to old data buffer.
+ * @buffer_size: bytes in @buffer->data
+ * @should_recover: whether this zone should recover from storage
+ * @dirty: whether the data in @buffer dirty
+ *
+ * zone structure in memory.
+ */
+struct pstore_zone {
+	loff_t off;
+	const char *name;
+	enum pstore_type_id type;
+
+	struct psz_buffer *buffer;
+	struct psz_buffer *oldbuf;
+	size_t buffer_size;
+	bool should_recover;
+	atomic_t dirty;
+};
+
+/**
+ * struct psz_context - all about running state of pstore/zone
+ *
+ * @opszs: oops/panic storage zones
+ * @oops_max_cnt: max count of @opszs
+ * @oops_read_cnt: counter to read oops zone
+ * @oops_write_cnt: counter to write
+ * @oops_counter: counter to oops
+ * @panic_counter: counter to panic
+ * @recovered: whether finish recovering data from storage
+ * @on_panic: whether occur panic
+ * @pstore_zone_info_lock: lock to @pstore_zone_info
+ * @pstore_zone_info: information from back-end
+ * @pstore: structure for pstore
+ */
+struct psz_context {
+	struct pstore_zone **opszs;
+	unsigned int oops_max_cnt;
+	unsigned int oops_read_cnt;
+	unsigned int oops_write_cnt;
+	/*
+	 * the counter should be recovered when recover.
+	 * It records the oops/panic times after burning rather than booting.
+	 */
+	unsigned int oops_counter;
+	unsigned int panic_counter;
+	atomic_t recovered;
+	atomic_t on_panic;
+
+	/*
+	 * pstore_zone_info_lock just protects "pstore_zone_info" during calls to
+	 * register_pstore_zone/unregister_pstore_zone
+	 */
+	struct mutex pstore_zone_info_lock;
+	struct pstore_zone_info *pstore_zone_info;
+	struct pstore_info pstore;
+};
+static struct psz_context psz_cxt;
+
+/**
+ * enum psz_flush_mode - flush mode for psz_zone_write()
+ *
+ * @FLUSH_NONE: do not flush to storage but update data on memory
+ * @FLUSH_PART: just flush part of data including meta data to storage
+ * @FLUSH_META: just flush meta data of zone to storage
+ * @FLUSH_ALL: flush all of zone
+ */
+enum psz_flush_mode {
+	FLUSH_NONE = 0,
+	FLUSH_PART,
+	FLUSH_META,
+	FLUSH_ALL,
+};
+
+static inline int buffer_datalen(struct pstore_zone *zone)
+{
+	return atomic_read(&zone->buffer->datalen);
+}
+
+static inline bool is_on_panic(void)
+{
+	struct psz_context *cxt = &psz_cxt;
+
+	return atomic_read(&cxt->on_panic);
+}
+
+static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
+		size_t len, unsigned long off)
+{
+	if (!buf || !zone->buffer)
+		return -EINVAL;
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	len = min_t(size_t, len, zone->buffer_size - off);
+	memcpy(buf, zone->buffer->data + off, len);
+	return len;
+}
+
+static int psz_zone_write(struct pstore_zone *zone,
+		enum psz_flush_mode flush_mode, const char *buf,
+		size_t len, unsigned long off)
+{
+	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
+	ssize_t wcnt = 0;
+	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
+	size_t wlen;
+
+	if (off > zone->buffer_size)
+		return -EINVAL;
+
+	wlen = min_t(size_t, len, zone->buffer_size - off);
+	if (buf && wlen) {
+		memcpy(zone->buffer->data + off, buf, wlen);
+		atomic_set(&zone->buffer->datalen, wlen + off);
+	}
+
+	/* avoid to damage old records */
+	if (!is_on_panic() && !atomic_read(&psz_cxt.recovered))
+		goto dirty;
+
+	writeop = is_on_panic() ? info->panic_write : info->write;
+	if (!writeop)
+		goto dirty;
+
+	switch (flush_mode) {
+	case FLUSH_NONE:
+		if (unlikely(buf && wlen))
+			goto dirty;
+		return 0;
+	case FLUSH_PART:
+		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
+				zone->off + sizeof(*zone->buffer) + off);
+		if (wcnt != wlen)
+			goto dirty;
+		fallthrough;
+	case FLUSH_META:
+		wlen = sizeof(struct psz_buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto dirty;
+		break;
+	case FLUSH_ALL:
+		wlen = zone->buffer_size + sizeof(*zone->buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto dirty;
+		break;
+	}
+
+	return 0;
+dirty:
+	atomic_set(&zone->dirty, true);
+	return -EBUSY;
+}
+
+static int psz_flush_dirty_zone(struct pstore_zone *zone)
+{
+	int ret;
+
+	if (!zone)
+		return -EINVAL;
+
+	if (!atomic_read(&zone->dirty))
+		return 0;
+
+	if (!atomic_read(&psz_cxt.recovered))
+		return -EBUSY;
+
+	ret = psz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
+	if (!ret)
+		atomic_set(&zone->dirty, false);
+	return ret;
+}
+
+static int psz_flush_dirty_zones(struct pstore_zone **zones, unsigned int cnt)
+{
+	int i, ret;
+	struct pstore_zone *zone;
+
+	if (!zones)
+		return -EINVAL;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (!zone)
+			return -EINVAL;
+		ret = psz_flush_dirty_zone(zone);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
+{
+	const char *data = (const char *)old->buffer->data;
+	int ret;
+
+	ret = psz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
+	if (ret) {
+		atomic_set(&new->buffer->datalen, 0);
+		atomic_set(&new->dirty, false);
+		return ret;
+	}
+	atomic_set(&old->buffer->datalen, 0);
+	return 0;
+}
+
+static int psz_recover_oops_data(struct psz_context *cxt)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	struct pstore_zone *zone = NULL;
+	struct psz_buffer *buf;
+	unsigned long i;
+	ssize_t rcnt;
+
+	if (!info->read)
+		return -EINVAL;
+
+	for (i = 0; i < cxt->oops_max_cnt; i++) {
+		zone = cxt->opszs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+		if (atomic_read(&zone->dirty)) {
+			unsigned int wcnt = cxt->oops_write_cnt;
+			struct pstore_zone *new = cxt->opszs[wcnt];
+			int ret;
+
+			ret = psz_move_zone(zone, new);
+			if (ret) {
+				pr_err("move zone from %lu to %d failed\n",
+						i, wcnt);
+				return ret;
+			}
+			cxt->oops_write_cnt = (wcnt + 1) % cxt->oops_max_cnt;
+		}
+		if (!zone->should_recover)
+			continue;
+		buf = zone->buffer;
+		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
+				zone->off);
+		if (rcnt != zone->buffer_size + sizeof(*buf))
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+	}
+	return 0;
+}
+
+static int psz_recover_oops_meta(struct psz_context *cxt)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	struct pstore_zone *zone;
+	size_t rcnt, len;
+	struct psz_buffer *buf;
+	struct psz_oops_header *hdr;
+	struct timespec64 time = {0};
+	unsigned long i;
+	/*
+	 * Recover may on panic, we can't allocate any memory by kmalloc.
+	 * So, we use local array instead.
+	 */
+	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
+
+	if (!info->read)
+		return -EINVAL;
+
+	len = sizeof(*buf) + sizeof(*hdr);
+	buf = (struct psz_buffer *)buffer_header;
+	for (i = 0; i < cxt->oops_max_cnt; i++) {
+		zone = cxt->opszs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+
+		rcnt = info->read((char *)buf, len, zone->off);
+		if (rcnt != len) {
+			pr_err("read %s with id %lu failed\n", zone->name, i);
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+		}
+
+		if (buf->sig != zone->buffer->sig) {
+			pr_debug("no valid data in oops zone %lu\n", i);
+			continue;
+		}
+
+		if (zone->buffer_size < atomic_read(&buf->datalen)) {
+			pr_info("found overtop zone: %s: id %lu, off %lld, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		hdr = (struct psz_oops_header *)buf->data;
+		if (hdr->magic != OOPS_HEADER_MAGIC) {
+			pr_info("found invalid zone: %s: id %lu, off %lld, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		/*
+		 * we get the newest zone, and the next one must be the oldest
+		 * or unused zone, because we do write one by one like a circle.
+		 */
+		if (hdr->time.tv_sec >= time.tv_sec) {
+			time.tv_sec = hdr->time.tv_sec;
+			cxt->oops_write_cnt = (i + 1) % cxt->oops_max_cnt;
+		}
+
+		if (hdr->reason == KMSG_DUMP_OOPS)
+			cxt->oops_counter =
+				max(cxt->oops_counter, hdr->counter);
+		else
+			cxt->panic_counter =
+				max(cxt->panic_counter, hdr->counter);
+
+		if (!atomic_read(&buf->datalen)) {
+			pr_debug("found erased zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
+					zone->name, i, zone->off,
+					zone->buffer_size,
+					atomic_read(&buf->datalen));
+			continue;
+		}
+
+		if (!is_on_panic())
+			zone->should_recover = true;
+		pr_debug("found nice zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
+				zone->name, i, zone->off,
+				zone->buffer_size, atomic_read(&buf->datalen));
+	}
+
+	return 0;
+}
+
+static int psz_recover_oops(struct psz_context *cxt)
+{
+	int ret;
+
+	if (!cxt->opszs)
+		return 0;
+
+	ret = psz_recover_oops_meta(cxt);
+	if (ret)
+		goto recover_fail;
+
+	ret = psz_recover_oops_data(cxt);
+	if (ret)
+		goto recover_fail;
+
+	return 0;
+recover_fail:
+	pr_debug("recover oops failed\n");
+	return ret;
+}
+
+/**
+ * psz_recovery() - recover data from storage
+ * @cxt: the context of pstore/zone
+ *
+ * recovery means reading data back from storage after rebooting
+ *
+ * Return: 0 on success, others on failure.
+ */
+static inline int psz_recovery(struct psz_context *cxt)
+{
+	int ret = -EBUSY;
+
+	if (atomic_read(&cxt->recovered))
+		return 0;
+
+	ret = psz_recover_oops(cxt);
+	if (ret)
+		goto recover_fail;
+
+	pr_debug("recover end!\n");
+	atomic_set(&cxt->recovered, 1);
+	return 0;
+
+recover_fail:
+	pr_err("recover failed\n");
+	return ret;
+}
+
+static int psz_pstore_open(struct pstore_info *psi)
+{
+	struct psz_context *cxt = psi->data;
+
+	cxt->oops_read_cnt = 0;
+	return 0;
+}
+
+static inline bool psz_ok(struct pstore_zone *zone)
+{
+	if (zone && zone->buffer && buffer_datalen(zone))
+		return true;
+	return false;
+}
+
+static inline int psz_oops_erase(struct psz_context *cxt,
+		struct pstore_zone *zone, struct pstore_record *record)
+{
+	struct psz_buffer *buffer = zone->buffer;
+	struct psz_oops_header *hdr =
+		(struct psz_oops_header *)buffer->data;
+
+	if (unlikely(!psz_ok(zone)))
+		return 0;
+	/* this zone is already updated, no need to erase */
+	if (record->count != hdr->counter)
+		return 0;
+
+	atomic_set(&zone->buffer->datalen, 0);
+	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+}
+
+static int psz_pstore_erase(struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		if (record->id >= cxt->oops_max_cnt)
+			return -EINVAL;
+		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
+	default:
+		return -EINVAL;
+	}
+}
+
+static void psz_write_kmsg_hdr(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+	struct psz_buffer *buffer = zone->buffer;
+	struct psz_oops_header *hdr =
+		(struct psz_oops_header *)buffer->data;
+
+	hdr->magic = OOPS_HEADER_MAGIC;
+	hdr->compressed = record->compressed;
+	hdr->time.tv_sec = record->time.tv_sec;
+	hdr->time.tv_nsec = record->time.tv_nsec;
+	hdr->reason = record->reason;
+	if (hdr->reason == KMSG_DUMP_OOPS)
+		hdr->counter = ++cxt->oops_counter;
+	else
+		hdr->counter = ++cxt->panic_counter;
+}
+
+static inline int notrace psz_oops_write_record(struct psz_context *cxt,
+		struct pstore_record *record)
+{
+	size_t size, hlen;
+	struct pstore_zone *zone;
+	unsigned int zonenum;
+
+	zonenum = cxt->oops_write_cnt;
+	zone = cxt->opszs[zonenum];
+	if (unlikely(!zone))
+		return -ENOSPC;
+	cxt->oops_write_cnt = (zonenum + 1) % cxt->oops_max_cnt;
+
+	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+	psz_write_kmsg_hdr(zone, record);
+	hlen = sizeof(struct psz_oops_header);
+	size = min_t(size_t, record->size, zone->buffer_size - hlen);
+	return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+}
+
+static int notrace psz_oops_write(struct psz_context *cxt,
+		struct pstore_record *record)
+{
+	int ret;
+
+	/*
+	 * Explicitly only take the first part of any new crash.
+	 * If our buffer is larger than kmsg_bytes, this can never happen,
+	 * and if our buffer is smaller than kmsg_bytes, we don't want the
+	 * report split across multiple records.
+	 */
+	if (record->part != 1)
+		return -ENOSPC;
+
+	if (!cxt->opszs)
+		return -ENOSPC;
+
+	ret = psz_oops_write_record(cxt, record);
+	if (!ret) {
+		pr_debug("try to flush other dirty oops zones\n");
+		psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+	}
+
+	/* always return 0 as we had handled it on buffer */
+	return 0;
+}
+
+static int notrace psz_pstore_write(struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+
+	if (record->type == PSTORE_TYPE_DMESG &&
+			record->reason == KMSG_DUMP_PANIC)
+		atomic_set(&cxt->on_panic, 1);
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		return psz_oops_write(cxt, record);
+	default:
+		return -EINVAL;
+	}
+}
+
+static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
+{
+	struct pstore_zone *zone = NULL;
+
+	while (cxt->oops_read_cnt < cxt->oops_max_cnt) {
+		zone = cxt->opszs[cxt->oops_read_cnt++];
+		if (psz_ok(zone))
+			return zone;
+	}
+
+	return NULL;
+}
+
+static int psz_read_oops_hdr(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	struct psz_buffer *buffer = zone->buffer;
+	struct psz_oops_header *hdr =
+		(struct psz_oops_header *)buffer->data;
+
+	if (hdr->magic != OOPS_HEADER_MAGIC)
+		return -EINVAL;
+	record->compressed = hdr->compressed;
+	record->time.tv_sec = hdr->time.tv_sec;
+	record->time.tv_nsec = hdr->time.tv_nsec;
+	record->reason = hdr->reason;
+	record->count = hdr->counter;
+	return 0;
+}
+
+static ssize_t psz_oops_read(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	ssize_t size, hlen = 0;
+
+	size = buffer_datalen(zone);
+	/* Clear and skip this oops record if it has no valid header */
+	if (psz_read_oops_hdr(zone, record)) {
+		atomic_set(&zone->buffer->datalen, 0);
+		atomic_set(&zone->dirty, 0);
+		return -ENOMSG;
+	}
+	size -= sizeof(struct psz_oops_header);
+
+	if (!record->compressed) {
+		char *buf = kasprintf(GFP_KERNEL, "%s: Total %d times\n",
+				      kmsg_dump_reason_str(record->reason),
+				      record->count);
+		hlen = strlen(buf);
+		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
+		if (!record->buf) {
+			kfree(buf);
+			return -ENOMEM;
+		}
+	} else {
+		record->buf = kmalloc(size, GFP_KERNEL);
+		if (!record->buf)
+			return -ENOMEM;
+	}
+
+	size = psz_zone_read(zone, record->buf + hlen, size,
+			sizeof(struct psz_oops_header) < 0);
+	if (unlikely(size < 0)) {
+		kfree(record->buf);
+		return -ENOMSG;
+	}
+
+	return size + hlen;
+}
+
+static ssize_t psz_pstore_read(struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+	ssize_t (*readop)(struct pstore_zone *zone,
+			struct pstore_record *record);
+	struct pstore_zone *zone;
+	ssize_t ret;
+
+	/* before read, we must recover from storage */
+	ret = psz_recovery(cxt);
+	if (ret)
+		return ret;
+
+next_zone:
+	zone = psz_read_next_zone(cxt);
+	if (!zone)
+		return 0;
+
+	record->type = zone->type;
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		readop = psz_oops_read;
+		record->id = cxt->oops_read_cnt - 1;
+		break;
+	default:
+		goto next_zone;
+	}
+
+	ret = readop(zone, record);
+	if (ret == -ENOMSG)
+		goto next_zone;
+	return ret;
+}
+
+static struct psz_context psz_cxt = {
+	.pstore_zone_info_lock = __MUTEX_INITIALIZER(psz_cxt.pstore_zone_info_lock),
+	.recovered = ATOMIC_INIT(0),
+	.on_panic = ATOMIC_INIT(0),
+	.pstore = {
+		.owner = THIS_MODULE,
+		.name = MODNAME,
+		.open = psz_pstore_open,
+		.read = psz_pstore_read,
+		.write = psz_pstore_write,
+		.erase = psz_pstore_erase,
+	},
+};
+
+static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
+		loff_t *off, size_t size)
+{
+	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
+	struct pstore_zone *zone;
+	const char *name = pstore_type_to_name(type);
+
+	if (!size)
+		return NULL;
+
+	if (*off + size > info->total_size) {
+		pr_err("no room for %s (0x%zx@0x%llx over 0x%lx)\n",
+			name, size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	zone = kzalloc(sizeof(struct pstore_zone), GFP_KERNEL);
+	if (!zone)
+		return ERR_PTR(-ENOMEM);
+
+	zone->buffer = kmalloc(size, GFP_KERNEL);
+	if (!zone->buffer) {
+		kfree(zone);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zone->buffer, 0xFF, size);
+	zone->off = *off;
+	zone->name = name;
+	zone->type = type;
+	zone->buffer_size = size - sizeof(struct psz_buffer);
+	zone->buffer->sig = type ^ PSZ_SIG;
+	atomic_set(&zone->dirty, 0);
+	atomic_set(&zone->buffer->datalen, 0);
+
+	*off += size;
+
+	pr_debug("pszone %s: off 0x%llx, %zu header, %zu data\n", zone->name,
+			zone->off, sizeof(*zone->buffer), zone->buffer_size);
+	return zone;
+}
+
+static struct pstore_zone **psz_init_zones(enum pstore_type_id type,
+	loff_t *off, size_t total_size, ssize_t record_size,
+	unsigned int *cnt)
+{
+	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
+	struct pstore_zone **zones, *zone;
+	const char *name = pstore_type_to_name(type);
+	int c, i;
+
+	if (!total_size || !record_size)
+		return NULL;
+
+	if (*off + total_size > info->total_size) {
+		pr_err("no room for zones %s (0x%zx@0x%llx over 0x%lx)\n",
+			name, total_size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	c = total_size / record_size;
+	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
+	if (!zones) {
+		pr_err("allocate for zones %s failed\n", name);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zones, 0, c * sizeof(*zones));
+
+	for (i = 0; i < c; i++) {
+		zone = psz_init_zone(type, off, record_size);
+		if (!zone || IS_ERR(zone)) {
+			pr_err("initialize zones %s failed\n", name);
+			while (--i >= 0) {
+				kfree(zones[i]->buffer);
+				kfree(zones[i]);
+			}
+			kfree(zones);
+			return (void *)zone;
+		}
+		zones[i] = zone;
+	}
+
+	*cnt = c;
+	return zones;
+}
+
+static void psz_free_zone(struct pstore_zone **pszone)
+{
+	struct pstore_zone *zone = *pszone;
+
+	if (!zone)
+		return;
+
+	kfree(zone->buffer);
+	kfree(zone);
+	*pszone = NULL;
+}
+
+static void psz_free_zones(struct pstore_zone ***pszones, unsigned int *cnt)
+{
+	struct pstore_zone **zones = *pszones;
+
+	if (!zones)
+		return;
+
+	while (*cnt > 0) {
+		psz_free_zone(&zones[*cnt]);
+		(*cnt)--;
+	}
+	kfree(zones);
+	*pszones = NULL;
+}
+
+static void psz_free_all_zones(struct psz_context *cxt)
+{
+	if (cxt->opszs)
+		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
+}
+
+static int psz_alloc_zones(struct psz_context *cxt)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	loff_t off = 0;
+	int err;
+	size_t size;
+
+	size = info->total_size;
+	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+			info->kmsg_size, &cxt->oops_max_cnt);
+	if (IS_ERR(cxt->opszs)) {
+		err = PTR_ERR(cxt->opszs);
+		goto fail_out;
+	}
+
+	return 0;
+fail_out:
+	return err;
+}
+
+/**
+ * register_pstore_zone() - register to pstore/zone
+ *
+ * @info: back-end driver information. See &struct pstore_zone_info.
+ *
+ * Only one back-end at one time.
+ *
+ * Return: 0 on success, others on failure.
+ */
+int register_pstore_zone(struct pstore_zone_info *info)
+{
+	int err = -EINVAL;
+	struct psz_context *cxt = &psz_cxt;
+
+	if (!info->total_size) {
+		pr_warn("the total size must be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->kmsg_size) {
+		pr_warn("at least one of the records be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->name || !info->name[0])
+		return -EINVAL;
+
+	if (info->total_size < 4096) {
+		pr_err("total size must be greater than 4096 bytes\n");
+		return -EINVAL;
+	}
+
+#define check_size(name, size) {					\
+		if (info->name > 0 && info->name < (size)) {		\
+			pr_err(#name " must be over %d\n", (size));	\
+			return -EINVAL;					\
+		}							\
+		if (info->name & (size - 1)) {				\
+			pr_err(#name " must be a multiple of %d\n",	\
+					(size));			\
+			return -EINVAL;					\
+		}							\
+	}
+
+	check_size(total_size, 4096);
+	check_size(kmsg_size, SECTOR_SIZE);
+
+#undef check_size
+
+	/*
+	 * the @read and @write must be applied.
+	 * if no @read, pstore may mount failed.
+	 * if no @write, pstore do not support to remove record file.
+	 */
+	if (!info->read || !info->write) {
+		pr_err("no valid general read/write interface\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&cxt->pstore_zone_info_lock);
+	if (cxt->pstore_zone_info) {
+		pr_warn("'%s' already loaded: ignoring '%s'\n",
+				cxt->pstore_zone_info->name, info->name);
+		mutex_unlock(&cxt->pstore_zone_info_lock);
+		return -EBUSY;
+	}
+	cxt->pstore_zone_info = info;
+	mutex_unlock(&cxt->pstore_zone_info_lock);
+
+	pr_debug("register %s with properties:\n", info->name);
+	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
+	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
+
+	err = psz_alloc_zones(cxt);
+	if (err) {
+		pr_err("alloc zones failed\n");
+		goto fail_out;
+	}
+
+	if (info->kmsg_size) {
+		cxt->pstore.bufsize = cxt->opszs[0]->buffer_size -
+			sizeof(struct psz_oops_header);
+		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
+		if (!cxt->pstore.buf) {
+			err = -ENOMEM;
+			goto free_all_zones;
+		}
+	}
+	cxt->pstore.data = cxt;
+
+	pr_info("registered %s as backend for", info->name);
+	cxt->pstore.max_reason = info->max_reason;
+	if (info->kmsg_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
+		pr_cont(" kmsg(%s",
+			kmsg_dump_reason_str(cxt->pstore.max_reason));
+		if (cxt->pstore_zone_info->panic_write)
+			pr_cont(",panic_write");
+		pr_cont(")");
+	}
+	pr_cont("\n");
+
+	err = pstore_register(&cxt->pstore);
+	if (err) {
+		pr_err("registering with pstore failed\n");
+		goto free_pstore_buf;
+	}
+
+	return 0;
+
+free_pstore_buf:
+	kfree(cxt->pstore.buf);
+free_all_zones:
+	psz_free_all_zones(cxt);
+fail_out:
+	mutex_lock(&psz_cxt.pstore_zone_info_lock);
+	psz_cxt.pstore_zone_info = NULL;
+	mutex_unlock(&psz_cxt.pstore_zone_info_lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(register_pstore_zone);
+
+/**
+ * unregister_pstore_zone() - unregister to pstore/zone
+ *
+ * @info: back-end driver information. See struct pstore_zone_info.
+ */
+void unregister_pstore_zone(struct pstore_zone_info *info)
+{
+	struct psz_context *cxt = &psz_cxt;
+
+	pstore_unregister(&cxt->pstore);
+	kfree(cxt->pstore.buf);
+	cxt->pstore.bufsize = 0;
+
+	mutex_lock(&cxt->pstore_zone_info_lock);
+	cxt->pstore_zone_info = NULL;
+	mutex_unlock(&cxt->pstore_zone_info_lock);
+
+	psz_free_all_zones(cxt);
+}
+EXPORT_SYMBOL_GPL(unregister_pstore_zone);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("Storage Manager for pstore/blk");
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
new file mode 100644
index 000000000000..a6a79ff1351b
--- /dev/null
+++ b/include/linux/pstore_zone.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSTORE_ZONE_H_
+#define __PSTORE_ZONE_H_
+
+#include <linux/types.h>
+
+typedef ssize_t (*psz_read_op)(char *, size_t, loff_t);
+typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
+/**
+ * struct pstore_zone_info - pstore/zone back-end driver structure
+ *
+ * @owner:	Module which is responsible for this back-end driver.
+ * @name:	Name of the back-end driver.
+ * @total_size: The total size in bytes pstore/zone can use. It must be greater
+ *		than 4096 and be multiple of 4096.
+ * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
+ *		it must be multiple of SECTOR_SIZE(512 Bytes).
+ * @max_reason: Maximum kmsg dump reason to store.
+ * @read:	The general read operation. Both of the function parameters
+ *		@size and @offset are relative value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		means error.
+ * @write:	The same as @read.
+ * @panic_write:The write operation only used for panic case. It's optional
+ *		if you do not care panic log. The parameters and return value
+ *		are the same as @read.
+ */
+struct pstore_zone_info {
+	struct module *owner;
+	const char *name;
+
+	unsigned long total_size;
+	unsigned long kmsg_size;
+	int max_reason;
+	psz_read_op read;
+	psz_write_op write;
+	psz_write_op panic_write;
+};
+
+extern int register_pstore_zone(struct pstore_zone_info *info);
+extern void unregister_pstore_zone(struct pstore_zone_info *info);
+
+#endif
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 03/12] pstore/blk: Introduce backend for block devices
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:39   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

pstore/blk is similar to pstore/ram, but uses a block device as the
storage rather than persistent ram.

The pstore/blk backend solves two common use-cases that used to preclude
using pstore/ram:
- not all devices have a battery that could be used to persist
  regular RAM across power failures.
- most embedded intelligent equipment have no persistent ram, which
  increases costs, instead preferring cheaper solutions, like block
  devices.

pstore/blk provides separate configurations for the end user and for the
block drivers. User configuration determines how pstore/blk operates, such
as record sizes, max kmsg dump reasons, etc. These can be set by Kconfig
and/or module parameters, but module parameter have priority over Kconfig.
Driver configuration covers all the details about the target block device,
such as total size of the device and how to perform read/write operations.
These are provided by block drivers, calling pstore_register_blkdev(),
including an optional panic_write callback used to bypass regular IO
APIs in an effort to avoid potentially destabilized kernel code during
a panic.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-3-git-send-email-liaoweixiong@allwinnertech.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig          |  64 ++++++
 fs/pstore/Makefile         |   3 +
 fs/pstore/blk.c            | 426 +++++++++++++++++++++++++++++++++++++
 include/linux/pstore_blk.h |  27 +++
 4 files changed, 520 insertions(+)
 create mode 100644 fs/pstore/blk.c
 create mode 100644 include/linux/pstore_blk.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 98d2457bdd9f..92ba73bd0b62 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -160,3 +160,67 @@ config PSTORE_ZONE
 	help
 	  The common layer for pstore/blk (and pstore/ram in the future)
 	  to manage storage in zones.
+
+config PSTORE_BLK
+	tristate "Log panic/oops to a block device"
+	depends on PSTORE
+	depends on BLOCK
+	select PSTORE_ZONE
+	default n
+	help
+	  This enables panic and oops message to be logged to a block dev
+	  where it can be read back at some later point.
+
+	  If unsure, say N.
+
+config PSTORE_BLK_BLKDEV
+	string "block device identifier"
+	depends on PSTORE_BLK
+	default ""
+	help
+	  Which block device should be used for pstore/blk.
+
+	  It accept the following variants:
+	  1) <hex_major><hex_minor> device number in hexadecimal represents
+	     itself no leading 0x, for example b302.
+	  2) /dev/<disk_name> represents the device number of disk
+	  3) /dev/<disk_name><decimal> represents the device number
+	     of partition - device number of disk plus the partition number
+	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
+	     used when disk name of partitioned disk ends with a digit.
+	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+	     unique id of a partition if the partition table provides it.
+	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+	     filled hex representation of the 32-bit "NT disk signature", and PP
+	     is a zero-filled hex representation of the 1-based partition number.
+	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
+	     to a partition with a known unique id.
+	  7) <major>:<minor> major and minor number of the device separated by
+	     a colon.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_KMSG_SIZE
+	int "Size in Kbytes of kmsg dump log to store"
+	depends on PSTORE_BLK
+	default 64
+	help
+	  This just sets size of kmsg dump (oops, panic, etc) log for
+	  pstore/blk. The size is in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_MAX_REASON
+	int "Maximum kmsg dump reason to store"
+	depends on PSTORE_BLK
+	default 2
+	help
+	  The maximum reason for kmsg dumps to store. The default is
+	  2 (KMSG_DUMP_OOPS), see include/linux/kmsg_dump.h's
+	  enum kmsg_dump_reason for more details.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 58a967cbe4af..c270467aeece 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -15,3 +15,6 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
 
 pstore_zone-objs += zone.o
 obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
+
+pstore_blk-objs += blk.o
+obj-$(CONFIG_PSTORE_BLK)	+= pstore_blk.o
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
new file mode 100644
index 000000000000..286aa82aa483
--- /dev/null
+++ b/fs/pstore/blk.c
@@ -0,0 +1,426 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define MODNAME "pstore-blk"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include "../../block/blk.h"
+#include <linux/blkdev.h>
+#include <linux/string.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/platform_device.h>
+#include <linux/pstore_blk.h>
+#include <linux/mount.h>
+#include <linux/uio.h>
+
+static long kmsg_size = CONFIG_PSTORE_BLK_KMSG_SIZE;
+module_param(kmsg_size, long, 0400);
+MODULE_PARM_DESC(kmsg_size, "kmsg dump record size in kbytes");
+
+static int max_reason = CONFIG_PSTORE_BLK_MAX_REASON;
+module_param(max_reason, int, 0400);
+MODULE_PARM_DESC(max_reason,
+		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
+
+/*
+ * blkdev - The block device to use.
+ *
+ * Most of the time, it is a partition of block device.
+ *
+ * blkdev accepts the following variants:
+ * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
+ *    no leading 0x, for example b302.
+ * 2) /dev/<disk_name> represents the device number of disk
+ * 3) /dev/<disk_name><decimal> represents the device number
+ *    of partition - device number of disk plus the partition number
+ * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
+ *    used when disk name of partitioned disk ends on a digit.
+ * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ *    unique id of a partition if the partition table provides it.
+ *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ *    filled hex representation of the 32-bit "NT disk signature", and PP
+ *    is a zero-filled hex representation of the 1-based partition number.
+ * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
+ *    a partition with a known unique id.
+ * 7) <major>:<minor> major and minor number of the device separated by
+ *    a colon.
+ */
+static char blkdev[80] = CONFIG_PSTORE_BLK_BLKDEV;
+module_param_string(blkdev, blkdev, 80, 0400);
+MODULE_PARM_DESC(blkdev, "the block device for general read/write");
+
+static DEFINE_MUTEX(psz_lock);
+static struct block_device *psblk_bdev;
+static struct pstore_zone_info *pstore_zone_info;
+static psblk_panic_write_op blkdev_panic_write;
+static struct bdev_info {
+	dev_t devt;
+	sector_t nr_sects;
+	sector_t start_sect;
+} g_bdev_info;
+
+/**
+ * struct psblk_device - back-end pstore/blk driver structure.
+ *
+ * @total_size: The total size in bytes pstore/blk can use. It must be greater
+ *		than 4096 and be multiple of 4096.
+ * @read:	The general read operation. Both of the function parameters
+ *		@size and @offset are relative value to bock device (not the
+ *		whole disk).
+ *		On success, the number of bytes should be returned, others
+ *		means error.
+ * @write:	The same as @read.
+ * @panic_write:The write operation only used for panic case. It's optional
+ *		if you do not care panic log. The parameters and return value
+ *		are the same as @read.
+ */
+struct psblk_device {
+	unsigned long total_size;
+	psz_read_op read;
+	psz_write_op write;
+	psz_write_op panic_write;
+};
+
+static int psblk_register_do(struct psblk_device *dev)
+{
+	int ret;
+
+	if (!dev || !dev->total_size || !dev->read || !dev->write)
+		return -EINVAL;
+
+	mutex_lock(&psz_lock);
+
+	/* someone already registered before */
+	if (pstore_zone_info) {
+		mutex_unlock(&psz_lock);
+		return -EBUSY;
+	}
+	pstore_zone_info = kzalloc(sizeof(struct pstore_zone_info), GFP_KERNEL);
+	if (!pstore_zone_info) {
+		mutex_unlock(&psz_lock);
+		return -ENOMEM;
+	}
+
+#define verify_size(name, alignsize) {					\
+		long _##name_ = (name);					\
+		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+		if (_##name_ & ((alignsize) - 1)) {			\
+			pr_info(#name " must align to %d\n",		\
+					(alignsize));			\
+			_##name_ = ALIGN(name, (alignsize));		\
+		}							\
+		name = _##name_ / 1024;					\
+		pstore_zone_info->name = _##name_;				\
+	}
+
+	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
+#undef verify_size
+
+	pstore_zone_info->total_size = dev->total_size;
+	pstore_zone_info->max_reason = max_reason;
+	pstore_zone_info->read = dev->read;
+	pstore_zone_info->write = dev->write;
+	pstore_zone_info->panic_write = dev->panic_write;
+	pstore_zone_info->name = MODNAME;
+	pstore_zone_info->owner = THIS_MODULE;
+
+	ret = register_pstore_zone(pstore_zone_info);
+	if (ret) {
+		kfree(pstore_zone_info);
+		pstore_zone_info = NULL;
+	}
+	mutex_unlock(&psz_lock);
+	return ret;
+}
+
+static void psblk_unregister_do(struct psblk_device *dev)
+{
+	mutex_lock(&psz_lock);
+	if (pstore_zone_info && pstore_zone_info->read == dev->read) {
+		unregister_pstore_zone(pstore_zone_info);
+		kfree(pstore_zone_info);
+		pstore_zone_info = NULL;
+	}
+	mutex_unlock(&psz_lock);
+}
+
+/**
+ * psblk_get_bdev() - open block device
+ * @holder: exclusive holder identifier
+ *
+ * Return: pointer to block device on success and others on error.
+ *
+ * On success, the returned block_device has reference count of one.
+ */
+static struct block_device *psblk_get_bdev(void *holder)
+{
+	struct block_device *bdev = ERR_PTR(-ENODEV);
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!blkdev[0])
+		return ERR_PTR(-ENODEV);
+
+	mutex_lock(&psz_lock);
+	if (pstore_zone_info)
+		goto out;
+	if (holder)
+		mode |= FMODE_EXCL;
+	bdev = blkdev_get_by_path(blkdev, mode, holder);
+	if (IS_ERR(bdev)) {
+		dev_t devt;
+
+		devt = name_to_dev_t(blkdev);
+		if (devt == 0) {
+			bdev = ERR_PTR(-ENODEV);
+			goto out;
+		}
+		bdev = blkdev_get_by_dev(devt, mode, holder);
+	}
+out:
+	mutex_unlock(&psz_lock);
+	return bdev;
+}
+
+static void psblk_put_bdev(struct block_device *bdev, void *holder)
+{
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!bdev)
+		return;
+
+	mutex_lock(&psz_lock);
+	if (holder)
+		mode |= FMODE_EXCL;
+	blkdev_put(bdev, mode);
+	mutex_unlock(&psz_lock);
+}
+
+static ssize_t psblk_generic_blk_read(char *buf, size_t bytes, loff_t pos)
+{
+	struct block_device *bdev = psblk_bdev;
+	struct file file;
+	struct kiocb kiocb;
+	struct iov_iter iter;
+	struct kvec iov = {.iov_base = buf, .iov_len = bytes};
+
+	if (!bdev)
+		return -ENODEV;
+
+	memset(&file, 0, sizeof(struct file));
+	file.f_mapping = bdev->bd_inode->i_mapping;
+	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	file.f_inode = bdev->bd_inode;
+	file_ra_state_init(&file.f_ra, file.f_mapping);
+
+	init_sync_kiocb(&kiocb, &file);
+	kiocb.ki_pos = pos;
+	iov_iter_kvec(&iter, READ, &iov, 1, bytes);
+
+	return generic_file_read_iter(&kiocb, &iter);
+}
+
+static ssize_t psblk_generic_blk_write(const char *buf, size_t bytes,
+		loff_t pos)
+{
+	struct block_device *bdev = psblk_bdev;
+	struct iov_iter iter;
+	struct kiocb kiocb;
+	struct file file;
+	ssize_t ret;
+	struct kvec iov = {.iov_base = (void *)buf, .iov_len = bytes};
+
+	if (!bdev)
+		return -ENODEV;
+
+	/* Console/Ftrace backend may handle buffer until flush dirty zones */
+	if (in_interrupt() || irqs_disabled())
+		return -EBUSY;
+
+	memset(&file, 0, sizeof(struct file));
+	file.f_mapping = bdev->bd_inode->i_mapping;
+	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	file.f_inode = bdev->bd_inode;
+
+	init_sync_kiocb(&kiocb, &file);
+	kiocb.ki_pos = pos;
+	iov_iter_kvec(&iter, WRITE, &iov, 1, bytes);
+
+	inode_lock(bdev->bd_inode);
+	ret = generic_write_checks(&kiocb, &iter);
+	if (ret > 0)
+		ret = generic_perform_write(&file, &iter, pos);
+	inode_unlock(bdev->bd_inode);
+
+	if (likely(ret > 0)) {
+		const struct file_operations f_op = {.fsync = blkdev_fsync};
+
+		file.f_op = &f_op;
+		kiocb.ki_pos += ret;
+		ret = generic_write_sync(&kiocb, ret);
+	}
+	return ret;
+}
+
+static inline unsigned long psblk_bdev_size(struct block_device *bdev)
+{
+	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
+}
+
+static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
+		loff_t off)
+{
+	int ret;
+
+	if (!blkdev_panic_write)
+		return -EOPNOTSUPP;
+
+	/* size and off must align to SECTOR_SIZE for block device */
+	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
+			size >> SECTOR_SHIFT);
+	return ret ? -EIO : size;
+}
+
+static struct bdev_info *psblk_get_bdev_info(void)
+{
+	struct bdev_info *info = &g_bdev_info;
+	struct block_device *bdev;
+
+	if (info->devt)
+		return info;
+
+	bdev = psblk_get_bdev(NULL);
+	if (IS_ERR(bdev))
+		return ERR_CAST(bdev);
+
+	info->devt = bdev->bd_dev;
+	info->nr_sects = part_nr_sects_read(bdev->bd_part);
+	info->start_sect = get_start_sect(bdev);
+
+	if (!psblk_bdev_size(bdev)) {
+		pr_err("not enough space to '%s'\n", blkdev);
+		info = ERR_PTR(-ENOSPC);
+	}
+
+	psblk_put_bdev(bdev, NULL);
+	return info;
+}
+
+/**
+ * psblk_register_blkdev() - register block device to pstore/blk
+ *
+ * @major: the major device number of registering device
+ * @panic_write: the interface for panic case.
+ *
+ * Only the matching major to @blkdev can register.
+ *
+ * If block device do not support panic write, @panic_write can be NULL.
+ *
+ * Return:
+ * * 0		- OK
+ * * Others	- something error.
+ */
+int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
+{
+	struct block_device *bdev;
+	struct psblk_device dev = {0};
+	struct bdev_info *binfo;
+	int ret = -ENODEV;
+	void *holder = blkdev;
+
+	binfo = psblk_get_bdev_info();
+	if (IS_ERR(binfo))
+		return PTR_ERR(binfo);
+
+	/* only allow driver matching the @blkdev */
+	if (!binfo->devt || MAJOR(binfo->devt) != major) {
+		pr_debug("invalid major %u (expect %u)\n",
+				major, MAJOR(binfo->devt));
+		return -ENODEV;
+	}
+
+	/* hold bdev exclusively */
+	bdev = psblk_get_bdev(holder);
+	if (IS_ERR(bdev)) {
+		pr_err("failed to open '%s'!\n", blkdev);
+		return PTR_ERR(bdev);
+	}
+
+	/* psblk_bdev must be assigned before register to pstore/blk */
+	psblk_bdev = bdev;
+	blkdev_panic_write = panic_write;
+
+	dev.total_size = psblk_bdev_size(bdev);
+	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
+	dev.read = psblk_generic_blk_read;
+	dev.write = psblk_generic_blk_write;
+
+	ret = psblk_register_do(&dev);
+	if (ret)
+		goto err_put_bdev;
+
+	pr_info("using '%s'\n", blkdev);
+	return 0;
+
+err_put_bdev:
+	psblk_bdev = NULL;
+	blkdev_panic_write = NULL;
+	psblk_put_bdev(bdev, holder);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psblk_register_blkdev);
+
+/**
+ * psblk_unregister_blkdev() - unregister block device from pstore/blk
+ *
+ * @major: the major device number of device
+ */
+void psblk_unregister_blkdev(unsigned int major)
+{
+	struct psblk_device dev = {.read = psblk_generic_blk_read};
+	void *holder = blkdev;
+
+	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
+		psblk_unregister_do(&dev);
+		psblk_put_bdev(psblk_bdev, holder);
+		blkdev_panic_write = NULL;
+		psblk_bdev = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(psblk_unregister_blkdev);
+
+/**
+ * psblk_blkdev_info() - get information of @blkdev
+ *
+ * @devt: the block device num of @blkdev
+ * @nr_sects: the sector count of @blkdev
+ * @start_sect: the start sector of @blkdev
+ *
+ * Block driver needs the follow information for @panic_write.
+ *
+ * Return: 0 on success, others on failure.
+ */
+int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
+{
+	struct bdev_info *binfo;
+
+	binfo = psblk_get_bdev_info();
+	if (IS_ERR(binfo))
+		return PTR_ERR(binfo);
+
+	if (devt)
+		*devt = binfo->devt;
+	if (nr_sects)
+		*nr_sects = binfo->nr_sects;
+	if (start_sect)
+		*start_sect = binfo->start_sect;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(psblk_blkdev_info);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("pstore backend for block devices");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
new file mode 100644
index 000000000000..5ff465e3953e
--- /dev/null
+++ b/include/linux/pstore_blk.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSTORE_BLK_H_
+#define __PSTORE_BLK_H_
+
+#include <linux/types.h>
+#include <linux/pstore_zone.h>
+
+/**
+ * typedef psblk_panic_write_op - panic write operation to block device
+ *
+ * @buf: the data to write
+ * @start_sect: start sector to block device
+ * @sects: sectors count on buf
+ *
+ * Return: On success, zero should be returned. Others mean error.
+ *
+ * Panic write to block device must be aligned to SECTOR_SIZE.
+ */
+typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
+		sector_t sects);
+
+int  psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write);
+void psblk_unregister_blkdev(unsigned int major);
+int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
+
+#endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 03/12] pstore/blk: Introduce backend for block devices
@ 2020-05-08  6:39   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

pstore/blk is similar to pstore/ram, but uses a block device as the
storage rather than persistent ram.

The pstore/blk backend solves two common use-cases that used to preclude
using pstore/ram:
- not all devices have a battery that could be used to persist
  regular RAM across power failures.
- most embedded intelligent equipment have no persistent ram, which
  increases costs, instead preferring cheaper solutions, like block
  devices.

pstore/blk provides separate configurations for the end user and for the
block drivers. User configuration determines how pstore/blk operates, such
as record sizes, max kmsg dump reasons, etc. These can be set by Kconfig
and/or module parameters, but module parameter have priority over Kconfig.
Driver configuration covers all the details about the target block device,
such as total size of the device and how to perform read/write operations.
These are provided by block drivers, calling pstore_register_blkdev(),
including an optional panic_write callback used to bypass regular IO
APIs in an effort to avoid potentially destabilized kernel code during
a panic.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-3-git-send-email-liaoweixiong@allwinnertech.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig          |  64 ++++++
 fs/pstore/Makefile         |   3 +
 fs/pstore/blk.c            | 426 +++++++++++++++++++++++++++++++++++++
 include/linux/pstore_blk.h |  27 +++
 4 files changed, 520 insertions(+)
 create mode 100644 fs/pstore/blk.c
 create mode 100644 include/linux/pstore_blk.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 98d2457bdd9f..92ba73bd0b62 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -160,3 +160,67 @@ config PSTORE_ZONE
 	help
 	  The common layer for pstore/blk (and pstore/ram in the future)
 	  to manage storage in zones.
+
+config PSTORE_BLK
+	tristate "Log panic/oops to a block device"
+	depends on PSTORE
+	depends on BLOCK
+	select PSTORE_ZONE
+	default n
+	help
+	  This enables panic and oops message to be logged to a block dev
+	  where it can be read back at some later point.
+
+	  If unsure, say N.
+
+config PSTORE_BLK_BLKDEV
+	string "block device identifier"
+	depends on PSTORE_BLK
+	default ""
+	help
+	  Which block device should be used for pstore/blk.
+
+	  It accept the following variants:
+	  1) <hex_major><hex_minor> device number in hexadecimal represents
+	     itself no leading 0x, for example b302.
+	  2) /dev/<disk_name> represents the device number of disk
+	  3) /dev/<disk_name><decimal> represents the device number
+	     of partition - device number of disk plus the partition number
+	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
+	     used when disk name of partitioned disk ends with a digit.
+	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+	     unique id of a partition if the partition table provides it.
+	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+	     filled hex representation of the 32-bit "NT disk signature", and PP
+	     is a zero-filled hex representation of the 1-based partition number.
+	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
+	     to a partition with a known unique id.
+	  7) <major>:<minor> major and minor number of the device separated by
+	     a colon.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_KMSG_SIZE
+	int "Size in Kbytes of kmsg dump log to store"
+	depends on PSTORE_BLK
+	default 64
+	help
+	  This just sets size of kmsg dump (oops, panic, etc) log for
+	  pstore/blk. The size is in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_MAX_REASON
+	int "Maximum kmsg dump reason to store"
+	depends on PSTORE_BLK
+	default 2
+	help
+	  The maximum reason for kmsg dumps to store. The default is
+	  2 (KMSG_DUMP_OOPS), see include/linux/kmsg_dump.h's
+	  enum kmsg_dump_reason for more details.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 58a967cbe4af..c270467aeece 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -15,3 +15,6 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
 
 pstore_zone-objs += zone.o
 obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
+
+pstore_blk-objs += blk.o
+obj-$(CONFIG_PSTORE_BLK)	+= pstore_blk.o
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
new file mode 100644
index 000000000000..286aa82aa483
--- /dev/null
+++ b/fs/pstore/blk.c
@@ -0,0 +1,426 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define MODNAME "pstore-blk"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include "../../block/blk.h"
+#include <linux/blkdev.h>
+#include <linux/string.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/platform_device.h>
+#include <linux/pstore_blk.h>
+#include <linux/mount.h>
+#include <linux/uio.h>
+
+static long kmsg_size = CONFIG_PSTORE_BLK_KMSG_SIZE;
+module_param(kmsg_size, long, 0400);
+MODULE_PARM_DESC(kmsg_size, "kmsg dump record size in kbytes");
+
+static int max_reason = CONFIG_PSTORE_BLK_MAX_REASON;
+module_param(max_reason, int, 0400);
+MODULE_PARM_DESC(max_reason,
+		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
+
+/*
+ * blkdev - The block device to use.
+ *
+ * Most of the time, it is a partition of block device.
+ *
+ * blkdev accepts the following variants:
+ * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
+ *    no leading 0x, for example b302.
+ * 2) /dev/<disk_name> represents the device number of disk
+ * 3) /dev/<disk_name><decimal> represents the device number
+ *    of partition - device number of disk plus the partition number
+ * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
+ *    used when disk name of partitioned disk ends on a digit.
+ * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ *    unique id of a partition if the partition table provides it.
+ *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ *    filled hex representation of the 32-bit "NT disk signature", and PP
+ *    is a zero-filled hex representation of the 1-based partition number.
+ * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
+ *    a partition with a known unique id.
+ * 7) <major>:<minor> major and minor number of the device separated by
+ *    a colon.
+ */
+static char blkdev[80] = CONFIG_PSTORE_BLK_BLKDEV;
+module_param_string(blkdev, blkdev, 80, 0400);
+MODULE_PARM_DESC(blkdev, "the block device for general read/write");
+
+static DEFINE_MUTEX(psz_lock);
+static struct block_device *psblk_bdev;
+static struct pstore_zone_info *pstore_zone_info;
+static psblk_panic_write_op blkdev_panic_write;
+static struct bdev_info {
+	dev_t devt;
+	sector_t nr_sects;
+	sector_t start_sect;
+} g_bdev_info;
+
+/**
+ * struct psblk_device - back-end pstore/blk driver structure.
+ *
+ * @total_size: The total size in bytes pstore/blk can use. It must be greater
+ *		than 4096 and be multiple of 4096.
+ * @read:	The general read operation. Both of the function parameters
+ *		@size and @offset are relative value to bock device (not the
+ *		whole disk).
+ *		On success, the number of bytes should be returned, others
+ *		means error.
+ * @write:	The same as @read.
+ * @panic_write:The write operation only used for panic case. It's optional
+ *		if you do not care panic log. The parameters and return value
+ *		are the same as @read.
+ */
+struct psblk_device {
+	unsigned long total_size;
+	psz_read_op read;
+	psz_write_op write;
+	psz_write_op panic_write;
+};
+
+static int psblk_register_do(struct psblk_device *dev)
+{
+	int ret;
+
+	if (!dev || !dev->total_size || !dev->read || !dev->write)
+		return -EINVAL;
+
+	mutex_lock(&psz_lock);
+
+	/* someone already registered before */
+	if (pstore_zone_info) {
+		mutex_unlock(&psz_lock);
+		return -EBUSY;
+	}
+	pstore_zone_info = kzalloc(sizeof(struct pstore_zone_info), GFP_KERNEL);
+	if (!pstore_zone_info) {
+		mutex_unlock(&psz_lock);
+		return -ENOMEM;
+	}
+
+#define verify_size(name, alignsize) {					\
+		long _##name_ = (name);					\
+		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+		if (_##name_ & ((alignsize) - 1)) {			\
+			pr_info(#name " must align to %d\n",		\
+					(alignsize));			\
+			_##name_ = ALIGN(name, (alignsize));		\
+		}							\
+		name = _##name_ / 1024;					\
+		pstore_zone_info->name = _##name_;				\
+	}
+
+	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
+#undef verify_size
+
+	pstore_zone_info->total_size = dev->total_size;
+	pstore_zone_info->max_reason = max_reason;
+	pstore_zone_info->read = dev->read;
+	pstore_zone_info->write = dev->write;
+	pstore_zone_info->panic_write = dev->panic_write;
+	pstore_zone_info->name = MODNAME;
+	pstore_zone_info->owner = THIS_MODULE;
+
+	ret = register_pstore_zone(pstore_zone_info);
+	if (ret) {
+		kfree(pstore_zone_info);
+		pstore_zone_info = NULL;
+	}
+	mutex_unlock(&psz_lock);
+	return ret;
+}
+
+static void psblk_unregister_do(struct psblk_device *dev)
+{
+	mutex_lock(&psz_lock);
+	if (pstore_zone_info && pstore_zone_info->read == dev->read) {
+		unregister_pstore_zone(pstore_zone_info);
+		kfree(pstore_zone_info);
+		pstore_zone_info = NULL;
+	}
+	mutex_unlock(&psz_lock);
+}
+
+/**
+ * psblk_get_bdev() - open block device
+ * @holder: exclusive holder identifier
+ *
+ * Return: pointer to block device on success and others on error.
+ *
+ * On success, the returned block_device has reference count of one.
+ */
+static struct block_device *psblk_get_bdev(void *holder)
+{
+	struct block_device *bdev = ERR_PTR(-ENODEV);
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!blkdev[0])
+		return ERR_PTR(-ENODEV);
+
+	mutex_lock(&psz_lock);
+	if (pstore_zone_info)
+		goto out;
+	if (holder)
+		mode |= FMODE_EXCL;
+	bdev = blkdev_get_by_path(blkdev, mode, holder);
+	if (IS_ERR(bdev)) {
+		dev_t devt;
+
+		devt = name_to_dev_t(blkdev);
+		if (devt == 0) {
+			bdev = ERR_PTR(-ENODEV);
+			goto out;
+		}
+		bdev = blkdev_get_by_dev(devt, mode, holder);
+	}
+out:
+	mutex_unlock(&psz_lock);
+	return bdev;
+}
+
+static void psblk_put_bdev(struct block_device *bdev, void *holder)
+{
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!bdev)
+		return;
+
+	mutex_lock(&psz_lock);
+	if (holder)
+		mode |= FMODE_EXCL;
+	blkdev_put(bdev, mode);
+	mutex_unlock(&psz_lock);
+}
+
+static ssize_t psblk_generic_blk_read(char *buf, size_t bytes, loff_t pos)
+{
+	struct block_device *bdev = psblk_bdev;
+	struct file file;
+	struct kiocb kiocb;
+	struct iov_iter iter;
+	struct kvec iov = {.iov_base = buf, .iov_len = bytes};
+
+	if (!bdev)
+		return -ENODEV;
+
+	memset(&file, 0, sizeof(struct file));
+	file.f_mapping = bdev->bd_inode->i_mapping;
+	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	file.f_inode = bdev->bd_inode;
+	file_ra_state_init(&file.f_ra, file.f_mapping);
+
+	init_sync_kiocb(&kiocb, &file);
+	kiocb.ki_pos = pos;
+	iov_iter_kvec(&iter, READ, &iov, 1, bytes);
+
+	return generic_file_read_iter(&kiocb, &iter);
+}
+
+static ssize_t psblk_generic_blk_write(const char *buf, size_t bytes,
+		loff_t pos)
+{
+	struct block_device *bdev = psblk_bdev;
+	struct iov_iter iter;
+	struct kiocb kiocb;
+	struct file file;
+	ssize_t ret;
+	struct kvec iov = {.iov_base = (void *)buf, .iov_len = bytes};
+
+	if (!bdev)
+		return -ENODEV;
+
+	/* Console/Ftrace backend may handle buffer until flush dirty zones */
+	if (in_interrupt() || irqs_disabled())
+		return -EBUSY;
+
+	memset(&file, 0, sizeof(struct file));
+	file.f_mapping = bdev->bd_inode->i_mapping;
+	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	file.f_inode = bdev->bd_inode;
+
+	init_sync_kiocb(&kiocb, &file);
+	kiocb.ki_pos = pos;
+	iov_iter_kvec(&iter, WRITE, &iov, 1, bytes);
+
+	inode_lock(bdev->bd_inode);
+	ret = generic_write_checks(&kiocb, &iter);
+	if (ret > 0)
+		ret = generic_perform_write(&file, &iter, pos);
+	inode_unlock(bdev->bd_inode);
+
+	if (likely(ret > 0)) {
+		const struct file_operations f_op = {.fsync = blkdev_fsync};
+
+		file.f_op = &f_op;
+		kiocb.ki_pos += ret;
+		ret = generic_write_sync(&kiocb, ret);
+	}
+	return ret;
+}
+
+static inline unsigned long psblk_bdev_size(struct block_device *bdev)
+{
+	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
+}
+
+static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
+		loff_t off)
+{
+	int ret;
+
+	if (!blkdev_panic_write)
+		return -EOPNOTSUPP;
+
+	/* size and off must align to SECTOR_SIZE for block device */
+	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
+			size >> SECTOR_SHIFT);
+	return ret ? -EIO : size;
+}
+
+static struct bdev_info *psblk_get_bdev_info(void)
+{
+	struct bdev_info *info = &g_bdev_info;
+	struct block_device *bdev;
+
+	if (info->devt)
+		return info;
+
+	bdev = psblk_get_bdev(NULL);
+	if (IS_ERR(bdev))
+		return ERR_CAST(bdev);
+
+	info->devt = bdev->bd_dev;
+	info->nr_sects = part_nr_sects_read(bdev->bd_part);
+	info->start_sect = get_start_sect(bdev);
+
+	if (!psblk_bdev_size(bdev)) {
+		pr_err("not enough space to '%s'\n", blkdev);
+		info = ERR_PTR(-ENOSPC);
+	}
+
+	psblk_put_bdev(bdev, NULL);
+	return info;
+}
+
+/**
+ * psblk_register_blkdev() - register block device to pstore/blk
+ *
+ * @major: the major device number of registering device
+ * @panic_write: the interface for panic case.
+ *
+ * Only the matching major to @blkdev can register.
+ *
+ * If block device do not support panic write, @panic_write can be NULL.
+ *
+ * Return:
+ * * 0		- OK
+ * * Others	- something error.
+ */
+int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
+{
+	struct block_device *bdev;
+	struct psblk_device dev = {0};
+	struct bdev_info *binfo;
+	int ret = -ENODEV;
+	void *holder = blkdev;
+
+	binfo = psblk_get_bdev_info();
+	if (IS_ERR(binfo))
+		return PTR_ERR(binfo);
+
+	/* only allow driver matching the @blkdev */
+	if (!binfo->devt || MAJOR(binfo->devt) != major) {
+		pr_debug("invalid major %u (expect %u)\n",
+				major, MAJOR(binfo->devt));
+		return -ENODEV;
+	}
+
+	/* hold bdev exclusively */
+	bdev = psblk_get_bdev(holder);
+	if (IS_ERR(bdev)) {
+		pr_err("failed to open '%s'!\n", blkdev);
+		return PTR_ERR(bdev);
+	}
+
+	/* psblk_bdev must be assigned before register to pstore/blk */
+	psblk_bdev = bdev;
+	blkdev_panic_write = panic_write;
+
+	dev.total_size = psblk_bdev_size(bdev);
+	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
+	dev.read = psblk_generic_blk_read;
+	dev.write = psblk_generic_blk_write;
+
+	ret = psblk_register_do(&dev);
+	if (ret)
+		goto err_put_bdev;
+
+	pr_info("using '%s'\n", blkdev);
+	return 0;
+
+err_put_bdev:
+	psblk_bdev = NULL;
+	blkdev_panic_write = NULL;
+	psblk_put_bdev(bdev, holder);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psblk_register_blkdev);
+
+/**
+ * psblk_unregister_blkdev() - unregister block device from pstore/blk
+ *
+ * @major: the major device number of device
+ */
+void psblk_unregister_blkdev(unsigned int major)
+{
+	struct psblk_device dev = {.read = psblk_generic_blk_read};
+	void *holder = blkdev;
+
+	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
+		psblk_unregister_do(&dev);
+		psblk_put_bdev(psblk_bdev, holder);
+		blkdev_panic_write = NULL;
+		psblk_bdev = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(psblk_unregister_blkdev);
+
+/**
+ * psblk_blkdev_info() - get information of @blkdev
+ *
+ * @devt: the block device num of @blkdev
+ * @nr_sects: the sector count of @blkdev
+ * @start_sect: the start sector of @blkdev
+ *
+ * Block driver needs the follow information for @panic_write.
+ *
+ * Return: 0 on success, others on failure.
+ */
+int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
+{
+	struct bdev_info *binfo;
+
+	binfo = psblk_get_bdev_info();
+	if (IS_ERR(binfo))
+		return PTR_ERR(binfo);
+
+	if (devt)
+		*devt = binfo->devt;
+	if (nr_sects)
+		*nr_sects = binfo->nr_sects;
+	if (start_sect)
+		*start_sect = binfo->start_sect;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(psblk_blkdev_info);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("pstore backend for block devices");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
new file mode 100644
index 000000000000..5ff465e3953e
--- /dev/null
+++ b/include/linux/pstore_blk.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSTORE_BLK_H_
+#define __PSTORE_BLK_H_
+
+#include <linux/types.h>
+#include <linux/pstore_zone.h>
+
+/**
+ * typedef psblk_panic_write_op - panic write operation to block device
+ *
+ * @buf: the data to write
+ * @start_sect: start sector to block device
+ * @sects: sectors count on buf
+ *
+ * Return: On success, zero should be returned. Others mean error.
+ *
+ * Panic write to block device must be aligned to SECTOR_SIZE.
+ */
+typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
+		sector_t sects);
+
+int  psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write);
+void psblk_unregister_blkdev(unsigned int major);
+int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
+
+#endif
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 04/12] pstore/blk: Provide way to choose pstore frontend support
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:39   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Most pstore backends lack support for all the pstore frontends, only
handling kmsg dump and not things like pmsg, console, and ftrace.
Provide a way for drivers using pstore/blk to list which frontends they
expect to support.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-4-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/blk.c            | 16 +++++++++++++---
 include/linux/pstore_blk.h |  4 +++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 286aa82aa483..d1c3074aa128 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -67,6 +67,9 @@ static struct bdev_info {
  *
  * @total_size: The total size in bytes pstore/blk can use. It must be greater
  *		than 4096 and be multiple of 4096.
+ * @flags:	Refer to macro starting with PSTORE_FLAGS defined in
+ *		linux/pstore.h. It means what front-ends this device support.
+ *		Zero means all backends for compatible.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to bock device (not the
  *		whole disk).
@@ -79,6 +82,7 @@ static struct bdev_info {
  */
 struct psblk_device {
 	unsigned long total_size;
+	unsigned int flags;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
@@ -104,8 +108,11 @@ static int psblk_register_do(struct psblk_device *dev)
 		return -ENOMEM;
 	}
 
-#define verify_size(name, alignsize) {					\
-		long _##name_ = (name);					\
+	/* zero means all backends for compatible */
+	if (!dev->flags)
+		dev->flags = UINT_MAX;
+#define verify_size(name, alignsize, enable) {				\
+		long _##name_ = (enable) ? (name) : 0;			\
 		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
 		if (_##name_ & ((alignsize) - 1)) {			\
 			pr_info(#name " must align to %d\n",		\
@@ -312,6 +319,7 @@ static struct bdev_info *psblk_get_bdev_info(void)
  * psblk_register_blkdev() - register block device to pstore/blk
  *
  * @major: the major device number of registering device
+ * @flags: refer to macro starting with PSTORE_FLAGS defined in linux/pstore.h
  * @panic_write: the interface for panic case.
  *
  * Only the matching major to @blkdev can register.
@@ -322,7 +330,8 @@ static struct bdev_info *psblk_get_bdev_info(void)
  * * 0		- OK
  * * Others	- something error.
  */
-int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
+int psblk_register_blkdev(unsigned int major, unsigned int flags,
+		psblk_panic_write_op panic_write)
 {
 	struct block_device *bdev;
 	struct psblk_device dev = {0};
@@ -353,6 +362,7 @@ int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
 	blkdev_panic_write = panic_write;
 
 	dev.total_size = psblk_bdev_size(bdev);
+	dev.flags = flags;
 	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
 	dev.read = psblk_generic_blk_read;
 	dev.write = psblk_generic_blk_write;
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 5ff465e3953e..d8f609e60288 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -4,6 +4,7 @@
 #define __PSTORE_BLK_H_
 
 #include <linux/types.h>
+#include <linux/pstore.h>
 #include <linux/pstore_zone.h>
 
 /**
@@ -20,7 +21,8 @@
 typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
 		sector_t sects);
 
-int  psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write);
+int  psblk_register_blkdev(unsigned int major, unsigned int flags,
+		psblk_panic_write_op panic_write);
 void psblk_unregister_blkdev(unsigned int major);
 int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 04/12] pstore/blk: Provide way to choose pstore frontend support
@ 2020-05-08  6:39   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Most pstore backends lack support for all the pstore frontends, only
handling kmsg dump and not things like pmsg, console, and ftrace.
Provide a way for drivers using pstore/blk to list which frontends they
expect to support.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-4-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/blk.c            | 16 +++++++++++++---
 include/linux/pstore_blk.h |  4 +++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 286aa82aa483..d1c3074aa128 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -67,6 +67,9 @@ static struct bdev_info {
  *
  * @total_size: The total size in bytes pstore/blk can use. It must be greater
  *		than 4096 and be multiple of 4096.
+ * @flags:	Refer to macro starting with PSTORE_FLAGS defined in
+ *		linux/pstore.h. It means what front-ends this device support.
+ *		Zero means all backends for compatible.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to bock device (not the
  *		whole disk).
@@ -79,6 +82,7 @@ static struct bdev_info {
  */
 struct psblk_device {
 	unsigned long total_size;
+	unsigned int flags;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
@@ -104,8 +108,11 @@ static int psblk_register_do(struct psblk_device *dev)
 		return -ENOMEM;
 	}
 
-#define verify_size(name, alignsize) {					\
-		long _##name_ = (name);					\
+	/* zero means all backends for compatible */
+	if (!dev->flags)
+		dev->flags = UINT_MAX;
+#define verify_size(name, alignsize, enable) {				\
+		long _##name_ = (enable) ? (name) : 0;			\
 		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
 		if (_##name_ & ((alignsize) - 1)) {			\
 			pr_info(#name " must align to %d\n",		\
@@ -312,6 +319,7 @@ static struct bdev_info *psblk_get_bdev_info(void)
  * psblk_register_blkdev() - register block device to pstore/blk
  *
  * @major: the major device number of registering device
+ * @flags: refer to macro starting with PSTORE_FLAGS defined in linux/pstore.h
  * @panic_write: the interface for panic case.
  *
  * Only the matching major to @blkdev can register.
@@ -322,7 +330,8 @@ static struct bdev_info *psblk_get_bdev_info(void)
  * * 0		- OK
  * * Others	- something error.
  */
-int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
+int psblk_register_blkdev(unsigned int major, unsigned int flags,
+		psblk_panic_write_op panic_write)
 {
 	struct block_device *bdev;
 	struct psblk_device dev = {0};
@@ -353,6 +362,7 @@ int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
 	blkdev_panic_write = panic_write;
 
 	dev.total_size = psblk_bdev_size(bdev);
+	dev.flags = flags;
 	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
 	dev.read = psblk_generic_blk_read;
 	dev.write = psblk_generic_blk_write;
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 5ff465e3953e..d8f609e60288 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -4,6 +4,7 @@
 #define __PSTORE_BLK_H_
 
 #include <linux/types.h>
+#include <linux/pstore.h>
 #include <linux/pstore_zone.h>
 
 /**
@@ -20,7 +21,8 @@
 typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
 		sector_t sects);
 
-int  psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write);
+int  psblk_register_blkdev(unsigned int major, unsigned int flags,
+		psblk_panic_write_op panic_write);
 void psblk_unregister_blkdev(unsigned int major);
 int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
 
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 05/12] pstore/blk: Add support for pmsg frontend
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:39   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Add pmsg support to pstore/blk (through pstore/zone). To enable, pmsg_size
must be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-5-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           |  12 ++
 fs/pstore/blk.c             |   9 ++
 fs/pstore/zone.c            | 268 ++++++++++++++++++++++++++++++++++--
 include/linux/pstore_zone.h |   2 +
 4 files changed, 281 insertions(+), 10 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 92ba73bd0b62..f18cd126d83f 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -224,3 +224,15 @@ config PSTORE_BLK_MAX_REASON
 
 	  NOTE that, both Kconfig and module parameters can configure
 	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_PMSG_SIZE
+	int "Size in Kbytes of pmsg to store"
+	depends on PSTORE_BLK
+	depends on PSTORE_PMSG
+	default 64
+	help
+	  This just sets size of pmsg (pmsg_size) for pstore/blk. The size is
+	  in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index d1c3074aa128..401e5ba66a5f 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -24,6 +24,14 @@ module_param(max_reason, int, 0400);
 MODULE_PARM_DESC(max_reason,
 		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
 
+#if IS_ENABLED(CONFIG_PSTORE_PMSG)
+static long pmsg_size = CONFIG_PSTORE_BLK_PMSG_SIZE;
+#else
+static long pmsg_size = -1;
+#endif
+module_param(pmsg_size, long, 0400);
+MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
+
 /*
  * blkdev - The block device to use.
  *
@@ -124,6 +132,7 @@ static int psblk_register_do(struct psblk_device *dev)
 	}
 
 	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
+	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
 #undef verify_size
 
 	pstore_zone_info->total_size = dev->total_size;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 6c25c443c8e2..f472b06a6c14 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -23,12 +23,14 @@
  *
  * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
  * @datalen: length of data in @data
+ * @start: offset into @data where the beginning of the stored bytes begin
  * @data: zone data.
  */
 struct psz_buffer {
 #define PSZ_SIG (0x43474244) /* DBGC */
 	uint32_t sig;
 	atomic_t datalen;
+	atomic_t start;
 	uint8_t data[];
 };
 
@@ -84,9 +86,11 @@ struct pstore_zone {
  * struct psz_context - all about running state of pstore/zone
  *
  * @opszs: oops/panic storage zones
+ * @ppsz: pmsg storage zone
  * @oops_max_cnt: max count of @opszs
  * @oops_read_cnt: counter to read oops zone
  * @oops_write_cnt: counter to write
+ * @pmsg_read_cnt: counter to read pmsg zone
  * @oops_counter: counter to oops
  * @panic_counter: counter to panic
  * @recovered: whether finish recovering data from storage
@@ -97,9 +101,11 @@ struct pstore_zone {
  */
 struct psz_context {
 	struct pstore_zone **opszs;
+	struct pstore_zone *ppsz;
 	unsigned int oops_max_cnt;
 	unsigned int oops_read_cnt;
 	unsigned int oops_write_cnt;
+	unsigned int pmsg_read_cnt;
 	/*
 	 * the counter should be recovered when recover.
 	 * It records the oops/panic times after burning rather than booting.
@@ -139,6 +145,11 @@ static inline int buffer_datalen(struct pstore_zone *zone)
 	return atomic_read(&zone->buffer->datalen);
 }
 
+static inline int buffer_start(struct pstore_zone *zone)
+{
+	return atomic_read(&zone->buffer->start);
+}
+
 static inline bool is_on_panic(void)
 {
 	struct psz_context *cxt = &psz_cxt;
@@ -146,10 +157,10 @@ static inline bool is_on_panic(void)
 	return atomic_read(&cxt->on_panic);
 }
 
-static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
+static ssize_t psz_zone_read_buffer(struct pstore_zone *zone, char *buf,
 		size_t len, unsigned long off)
 {
-	if (!buf || !zone->buffer)
+	if (!buf || !zone || !zone->buffer)
 		return -EINVAL;
 	if (off > zone->buffer_size)
 		return -EINVAL;
@@ -158,6 +169,18 @@ static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
 	return len;
 }
 
+static int psz_zone_read_oldbuf(struct pstore_zone *zone, char *buf,
+		size_t len, unsigned long off)
+{
+	if (!buf || !zone || !zone->oldbuf)
+		return -EINVAL;
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	len = min_t(size_t, len, zone->buffer_size - off);
+	memcpy(buf, zone->oldbuf->data + off, len);
+	return 0;
+}
+
 static int psz_zone_write(struct pstore_zone *zone,
 		enum psz_flush_mode flush_mode, const char *buf,
 		size_t len, unsigned long off)
@@ -413,6 +436,93 @@ static int psz_recover_oops(struct psz_context *cxt)
 	return ret;
 }
 
+static int psz_recover_zone(struct psz_context *cxt, struct pstore_zone *zone)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	struct psz_buffer *oldbuf, tmpbuf;
+	int ret = 0;
+	char *buf;
+	ssize_t rcnt, len, start, off;
+
+	if (!zone || zone->oldbuf)
+		return 0;
+
+	if (is_on_panic()) {
+		/* save data as much as possible */
+		psz_flush_dirty_zone(zone);
+		return 0;
+	}
+
+	if (unlikely(!info->read))
+		return -EINVAL;
+
+	len = sizeof(struct psz_buffer);
+	rcnt = info->read((char *)&tmpbuf, len, zone->off);
+	if (rcnt != len) {
+		pr_debug("read zone %s failed\n", zone->name);
+		return (int)rcnt < 0 ? (int)rcnt : -EIO;
+	}
+
+	if (tmpbuf.sig != zone->buffer->sig) {
+		pr_debug("no valid data in zone %s\n", zone->name);
+		return 0;
+	}
+
+	if (zone->buffer_size < atomic_read(&tmpbuf.datalen) ||
+		zone->buffer_size < atomic_read(&tmpbuf.start)) {
+		pr_info("found overtop zone: %s: off %lld, size %zu\n",
+				zone->name, zone->off, zone->buffer_size);
+		/* just keep going */
+		return 0;
+	}
+
+	if (!atomic_read(&tmpbuf.datalen)) {
+		pr_debug("found erased zone: %s: off %lld, size %zu, datalen %d\n",
+				zone->name, zone->off, zone->buffer_size,
+				atomic_read(&tmpbuf.datalen));
+		return 0;
+	}
+
+	pr_debug("found nice zone: %s: off %lld, size %zu, datalen %d\n",
+			zone->name, zone->off, zone->buffer_size,
+			atomic_read(&tmpbuf.datalen));
+
+	len = atomic_read(&tmpbuf.datalen) + sizeof(*oldbuf);
+	oldbuf = kzalloc(len, GFP_KERNEL);
+	if (!oldbuf)
+		return -ENOMEM;
+
+	memcpy(oldbuf, &tmpbuf, sizeof(*oldbuf));
+	buf = (char *)oldbuf + sizeof(*oldbuf);
+	len = atomic_read(&oldbuf->datalen);
+	start = atomic_read(&oldbuf->start);
+	off = zone->off + sizeof(*oldbuf);
+
+	/* get part of data */
+	rcnt = info->read(buf, len - start, off + start);
+	if (rcnt != len - start) {
+		pr_err("read zone %s failed\n", zone->name);
+		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
+		goto free_oldbuf;
+	}
+
+	/* get the rest of data */
+	rcnt = info->read(buf + len - start, start, off);
+	if (rcnt != start) {
+		pr_err("read zone %s failed\n", zone->name);
+		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
+		goto free_oldbuf;
+	}
+
+	zone->oldbuf = oldbuf;
+	psz_flush_dirty_zone(zone);
+	return 0;
+
+free_oldbuf:
+	kfree(oldbuf);
+	return ret;
+}
+
 /**
  * psz_recovery() - recover data from storage
  * @cxt: the context of pstore/zone
@@ -432,6 +542,10 @@ static inline int psz_recovery(struct psz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = psz_recover_zone(cxt, cxt->ppsz);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -446,9 +560,17 @@ static int psz_pstore_open(struct pstore_info *psi)
 	struct psz_context *cxt = psi->data;
 
 	cxt->oops_read_cnt = 0;
+	cxt->pmsg_read_cnt = 0;
 	return 0;
 }
 
+static inline bool psz_old_ok(struct pstore_zone *zone)
+{
+	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
+		return true;
+	return false;
+}
+
 static inline bool psz_ok(struct pstore_zone *zone)
 {
 	if (zone && zone->buffer && buffer_datalen(zone))
@@ -473,6 +595,25 @@ static inline int psz_oops_erase(struct psz_context *cxt,
 	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
+static inline int psz_record_erase(struct psz_context *cxt,
+		struct pstore_zone *zone)
+{
+	if (unlikely(!psz_old_ok(zone)))
+		return 0;
+
+	kfree(zone->oldbuf);
+	zone->oldbuf = NULL;
+	/*
+	 * if there are new data in zone buffer, that means the old data
+	 * are already invalid. It is no need to flush 0 (erase) to
+	 * block device.
+	 */
+	if (!buffer_datalen(zone))
+		return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	psz_flush_dirty_zone(zone);
+	return 0;
+}
+
 static int psz_pstore_erase(struct pstore_record *record)
 {
 	struct psz_context *cxt = record->psi->data;
@@ -482,6 +623,8 @@ static int psz_pstore_erase(struct pstore_record *record)
 		if (record->id >= cxt->oops_max_cnt)
 			return -EINVAL;
 		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
+	case PSTORE_TYPE_PMSG:
+		return psz_record_erase(cxt, cxt->ppsz);
 	default:
 		return -EINVAL;
 	}
@@ -502,8 +645,10 @@ static void psz_write_kmsg_hdr(struct pstore_zone *zone,
 	hdr->reason = record->reason;
 	if (hdr->reason == KMSG_DUMP_OOPS)
 		hdr->counter = ++cxt->oops_counter;
-	else
+	else if (hdr->reason == KMSG_DUMP_PANIC)
 		hdr->counter = ++cxt->panic_counter;
+	else
+		hdr->counter = 0;
 }
 
 static inline int notrace psz_oops_write_record(struct psz_context *cxt,
@@ -553,6 +698,53 @@ static int notrace psz_oops_write(struct psz_context *cxt,
 	return 0;
 }
 
+static int notrace psz_record_write(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	size_t start, rem;
+	int cnt = record->size;
+	bool is_full_data = false;
+	char *buf = record->buf;
+
+	if (!zone || !record)
+		return -ENOSPC;
+
+	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
+		is_full_data = true;
+
+	if (unlikely(cnt > zone->buffer_size)) {
+		buf += cnt - zone->buffer_size;
+		cnt = zone->buffer_size;
+	}
+
+	start = buffer_start(zone);
+	rem = zone->buffer_size - start;
+	if (unlikely(rem < cnt)) {
+		psz_zone_write(zone, FLUSH_PART, buf, rem, start);
+		buf += rem;
+		cnt -= rem;
+		start = 0;
+		is_full_data = true;
+	}
+
+	atomic_set(&zone->buffer->start, cnt + start);
+	psz_zone_write(zone, FLUSH_PART, buf, cnt, start);
+
+	/**
+	 * psz_zone_write will set datalen as start + cnt.
+	 * It work if actual data length lesser than buffer size.
+	 * If data length greater than buffer size, pmsg will rewrite to
+	 * beginning of zone, which make buffer->datalen wrongly.
+	 * So we should reset datalen as buffer size once actual data length
+	 * greater than buffer size.
+	 */
+	if (is_full_data) {
+		atomic_set(&zone->buffer->datalen, zone->buffer_size);
+		psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	}
+	return 0;
+}
+
 static int notrace psz_pstore_write(struct pstore_record *record)
 {
 	struct psz_context *cxt = record->psi->data;
@@ -564,6 +756,8 @@ static int notrace psz_pstore_write(struct pstore_record *record)
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return psz_oops_write(cxt, record);
+	case PSTORE_TYPE_PMSG:
+		return psz_record_write(cxt->ppsz, record);
 	default:
 		return -EINVAL;
 	}
@@ -579,6 +773,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->pmsg_read_cnt == 0) {
+		cxt->pmsg_read_cnt++;
+		zone = cxt->ppsz;
+		if (psz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -629,7 +830,7 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
 			return -ENOMEM;
 	}
 
-	size = psz_zone_read(zone, record->buf + hlen, size,
+	size = psz_zone_read_buffer(zone, record->buf + hlen, size,
 			sizeof(struct psz_oops_header) < 0);
 	if (unlikely(size < 0)) {
 		kfree(record->buf);
@@ -639,6 +840,32 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
 	return size + hlen;
 }
 
+static ssize_t psz_record_read(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	size_t len;
+	struct psz_buffer *buf;
+
+	if (!zone || !record)
+		return -ENOSPC;
+
+	buf = (struct psz_buffer *)zone->oldbuf;
+	if (!buf)
+		return -ENOMSG;
+
+	len = atomic_read(&buf->datalen);
+	record->buf = kmalloc(len, GFP_KERNEL);
+	if (!record->buf)
+		return -ENOMEM;
+
+	if (unlikely(psz_zone_read_oldbuf(zone, record->buf, len, 0))) {
+		kfree(record->buf);
+		return -ENOMSG;
+	}
+
+	return len;
+}
+
 static ssize_t psz_pstore_read(struct pstore_record *record)
 {
 	struct psz_context *cxt = record->psi->data;
@@ -663,6 +890,9 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
 		readop = psz_oops_read;
 		record->id = cxt->oops_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_PMSG:
+		readop = psz_record_read;
+		break;
 	default:
 		goto next_zone;
 	}
@@ -718,8 +948,10 @@ static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
 	zone->type = type;
 	zone->buffer_size = size - sizeof(struct psz_buffer);
 	zone->buffer->sig = type ^ PSZ_SIG;
+	zone->oldbuf = NULL;
 	atomic_set(&zone->dirty, 0);
 	atomic_set(&zone->buffer->datalen, 0);
+	atomic_set(&zone->buffer->start, 0);
 
 	*off += size;
 
@@ -803,6 +1035,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
 {
 	if (cxt->opszs)
 		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
+	if (cxt->ppsz)
+		psz_free_zone(&cxt->ppsz);
 }
 
 static int psz_alloc_zones(struct psz_context *cxt)
@@ -810,18 +1044,26 @@ static int psz_alloc_zones(struct psz_context *cxt)
 	struct pstore_zone_info *info = cxt->pstore_zone_info;
 	loff_t off = 0;
 	int err;
-	size_t size;
+	size_t off_size = 0;
 
-	size = info->total_size;
-	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+	off_size += info->pmsg_size;
+	cxt->ppsz = psz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
+	if (IS_ERR(cxt->ppsz)) {
+		err = PTR_ERR(cxt->ppsz);
+		goto free_out;
+	}
+
+	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
+			info->total_size - off_size,
 			info->kmsg_size, &cxt->oops_max_cnt);
 	if (IS_ERR(cxt->opszs)) {
 		err = PTR_ERR(cxt->opszs);
-		goto fail_out;
+		goto free_out;
 	}
 
 	return 0;
-fail_out:
+free_out:
+	psz_free_all_zones(cxt);
 	return err;
 }
 
@@ -844,7 +1086,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->kmsg_size) {
+	if (!info->kmsg_size && !info->pmsg_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -871,6 +1113,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 
 	check_size(total_size, 4096);
 	check_size(kmsg_size, SECTOR_SIZE);
+	check_size(pmsg_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -897,6 +1140,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	pr_debug("register %s with properties:\n", info->name);
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
+	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 
 	err = psz_alloc_zones(cxt);
 	if (err) {
@@ -925,6 +1169,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
 			pr_cont(",panic_write");
 		pr_cont(")");
 	}
+	if (info->pmsg_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
+		pr_cont(" pmsg");
+	}
 	pr_cont("\n");
 
 	err = pstore_register(&cxt->pstore);
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index a6a79ff1351b..39c2cb944123 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -17,6 +17,7 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
  *		it must be multiple of SECTOR_SIZE(512 Bytes).
  * @max_reason: Maximum kmsg dump reason to store.
+ * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
@@ -33,6 +34,7 @@ struct pstore_zone_info {
 	unsigned long total_size;
 	unsigned long kmsg_size;
 	int max_reason;
+	unsigned long pmsg_size;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 05/12] pstore/blk: Add support for pmsg frontend
@ 2020-05-08  6:39   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Add pmsg support to pstore/blk (through pstore/zone). To enable, pmsg_size
must be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-5-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           |  12 ++
 fs/pstore/blk.c             |   9 ++
 fs/pstore/zone.c            | 268 ++++++++++++++++++++++++++++++++++--
 include/linux/pstore_zone.h |   2 +
 4 files changed, 281 insertions(+), 10 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 92ba73bd0b62..f18cd126d83f 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -224,3 +224,15 @@ config PSTORE_BLK_MAX_REASON
 
 	  NOTE that, both Kconfig and module parameters can configure
 	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_PMSG_SIZE
+	int "Size in Kbytes of pmsg to store"
+	depends on PSTORE_BLK
+	depends on PSTORE_PMSG
+	default 64
+	help
+	  This just sets size of pmsg (pmsg_size) for pstore/blk. The size is
+	  in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index d1c3074aa128..401e5ba66a5f 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -24,6 +24,14 @@ module_param(max_reason, int, 0400);
 MODULE_PARM_DESC(max_reason,
 		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
 
+#if IS_ENABLED(CONFIG_PSTORE_PMSG)
+static long pmsg_size = CONFIG_PSTORE_BLK_PMSG_SIZE;
+#else
+static long pmsg_size = -1;
+#endif
+module_param(pmsg_size, long, 0400);
+MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
+
 /*
  * blkdev - The block device to use.
  *
@@ -124,6 +132,7 @@ static int psblk_register_do(struct psblk_device *dev)
 	}
 
 	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
+	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
 #undef verify_size
 
 	pstore_zone_info->total_size = dev->total_size;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 6c25c443c8e2..f472b06a6c14 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -23,12 +23,14 @@
  *
  * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
  * @datalen: length of data in @data
+ * @start: offset into @data where the beginning of the stored bytes begin
  * @data: zone data.
  */
 struct psz_buffer {
 #define PSZ_SIG (0x43474244) /* DBGC */
 	uint32_t sig;
 	atomic_t datalen;
+	atomic_t start;
 	uint8_t data[];
 };
 
@@ -84,9 +86,11 @@ struct pstore_zone {
  * struct psz_context - all about running state of pstore/zone
  *
  * @opszs: oops/panic storage zones
+ * @ppsz: pmsg storage zone
  * @oops_max_cnt: max count of @opszs
  * @oops_read_cnt: counter to read oops zone
  * @oops_write_cnt: counter to write
+ * @pmsg_read_cnt: counter to read pmsg zone
  * @oops_counter: counter to oops
  * @panic_counter: counter to panic
  * @recovered: whether finish recovering data from storage
@@ -97,9 +101,11 @@ struct pstore_zone {
  */
 struct psz_context {
 	struct pstore_zone **opszs;
+	struct pstore_zone *ppsz;
 	unsigned int oops_max_cnt;
 	unsigned int oops_read_cnt;
 	unsigned int oops_write_cnt;
+	unsigned int pmsg_read_cnt;
 	/*
 	 * the counter should be recovered when recover.
 	 * It records the oops/panic times after burning rather than booting.
@@ -139,6 +145,11 @@ static inline int buffer_datalen(struct pstore_zone *zone)
 	return atomic_read(&zone->buffer->datalen);
 }
 
+static inline int buffer_start(struct pstore_zone *zone)
+{
+	return atomic_read(&zone->buffer->start);
+}
+
 static inline bool is_on_panic(void)
 {
 	struct psz_context *cxt = &psz_cxt;
@@ -146,10 +157,10 @@ static inline bool is_on_panic(void)
 	return atomic_read(&cxt->on_panic);
 }
 
-static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
+static ssize_t psz_zone_read_buffer(struct pstore_zone *zone, char *buf,
 		size_t len, unsigned long off)
 {
-	if (!buf || !zone->buffer)
+	if (!buf || !zone || !zone->buffer)
 		return -EINVAL;
 	if (off > zone->buffer_size)
 		return -EINVAL;
@@ -158,6 +169,18 @@ static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
 	return len;
 }
 
+static int psz_zone_read_oldbuf(struct pstore_zone *zone, char *buf,
+		size_t len, unsigned long off)
+{
+	if (!buf || !zone || !zone->oldbuf)
+		return -EINVAL;
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	len = min_t(size_t, len, zone->buffer_size - off);
+	memcpy(buf, zone->oldbuf->data + off, len);
+	return 0;
+}
+
 static int psz_zone_write(struct pstore_zone *zone,
 		enum psz_flush_mode flush_mode, const char *buf,
 		size_t len, unsigned long off)
@@ -413,6 +436,93 @@ static int psz_recover_oops(struct psz_context *cxt)
 	return ret;
 }
 
+static int psz_recover_zone(struct psz_context *cxt, struct pstore_zone *zone)
+{
+	struct pstore_zone_info *info = cxt->pstore_zone_info;
+	struct psz_buffer *oldbuf, tmpbuf;
+	int ret = 0;
+	char *buf;
+	ssize_t rcnt, len, start, off;
+
+	if (!zone || zone->oldbuf)
+		return 0;
+
+	if (is_on_panic()) {
+		/* save data as much as possible */
+		psz_flush_dirty_zone(zone);
+		return 0;
+	}
+
+	if (unlikely(!info->read))
+		return -EINVAL;
+
+	len = sizeof(struct psz_buffer);
+	rcnt = info->read((char *)&tmpbuf, len, zone->off);
+	if (rcnt != len) {
+		pr_debug("read zone %s failed\n", zone->name);
+		return (int)rcnt < 0 ? (int)rcnt : -EIO;
+	}
+
+	if (tmpbuf.sig != zone->buffer->sig) {
+		pr_debug("no valid data in zone %s\n", zone->name);
+		return 0;
+	}
+
+	if (zone->buffer_size < atomic_read(&tmpbuf.datalen) ||
+		zone->buffer_size < atomic_read(&tmpbuf.start)) {
+		pr_info("found overtop zone: %s: off %lld, size %zu\n",
+				zone->name, zone->off, zone->buffer_size);
+		/* just keep going */
+		return 0;
+	}
+
+	if (!atomic_read(&tmpbuf.datalen)) {
+		pr_debug("found erased zone: %s: off %lld, size %zu, datalen %d\n",
+				zone->name, zone->off, zone->buffer_size,
+				atomic_read(&tmpbuf.datalen));
+		return 0;
+	}
+
+	pr_debug("found nice zone: %s: off %lld, size %zu, datalen %d\n",
+			zone->name, zone->off, zone->buffer_size,
+			atomic_read(&tmpbuf.datalen));
+
+	len = atomic_read(&tmpbuf.datalen) + sizeof(*oldbuf);
+	oldbuf = kzalloc(len, GFP_KERNEL);
+	if (!oldbuf)
+		return -ENOMEM;
+
+	memcpy(oldbuf, &tmpbuf, sizeof(*oldbuf));
+	buf = (char *)oldbuf + sizeof(*oldbuf);
+	len = atomic_read(&oldbuf->datalen);
+	start = atomic_read(&oldbuf->start);
+	off = zone->off + sizeof(*oldbuf);
+
+	/* get part of data */
+	rcnt = info->read(buf, len - start, off + start);
+	if (rcnt != len - start) {
+		pr_err("read zone %s failed\n", zone->name);
+		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
+		goto free_oldbuf;
+	}
+
+	/* get the rest of data */
+	rcnt = info->read(buf + len - start, start, off);
+	if (rcnt != start) {
+		pr_err("read zone %s failed\n", zone->name);
+		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
+		goto free_oldbuf;
+	}
+
+	zone->oldbuf = oldbuf;
+	psz_flush_dirty_zone(zone);
+	return 0;
+
+free_oldbuf:
+	kfree(oldbuf);
+	return ret;
+}
+
 /**
  * psz_recovery() - recover data from storage
  * @cxt: the context of pstore/zone
@@ -432,6 +542,10 @@ static inline int psz_recovery(struct psz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = psz_recover_zone(cxt, cxt->ppsz);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -446,9 +560,17 @@ static int psz_pstore_open(struct pstore_info *psi)
 	struct psz_context *cxt = psi->data;
 
 	cxt->oops_read_cnt = 0;
+	cxt->pmsg_read_cnt = 0;
 	return 0;
 }
 
+static inline bool psz_old_ok(struct pstore_zone *zone)
+{
+	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
+		return true;
+	return false;
+}
+
 static inline bool psz_ok(struct pstore_zone *zone)
 {
 	if (zone && zone->buffer && buffer_datalen(zone))
@@ -473,6 +595,25 @@ static inline int psz_oops_erase(struct psz_context *cxt,
 	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
+static inline int psz_record_erase(struct psz_context *cxt,
+		struct pstore_zone *zone)
+{
+	if (unlikely(!psz_old_ok(zone)))
+		return 0;
+
+	kfree(zone->oldbuf);
+	zone->oldbuf = NULL;
+	/*
+	 * if there are new data in zone buffer, that means the old data
+	 * are already invalid. It is no need to flush 0 (erase) to
+	 * block device.
+	 */
+	if (!buffer_datalen(zone))
+		return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	psz_flush_dirty_zone(zone);
+	return 0;
+}
+
 static int psz_pstore_erase(struct pstore_record *record)
 {
 	struct psz_context *cxt = record->psi->data;
@@ -482,6 +623,8 @@ static int psz_pstore_erase(struct pstore_record *record)
 		if (record->id >= cxt->oops_max_cnt)
 			return -EINVAL;
 		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
+	case PSTORE_TYPE_PMSG:
+		return psz_record_erase(cxt, cxt->ppsz);
 	default:
 		return -EINVAL;
 	}
@@ -502,8 +645,10 @@ static void psz_write_kmsg_hdr(struct pstore_zone *zone,
 	hdr->reason = record->reason;
 	if (hdr->reason == KMSG_DUMP_OOPS)
 		hdr->counter = ++cxt->oops_counter;
-	else
+	else if (hdr->reason == KMSG_DUMP_PANIC)
 		hdr->counter = ++cxt->panic_counter;
+	else
+		hdr->counter = 0;
 }
 
 static inline int notrace psz_oops_write_record(struct psz_context *cxt,
@@ -553,6 +698,53 @@ static int notrace psz_oops_write(struct psz_context *cxt,
 	return 0;
 }
 
+static int notrace psz_record_write(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	size_t start, rem;
+	int cnt = record->size;
+	bool is_full_data = false;
+	char *buf = record->buf;
+
+	if (!zone || !record)
+		return -ENOSPC;
+
+	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
+		is_full_data = true;
+
+	if (unlikely(cnt > zone->buffer_size)) {
+		buf += cnt - zone->buffer_size;
+		cnt = zone->buffer_size;
+	}
+
+	start = buffer_start(zone);
+	rem = zone->buffer_size - start;
+	if (unlikely(rem < cnt)) {
+		psz_zone_write(zone, FLUSH_PART, buf, rem, start);
+		buf += rem;
+		cnt -= rem;
+		start = 0;
+		is_full_data = true;
+	}
+
+	atomic_set(&zone->buffer->start, cnt + start);
+	psz_zone_write(zone, FLUSH_PART, buf, cnt, start);
+
+	/**
+	 * psz_zone_write will set datalen as start + cnt.
+	 * It work if actual data length lesser than buffer size.
+	 * If data length greater than buffer size, pmsg will rewrite to
+	 * beginning of zone, which make buffer->datalen wrongly.
+	 * So we should reset datalen as buffer size once actual data length
+	 * greater than buffer size.
+	 */
+	if (is_full_data) {
+		atomic_set(&zone->buffer->datalen, zone->buffer_size);
+		psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	}
+	return 0;
+}
+
 static int notrace psz_pstore_write(struct pstore_record *record)
 {
 	struct psz_context *cxt = record->psi->data;
@@ -564,6 +756,8 @@ static int notrace psz_pstore_write(struct pstore_record *record)
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return psz_oops_write(cxt, record);
+	case PSTORE_TYPE_PMSG:
+		return psz_record_write(cxt->ppsz, record);
 	default:
 		return -EINVAL;
 	}
@@ -579,6 +773,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->pmsg_read_cnt == 0) {
+		cxt->pmsg_read_cnt++;
+		zone = cxt->ppsz;
+		if (psz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -629,7 +830,7 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
 			return -ENOMEM;
 	}
 
-	size = psz_zone_read(zone, record->buf + hlen, size,
+	size = psz_zone_read_buffer(zone, record->buf + hlen, size,
 			sizeof(struct psz_oops_header) < 0);
 	if (unlikely(size < 0)) {
 		kfree(record->buf);
@@ -639,6 +840,32 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
 	return size + hlen;
 }
 
+static ssize_t psz_record_read(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	size_t len;
+	struct psz_buffer *buf;
+
+	if (!zone || !record)
+		return -ENOSPC;
+
+	buf = (struct psz_buffer *)zone->oldbuf;
+	if (!buf)
+		return -ENOMSG;
+
+	len = atomic_read(&buf->datalen);
+	record->buf = kmalloc(len, GFP_KERNEL);
+	if (!record->buf)
+		return -ENOMEM;
+
+	if (unlikely(psz_zone_read_oldbuf(zone, record->buf, len, 0))) {
+		kfree(record->buf);
+		return -ENOMSG;
+	}
+
+	return len;
+}
+
 static ssize_t psz_pstore_read(struct pstore_record *record)
 {
 	struct psz_context *cxt = record->psi->data;
@@ -663,6 +890,9 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
 		readop = psz_oops_read;
 		record->id = cxt->oops_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_PMSG:
+		readop = psz_record_read;
+		break;
 	default:
 		goto next_zone;
 	}
@@ -718,8 +948,10 @@ static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
 	zone->type = type;
 	zone->buffer_size = size - sizeof(struct psz_buffer);
 	zone->buffer->sig = type ^ PSZ_SIG;
+	zone->oldbuf = NULL;
 	atomic_set(&zone->dirty, 0);
 	atomic_set(&zone->buffer->datalen, 0);
+	atomic_set(&zone->buffer->start, 0);
 
 	*off += size;
 
@@ -803,6 +1035,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
 {
 	if (cxt->opszs)
 		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
+	if (cxt->ppsz)
+		psz_free_zone(&cxt->ppsz);
 }
 
 static int psz_alloc_zones(struct psz_context *cxt)
@@ -810,18 +1044,26 @@ static int psz_alloc_zones(struct psz_context *cxt)
 	struct pstore_zone_info *info = cxt->pstore_zone_info;
 	loff_t off = 0;
 	int err;
-	size_t size;
+	size_t off_size = 0;
 
-	size = info->total_size;
-	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+	off_size += info->pmsg_size;
+	cxt->ppsz = psz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
+	if (IS_ERR(cxt->ppsz)) {
+		err = PTR_ERR(cxt->ppsz);
+		goto free_out;
+	}
+
+	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
+			info->total_size - off_size,
 			info->kmsg_size, &cxt->oops_max_cnt);
 	if (IS_ERR(cxt->opszs)) {
 		err = PTR_ERR(cxt->opszs);
-		goto fail_out;
+		goto free_out;
 	}
 
 	return 0;
-fail_out:
+free_out:
+	psz_free_all_zones(cxt);
 	return err;
 }
 
@@ -844,7 +1086,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->kmsg_size) {
+	if (!info->kmsg_size && !info->pmsg_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -871,6 +1113,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 
 	check_size(total_size, 4096);
 	check_size(kmsg_size, SECTOR_SIZE);
+	check_size(pmsg_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -897,6 +1140,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	pr_debug("register %s with properties:\n", info->name);
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
+	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 
 	err = psz_alloc_zones(cxt);
 	if (err) {
@@ -925,6 +1169,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
 			pr_cont(",panic_write");
 		pr_cont(")");
 	}
+	if (info->pmsg_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
+		pr_cont(" pmsg");
+	}
 	pr_cont("\n");
 
 	err = pstore_register(&cxt->pstore);
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index a6a79ff1351b..39c2cb944123 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -17,6 +17,7 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
  *		it must be multiple of SECTOR_SIZE(512 Bytes).
  * @max_reason: Maximum kmsg dump reason to store.
+ * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
@@ -33,6 +34,7 @@ struct pstore_zone_info {
 	unsigned long total_size;
 	unsigned long kmsg_size;
 	int max_reason;
+	unsigned long pmsg_size;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 06/12] pstore/blk: Add console frontend support
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:39   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Support backend for console. To enable console backend, just make
console_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-6-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           | 12 +++++++
 fs/pstore/blk.c             | 12 ++++++-
 fs/pstore/zone.c            | 67 +++++++++++++++++++++++++++++++++++--
 include/linux/pstore_zone.h |  4 ++-
 4 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index f18cd126d83f..f1484f751c5e 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -236,3 +236,15 @@ config PSTORE_BLK_PMSG_SIZE
 
 	  NOTE that, both Kconfig and module parameters can configure
 	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_CONSOLE_SIZE
+	int "Size in Kbytes of console to store"
+	depends on PSTORE_BLK
+	depends on PSTORE_CONSOLE
+	default 64
+	help
+	  This just sets size of console (console_size) for pstore/blk. The
+	  size is in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 401e5ba66a5f..813025ea7edd 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -32,6 +32,14 @@ static long pmsg_size = -1;
 module_param(pmsg_size, long, 0400);
 MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
 
+#if IS_ENABLED(CONFIG_PSTORE_CONSOLE)
+static long console_size = CONFIG_PSTORE_BLK_CONSOLE_SIZE;
+#else
+static long console_size = -1;
+#endif
+module_param(console_size, long, 0400);
+MODULE_PARM_DESC(console_size, "console size in kbytes");
+
 /*
  * blkdev - The block device to use.
  *
@@ -83,7 +91,8 @@ static struct bdev_info {
  *		whole disk).
  *		On success, the number of bytes should be returned, others
  *		means error.
- * @write:	The same as @read.
+ * @write:	The same as @read, but the following error number:
+ *		-EBUSY means try to write again later.
  * @panic_write:The write operation only used for panic case. It's optional
  *		if you do not care panic log. The parameters and return value
  *		are the same as @read.
@@ -133,6 +142,7 @@ static int psblk_register_do(struct psblk_device *dev)
 
 	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
 	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
+	verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
 #undef verify_size
 
 	pstore_zone_info->total_size = dev->total_size;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index f472b06a6c14..0b952eea39fe 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -87,10 +87,12 @@ struct pstore_zone {
  *
  * @opszs: oops/panic storage zones
  * @ppsz: pmsg storage zone
+ * @cpsz: console storage zone
  * @oops_max_cnt: max count of @opszs
  * @oops_read_cnt: counter to read oops zone
  * @oops_write_cnt: counter to write
  * @pmsg_read_cnt: counter to read pmsg zone
+ * @console_read_cnt: counter to read console zone
  * @oops_counter: counter to oops
  * @panic_counter: counter to panic
  * @recovered: whether finish recovering data from storage
@@ -102,10 +104,12 @@ struct pstore_zone {
 struct psz_context {
 	struct pstore_zone **opszs;
 	struct pstore_zone *ppsz;
+	struct pstore_zone *cpsz;
 	unsigned int oops_max_cnt;
 	unsigned int oops_read_cnt;
 	unsigned int oops_write_cnt;
 	unsigned int pmsg_read_cnt;
+	unsigned int console_read_cnt;
 	/*
 	 * the counter should be recovered when recover.
 	 * It records the oops/panic times after burning rather than booting.
@@ -125,6 +129,9 @@ struct psz_context {
 };
 static struct psz_context psz_cxt;
 
+static void psz_flush_all_dirty_zones(struct work_struct *);
+static DECLARE_WORK(psz_cleaner, psz_flush_all_dirty_zones);
+
 /**
  * enum psz_flush_mode - flush mode for psz_zone_write()
  *
@@ -235,6 +242,9 @@ static int psz_zone_write(struct pstore_zone *zone,
 	return 0;
 dirty:
 	atomic_set(&zone->dirty, true);
+	/* flush dirty zones nicely */
+	if (wcnt == -EBUSY && !is_on_panic())
+		schedule_work(&psz_cleaner);
 	return -EBUSY;
 }
 
@@ -291,6 +301,15 @@ static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
 	return 0;
 }
 
+static void psz_flush_all_dirty_zones(struct work_struct *work)
+{
+	struct psz_context *cxt = &psz_cxt;
+
+	psz_flush_dirty_zone(cxt->ppsz);
+	psz_flush_dirty_zone(cxt->cpsz);
+	psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+}
+
 static int psz_recover_oops_data(struct psz_context *cxt)
 {
 	struct pstore_zone_info *info = cxt->pstore_zone_info;
@@ -546,6 +565,10 @@ static inline int psz_recovery(struct psz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = psz_recover_zone(cxt, cxt->cpsz);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -561,6 +584,7 @@ static int psz_pstore_open(struct pstore_info *psi)
 
 	cxt->oops_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
+	cxt->console_read_cnt = 0;
 	return 0;
 }
 
@@ -625,8 +649,9 @@ static int psz_pstore_erase(struct pstore_record *record)
 		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
 	case PSTORE_TYPE_PMSG:
 		return psz_record_erase(cxt, cxt->ppsz);
-	default:
-		return -EINVAL;
+	case PSTORE_TYPE_CONSOLE:
+		return psz_record_erase(cxt, cxt->cpsz);
+	default: return -EINVAL;
 	}
 }
 
@@ -753,9 +778,18 @@ static int notrace psz_pstore_write(struct pstore_record *record)
 			record->reason == KMSG_DUMP_PANIC)
 		atomic_set(&cxt->on_panic, 1);
 
+	/*
+	 * if on panic, do not write except panic records
+	 * Fix case that panic_write prints log which wakes up console backend.
+	 */
+	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
+		return -EBUSY;
+
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return psz_oops_write(cxt, record);
+	case PSTORE_TYPE_CONSOLE:
+		return psz_record_write(cxt->cpsz, record);
 	case PSTORE_TYPE_PMSG:
 		return psz_record_write(cxt->ppsz, record);
 	default:
@@ -780,6 +814,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->console_read_cnt == 0) {
+		cxt->console_read_cnt++;
+		zone = cxt->cpsz;
+		if (psz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -890,6 +931,8 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
 		readop = psz_oops_read;
 		record->id = cxt->oops_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_CONSOLE:
+		fallthrough;
 	case PSTORE_TYPE_PMSG:
 		readop = psz_record_read;
 		break;
@@ -1037,6 +1080,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
 		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
 	if (cxt->ppsz)
 		psz_free_zone(&cxt->ppsz);
+	if (cxt->cpsz)
+		psz_free_zone(&cxt->cpsz);
 }
 
 static int psz_alloc_zones(struct psz_context *cxt)
@@ -1053,6 +1098,14 @@ static int psz_alloc_zones(struct psz_context *cxt)
 		goto free_out;
 	}
 
+	off_size += info->console_size;
+	cxt->cpsz = psz_init_zone(PSTORE_TYPE_CONSOLE, &off,
+			info->console_size);
+	if (IS_ERR(cxt->cpsz)) {
+		err = PTR_ERR(cxt->cpsz);
+		goto free_out;
+	}
+
 	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->kmsg_size, &cxt->oops_max_cnt);
@@ -1086,7 +1139,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->kmsg_size && !info->pmsg_size) {
+	if (!info->kmsg_size && !info->pmsg_size && !info->console_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -1114,6 +1167,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	check_size(total_size, 4096);
 	check_size(kmsg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
+	check_size(console_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1141,6 +1195,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
+	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
 
 	err = psz_alloc_zones(cxt);
 	if (err) {
@@ -1173,6 +1228,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
 		pr_cont(" pmsg");
 	}
+	if (info->console_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
+		pr_cont(" console");
+	}
 	pr_cont("\n");
 
 	err = pstore_register(&cxt->pstore);
@@ -1204,6 +1263,8 @@ void unregister_pstore_zone(struct pstore_zone_info *info)
 {
 	struct psz_context *cxt = &psz_cxt;
 
+	flush_work(&psz_cleaner);
+
 	pstore_unregister(&cxt->pstore);
 	kfree(cxt->pstore.buf);
 	cxt->pstore.bufsize = 0;
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index 39c2cb944123..da294e6d7661 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -18,11 +18,12 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  *		it must be multiple of SECTOR_SIZE(512 Bytes).
  * @max_reason: Maximum kmsg dump reason to store.
  * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
+ * @console_size:The size of console zone which is the same as @kmsg_size.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
  *		means error.
- * @write:	The same as @read.
+ * @write:	The same as @read, but -EBUSY means try to write again later.
  * @panic_write:The write operation only used for panic case. It's optional
  *		if you do not care panic log. The parameters and return value
  *		are the same as @read.
@@ -35,6 +36,7 @@ struct pstore_zone_info {
 	unsigned long kmsg_size;
 	int max_reason;
 	unsigned long pmsg_size;
+	unsigned long console_size;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 06/12] pstore/blk: Add console frontend support
@ 2020-05-08  6:39   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Support backend for console. To enable console backend, just make
console_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-6-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           | 12 +++++++
 fs/pstore/blk.c             | 12 ++++++-
 fs/pstore/zone.c            | 67 +++++++++++++++++++++++++++++++++++--
 include/linux/pstore_zone.h |  4 ++-
 4 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index f18cd126d83f..f1484f751c5e 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -236,3 +236,15 @@ config PSTORE_BLK_PMSG_SIZE
 
 	  NOTE that, both Kconfig and module parameters can configure
 	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_CONSOLE_SIZE
+	int "Size in Kbytes of console to store"
+	depends on PSTORE_BLK
+	depends on PSTORE_CONSOLE
+	default 64
+	help
+	  This just sets size of console (console_size) for pstore/blk. The
+	  size is in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 401e5ba66a5f..813025ea7edd 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -32,6 +32,14 @@ static long pmsg_size = -1;
 module_param(pmsg_size, long, 0400);
 MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
 
+#if IS_ENABLED(CONFIG_PSTORE_CONSOLE)
+static long console_size = CONFIG_PSTORE_BLK_CONSOLE_SIZE;
+#else
+static long console_size = -1;
+#endif
+module_param(console_size, long, 0400);
+MODULE_PARM_DESC(console_size, "console size in kbytes");
+
 /*
  * blkdev - The block device to use.
  *
@@ -83,7 +91,8 @@ static struct bdev_info {
  *		whole disk).
  *		On success, the number of bytes should be returned, others
  *		means error.
- * @write:	The same as @read.
+ * @write:	The same as @read, but the following error number:
+ *		-EBUSY means try to write again later.
  * @panic_write:The write operation only used for panic case. It's optional
  *		if you do not care panic log. The parameters and return value
  *		are the same as @read.
@@ -133,6 +142,7 @@ static int psblk_register_do(struct psblk_device *dev)
 
 	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
 	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
+	verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
 #undef verify_size
 
 	pstore_zone_info->total_size = dev->total_size;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index f472b06a6c14..0b952eea39fe 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -87,10 +87,12 @@ struct pstore_zone {
  *
  * @opszs: oops/panic storage zones
  * @ppsz: pmsg storage zone
+ * @cpsz: console storage zone
  * @oops_max_cnt: max count of @opszs
  * @oops_read_cnt: counter to read oops zone
  * @oops_write_cnt: counter to write
  * @pmsg_read_cnt: counter to read pmsg zone
+ * @console_read_cnt: counter to read console zone
  * @oops_counter: counter to oops
  * @panic_counter: counter to panic
  * @recovered: whether finish recovering data from storage
@@ -102,10 +104,12 @@ struct pstore_zone {
 struct psz_context {
 	struct pstore_zone **opszs;
 	struct pstore_zone *ppsz;
+	struct pstore_zone *cpsz;
 	unsigned int oops_max_cnt;
 	unsigned int oops_read_cnt;
 	unsigned int oops_write_cnt;
 	unsigned int pmsg_read_cnt;
+	unsigned int console_read_cnt;
 	/*
 	 * the counter should be recovered when recover.
 	 * It records the oops/panic times after burning rather than booting.
@@ -125,6 +129,9 @@ struct psz_context {
 };
 static struct psz_context psz_cxt;
 
+static void psz_flush_all_dirty_zones(struct work_struct *);
+static DECLARE_WORK(psz_cleaner, psz_flush_all_dirty_zones);
+
 /**
  * enum psz_flush_mode - flush mode for psz_zone_write()
  *
@@ -235,6 +242,9 @@ static int psz_zone_write(struct pstore_zone *zone,
 	return 0;
 dirty:
 	atomic_set(&zone->dirty, true);
+	/* flush dirty zones nicely */
+	if (wcnt == -EBUSY && !is_on_panic())
+		schedule_work(&psz_cleaner);
 	return -EBUSY;
 }
 
@@ -291,6 +301,15 @@ static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
 	return 0;
 }
 
+static void psz_flush_all_dirty_zones(struct work_struct *work)
+{
+	struct psz_context *cxt = &psz_cxt;
+
+	psz_flush_dirty_zone(cxt->ppsz);
+	psz_flush_dirty_zone(cxt->cpsz);
+	psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+}
+
 static int psz_recover_oops_data(struct psz_context *cxt)
 {
 	struct pstore_zone_info *info = cxt->pstore_zone_info;
@@ -546,6 +565,10 @@ static inline int psz_recovery(struct psz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = psz_recover_zone(cxt, cxt->cpsz);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -561,6 +584,7 @@ static int psz_pstore_open(struct pstore_info *psi)
 
 	cxt->oops_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
+	cxt->console_read_cnt = 0;
 	return 0;
 }
 
@@ -625,8 +649,9 @@ static int psz_pstore_erase(struct pstore_record *record)
 		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
 	case PSTORE_TYPE_PMSG:
 		return psz_record_erase(cxt, cxt->ppsz);
-	default:
-		return -EINVAL;
+	case PSTORE_TYPE_CONSOLE:
+		return psz_record_erase(cxt, cxt->cpsz);
+	default: return -EINVAL;
 	}
 }
 
@@ -753,9 +778,18 @@ static int notrace psz_pstore_write(struct pstore_record *record)
 			record->reason == KMSG_DUMP_PANIC)
 		atomic_set(&cxt->on_panic, 1);
 
+	/*
+	 * if on panic, do not write except panic records
+	 * Fix case that panic_write prints log which wakes up console backend.
+	 */
+	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
+		return -EBUSY;
+
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return psz_oops_write(cxt, record);
+	case PSTORE_TYPE_CONSOLE:
+		return psz_record_write(cxt->cpsz, record);
 	case PSTORE_TYPE_PMSG:
 		return psz_record_write(cxt->ppsz, record);
 	default:
@@ -780,6 +814,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->console_read_cnt == 0) {
+		cxt->console_read_cnt++;
+		zone = cxt->cpsz;
+		if (psz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -890,6 +931,8 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
 		readop = psz_oops_read;
 		record->id = cxt->oops_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_CONSOLE:
+		fallthrough;
 	case PSTORE_TYPE_PMSG:
 		readop = psz_record_read;
 		break;
@@ -1037,6 +1080,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
 		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
 	if (cxt->ppsz)
 		psz_free_zone(&cxt->ppsz);
+	if (cxt->cpsz)
+		psz_free_zone(&cxt->cpsz);
 }
 
 static int psz_alloc_zones(struct psz_context *cxt)
@@ -1053,6 +1098,14 @@ static int psz_alloc_zones(struct psz_context *cxt)
 		goto free_out;
 	}
 
+	off_size += info->console_size;
+	cxt->cpsz = psz_init_zone(PSTORE_TYPE_CONSOLE, &off,
+			info->console_size);
+	if (IS_ERR(cxt->cpsz)) {
+		err = PTR_ERR(cxt->cpsz);
+		goto free_out;
+	}
+
 	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->kmsg_size, &cxt->oops_max_cnt);
@@ -1086,7 +1139,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->kmsg_size && !info->pmsg_size) {
+	if (!info->kmsg_size && !info->pmsg_size && !info->console_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -1114,6 +1167,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	check_size(total_size, 4096);
 	check_size(kmsg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
+	check_size(console_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1141,6 +1195,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
+	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
 
 	err = psz_alloc_zones(cxt);
 	if (err) {
@@ -1173,6 +1228,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
 		pr_cont(" pmsg");
 	}
+	if (info->console_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
+		pr_cont(" console");
+	}
 	pr_cont("\n");
 
 	err = pstore_register(&cxt->pstore);
@@ -1204,6 +1263,8 @@ void unregister_pstore_zone(struct pstore_zone_info *info)
 {
 	struct psz_context *cxt = &psz_cxt;
 
+	flush_work(&psz_cleaner);
+
 	pstore_unregister(&cxt->pstore);
 	kfree(cxt->pstore.buf);
 	cxt->pstore.bufsize = 0;
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index 39c2cb944123..da294e6d7661 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -18,11 +18,12 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  *		it must be multiple of SECTOR_SIZE(512 Bytes).
  * @max_reason: Maximum kmsg dump reason to store.
  * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
+ * @console_size:The size of console zone which is the same as @kmsg_size.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
  *		means error.
- * @write:	The same as @read.
+ * @write:	The same as @read, but -EBUSY means try to write again later.
  * @panic_write:The write operation only used for panic case. It's optional
  *		if you do not care panic log. The parameters and return value
  *		are the same as @read.
@@ -35,6 +36,7 @@ struct pstore_zone_info {
 	unsigned long kmsg_size;
 	int max_reason;
 	unsigned long pmsg_size;
+	unsigned long console_size;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 07/12] pstore/blk: Add ftrace frontend support
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:39   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Support backend for ftrace. To enable ftrace backend, just make
ftrace_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-7-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           |  12 +++
 fs/pstore/blk.c             |   9 ++
 fs/pstore/zone.c            | 169 ++++++++++++++++++++++++++++++++++++
 include/linux/pstore_zone.h |   2 +
 4 files changed, 192 insertions(+)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index f1484f751c5e..16a0440d8d5a 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -248,3 +248,15 @@ config PSTORE_BLK_CONSOLE_SIZE
 
 	  NOTE that, both Kconfig and module parameters can configure
 	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_FTRACE_SIZE
+	int "Size in Kbytes of ftarce to store"
+	depends on PSTORE_BLK
+	depends on PSTORE_FTRACE
+	default 64
+	help
+	  This just sets size of ftrace (ftrace_size) for pstore/blk. The
+	  size is in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 813025ea7edd..5db811b7018d 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -40,6 +40,14 @@ static long console_size = -1;
 module_param(console_size, long, 0400);
 MODULE_PARM_DESC(console_size, "console size in kbytes");
 
+#if IS_ENABLED(CONFIG_PSTORE_FTRACE)
+static long ftrace_size = CONFIG_PSTORE_BLK_FTRACE_SIZE;
+#else
+static long ftrace_size = -1;
+#endif
+module_param(ftrace_size, long, 0400);
+MODULE_PARM_DESC(ftrace_size, "ftrace size in kbytes");
+
 /*
  * blkdev - The block device to use.
  *
@@ -143,6 +151,7 @@ static int psblk_register_do(struct psblk_device *dev)
 	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
 	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
 	verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
+	verify_size(ftrace_size, 4096, dev->flags & PSTORE_FLAGS_FTRACE);
 #undef verify_size
 
 	pstore_zone_info->total_size = dev->total_size;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 0b952eea39fe..36d78c63bd20 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -88,11 +88,14 @@ struct pstore_zone {
  * @opszs: oops/panic storage zones
  * @ppsz: pmsg storage zone
  * @cpsz: console storage zone
+ * @fpszs: ftrace storage zones
  * @oops_max_cnt: max count of @opszs
  * @oops_read_cnt: counter to read oops zone
  * @oops_write_cnt: counter to write
  * @pmsg_read_cnt: counter to read pmsg zone
  * @console_read_cnt: counter to read console zone
+ * @ftrace_max_cnt: max count of @fpszs
+ * @ftrace_read_cnt: counter to read ftrace zone
  * @oops_counter: counter to oops
  * @panic_counter: counter to panic
  * @recovered: whether finish recovering data from storage
@@ -105,11 +108,14 @@ struct psz_context {
 	struct pstore_zone **opszs;
 	struct pstore_zone *ppsz;
 	struct pstore_zone *cpsz;
+	struct pstore_zone **fpszs;
 	unsigned int oops_max_cnt;
 	unsigned int oops_read_cnt;
 	unsigned int oops_write_cnt;
 	unsigned int pmsg_read_cnt;
 	unsigned int console_read_cnt;
+	unsigned int ftrace_max_cnt;
+	unsigned int ftrace_read_cnt;
 	/*
 	 * the counter should be recovered when recover.
 	 * It records the oops/panic times after burning rather than booting.
@@ -308,6 +314,7 @@ static void psz_flush_all_dirty_zones(struct work_struct *work)
 	psz_flush_dirty_zone(cxt->ppsz);
 	psz_flush_dirty_zone(cxt->cpsz);
 	psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+	psz_flush_dirty_zones(cxt->fpszs, cxt->ftrace_max_cnt);
 }
 
 static int psz_recover_oops_data(struct psz_context *cxt)
@@ -542,6 +549,31 @@ static int psz_recover_zone(struct psz_context *cxt, struct pstore_zone *zone)
 	return ret;
 }
 
+static int psz_recover_zones(struct psz_context *cxt,
+		struct pstore_zone **zones, unsigned int cnt)
+{
+	int ret;
+	unsigned int i;
+	struct pstore_zone *zone;
+
+	if (!zones)
+		return 0;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (unlikely(!zone))
+			continue;
+		ret = psz_recover_zone(cxt, zone);
+		if (ret)
+			goto recover_fail;
+	}
+
+	return 0;
+recover_fail:
+	pr_debug("recover %s[%u] failed\n", zone->name, i);
+	return ret;
+}
+
 /**
  * psz_recovery() - recover data from storage
  * @cxt: the context of pstore/zone
@@ -569,6 +601,10 @@ static inline int psz_recovery(struct psz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = psz_recover_zones(cxt, cxt->fpszs, cxt->ftrace_max_cnt);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -585,6 +621,7 @@ static int psz_pstore_open(struct pstore_info *psi)
 	cxt->oops_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
 	cxt->console_read_cnt = 0;
+	cxt->ftrace_read_cnt = 0;
 	return 0;
 }
 
@@ -651,6 +688,10 @@ static int psz_pstore_erase(struct pstore_record *record)
 		return psz_record_erase(cxt, cxt->ppsz);
 	case PSTORE_TYPE_CONSOLE:
 		return psz_record_erase(cxt, cxt->cpsz);
+	case PSTORE_TYPE_FTRACE:
+		if (record->id >= cxt->ftrace_max_cnt)
+			return -EINVAL;
+		return psz_record_erase(cxt, cxt->fpszs[record->id]);
 	default: return -EINVAL;
 	}
 }
@@ -792,6 +833,13 @@ static int notrace psz_pstore_write(struct pstore_record *record)
 		return psz_record_write(cxt->cpsz, record);
 	case PSTORE_TYPE_PMSG:
 		return psz_record_write(cxt->ppsz, record);
+	case PSTORE_TYPE_FTRACE: {
+		int zonenum = smp_processor_id();
+
+		if (!cxt->fpszs)
+			return -ENOSPC;
+		return psz_record_write(cxt->fpszs[zonenum], record);
+	}
 	default:
 		return -EINVAL;
 	}
@@ -807,6 +855,14 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt)
+		/*
+		 * No need psz_old_ok(). Let psz_ftrace_read() do so for
+		 * combination. psz_ftrace_read() should traverse over
+		 * all zones in case of some zone without data.
+		 */
+		return cxt->fpszs[cxt->ftrace_read_cnt++];
+
 	if (cxt->pmsg_read_cnt == 0) {
 		cxt->pmsg_read_cnt++;
 		zone = cxt->ppsz;
@@ -881,6 +937,98 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
 	return size + hlen;
 }
 
+static int psz_ftrace_combine(char *src1_buf, size_t src1_size,
+		char *src2_buf, size_t src2_size,
+		char **dest_buf, size_t *dest_size)
+{
+	size_t src1_off, src2_off, total;
+	size_t src1_idx = 0, src2_idx = 0, merged_idx = 0;
+	void *merged_buf;
+	struct pstore_ftrace_record *mrec, *s1rec, *s2rec;
+	size_t record_size = sizeof(struct pstore_ftrace_record);
+
+	src1_off = src1_size % record_size;
+	src1_size -= src1_off;
+
+	src2_off = src2_size % record_size;
+	src2_size -= src2_off;
+
+	total = src1_size + src2_size;
+	merged_buf = kmalloc(total, GFP_KERNEL);
+	if (!merged_buf)
+		return -ENOMEM;
+
+	s1rec = (struct pstore_ftrace_record *)(src1_buf + src1_off);
+	s2rec = (struct pstore_ftrace_record *)(src2_buf + src2_off);
+	mrec = (struct pstore_ftrace_record *)(merged_buf);
+
+	while (src1_size > 0 && src2_size > 0) {
+		u64 s1_ts, s2_ts;
+
+		s1_ts = pstore_ftrace_read_timestamp(&s1rec[src1_idx]);
+		s2_ts = pstore_ftrace_read_timestamp(&s2rec[src2_idx]);
+		if (s1_ts < s2_ts) {
+			mrec[merged_idx++] = s1rec[src1_idx++];
+			src1_size -= record_size;
+		} else {
+			mrec[merged_idx++] = s2rec[src2_idx++];
+			src2_size -= record_size;
+		}
+	}
+
+	while (src1_size > 0) {
+		mrec[merged_idx++] = s1rec[src1_idx++];
+		src1_size -= record_size;
+	}
+
+	while (src2_size > 0) {
+		mrec[merged_idx++] = s2rec[src2_idx++];
+		src2_size -= record_size;
+	}
+
+	*dest_buf = merged_buf;
+	*dest_size = total;
+	return 0;
+}
+
+/* try to combine all ftrace zones */
+static ssize_t psz_ftrace_read(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+	struct psz_buffer *buf;
+	char *dest;
+	size_t dest_size;
+	int ret;
+
+	if (!zone || !record)
+		return -ENOSPC;
+
+	if (!psz_old_ok(zone))
+		goto out;
+
+	buf = (struct psz_buffer *)zone->oldbuf;
+	if (!buf)
+		return -ENOMSG;
+
+	ret = psz_ftrace_combine(record->buf, record->size,
+			(char *)buf->data, atomic_read(&buf->datalen),
+			&dest, &dest_size);
+	if (unlikely(ret))
+		return ret;
+
+	kfree(record->buf);
+	record->buf = dest;
+	record->size = dest_size;
+
+out:
+	if (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt)
+		/* then, read next ftrace zone */
+		return -ENOMSG;
+	record->id = 0;
+	return record->size ? record->size : -ENOMSG;
+}
+
 static ssize_t psz_record_read(struct pstore_zone *zone,
 		struct pstore_record *record)
 {
@@ -931,6 +1079,9 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
 		readop = psz_oops_read;
 		record->id = cxt->oops_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_FTRACE:
+		readop = psz_ftrace_read;
+		break;
 	case PSTORE_TYPE_CONSOLE:
 		fallthrough;
 	case PSTORE_TYPE_PMSG:
@@ -1082,6 +1233,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
 		psz_free_zone(&cxt->ppsz);
 	if (cxt->cpsz)
 		psz_free_zone(&cxt->cpsz);
+	if (cxt->fpszs)
+		psz_free_zones(&cxt->fpszs, &cxt->ftrace_max_cnt);
 }
 
 static int psz_alloc_zones(struct psz_context *cxt)
@@ -1106,6 +1259,16 @@ static int psz_alloc_zones(struct psz_context *cxt)
 		goto free_out;
 	}
 
+	off_size += info->ftrace_size;
+	cxt->fpszs = psz_init_zones(PSTORE_TYPE_FTRACE, &off,
+			info->ftrace_size,
+			info->ftrace_size / nr_cpu_ids,
+			&cxt->ftrace_max_cnt);
+	if (IS_ERR(cxt->fpszs)) {
+		err = PTR_ERR(cxt->fpszs);
+		goto free_out;
+	}
+
 	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->kmsg_size, &cxt->oops_max_cnt);
@@ -1168,6 +1331,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	check_size(kmsg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
 	check_size(console_size, SECTOR_SIZE);
+	check_size(ftrace_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1196,6 +1360,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
+	pr_debug("\tftrace size : %ld Bytes\n", info->ftrace_size);
 
 	err = psz_alloc_zones(cxt);
 	if (err) {
@@ -1232,6 +1397,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
 		pr_cont(" console");
 	}
+	if (info->ftrace_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_FTRACE;
+		pr_cont(" ftrace");
+	}
 	pr_cont("\n");
 
 	err = pstore_register(&cxt->pstore);
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index da294e6d7661..94f441b8b616 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -19,6 +19,7 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @max_reason: Maximum kmsg dump reason to store.
  * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
  * @console_size:The size of console zone which is the same as @kmsg_size.
+ * @ftrace_size:The size of ftrace zone which is the same as @kmsg_size.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
@@ -37,6 +38,7 @@ struct pstore_zone_info {
 	int max_reason;
 	unsigned long pmsg_size;
 	unsigned long console_size;
+	unsigned long ftrace_size;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 07/12] pstore/blk: Add ftrace frontend support
@ 2020-05-08  6:39   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:39 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Support backend for ftrace. To enable ftrace backend, just make
ftrace_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-7-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/Kconfig           |  12 +++
 fs/pstore/blk.c             |   9 ++
 fs/pstore/zone.c            | 169 ++++++++++++++++++++++++++++++++++++
 include/linux/pstore_zone.h |   2 +
 4 files changed, 192 insertions(+)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index f1484f751c5e..16a0440d8d5a 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -248,3 +248,15 @@ config PSTORE_BLK_CONSOLE_SIZE
 
 	  NOTE that, both Kconfig and module parameters can configure
 	  pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_FTRACE_SIZE
+	int "Size in Kbytes of ftarce to store"
+	depends on PSTORE_BLK
+	depends on PSTORE_FTRACE
+	default 64
+	help
+	  This just sets size of ftrace (ftrace_size) for pstore/blk. The
+	  size is in KB and must be a multiple of 4.
+
+	  NOTE that, both Kconfig and module parameters can configure
+	  pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 813025ea7edd..5db811b7018d 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -40,6 +40,14 @@ static long console_size = -1;
 module_param(console_size, long, 0400);
 MODULE_PARM_DESC(console_size, "console size in kbytes");
 
+#if IS_ENABLED(CONFIG_PSTORE_FTRACE)
+static long ftrace_size = CONFIG_PSTORE_BLK_FTRACE_SIZE;
+#else
+static long ftrace_size = -1;
+#endif
+module_param(ftrace_size, long, 0400);
+MODULE_PARM_DESC(ftrace_size, "ftrace size in kbytes");
+
 /*
  * blkdev - The block device to use.
  *
@@ -143,6 +151,7 @@ static int psblk_register_do(struct psblk_device *dev)
 	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
 	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
 	verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
+	verify_size(ftrace_size, 4096, dev->flags & PSTORE_FLAGS_FTRACE);
 #undef verify_size
 
 	pstore_zone_info->total_size = dev->total_size;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 0b952eea39fe..36d78c63bd20 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -88,11 +88,14 @@ struct pstore_zone {
  * @opszs: oops/panic storage zones
  * @ppsz: pmsg storage zone
  * @cpsz: console storage zone
+ * @fpszs: ftrace storage zones
  * @oops_max_cnt: max count of @opszs
  * @oops_read_cnt: counter to read oops zone
  * @oops_write_cnt: counter to write
  * @pmsg_read_cnt: counter to read pmsg zone
  * @console_read_cnt: counter to read console zone
+ * @ftrace_max_cnt: max count of @fpszs
+ * @ftrace_read_cnt: counter to read ftrace zone
  * @oops_counter: counter to oops
  * @panic_counter: counter to panic
  * @recovered: whether finish recovering data from storage
@@ -105,11 +108,14 @@ struct psz_context {
 	struct pstore_zone **opszs;
 	struct pstore_zone *ppsz;
 	struct pstore_zone *cpsz;
+	struct pstore_zone **fpszs;
 	unsigned int oops_max_cnt;
 	unsigned int oops_read_cnt;
 	unsigned int oops_write_cnt;
 	unsigned int pmsg_read_cnt;
 	unsigned int console_read_cnt;
+	unsigned int ftrace_max_cnt;
+	unsigned int ftrace_read_cnt;
 	/*
 	 * the counter should be recovered when recover.
 	 * It records the oops/panic times after burning rather than booting.
@@ -308,6 +314,7 @@ static void psz_flush_all_dirty_zones(struct work_struct *work)
 	psz_flush_dirty_zone(cxt->ppsz);
 	psz_flush_dirty_zone(cxt->cpsz);
 	psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+	psz_flush_dirty_zones(cxt->fpszs, cxt->ftrace_max_cnt);
 }
 
 static int psz_recover_oops_data(struct psz_context *cxt)
@@ -542,6 +549,31 @@ static int psz_recover_zone(struct psz_context *cxt, struct pstore_zone *zone)
 	return ret;
 }
 
+static int psz_recover_zones(struct psz_context *cxt,
+		struct pstore_zone **zones, unsigned int cnt)
+{
+	int ret;
+	unsigned int i;
+	struct pstore_zone *zone;
+
+	if (!zones)
+		return 0;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (unlikely(!zone))
+			continue;
+		ret = psz_recover_zone(cxt, zone);
+		if (ret)
+			goto recover_fail;
+	}
+
+	return 0;
+recover_fail:
+	pr_debug("recover %s[%u] failed\n", zone->name, i);
+	return ret;
+}
+
 /**
  * psz_recovery() - recover data from storage
  * @cxt: the context of pstore/zone
@@ -569,6 +601,10 @@ static inline int psz_recovery(struct psz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = psz_recover_zones(cxt, cxt->fpszs, cxt->ftrace_max_cnt);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -585,6 +621,7 @@ static int psz_pstore_open(struct pstore_info *psi)
 	cxt->oops_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
 	cxt->console_read_cnt = 0;
+	cxt->ftrace_read_cnt = 0;
 	return 0;
 }
 
@@ -651,6 +688,10 @@ static int psz_pstore_erase(struct pstore_record *record)
 		return psz_record_erase(cxt, cxt->ppsz);
 	case PSTORE_TYPE_CONSOLE:
 		return psz_record_erase(cxt, cxt->cpsz);
+	case PSTORE_TYPE_FTRACE:
+		if (record->id >= cxt->ftrace_max_cnt)
+			return -EINVAL;
+		return psz_record_erase(cxt, cxt->fpszs[record->id]);
 	default: return -EINVAL;
 	}
 }
@@ -792,6 +833,13 @@ static int notrace psz_pstore_write(struct pstore_record *record)
 		return psz_record_write(cxt->cpsz, record);
 	case PSTORE_TYPE_PMSG:
 		return psz_record_write(cxt->ppsz, record);
+	case PSTORE_TYPE_FTRACE: {
+		int zonenum = smp_processor_id();
+
+		if (!cxt->fpszs)
+			return -ENOSPC;
+		return psz_record_write(cxt->fpszs[zonenum], record);
+	}
 	default:
 		return -EINVAL;
 	}
@@ -807,6 +855,14 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt)
+		/*
+		 * No need psz_old_ok(). Let psz_ftrace_read() do so for
+		 * combination. psz_ftrace_read() should traverse over
+		 * all zones in case of some zone without data.
+		 */
+		return cxt->fpszs[cxt->ftrace_read_cnt++];
+
 	if (cxt->pmsg_read_cnt == 0) {
 		cxt->pmsg_read_cnt++;
 		zone = cxt->ppsz;
@@ -881,6 +937,98 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
 	return size + hlen;
 }
 
+static int psz_ftrace_combine(char *src1_buf, size_t src1_size,
+		char *src2_buf, size_t src2_size,
+		char **dest_buf, size_t *dest_size)
+{
+	size_t src1_off, src2_off, total;
+	size_t src1_idx = 0, src2_idx = 0, merged_idx = 0;
+	void *merged_buf;
+	struct pstore_ftrace_record *mrec, *s1rec, *s2rec;
+	size_t record_size = sizeof(struct pstore_ftrace_record);
+
+	src1_off = src1_size % record_size;
+	src1_size -= src1_off;
+
+	src2_off = src2_size % record_size;
+	src2_size -= src2_off;
+
+	total = src1_size + src2_size;
+	merged_buf = kmalloc(total, GFP_KERNEL);
+	if (!merged_buf)
+		return -ENOMEM;
+
+	s1rec = (struct pstore_ftrace_record *)(src1_buf + src1_off);
+	s2rec = (struct pstore_ftrace_record *)(src2_buf + src2_off);
+	mrec = (struct pstore_ftrace_record *)(merged_buf);
+
+	while (src1_size > 0 && src2_size > 0) {
+		u64 s1_ts, s2_ts;
+
+		s1_ts = pstore_ftrace_read_timestamp(&s1rec[src1_idx]);
+		s2_ts = pstore_ftrace_read_timestamp(&s2rec[src2_idx]);
+		if (s1_ts < s2_ts) {
+			mrec[merged_idx++] = s1rec[src1_idx++];
+			src1_size -= record_size;
+		} else {
+			mrec[merged_idx++] = s2rec[src2_idx++];
+			src2_size -= record_size;
+		}
+	}
+
+	while (src1_size > 0) {
+		mrec[merged_idx++] = s1rec[src1_idx++];
+		src1_size -= record_size;
+	}
+
+	while (src2_size > 0) {
+		mrec[merged_idx++] = s2rec[src2_idx++];
+		src2_size -= record_size;
+	}
+
+	*dest_buf = merged_buf;
+	*dest_size = total;
+	return 0;
+}
+
+/* try to combine all ftrace zones */
+static ssize_t psz_ftrace_read(struct pstore_zone *zone,
+		struct pstore_record *record)
+{
+	struct psz_context *cxt = record->psi->data;
+	struct psz_buffer *buf;
+	char *dest;
+	size_t dest_size;
+	int ret;
+
+	if (!zone || !record)
+		return -ENOSPC;
+
+	if (!psz_old_ok(zone))
+		goto out;
+
+	buf = (struct psz_buffer *)zone->oldbuf;
+	if (!buf)
+		return -ENOMSG;
+
+	ret = psz_ftrace_combine(record->buf, record->size,
+			(char *)buf->data, atomic_read(&buf->datalen),
+			&dest, &dest_size);
+	if (unlikely(ret))
+		return ret;
+
+	kfree(record->buf);
+	record->buf = dest;
+	record->size = dest_size;
+
+out:
+	if (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt)
+		/* then, read next ftrace zone */
+		return -ENOMSG;
+	record->id = 0;
+	return record->size ? record->size : -ENOMSG;
+}
+
 static ssize_t psz_record_read(struct pstore_zone *zone,
 		struct pstore_record *record)
 {
@@ -931,6 +1079,9 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
 		readop = psz_oops_read;
 		record->id = cxt->oops_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_FTRACE:
+		readop = psz_ftrace_read;
+		break;
 	case PSTORE_TYPE_CONSOLE:
 		fallthrough;
 	case PSTORE_TYPE_PMSG:
@@ -1082,6 +1233,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
 		psz_free_zone(&cxt->ppsz);
 	if (cxt->cpsz)
 		psz_free_zone(&cxt->cpsz);
+	if (cxt->fpszs)
+		psz_free_zones(&cxt->fpszs, &cxt->ftrace_max_cnt);
 }
 
 static int psz_alloc_zones(struct psz_context *cxt)
@@ -1106,6 +1259,16 @@ static int psz_alloc_zones(struct psz_context *cxt)
 		goto free_out;
 	}
 
+	off_size += info->ftrace_size;
+	cxt->fpszs = psz_init_zones(PSTORE_TYPE_FTRACE, &off,
+			info->ftrace_size,
+			info->ftrace_size / nr_cpu_ids,
+			&cxt->ftrace_max_cnt);
+	if (IS_ERR(cxt->fpszs)) {
+		err = PTR_ERR(cxt->fpszs);
+		goto free_out;
+	}
+
 	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->kmsg_size, &cxt->oops_max_cnt);
@@ -1168,6 +1331,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	check_size(kmsg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
 	check_size(console_size, SECTOR_SIZE);
+	check_size(ftrace_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1196,6 +1360,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
 	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
+	pr_debug("\tftrace size : %ld Bytes\n", info->ftrace_size);
 
 	err = psz_alloc_zones(cxt);
 	if (err) {
@@ -1232,6 +1397,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
 		pr_cont(" console");
 	}
+	if (info->ftrace_size) {
+		cxt->pstore.flags |= PSTORE_FLAGS_FTRACE;
+		pr_cont(" ftrace");
+	}
 	pr_cont("\n");
 
 	err = pstore_register(&cxt->pstore);
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index da294e6d7661..94f441b8b616 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -19,6 +19,7 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @max_reason: Maximum kmsg dump reason to store.
  * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
  * @console_size:The size of console zone which is the same as @kmsg_size.
+ * @ftrace_size:The size of ftrace zone which is the same as @kmsg_size.
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
@@ -37,6 +38,7 @@ struct pstore_zone_info {
 	int max_reason;
 	unsigned long pmsg_size;
 	unsigned long console_size;
+	unsigned long ftrace_size;
 	psz_read_op read;
 	psz_write_op write;
 	psz_write_op panic_write;
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 08/12] Documentation: Add details for pstore/blk
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:40   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Add details on using pstore/blk, the new backend of pstore to record
dumps to block devices, in Documentation/admin-guide/pstore-blk.rst

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-8-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Documentation/admin-guide/pstore-blk.rst | 229 +++++++++++++++++++++++
 MAINTAINERS                              |   1 +
 fs/pstore/Kconfig                        |   2 +
 3 files changed, 232 insertions(+)
 create mode 100644 Documentation/admin-guide/pstore-blk.rst

diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
new file mode 100644
index 000000000000..484a1502fb49
--- /dev/null
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -0,0 +1,229 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+pstore block oops/panic logger
+==============================
+
+Introduction
+------------
+
+pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
+block device before the system crashes. You can get these log files by
+mounting pstore filesystem like::
+
+    mount -t pstore pstore /sys/fs/pstore
+
+
+pstore block concepts
+---------------------
+
+pstore/blk provides efficient configuration method for pstore/blk, which
+divides all configurations into two parts, configurations for user and
+configurations for driver.
+
+Configurations for user determine how pstore/blk works, such as pmsg_size,
+kmsg_size and so on. All of them support both Kconfig and module parameters,
+but module parameters have priority over Kconfig.
+
+Configurations for driver are all about block device, such as total_size
+of block device and read/write operations.
+
+Configurations for user
+-----------------------
+
+All of these configurations support both Kconfig and module parameters, but
+module parameters have priority over Kconfig.
+
+Here is an example for module parameters::
+
+        pstore_blk.blkdev=179:7 pstore_blk.kmsg_size=64
+
+The detail of each configurations may be of interest to you.
+
+blkdev
+~~~~~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's required for pstore/blk.
+
+It accepts the following variants:
+
+1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
+   leading 0x, for example b302.
+#. /dev/<disk_name> represents the device number of disk
+#. /dev/<disk_name><decimal> represents the device number of partition - device
+   number of disk plus the partition number
+#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
+   name of partitioned disk ends with a digit.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
+   a partition if the partition table provides it. The UUID may be either an
+   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
+   where SSSSSSSS is a zero-filled hex representation of the 32-bit
+   "NT disk signature", and PP is a zero-filled hex representation of the
+   1-based partition number.
+#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
+   partition with a known unique id.
+#. <major>:<minor> major and minor number of the device separated by a colon.
+
+kmsg_size
+~~~~~~~~~
+
+The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care oops/panic log.
+
+There are multiple chunks for oops/panic front-end depending on the remaining
+space except other pstore front-ends.
+
+pstore/blk will log to oops/panic chunks one by one, and always overwrite the
+oldest chunk if there is no more free chunk.
+
+pmsg_size
+~~~~~~~~~
+
+The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care pmsg log.
+
+Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+*/sys/fs/pstore/pmsg-pstore-blk-0*.
+
+console_size
+~~~~~~~~~~~~
+
+The chunk size in KB for console front-end.  It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to pmsg front-end, there is only one chunk for console front-end.
+
+All log of console will be appended to the chunk. On reboot the contents are
+available in */sys/fs/pstore/console-pstore-blk-0*.
+
+ftrace_size
+~~~~~~~~~~~
+
+The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to oops front-end, there are multiple chunks for ftrace front-end
+depending on the count of cpu processors. Each chunk size is equal to
+ftrace_size / processors_count.
+
+All log of ftrace will be appended to the chunk. On reboot the contents are
+combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*.
+
+Persistent function tracing might be useful for debugging software or hardware
+related hangs. Here is an example of usage::
+
+ # mount -t pstore pstore /sys/fs/pstore
+ # mount -t debugfs debugfs /sys/kernel/debug/
+ # echo 1 > /sys/kernel/debug/pstore/record_ftrace
+ # reboot -f
+ [...]
+ # mount -t pstore pstore /sys/fs/pstore
+ # tail /sys/fs/pstore/ftrace-pstore-blk-0
+ CPU:0 ts:5914676 c0063828  c0063b94  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
+ CPU:0 ts:5914678 c039ecdc  c006385c  cpuidle_enter_state <- call_cpuidle+0x44/0x48
+ CPU:0 ts:5914680 c039e9a0  c039ecf0  cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314
+ CPU:0 ts:5914681 c0063870  c039ea30  sched_idle_set_state <- cpuidle_enter_state+0x44/0x314
+ CPU:1 ts:5916720 c0160f59  c015ee04  kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204
+ CPU:1 ts:5916721 c05ca625  c015ee0c  __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204
+ CPU:1 ts:5916723 c05c813d  c05ca630  yield_to <- __mutex_lock_slowpath+0x314/0x358
+ CPU:1 ts:5916724 c05ca2d1  c05ca638  __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358
+
+max_reason
+~~~~~~~~~~
+
+Limiting which kinds of kmsg dumps are stored can be controlled via
+the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
+``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
+``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
+``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
+(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
+``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
+otherwise KMSG_DUMP_MAX.
+
+Configurations for driver
+-------------------------
+
+Only a block device driver cares about these configurations. A block device
+driver uses ``psblk_register_blkdev`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/blk.c
+   :identifiers: psblk_register_blkdev
+
+Compression and header
+----------------------
+
+Block device is large enough for uncompressed oops data. Actually we do not
+recommend data compression because pstore/blk will insert some information into
+the first line of oops/panic data. For example::
+
+        Panic: Total 16 times
+
+It means that it's OOPS|Panic for the 16th time since the first booting.
+Sometimes the number of occurrences of oops|panic since the first booting is
+important to judge whether the system is stable.
+
+The following line is inserted by pstore filesystem. For example::
+
+        Oops#2 Part1
+
+It means that it's OOPS for the 2nd time on the last boot.
+
+Reading the data
+----------------
+
+The dump data can be read from the pstore filesystem. The format for these
+files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end,
+``pmsg-pstore-blk-0`` for pmsg front-end and so on.  The timestamp of the
+dump file records the trigger time. To delete a stored record from block
+device, simply unlink the respective pstore file.
+
+Attentions in panic read/write APIs
+-----------------------------------
+
+If on panic, the kernel is not going to run for much longer, the tasks will not
+be scheduled and most kernel resources will be out of service. It
+looks like a single-threaded program running on a single-core computer.
+
+The following points require special attention for panic read/write APIs:
+
+1. Can **NOT** allocate any memory.
+   If you need memory, just allocate while the block driver is initializing
+   rather than waiting until the panic.
+#. Must be polled, **NOT** interrupt driven.
+   No task schedule any more. The block driver should delay to ensure the write
+   succeeds, but NOT sleep.
+#. Can **NOT** take any lock.
+   There is no other task, nor any shared resource; you are safe to break all
+   locks.
+#. Just use CPU to transfer.
+   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
+#. Control registers directly.
+   Please control registers directly rather than use Linux kernel resources.
+   Do I/O map while initializing rather than wait until a panic occurs.
+#. Reset your block device and controller if necessary.
+   If you are not sure of the state of your block device and controller when
+   a panic occurs, you are safe to stop and reset them.
+
+pstore/blk supports psblk_blkdev_info(), which is defined in
+*linux/pstore_blk.h*, to get information of using block device, such as the
+device number, sector count and start sector of the whole disk.
+
+pstore block internals
+----------------------
+
+For developer reference, here are all the important structures and APIs:
+
+.. kernel-doc:: fs/pstore/zone.c
+   :internal:
+
+.. kernel-doc:: include/linux/pstore_zone.h
+   :internal:
+
+.. kernel-doc:: fs/pstore/blk.c
+   :export:
+
+.. kernel-doc:: include/linux/pstore_blk.h
+   :internal:
diff --git a/MAINTAINERS b/MAINTAINERS
index e64e5db31497..9c1f4feff418 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13660,6 +13660,7 @@ M:	Tony Luck <tony.luck@intel.com>
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
 F:	Documentation/admin-guide/ramoops.rst
+F:	Documentation/admin-guide/pstore-blk.rst
 F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
 F:	drivers/acpi/apei/erst.c
 F:	drivers/firmware/efi/efi-pstore.c
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 16a0440d8d5a..8371d29651a6 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -171,6 +171,8 @@ config PSTORE_BLK
 	  This enables panic and oops message to be logged to a block dev
 	  where it can be read back at some later point.
 
+	  For more information, see Documentation/admin-guide/pstore-blk.rst
+
 	  If unsure, say N.
 
 config PSTORE_BLK_BLKDEV
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 08/12] Documentation: Add details for pstore/blk
@ 2020-05-08  6:40   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Add details on using pstore/blk, the new backend of pstore to record
dumps to block devices, in Documentation/admin-guide/pstore-blk.rst

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-8-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Documentation/admin-guide/pstore-blk.rst | 229 +++++++++++++++++++++++
 MAINTAINERS                              |   1 +
 fs/pstore/Kconfig                        |   2 +
 3 files changed, 232 insertions(+)
 create mode 100644 Documentation/admin-guide/pstore-blk.rst

diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
new file mode 100644
index 000000000000..484a1502fb49
--- /dev/null
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -0,0 +1,229 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+pstore block oops/panic logger
+==============================
+
+Introduction
+------------
+
+pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
+block device before the system crashes. You can get these log files by
+mounting pstore filesystem like::
+
+    mount -t pstore pstore /sys/fs/pstore
+
+
+pstore block concepts
+---------------------
+
+pstore/blk provides efficient configuration method for pstore/blk, which
+divides all configurations into two parts, configurations for user and
+configurations for driver.
+
+Configurations for user determine how pstore/blk works, such as pmsg_size,
+kmsg_size and so on. All of them support both Kconfig and module parameters,
+but module parameters have priority over Kconfig.
+
+Configurations for driver are all about block device, such as total_size
+of block device and read/write operations.
+
+Configurations for user
+-----------------------
+
+All of these configurations support both Kconfig and module parameters, but
+module parameters have priority over Kconfig.
+
+Here is an example for module parameters::
+
+        pstore_blk.blkdev=179:7 pstore_blk.kmsg_size=64
+
+The detail of each configurations may be of interest to you.
+
+blkdev
+~~~~~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's required for pstore/blk.
+
+It accepts the following variants:
+
+1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
+   leading 0x, for example b302.
+#. /dev/<disk_name> represents the device number of disk
+#. /dev/<disk_name><decimal> represents the device number of partition - device
+   number of disk plus the partition number
+#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
+   name of partitioned disk ends with a digit.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
+   a partition if the partition table provides it. The UUID may be either an
+   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
+   where SSSSSSSS is a zero-filled hex representation of the 32-bit
+   "NT disk signature", and PP is a zero-filled hex representation of the
+   1-based partition number.
+#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
+   partition with a known unique id.
+#. <major>:<minor> major and minor number of the device separated by a colon.
+
+kmsg_size
+~~~~~~~~~
+
+The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care oops/panic log.
+
+There are multiple chunks for oops/panic front-end depending on the remaining
+space except other pstore front-ends.
+
+pstore/blk will log to oops/panic chunks one by one, and always overwrite the
+oldest chunk if there is no more free chunk.
+
+pmsg_size
+~~~~~~~~~
+
+The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care pmsg log.
+
+Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+*/sys/fs/pstore/pmsg-pstore-blk-0*.
+
+console_size
+~~~~~~~~~~~~
+
+The chunk size in KB for console front-end.  It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to pmsg front-end, there is only one chunk for console front-end.
+
+All log of console will be appended to the chunk. On reboot the contents are
+available in */sys/fs/pstore/console-pstore-blk-0*.
+
+ftrace_size
+~~~~~~~~~~~
+
+The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to oops front-end, there are multiple chunks for ftrace front-end
+depending on the count of cpu processors. Each chunk size is equal to
+ftrace_size / processors_count.
+
+All log of ftrace will be appended to the chunk. On reboot the contents are
+combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*.
+
+Persistent function tracing might be useful for debugging software or hardware
+related hangs. Here is an example of usage::
+
+ # mount -t pstore pstore /sys/fs/pstore
+ # mount -t debugfs debugfs /sys/kernel/debug/
+ # echo 1 > /sys/kernel/debug/pstore/record_ftrace
+ # reboot -f
+ [...]
+ # mount -t pstore pstore /sys/fs/pstore
+ # tail /sys/fs/pstore/ftrace-pstore-blk-0
+ CPU:0 ts:5914676 c0063828  c0063b94  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
+ CPU:0 ts:5914678 c039ecdc  c006385c  cpuidle_enter_state <- call_cpuidle+0x44/0x48
+ CPU:0 ts:5914680 c039e9a0  c039ecf0  cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314
+ CPU:0 ts:5914681 c0063870  c039ea30  sched_idle_set_state <- cpuidle_enter_state+0x44/0x314
+ CPU:1 ts:5916720 c0160f59  c015ee04  kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204
+ CPU:1 ts:5916721 c05ca625  c015ee0c  __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204
+ CPU:1 ts:5916723 c05c813d  c05ca630  yield_to <- __mutex_lock_slowpath+0x314/0x358
+ CPU:1 ts:5916724 c05ca2d1  c05ca638  __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358
+
+max_reason
+~~~~~~~~~~
+
+Limiting which kinds of kmsg dumps are stored can be controlled via
+the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
+``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
+``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
+``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
+(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
+``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
+otherwise KMSG_DUMP_MAX.
+
+Configurations for driver
+-------------------------
+
+Only a block device driver cares about these configurations. A block device
+driver uses ``psblk_register_blkdev`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/blk.c
+   :identifiers: psblk_register_blkdev
+
+Compression and header
+----------------------
+
+Block device is large enough for uncompressed oops data. Actually we do not
+recommend data compression because pstore/blk will insert some information into
+the first line of oops/panic data. For example::
+
+        Panic: Total 16 times
+
+It means that it's OOPS|Panic for the 16th time since the first booting.
+Sometimes the number of occurrences of oops|panic since the first booting is
+important to judge whether the system is stable.
+
+The following line is inserted by pstore filesystem. For example::
+
+        Oops#2 Part1
+
+It means that it's OOPS for the 2nd time on the last boot.
+
+Reading the data
+----------------
+
+The dump data can be read from the pstore filesystem. The format for these
+files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end,
+``pmsg-pstore-blk-0`` for pmsg front-end and so on.  The timestamp of the
+dump file records the trigger time. To delete a stored record from block
+device, simply unlink the respective pstore file.
+
+Attentions in panic read/write APIs
+-----------------------------------
+
+If on panic, the kernel is not going to run for much longer, the tasks will not
+be scheduled and most kernel resources will be out of service. It
+looks like a single-threaded program running on a single-core computer.
+
+The following points require special attention for panic read/write APIs:
+
+1. Can **NOT** allocate any memory.
+   If you need memory, just allocate while the block driver is initializing
+   rather than waiting until the panic.
+#. Must be polled, **NOT** interrupt driven.
+   No task schedule any more. The block driver should delay to ensure the write
+   succeeds, but NOT sleep.
+#. Can **NOT** take any lock.
+   There is no other task, nor any shared resource; you are safe to break all
+   locks.
+#. Just use CPU to transfer.
+   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
+#. Control registers directly.
+   Please control registers directly rather than use Linux kernel resources.
+   Do I/O map while initializing rather than wait until a panic occurs.
+#. Reset your block device and controller if necessary.
+   If you are not sure of the state of your block device and controller when
+   a panic occurs, you are safe to stop and reset them.
+
+pstore/blk supports psblk_blkdev_info(), which is defined in
+*linux/pstore_blk.h*, to get information of using block device, such as the
+device number, sector count and start sector of the whole disk.
+
+pstore block internals
+----------------------
+
+For developer reference, here are all the important structures and APIs:
+
+.. kernel-doc:: fs/pstore/zone.c
+   :internal:
+
+.. kernel-doc:: include/linux/pstore_zone.h
+   :internal:
+
+.. kernel-doc:: fs/pstore/blk.c
+   :export:
+
+.. kernel-doc:: include/linux/pstore_blk.h
+   :internal:
diff --git a/MAINTAINERS b/MAINTAINERS
index e64e5db31497..9c1f4feff418 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13660,6 +13660,7 @@ M:	Tony Luck <tony.luck@intel.com>
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
 F:	Documentation/admin-guide/ramoops.rst
+F:	Documentation/admin-guide/pstore-blk.rst
 F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
 F:	drivers/acpi/apei/erst.c
 F:	drivers/firmware/efi/efi-pstore.c
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 16a0440d8d5a..8371d29651a6 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -171,6 +171,8 @@ config PSTORE_BLK
 	  This enables panic and oops message to be logged to a block dev
 	  where it can be read back at some later point.
 
+	  For more information, see Documentation/admin-guide/pstore-blk.rst
+
 	  If unsure, say N.
 
 config PSTORE_BLK_BLKDEV
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 09/12] pstore/zone: Provide way to skip "broken" zone for MTD devices
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:40   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

One requirement to support MTD devices in pstore/zone is having a
way to declare certain regions as broken. Add this support to
pstore/zone.

The MTD driver should return -ENOMSG when encountering a bad region,
which tells pstore/zone to skip and try the next one.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-9-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/blk.c             | 10 ++++--
 fs/pstore/zone.c            | 65 ++++++++++++++++++++++++++++++-------
 include/linux/pstore_blk.h  |  3 +-
 include/linux/pstore_zone.h | 12 ++++---
 4 files changed, 71 insertions(+), 19 deletions(-)

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 5db811b7018d..e33e58afd4cb 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -101,9 +101,12 @@ static struct bdev_info {
  *		means error.
  * @write:	The same as @read, but the following error number:
  *		-EBUSY means try to write again later.
+ *		-ENOMSG means to try next zone.
  * @panic_write:The write operation only used for panic case. It's optional
- *		if you do not care panic log. The parameters and return value
- *		are the same as @read.
+ *		if you do not care panic log. The parameters are relative
+ *		value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
  */
 struct psblk_device {
 	unsigned long total_size;
@@ -315,6 +318,9 @@ static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
 	/* size and off must align to SECTOR_SIZE for block device */
 	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
 			size >> SECTOR_SHIFT);
+	/* try next zone */
+	if (ret == -ENOMSG)
+		return ret;
 	return ret ? -EIO : size;
 }
 
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 36d78c63bd20..43d44d016039 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -247,6 +247,9 @@ static int psz_zone_write(struct pstore_zone *zone,
 
 	return 0;
 dirty:
+	/* no need to mark dirty if going to try next zone */
+	if (wcnt == -ENOMSG)
+		return -ENOMSG;
 	atomic_set(&zone->dirty, true);
 	/* flush dirty zones nicely */
 	if (wcnt == -EBUSY && !is_on_panic())
@@ -382,7 +385,11 @@ static int psz_recover_oops_meta(struct psz_context *cxt)
 			return -EINVAL;
 
 		rcnt = info->read((char *)buf, len, zone->off);
-		if (rcnt != len) {
+		if (rcnt == -ENOMSG) {
+			pr_debug("%s with id %lu may be broken, skip\n",
+					zone->name, i);
+			continue;
+		} else if (rcnt != len) {
 			pr_err("read %s with id %lu failed\n", zone->name, i);
 			return (int)rcnt < 0 ? (int)rcnt : -EIO;
 		}
@@ -717,24 +724,58 @@ static void psz_write_kmsg_hdr(struct pstore_zone *zone,
 		hdr->counter = 0;
 }
 
+/*
+ * In case zone is broken, which may occur to MTD device, we try each zones,
+ * start at cxt->oops_write_cnt.
+ */
 static inline int notrace psz_oops_write_record(struct psz_context *cxt,
 		struct pstore_record *record)
 {
+	int ret = -EBUSY;
 	size_t size, hlen;
 	struct pstore_zone *zone;
-	unsigned int zonenum;
+	unsigned int i;
 
-	zonenum = cxt->oops_write_cnt;
-	zone = cxt->opszs[zonenum];
-	if (unlikely(!zone))
-		return -ENOSPC;
-	cxt->oops_write_cnt = (zonenum + 1) % cxt->oops_max_cnt;
+	for (i = 0; i < cxt->oops_max_cnt; i++) {
+		unsigned int zonenum, len;
+
+		zonenum = (cxt->oops_write_cnt + i) % cxt->oops_max_cnt;
+		zone = cxt->opszs[zonenum];
+		if (unlikely(!zone))
+			return -ENOSPC;
 
-	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
-	psz_write_kmsg_hdr(zone, record);
-	hlen = sizeof(struct psz_oops_header);
-	size = min_t(size_t, record->size, zone->buffer_size - hlen);
-	return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		/* avoid destorying old data, allocate a new one */
+		len = zone->buffer_size + sizeof(*zone->buffer);
+		zone->oldbuf = zone->buffer;
+		zone->buffer = kzalloc(len, GFP_KERNEL);
+		if (!zone->buffer) {
+			zone->buffer = zone->oldbuf;
+			return -ENOMEM;
+		}
+		zone->buffer->sig = zone->oldbuf->sig;
+
+		pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+		psz_write_kmsg_hdr(zone, record);
+		hlen = sizeof(struct psz_oops_header);
+		size = min_t(size_t, record->size, zone->buffer_size - hlen);
+		ret = psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		if (likely(!ret || ret != -ENOMSG)) {
+			cxt->oops_write_cnt = zonenum + 1;
+			cxt->oops_write_cnt %= cxt->oops_max_cnt;
+			/* no need to try next zone, free last zone buffer */
+			kfree(zone->oldbuf);
+			zone->oldbuf = NULL;
+			return ret;
+		}
+
+		pr_debug("zone %u may be broken, try next dmesg zone\n",
+				zonenum);
+		kfree(zone->buffer);
+		zone->buffer = zone->oldbuf;
+		zone->oldbuf = NULL;
+	}
+
+	return -EBUSY;
 }
 
 static int notrace psz_oops_write(struct psz_context *cxt,
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index d8f609e60288..828b0763d477 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -14,7 +14,8 @@
  * @start_sect: start sector to block device
  * @sects: sectors count on buf
  *
- * Return: On success, zero should be returned. Others mean error.
+ * Return: On success, zero should be returned. Others excluding -ENOMSG
+ * mean error. -ENOMSG means to try next zone.
  *
  * Panic write to block device must be aligned to SECTOR_SIZE.
  */
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index 94f441b8b616..ddb3dfea4ea6 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -23,11 +23,15 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
- *		means error.
- * @write:	The same as @read, but -EBUSY means try to write again later.
+ *		mean error.
+ * @write:	The same as @read, but the following error number:
+ *		-EBUSY means try to write again later.
+ *		-ENOMSG means to try next zone.
  * @panic_write:The write operation only used for panic case. It's optional
- *		if you do not care panic log. The parameters and return value
- *		are the same as @read.
+ *		if you do not care panic log. The parameters are relative
+ *		value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
  */
 struct pstore_zone_info {
 	struct module *owner;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 09/12] pstore/zone: Provide way to skip "broken" zone for MTD devices
@ 2020-05-08  6:40   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

One requirement to support MTD devices in pstore/zone is having a
way to declare certain regions as broken. Add this support to
pstore/zone.

The MTD driver should return -ENOMSG when encountering a bad region,
which tells pstore/zone to skip and try the next one.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-9-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/blk.c             | 10 ++++--
 fs/pstore/zone.c            | 65 ++++++++++++++++++++++++++++++-------
 include/linux/pstore_blk.h  |  3 +-
 include/linux/pstore_zone.h | 12 ++++---
 4 files changed, 71 insertions(+), 19 deletions(-)

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 5db811b7018d..e33e58afd4cb 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -101,9 +101,12 @@ static struct bdev_info {
  *		means error.
  * @write:	The same as @read, but the following error number:
  *		-EBUSY means try to write again later.
+ *		-ENOMSG means to try next zone.
  * @panic_write:The write operation only used for panic case. It's optional
- *		if you do not care panic log. The parameters and return value
- *		are the same as @read.
+ *		if you do not care panic log. The parameters are relative
+ *		value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
  */
 struct psblk_device {
 	unsigned long total_size;
@@ -315,6 +318,9 @@ static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
 	/* size and off must align to SECTOR_SIZE for block device */
 	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
 			size >> SECTOR_SHIFT);
+	/* try next zone */
+	if (ret == -ENOMSG)
+		return ret;
 	return ret ? -EIO : size;
 }
 
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 36d78c63bd20..43d44d016039 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -247,6 +247,9 @@ static int psz_zone_write(struct pstore_zone *zone,
 
 	return 0;
 dirty:
+	/* no need to mark dirty if going to try next zone */
+	if (wcnt == -ENOMSG)
+		return -ENOMSG;
 	atomic_set(&zone->dirty, true);
 	/* flush dirty zones nicely */
 	if (wcnt == -EBUSY && !is_on_panic())
@@ -382,7 +385,11 @@ static int psz_recover_oops_meta(struct psz_context *cxt)
 			return -EINVAL;
 
 		rcnt = info->read((char *)buf, len, zone->off);
-		if (rcnt != len) {
+		if (rcnt == -ENOMSG) {
+			pr_debug("%s with id %lu may be broken, skip\n",
+					zone->name, i);
+			continue;
+		} else if (rcnt != len) {
 			pr_err("read %s with id %lu failed\n", zone->name, i);
 			return (int)rcnt < 0 ? (int)rcnt : -EIO;
 		}
@@ -717,24 +724,58 @@ static void psz_write_kmsg_hdr(struct pstore_zone *zone,
 		hdr->counter = 0;
 }
 
+/*
+ * In case zone is broken, which may occur to MTD device, we try each zones,
+ * start at cxt->oops_write_cnt.
+ */
 static inline int notrace psz_oops_write_record(struct psz_context *cxt,
 		struct pstore_record *record)
 {
+	int ret = -EBUSY;
 	size_t size, hlen;
 	struct pstore_zone *zone;
-	unsigned int zonenum;
+	unsigned int i;
 
-	zonenum = cxt->oops_write_cnt;
-	zone = cxt->opszs[zonenum];
-	if (unlikely(!zone))
-		return -ENOSPC;
-	cxt->oops_write_cnt = (zonenum + 1) % cxt->oops_max_cnt;
+	for (i = 0; i < cxt->oops_max_cnt; i++) {
+		unsigned int zonenum, len;
+
+		zonenum = (cxt->oops_write_cnt + i) % cxt->oops_max_cnt;
+		zone = cxt->opszs[zonenum];
+		if (unlikely(!zone))
+			return -ENOSPC;
 
-	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
-	psz_write_kmsg_hdr(zone, record);
-	hlen = sizeof(struct psz_oops_header);
-	size = min_t(size_t, record->size, zone->buffer_size - hlen);
-	return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		/* avoid destorying old data, allocate a new one */
+		len = zone->buffer_size + sizeof(*zone->buffer);
+		zone->oldbuf = zone->buffer;
+		zone->buffer = kzalloc(len, GFP_KERNEL);
+		if (!zone->buffer) {
+			zone->buffer = zone->oldbuf;
+			return -ENOMEM;
+		}
+		zone->buffer->sig = zone->oldbuf->sig;
+
+		pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+		psz_write_kmsg_hdr(zone, record);
+		hlen = sizeof(struct psz_oops_header);
+		size = min_t(size_t, record->size, zone->buffer_size - hlen);
+		ret = psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		if (likely(!ret || ret != -ENOMSG)) {
+			cxt->oops_write_cnt = zonenum + 1;
+			cxt->oops_write_cnt %= cxt->oops_max_cnt;
+			/* no need to try next zone, free last zone buffer */
+			kfree(zone->oldbuf);
+			zone->oldbuf = NULL;
+			return ret;
+		}
+
+		pr_debug("zone %u may be broken, try next dmesg zone\n",
+				zonenum);
+		kfree(zone->buffer);
+		zone->buffer = zone->oldbuf;
+		zone->oldbuf = NULL;
+	}
+
+	return -EBUSY;
 }
 
 static int notrace psz_oops_write(struct psz_context *cxt,
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index d8f609e60288..828b0763d477 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -14,7 +14,8 @@
  * @start_sect: start sector to block device
  * @sects: sectors count on buf
  *
- * Return: On success, zero should be returned. Others mean error.
+ * Return: On success, zero should be returned. Others excluding -ENOMSG
+ * mean error. -ENOMSG means to try next zone.
  *
  * Panic write to block device must be aligned to SECTOR_SIZE.
  */
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index 94f441b8b616..ddb3dfea4ea6 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -23,11 +23,15 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @read:	The general read operation. Both of the function parameters
  *		@size and @offset are relative value to storage.
  *		On success, the number of bytes should be returned, others
- *		means error.
- * @write:	The same as @read, but -EBUSY means try to write again later.
+ *		mean error.
+ * @write:	The same as @read, but the following error number:
+ *		-EBUSY means try to write again later.
+ *		-ENOMSG means to try next zone.
  * @panic_write:The write operation only used for panic case. It's optional
- *		if you do not care panic log. The parameters and return value
- *		are the same as @read.
+ *		if you do not care panic log. The parameters are relative
+ *		value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
  */
 struct pstore_zone_info {
 	struct module *owner;
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 10/12] pstore/blk: Provide way to query pstore configuration
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:40   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

In order to configure itself, the MTD backend needs to be able to query
the current pstore configuration. Introduce pstore_blk_usr_info() for
this purpose.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-10-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/blk.c            | 37 ++++++++++++++++++++++++++++++-------
 include/linux/pstore_blk.h | 10 ++++++++++
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index e33e58afd4cb..c6d99d5dcd7f 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -86,6 +86,17 @@ static struct bdev_info {
 	sector_t start_sect;
 } g_bdev_info;
 
+#define check_size(name, alignsize) ({				\
+	long _##name_ = (name);					\
+	_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+	if (_##name_ & ((alignsize) - 1)) {			\
+		pr_info(#name " must align to %d\n",		\
+				(alignsize));			\
+		_##name_ = ALIGN(name, (alignsize));		\
+	}							\
+	_##name_;						\
+})
+
 /**
  * struct psblk_device - back-end pstore/blk driver structure.
  *
@@ -140,13 +151,11 @@ static int psblk_register_do(struct psblk_device *dev)
 	if (!dev->flags)
 		dev->flags = UINT_MAX;
 #define verify_size(name, alignsize, enable) {				\
-		long _##name_ = (enable) ? (name) : 0;			\
-		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
-		if (_##name_ & ((alignsize) - 1)) {			\
-			pr_info(#name " must align to %d\n",		\
-					(alignsize));			\
-			_##name_ = ALIGN(name, (alignsize));		\
-		}							\
+		long _##name_;						\
+		if (enable)						\
+			_##name_ = check_size(name, alignsize);		\
+		else							\
+			_##name_ = 0;					\
 		name = _##name_ / 1024;					\
 		pstore_zone_info->name = _##name_;				\
 	}
@@ -465,6 +474,20 @@ int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
 }
 EXPORT_SYMBOL_GPL(psblk_blkdev_info);
 
+/* get information of pstore/blk */
+int pstore_blk_usr_info(struct pstore_blk_info *info)
+{
+	strncpy(info->device, blkdev, 80);
+	info->max_reason = max_reason;
+	info->kmsg_size = check_size(kmsg_size, 4096);
+	info->pmsg_size = check_size(pmsg_size, 4096);
+	info->ftrace_size = check_size(ftrace_size, 4096);
+	info->console_size = check_size(console_size, 4096);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pstore_blk_usr_info);
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
 MODULE_DESCRIPTION("pstore backend for block devices");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 828b0763d477..dd5213044e21 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -27,4 +27,14 @@ int  psblk_register_blkdev(unsigned int major, unsigned int flags,
 void psblk_unregister_blkdev(unsigned int major);
 int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
 
+struct pstore_blk_info {
+	char device[80];
+	enum kmsg_dump_reason max_reason;
+	unsigned long kmsg_size;
+	unsigned long pmsg_size;
+	unsigned long console_size;
+	unsigned long ftrace_size;
+};
+int pstore_blk_usr_info(struct pstore_blk_info *info);
+
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 10/12] pstore/blk: Provide way to query pstore configuration
@ 2020-05-08  6:40   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

In order to configure itself, the MTD backend needs to be able to query
the current pstore configuration. Introduce pstore_blk_usr_info() for
this purpose.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-10-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/pstore/blk.c            | 37 ++++++++++++++++++++++++++++++-------
 include/linux/pstore_blk.h | 10 ++++++++++
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index e33e58afd4cb..c6d99d5dcd7f 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -86,6 +86,17 @@ static struct bdev_info {
 	sector_t start_sect;
 } g_bdev_info;
 
+#define check_size(name, alignsize) ({				\
+	long _##name_ = (name);					\
+	_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+	if (_##name_ & ((alignsize) - 1)) {			\
+		pr_info(#name " must align to %d\n",		\
+				(alignsize));			\
+		_##name_ = ALIGN(name, (alignsize));		\
+	}							\
+	_##name_;						\
+})
+
 /**
  * struct psblk_device - back-end pstore/blk driver structure.
  *
@@ -140,13 +151,11 @@ static int psblk_register_do(struct psblk_device *dev)
 	if (!dev->flags)
 		dev->flags = UINT_MAX;
 #define verify_size(name, alignsize, enable) {				\
-		long _##name_ = (enable) ? (name) : 0;			\
-		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
-		if (_##name_ & ((alignsize) - 1)) {			\
-			pr_info(#name " must align to %d\n",		\
-					(alignsize));			\
-			_##name_ = ALIGN(name, (alignsize));		\
-		}							\
+		long _##name_;						\
+		if (enable)						\
+			_##name_ = check_size(name, alignsize);		\
+		else							\
+			_##name_ = 0;					\
 		name = _##name_ / 1024;					\
 		pstore_zone_info->name = _##name_;				\
 	}
@@ -465,6 +474,20 @@ int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
 }
 EXPORT_SYMBOL_GPL(psblk_blkdev_info);
 
+/* get information of pstore/blk */
+int pstore_blk_usr_info(struct pstore_blk_info *info)
+{
+	strncpy(info->device, blkdev, 80);
+	info->max_reason = max_reason;
+	info->kmsg_size = check_size(kmsg_size, 4096);
+	info->pmsg_size = check_size(pmsg_size, 4096);
+	info->ftrace_size = check_size(ftrace_size, 4096);
+	info->console_size = check_size(console_size, 4096);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pstore_blk_usr_info);
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
 MODULE_DESCRIPTION("pstore backend for block devices");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 828b0763d477..dd5213044e21 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -27,4 +27,14 @@ int  psblk_register_blkdev(unsigned int major, unsigned int flags,
 void psblk_unregister_blkdev(unsigned int major);
 int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
 
+struct pstore_blk_info {
+	char device[80];
+	enum kmsg_dump_reason max_reason;
+	unsigned long kmsg_size;
+	unsigned long pmsg_size;
+	unsigned long console_size;
+	unsigned long ftrace_size;
+};
+int pstore_blk_usr_info(struct pstore_blk_info *info);
+
 #endif
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 11/12] pstore/blk: Support non-block storage devices
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:40   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Add support for non-block devices (e.g. MTD). A non-block driver calls
pstore_blk_register_device() to register iself.

In addition, pstore/zone is updated to handle non-block devices,
where an erase must be done before a write. Without this, there is no
way to remove records stored to an MTD.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-11-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Documentation/admin-guide/pstore-blk.rst | 17 ++++++--
 fs/pstore/blk.c                          | 52 +++++++++---------------
 fs/pstore/zone.c                         |  8 +++-
 include/linux/pstore_blk.h               | 37 +++++++++++++++++
 include/linux/pstore_zone.h              |  6 +++
 5 files changed, 83 insertions(+), 37 deletions(-)

diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
index 484a1502fb49..2f3602397715 100644
--- a/Documentation/admin-guide/pstore-blk.rst
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -7,8 +7,8 @@ Introduction
 ------------
 
 pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
-block device before the system crashes. You can get these log files by
-mounting pstore filesystem like::
+block device and non-block device before the system crashes. You can get
+these log files by mounting pstore filesystem like::
 
     mount -t pstore pstore /sys/fs/pstore
 
@@ -24,8 +24,8 @@ Configurations for user determine how pstore/blk works, such as pmsg_size,
 kmsg_size and so on. All of them support both Kconfig and module parameters,
 but module parameters have priority over Kconfig.
 
-Configurations for driver are all about block device, such as total_size
-of block device and read/write operations.
+Configurations for driver are all about block device and non-block device,
+such as total_size of block device and read/write operations.
 
 Configurations for user
 -----------------------
@@ -152,6 +152,15 @@ driver uses ``psblk_register_blkdev`` to register to pstore/blk.
 .. kernel-doc:: fs/pstore/blk.c
    :identifiers: psblk_register_blkdev
 
+A non-block device driver uses ``psblk_register_device`` with
+``struct psblk_device`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/blk.c
+   :identifiers: psblk_register_device
+
+.. kernel-doc:: include/linux/pstore_blk.h
+   :identifiers: psblk_device
+
 Compression and header
 ----------------------
 
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index c6d99d5dcd7f..a736555e1ed3 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -98,36 +98,15 @@ static struct bdev_info {
 })
 
 /**
- * struct psblk_device - back-end pstore/blk driver structure.
+ * psblk_register_device() - register non-block device to pstore/blk
  *
- * @total_size: The total size in bytes pstore/blk can use. It must be greater
- *		than 4096 and be multiple of 4096.
- * @flags:	Refer to macro starting with PSTORE_FLAGS defined in
- *		linux/pstore.h. It means what front-ends this device support.
- *		Zero means all backends for compatible.
- * @read:	The general read operation. Both of the function parameters
- *		@size and @offset are relative value to bock device (not the
- *		whole disk).
- *		On success, the number of bytes should be returned, others
- *		means error.
- * @write:	The same as @read, but the following error number:
- *		-EBUSY means try to write again later.
- *		-ENOMSG means to try next zone.
- * @panic_write:The write operation only used for panic case. It's optional
- *		if you do not care panic log. The parameters are relative
- *		value to storage.
- *		On success, the number of bytes should be returned, others
- *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
+ * @dev: non-block device information
+ *
+ * Return:
+ * * 0		- OK
+ * * Others	- something error.
  */
-struct psblk_device {
-	unsigned long total_size;
-	unsigned int flags;
-	psz_read_op read;
-	psz_write_op write;
-	psz_write_op panic_write;
-};
-
-static int psblk_register_do(struct psblk_device *dev)
+int psblk_register_device(struct psblk_device *dev)
 {
 	int ret;
 
@@ -170,6 +149,7 @@ static int psblk_register_do(struct psblk_device *dev)
 	pstore_zone_info->max_reason = max_reason;
 	pstore_zone_info->read = dev->read;
 	pstore_zone_info->write = dev->write;
+	pstore_zone_info->erase = dev->erase;
 	pstore_zone_info->panic_write = dev->panic_write;
 	pstore_zone_info->name = MODNAME;
 	pstore_zone_info->owner = THIS_MODULE;
@@ -182,8 +162,14 @@ static int psblk_register_do(struct psblk_device *dev)
 	mutex_unlock(&psz_lock);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(psblk_register_device);
 
-static void psblk_unregister_do(struct psblk_device *dev)
+/**
+ * psblk_unregister_blkdev() - unregister block device from pstore/blk
+ *
+ * @dev: non-block device information
+ */
+void psblk_unregister_device(struct psblk_device *dev)
 {
 	mutex_lock(&psz_lock);
 	if (pstore_zone_info && pstore_zone_info->read == dev->read) {
@@ -193,6 +179,7 @@ static void psblk_unregister_do(struct psblk_device *dev)
 	}
 	mutex_unlock(&psz_lock);
 }
+EXPORT_SYMBOL_GPL(psblk_unregister_device);
 
 /**
  * psblk_get_bdev() - open block device
@@ -406,11 +393,12 @@ int psblk_register_blkdev(unsigned int major, unsigned int flags,
 
 	dev.total_size = psblk_bdev_size(bdev);
 	dev.flags = flags;
-	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
 	dev.read = psblk_generic_blk_read;
 	dev.write = psblk_generic_blk_write;
+	dev.erase = NULL;
+	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
 
-	ret = psblk_register_do(&dev);
+	ret = psblk_register_device(&dev);
 	if (ret)
 		goto err_put_bdev;
 
@@ -436,7 +424,7 @@ void psblk_unregister_blkdev(unsigned int major)
 	void *holder = blkdev;
 
 	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
-		psblk_unregister_do(&dev);
+		psblk_unregister_device(&dev);
 		psblk_put_bdev(psblk_bdev, holder);
 		blkdev_panic_write = NULL;
 		psblk_bdev = NULL;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 43d44d016039..df5ce54eb7ea 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -652,15 +652,21 @@ static inline int psz_oops_erase(struct psz_context *cxt,
 	struct psz_buffer *buffer = zone->buffer;
 	struct psz_oops_header *hdr =
 		(struct psz_oops_header *)buffer->data;
+	size_t size;
 
 	if (unlikely(!psz_ok(zone)))
 		return 0;
+
 	/* this zone is already updated, no need to erase */
 	if (record->count != hdr->counter)
 		return 0;
 
+	size = buffer_datalen(zone) + sizeof(*zone->buffer);
 	atomic_set(&zone->buffer->datalen, 0);
-	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	if (cxt->pstore_zone_info->erase)
+		return cxt->pstore_zone_info->erase(size, zone->off);
+	else
+		return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
 static inline int psz_record_erase(struct psz_context *cxt,
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index dd5213044e21..43242e343dad 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -7,6 +7,41 @@
 #include <linux/pstore.h>
 #include <linux/pstore_zone.h>
 
+/**
+ * struct psblk_device - back-end pstore/blk driver structure.
+ *
+ * @total_size: The total size in bytes pstore/blk can use. It must be greater
+ *		than 4096 and be multiple of 4096.
+ * @flags:	Refer to macro starting with PSTORE_FLAGS defined in
+ *		linux/pstore.h. It means what front-ends this device support.
+ *		Zero means all backends for compatible.
+ * @read:	The general read operation. Both of the function parameters
+ *		@size and @offset are relative value to bock device (not the
+ *		whole disk).
+ *		On success, the number of bytes should be returned, others
+ *		means error.
+ * @write:	The same as @read, but the following error number:
+ *		-EBUSY means try to write again later.
+ *		-ENOMSG means to try next zone.
+ * @erase:	The general erase operation for device with special removing
+ *		job. Both of the function parameters @size and @offset are
+ *		relative value to storage.
+ *		Return 0 on success and others on failure.
+ * @panic_write:The write operation only used for panic case. It's optional
+ *		if you do not care panic log. The parameters are relative
+ *		value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
+ */
+struct psblk_device {
+	unsigned long total_size;
+	unsigned int flags;
+	psz_read_op read;
+	psz_write_op write;
+	psz_erase_op erase;
+	psz_write_op panic_write;
+};
+
 /**
  * typedef psblk_panic_write_op - panic write operation to block device
  *
@@ -22,6 +57,8 @@
 typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
 		sector_t sects);
 
+int psblk_register_device(struct psblk_device *dev);
+void psblk_unregister_device(struct psblk_device *dev);
 int  psblk_register_blkdev(unsigned int major, unsigned int flags,
 		psblk_panic_write_op panic_write);
 void psblk_unregister_blkdev(unsigned int major);
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index ddb3dfea4ea6..2c031a25ee5f 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -7,6 +7,7 @@
 
 typedef ssize_t (*psz_read_op)(char *, size_t, loff_t);
 typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
+typedef ssize_t (*psz_erase_op)(size_t, loff_t);
 /**
  * struct pstore_zone_info - pstore/zone back-end driver structure
  *
@@ -27,6 +28,10 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @write:	The same as @read, but the following error number:
  *		-EBUSY means try to write again later.
  *		-ENOMSG means to try next zone.
+ * @erase:	The general erase operation for device with special removing
+ *		job. Both of the function parameters @size and @offset are
+ *		relative value to storage.
+ *		Return 0 on success and others on failure.
  * @panic_write:The write operation only used for panic case. It's optional
  *		if you do not care panic log. The parameters are relative
  *		value to storage.
@@ -45,6 +50,7 @@ struct pstore_zone_info {
 	unsigned long ftrace_size;
 	psz_read_op read;
 	psz_write_op write;
+	psz_erase_op erase;
 	psz_write_op panic_write;
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 11/12] pstore/blk: Support non-block storage devices
@ 2020-05-08  6:40   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Add support for non-block devices (e.g. MTD). A non-block driver calls
pstore_blk_register_device() to register iself.

In addition, pstore/zone is updated to handle non-block devices,
where an erase must be done before a write. Without this, there is no
way to remove records stored to an MTD.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-11-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Documentation/admin-guide/pstore-blk.rst | 17 ++++++--
 fs/pstore/blk.c                          | 52 +++++++++---------------
 fs/pstore/zone.c                         |  8 +++-
 include/linux/pstore_blk.h               | 37 +++++++++++++++++
 include/linux/pstore_zone.h              |  6 +++
 5 files changed, 83 insertions(+), 37 deletions(-)

diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
index 484a1502fb49..2f3602397715 100644
--- a/Documentation/admin-guide/pstore-blk.rst
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -7,8 +7,8 @@ Introduction
 ------------
 
 pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
-block device before the system crashes. You can get these log files by
-mounting pstore filesystem like::
+block device and non-block device before the system crashes. You can get
+these log files by mounting pstore filesystem like::
 
     mount -t pstore pstore /sys/fs/pstore
 
@@ -24,8 +24,8 @@ Configurations for user determine how pstore/blk works, such as pmsg_size,
 kmsg_size and so on. All of them support both Kconfig and module parameters,
 but module parameters have priority over Kconfig.
 
-Configurations for driver are all about block device, such as total_size
-of block device and read/write operations.
+Configurations for driver are all about block device and non-block device,
+such as total_size of block device and read/write operations.
 
 Configurations for user
 -----------------------
@@ -152,6 +152,15 @@ driver uses ``psblk_register_blkdev`` to register to pstore/blk.
 .. kernel-doc:: fs/pstore/blk.c
    :identifiers: psblk_register_blkdev
 
+A non-block device driver uses ``psblk_register_device`` with
+``struct psblk_device`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/blk.c
+   :identifiers: psblk_register_device
+
+.. kernel-doc:: include/linux/pstore_blk.h
+   :identifiers: psblk_device
+
 Compression and header
 ----------------------
 
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index c6d99d5dcd7f..a736555e1ed3 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -98,36 +98,15 @@ static struct bdev_info {
 })
 
 /**
- * struct psblk_device - back-end pstore/blk driver structure.
+ * psblk_register_device() - register non-block device to pstore/blk
  *
- * @total_size: The total size in bytes pstore/blk can use. It must be greater
- *		than 4096 and be multiple of 4096.
- * @flags:	Refer to macro starting with PSTORE_FLAGS defined in
- *		linux/pstore.h. It means what front-ends this device support.
- *		Zero means all backends for compatible.
- * @read:	The general read operation. Both of the function parameters
- *		@size and @offset are relative value to bock device (not the
- *		whole disk).
- *		On success, the number of bytes should be returned, others
- *		means error.
- * @write:	The same as @read, but the following error number:
- *		-EBUSY means try to write again later.
- *		-ENOMSG means to try next zone.
- * @panic_write:The write operation only used for panic case. It's optional
- *		if you do not care panic log. The parameters are relative
- *		value to storage.
- *		On success, the number of bytes should be returned, others
- *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
+ * @dev: non-block device information
+ *
+ * Return:
+ * * 0		- OK
+ * * Others	- something error.
  */
-struct psblk_device {
-	unsigned long total_size;
-	unsigned int flags;
-	psz_read_op read;
-	psz_write_op write;
-	psz_write_op panic_write;
-};
-
-static int psblk_register_do(struct psblk_device *dev)
+int psblk_register_device(struct psblk_device *dev)
 {
 	int ret;
 
@@ -170,6 +149,7 @@ static int psblk_register_do(struct psblk_device *dev)
 	pstore_zone_info->max_reason = max_reason;
 	pstore_zone_info->read = dev->read;
 	pstore_zone_info->write = dev->write;
+	pstore_zone_info->erase = dev->erase;
 	pstore_zone_info->panic_write = dev->panic_write;
 	pstore_zone_info->name = MODNAME;
 	pstore_zone_info->owner = THIS_MODULE;
@@ -182,8 +162,14 @@ static int psblk_register_do(struct psblk_device *dev)
 	mutex_unlock(&psz_lock);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(psblk_register_device);
 
-static void psblk_unregister_do(struct psblk_device *dev)
+/**
+ * psblk_unregister_blkdev() - unregister block device from pstore/blk
+ *
+ * @dev: non-block device information
+ */
+void psblk_unregister_device(struct psblk_device *dev)
 {
 	mutex_lock(&psz_lock);
 	if (pstore_zone_info && pstore_zone_info->read == dev->read) {
@@ -193,6 +179,7 @@ static void psblk_unregister_do(struct psblk_device *dev)
 	}
 	mutex_unlock(&psz_lock);
 }
+EXPORT_SYMBOL_GPL(psblk_unregister_device);
 
 /**
  * psblk_get_bdev() - open block device
@@ -406,11 +393,12 @@ int psblk_register_blkdev(unsigned int major, unsigned int flags,
 
 	dev.total_size = psblk_bdev_size(bdev);
 	dev.flags = flags;
-	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
 	dev.read = psblk_generic_blk_read;
 	dev.write = psblk_generic_blk_write;
+	dev.erase = NULL;
+	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
 
-	ret = psblk_register_do(&dev);
+	ret = psblk_register_device(&dev);
 	if (ret)
 		goto err_put_bdev;
 
@@ -436,7 +424,7 @@ void psblk_unregister_blkdev(unsigned int major)
 	void *holder = blkdev;
 
 	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
-		psblk_unregister_do(&dev);
+		psblk_unregister_device(&dev);
 		psblk_put_bdev(psblk_bdev, holder);
 		blkdev_panic_write = NULL;
 		psblk_bdev = NULL;
diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
index 43d44d016039..df5ce54eb7ea 100644
--- a/fs/pstore/zone.c
+++ b/fs/pstore/zone.c
@@ -652,15 +652,21 @@ static inline int psz_oops_erase(struct psz_context *cxt,
 	struct psz_buffer *buffer = zone->buffer;
 	struct psz_oops_header *hdr =
 		(struct psz_oops_header *)buffer->data;
+	size_t size;
 
 	if (unlikely(!psz_ok(zone)))
 		return 0;
+
 	/* this zone is already updated, no need to erase */
 	if (record->count != hdr->counter)
 		return 0;
 
+	size = buffer_datalen(zone) + sizeof(*zone->buffer);
 	atomic_set(&zone->buffer->datalen, 0);
-	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	if (cxt->pstore_zone_info->erase)
+		return cxt->pstore_zone_info->erase(size, zone->off);
+	else
+		return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
 static inline int psz_record_erase(struct psz_context *cxt,
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index dd5213044e21..43242e343dad 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -7,6 +7,41 @@
 #include <linux/pstore.h>
 #include <linux/pstore_zone.h>
 
+/**
+ * struct psblk_device - back-end pstore/blk driver structure.
+ *
+ * @total_size: The total size in bytes pstore/blk can use. It must be greater
+ *		than 4096 and be multiple of 4096.
+ * @flags:	Refer to macro starting with PSTORE_FLAGS defined in
+ *		linux/pstore.h. It means what front-ends this device support.
+ *		Zero means all backends for compatible.
+ * @read:	The general read operation. Both of the function parameters
+ *		@size and @offset are relative value to bock device (not the
+ *		whole disk).
+ *		On success, the number of bytes should be returned, others
+ *		means error.
+ * @write:	The same as @read, but the following error number:
+ *		-EBUSY means try to write again later.
+ *		-ENOMSG means to try next zone.
+ * @erase:	The general erase operation for device with special removing
+ *		job. Both of the function parameters @size and @offset are
+ *		relative value to storage.
+ *		Return 0 on success and others on failure.
+ * @panic_write:The write operation only used for panic case. It's optional
+ *		if you do not care panic log. The parameters are relative
+ *		value to storage.
+ *		On success, the number of bytes should be returned, others
+ *		excluding -ENOMSG mean error. -ENOMSG means to try next zone.
+ */
+struct psblk_device {
+	unsigned long total_size;
+	unsigned int flags;
+	psz_read_op read;
+	psz_write_op write;
+	psz_erase_op erase;
+	psz_write_op panic_write;
+};
+
 /**
  * typedef psblk_panic_write_op - panic write operation to block device
  *
@@ -22,6 +57,8 @@
 typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
 		sector_t sects);
 
+int psblk_register_device(struct psblk_device *dev);
+void psblk_unregister_device(struct psblk_device *dev);
 int  psblk_register_blkdev(unsigned int major, unsigned int flags,
 		psblk_panic_write_op panic_write);
 void psblk_unregister_blkdev(unsigned int major);
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index ddb3dfea4ea6..2c031a25ee5f 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -7,6 +7,7 @@
 
 typedef ssize_t (*psz_read_op)(char *, size_t, loff_t);
 typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
+typedef ssize_t (*psz_erase_op)(size_t, loff_t);
 /**
  * struct pstore_zone_info - pstore/zone back-end driver structure
  *
@@ -27,6 +28,10 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
  * @write:	The same as @read, but the following error number:
  *		-EBUSY means try to write again later.
  *		-ENOMSG means to try next zone.
+ * @erase:	The general erase operation for device with special removing
+ *		job. Both of the function parameters @size and @offset are
+ *		relative value to storage.
+ *		Return 0 on success and others on failure.
  * @panic_write:The write operation only used for panic case. It's optional
  *		if you do not care panic log. The parameters are relative
  *		value to storage.
@@ -45,6 +50,7 @@ struct pstore_zone_info {
 	unsigned long ftrace_size;
 	psz_read_op read;
 	psz_write_op write;
+	psz_erase_op erase;
 	psz_write_op panic_write;
 };
 
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 12/12] mtd: Support kmsg dumper based on pstore/blk
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  6:40   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

This introduces mtdpstore, which is similar to mtdoops but more
powerful. It uses pstore/blk, and aims to store panic and oops logs to
a flash partition, where pstore can later read back and present as files
in the mounted pstore filesystem.

To make mtdpstore work, the "blkdev" of pstore/blk should be set
as MTD device name or MTD device number. For more details, see
Documentation/admin-guide/pstore-blk.rst

This solves a number of issues:
- Work duplication: both of pstore and mtdoops do the same job storing
  panic/oops log. They have very similar logic, registering to kmsg
  dumper and storing logs to several chunks one by one.
- Layer violations: drivers should provides methods instead of polices.
  MTD should provide read/write/erase operations, and allow a higher
  level drivers to provide the chunk management, kmsg dump
  configuration, etc.
- Missing features: pstore provides many additional features, including
  presenting the logs as files, logging dump time and count, and
  supporting other frontends like pmsg, console, etc.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-12-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Documentation/admin-guide/pstore-blk.rst |   9 +-
 drivers/mtd/Kconfig                      |  10 +
 drivers/mtd/Makefile                     |   1 +
 drivers/mtd/mtdpstore.c                  | 564 +++++++++++++++++++++++
 fs/pstore/platform.c                     |  22 +-
 5 files changed, 583 insertions(+), 23 deletions(-)
 create mode 100644 drivers/mtd/mtdpstore.c

diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
index 2f3602397715..bf0b5a227e24 100644
--- a/Documentation/admin-guide/pstore-blk.rst
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -43,9 +43,9 @@ blkdev
 ~~~~~~
 
 The block device to use. Most of the time, it is a partition of block device.
-It's required for pstore/blk.
+It's required for pstore/blk. It is also used for MTD device.
 
-It accepts the following variants:
+It accepts the following variants for block device:
 
 1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
    leading 0x, for example b302.
@@ -64,6 +64,11 @@ It accepts the following variants:
    partition with a known unique id.
 #. <major>:<minor> major and minor number of the device separated by a colon.
 
+It accepts the following variants for MTD device:
+
+1. <device name> MTD device name. "pstore" is recommended.
+#. <device number> MTD device number.
+
 kmsg_size
 ~~~~~~~~~
 
diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
index 42d401ea60ee..6ddab796216d 100644
--- a/drivers/mtd/Kconfig
+++ b/drivers/mtd/Kconfig
@@ -170,6 +170,16 @@ config MTD_OOPS
 	  buffer in a flash partition where it can be read back at some
 	  later point.
 
+config MTD_PSTORE
+	tristate "Log panic/oops to an MTD buffer based on pstore"
+	depends on PSTORE_BLK
+	help
+	  This enables panic and oops messages to be logged to a circular
+	  buffer in a flash partition where it can be read back as files after
+	  mounting pstore filesystem.
+
+	  If unsure, say N.
+
 config MTD_SWAP
 	tristate "Swap on MTD device support"
 	depends on MTD && SWAP
diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index 56cc60ccc477..593d0593a038 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
 obj-$(CONFIG_SSFDC)		+= ssfdc.o
 obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
 obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
+obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
 obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
 
 nftl-objs		:= nftlcore.o nftlmount.o
diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
new file mode 100644
index 000000000000..50c8fc746f39
--- /dev/null
+++ b/drivers/mtd/mtdpstore.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define dev_fmt(fmt) "mtdoops-pstore: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pstore_blk.h>
+#include <linux/mtd/mtd.h>
+#include <linux/bitops.h>
+
+static struct mtdpstore_context {
+	int index;
+	struct pstore_blk_info info;
+	struct psblk_device dev;
+	struct mtd_info *mtd;
+	unsigned long *rmmap;		/* removed bit map */
+	unsigned long *usedmap;		/* used bit map */
+	/*
+	 * used for panic write
+	 * As there are no block_isbad for panic case, we should keep this
+	 * status before panic to ensure panic_write not failed.
+	 */
+	unsigned long *badmap;		/* bad block bit map */
+} oops_cxt;
+
+static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret;
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	ret = mtd_block_isbad(mtd, off);
+	if (ret < 0) {
+		dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
+		return ret;
+	} else if (ret > 0) {
+		set_bit(blknum, cxt->badmap);
+		return true;
+	}
+	return false;
+}
+
+static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	return test_bit(blknum, cxt->badmap);
+}
+
+static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
+	set_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+	clear_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+		clear_bit(zonenum, cxt->usedmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u64 blknum = div_u64(off, cxt->mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	return test_bit(zonenum, cxt->usedmap);
+}
+
+static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->usedmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
+		size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	size_t sz;
+	int i;
+
+	sz = min_t(uint32_t, size, mtd->writesize / 4);
+	for (i = 0; i < sz; i++) {
+		if (buf[i] != (char)0xFF)
+			return false;
+	}
+	return true;
+}
+
+static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
+	set_bit(zonenum, cxt->rmmap);
+}
+
+static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		clear_bit(zonenum, cxt->rmmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->rmmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	struct erase_info erase;
+	int ret;
+
+	dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
+	erase.len = cxt->mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(cxt->mtd, &erase);
+	if (!ret)
+		mtdpstore_block_clear_removed(cxt, off);
+	else
+		dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
+		       (unsigned long long)erase.addr,
+		       (unsigned long long)erase.len, cxt->info.device);
+	return ret;
+}
+
+/*
+ * called while removing file
+ *
+ * Avoiding over erasing, do erase block only when the whole block is unused.
+ * If the block contains valid log, do erase lazily on flush_removed() when
+ * unregister.
+ */
+static ssize_t mtdpstore_erase(size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -EIO;
+
+	mtdpstore_mark_unused(cxt, off);
+
+	/* If the block still has valid data, mtdpstore do erase lazily */
+	if (likely(mtdpstore_block_is_used(cxt, off))) {
+		mtdpstore_mark_removed(cxt, off);
+		return 0;
+	}
+
+	/* all zones are unused, erase it */
+	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
+	return mtdpstore_erase_do(cxt, off);
+}
+
+/*
+ * What is security for mtdpstore?
+ * As there is no erase for panic case, we should ensure at least one zone
+ * is writable. Otherwise, panic write will fail.
+ * If zone is used, write operation will return -ENOMSG, which means that
+ * pstore/blk will try one by one until gets an empty zone. So, it is not
+ * needed to ensure the next zone is empty, but at least one.
+ */
+static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret = 0, i;
+	struct mtd_info *mtd = cxt->mtd;
+	u32 zonenum = (u32)div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->info.kmsg_size);
+	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
+	u32 erasesize = cxt->mtd->erasesize;
+
+	for (i = 0; i < zonecnt; i++) {
+		u32 num = (zonenum + i) % zonecnt;
+
+		/* found empty zone */
+		if (!test_bit(num, cxt->usedmap))
+			return 0;
+	}
+
+	/* If there is no any empty zone, we have no way but to do erase */
+	off = ALIGN_DOWN(off, erasesize);
+	while (blkcnt--) {
+		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
+
+		if (mtdpstore_block_isbad(cxt, off))
+			continue;
+
+		ret = mtdpstore_erase_do(cxt, off);
+		if (!ret) {
+			mtdpstore_block_mark_unused(cxt, off);
+			break;
+		}
+	}
+
+	if (ret)
+		dev_err(&mtd->dev, "all blocks bad!\n");
+	dev_dbg(&mtd->dev, "end security\n");
+	return ret;
+}
+
+static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENOMSG;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENOMSG;
+
+	dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
+	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || retlen != size) {
+		dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+static inline bool mtdpstore_is_io_error(int ret)
+{
+	return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
+}
+
+/*
+ * All zones will be read as pstore/blk will read zone one by one when do
+ * recover.
+ */
+static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen, done;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENOMSG;
+
+	dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
+	for (done = 0, retlen = 0; done < size; done += retlen) {
+		retlen = 0;
+
+		ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
+				(u_char *)buf + done);
+		if (mtdpstore_is_io_error(ret)) {
+			dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
+					off + done, retlen, size - done, ret);
+			/* the zone may be broken, try next one */
+			return -ENOMSG;
+		}
+
+		/*
+		 * ECC error. The impact on log data is so small. Maybe we can
+		 * still read it and try to understand. So mtdpstore just hands
+		 * over what it gets and user can judge whether the data is
+		 * valid or not.
+		 */
+		if (mtd_is_eccerr(ret)) {
+			dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
+					off + done, retlen, size - done, ret);
+			/* driver may not set retlen when ecc error */
+			retlen = retlen == 0 ? size - done : retlen;
+		}
+	}
+
+	if (mtdpstore_is_empty(cxt, buf, size))
+		mtdpstore_mark_unused(cxt, off);
+	else
+		mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_panic_block_isbad(cxt, off))
+		return -ENOMSG;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENOMSG;
+
+	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || size != retlen) {
+		dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	return retlen;
+}
+
+static void mtdpstore_notify_add(struct mtd_info *mtd)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct pstore_blk_info *info = &cxt->info;
+	unsigned long longcnt;
+
+	if (!strcmp(mtd->name, info->device))
+		cxt->index = mtd->index;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
+
+	if (mtd->size < info->kmsg_size * 2) {
+		dev_err(&mtd->dev, "MTD partition %d not big enough\n",
+				mtd->index);
+		return;
+	}
+	/*
+	 * kmsg_size must be aligned to 4096 Bytes, which is limited by
+	 * psblk. The default value of kmsg_size is 64KB. If kmsg_size
+	 * is larger than erasesize, some errors will occur since mtdpsotre
+	 * is designed on it.
+	 */
+	if (mtd->erasesize < info->kmsg_size) {
+		dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
+				mtd->index);
+		return;
+	}
+	if (unlikely(info->kmsg_size % mtd->writesize)) {
+		dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
+				info->kmsg_size / 1024,
+				mtd->writesize / 1024);
+		return;
+	}
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->kmsg_size));
+	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
+	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	cxt->dev.total_size = mtd->size;
+	/* just support dmesg right now */
+	cxt->dev.flags = PSTORE_FLAGS_DMESG;
+	cxt->dev.read = mtdpstore_read;
+	cxt->dev.write = mtdpstore_write;
+	cxt->dev.erase = mtdpstore_erase;
+	cxt->dev.panic_write = mtdpstore_panic_write;
+
+	ret = psblk_register_device(&cxt->dev);
+	if (ret) {
+		dev_err(&mtd->dev, "mtd%d register to psblk failed\n",
+				mtd->index);
+		return;
+	}
+	cxt->mtd = mtd;
+	dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
+}
+
+static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
+		loff_t off, size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u_char *buf;
+	int ret;
+	size_t retlen;
+	struct erase_info erase;
+
+	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* 1st. read to cache */
+	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
+	if (mtdpstore_is_io_error(ret))
+		goto free;
+
+	/* 2nd. erase block */
+	erase.len = mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(mtd, &erase);
+	if (ret)
+		goto free;
+
+	/* 3rd. write back */
+	while (size) {
+		unsigned int zonesize = cxt->info.kmsg_size;
+
+		/* there is valid data on block, write back */
+		if (mtdpstore_is_used(cxt, off)) {
+			ret = mtd_write(mtd, off, zonesize, &retlen, buf);
+			if (ret)
+				dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
+						off, retlen, zonesize, ret);
+		}
+
+		off += zonesize;
+		size -= min_t(unsigned int, zonesize, size);
+	}
+
+free:
+	kfree(buf);
+	return ret;
+}
+
+/*
+ * What does mtdpstore_flush_removed() do?
+ * When user remove any log file on pstore filesystem, mtdpstore should do
+ * something to ensure log file removed. If the whole block is no longer used,
+ * it's nice to erase the block. However if the block still contains valid log,
+ * what mtdpstore can do is to erase and write the valid log back.
+ */
+static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	int ret;
+	loff_t off;
+	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
+
+	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
+		ret = mtdpstore_block_isbad(cxt, off);
+		if (ret)
+			continue;
+
+		ret = mtdpstore_block_is_removed(cxt, off);
+		if (!ret)
+			continue;
+
+		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static void mtdpstore_notify_remove(struct mtd_info *mtd)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	mtdpstore_flush_removed(cxt);
+
+	psblk_unregister_device(&cxt->dev);
+	kfree(cxt->badmap);
+	kfree(cxt->usedmap);
+	kfree(cxt->rmmap);
+	cxt->mtd = NULL;
+	cxt->index = -1;
+}
+
+static struct mtd_notifier mtdpstore_notifier = {
+	.add	= mtdpstore_notify_add,
+	.remove	= mtdpstore_notify_remove,
+};
+
+static int __init mtdpstore_init(void)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	struct pstore_blk_info *info = &cxt->info;
+
+	ret = pstore_blk_usr_info(info);
+	if (unlikely(ret))
+		return ret;
+
+	if (strlen(info->device) == 0) {
+		dev_err(&mtd->dev, "mtd device must be supplied\n");
+		return -EINVAL;
+	}
+	if (!info->kmsg_size) {
+		dev_err(&mtd->dev, "no backend enabled\n");
+		return -EINVAL;
+	}
+
+	/* Setup the MTD device to use */
+	ret = kstrtoint((char *)info->device, 0, &cxt->index);
+	if (ret)
+		cxt->index = -1;
+
+	register_mtd_user(&mtdpstore_notifier);
+	return 0;
+}
+module_init(mtdpstore_init);
+
+static void __exit mtdpstore_exit(void)
+{
+	unregister_mtd_user(&mtdpstore_notifier);
+}
+module_exit(mtdpstore_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("MTD backend for pstore/blk");
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index b882919b8784..4fb8ec9f3a1c 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -130,26 +130,6 @@ enum pstore_type_id pstore_name_to_type(const char *name)
 }
 EXPORT_SYMBOL_GPL(pstore_name_to_type);
 
-static const char *get_reason_str(enum kmsg_dump_reason reason)
-{
-	switch (reason) {
-	case KMSG_DUMP_PANIC:
-		return "Panic";
-	case KMSG_DUMP_OOPS:
-		return "Oops";
-	case KMSG_DUMP_EMERG:
-		return "Emergency";
-	case KMSG_DUMP_RESTART:
-		return "Restart";
-	case KMSG_DUMP_HALT:
-		return "Halt";
-	case KMSG_DUMP_POWEROFF:
-		return "Poweroff";
-	default:
-		return "Unknown";
-	}
-}
-
 static void pstore_timer_kick(void)
 {
 	if (pstore_update_ms < 0)
@@ -402,7 +382,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 	unsigned int	part = 1;
 	int		ret;
 
-	why = get_reason_str(reason);
+	why = kmsg_dump_reason_str(reason);
 
 	if (down_trylock(&psinfo->buf_lock)) {
 		/* Failed to acquire lock: give up if we cannot wait. */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 12/12] mtd: Support kmsg dumper based on pstore/blk
@ 2020-05-08  6:40   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  6:40 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, Kees Cook, linux-doc, Anton Vorontsov,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky, linux-mtd,
	Colin Cross

From: WeiXiong Liao <liaoweixiong@allwinnertech.com>

This introduces mtdpstore, which is similar to mtdoops but more
powerful. It uses pstore/blk, and aims to store panic and oops logs to
a flash partition, where pstore can later read back and present as files
in the mounted pstore filesystem.

To make mtdpstore work, the "blkdev" of pstore/blk should be set
as MTD device name or MTD device number. For more details, see
Documentation/admin-guide/pstore-blk.rst

This solves a number of issues:
- Work duplication: both of pstore and mtdoops do the same job storing
  panic/oops log. They have very similar logic, registering to kmsg
  dumper and storing logs to several chunks one by one.
- Layer violations: drivers should provides methods instead of polices.
  MTD should provide read/write/erase operations, and allow a higher
  level drivers to provide the chunk management, kmsg dump
  configuration, etc.
- Missing features: pstore provides many additional features, including
  presenting the logs as files, logging dump time and count, and
  supporting other frontends like pmsg, console, etc.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Link: https://lore.kernel.org/r/1585126506-18635-12-git-send-email-liaoweixiong@allwinnertech.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Documentation/admin-guide/pstore-blk.rst |   9 +-
 drivers/mtd/Kconfig                      |  10 +
 drivers/mtd/Makefile                     |   1 +
 drivers/mtd/mtdpstore.c                  | 564 +++++++++++++++++++++++
 fs/pstore/platform.c                     |  22 +-
 5 files changed, 583 insertions(+), 23 deletions(-)
 create mode 100644 drivers/mtd/mtdpstore.c

diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
index 2f3602397715..bf0b5a227e24 100644
--- a/Documentation/admin-guide/pstore-blk.rst
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -43,9 +43,9 @@ blkdev
 ~~~~~~
 
 The block device to use. Most of the time, it is a partition of block device.
-It's required for pstore/blk.
+It's required for pstore/blk. It is also used for MTD device.
 
-It accepts the following variants:
+It accepts the following variants for block device:
 
 1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
    leading 0x, for example b302.
@@ -64,6 +64,11 @@ It accepts the following variants:
    partition with a known unique id.
 #. <major>:<minor> major and minor number of the device separated by a colon.
 
+It accepts the following variants for MTD device:
+
+1. <device name> MTD device name. "pstore" is recommended.
+#. <device number> MTD device number.
+
 kmsg_size
 ~~~~~~~~~
 
diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
index 42d401ea60ee..6ddab796216d 100644
--- a/drivers/mtd/Kconfig
+++ b/drivers/mtd/Kconfig
@@ -170,6 +170,16 @@ config MTD_OOPS
 	  buffer in a flash partition where it can be read back at some
 	  later point.
 
+config MTD_PSTORE
+	tristate "Log panic/oops to an MTD buffer based on pstore"
+	depends on PSTORE_BLK
+	help
+	  This enables panic and oops messages to be logged to a circular
+	  buffer in a flash partition where it can be read back as files after
+	  mounting pstore filesystem.
+
+	  If unsure, say N.
+
 config MTD_SWAP
 	tristate "Swap on MTD device support"
 	depends on MTD && SWAP
diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index 56cc60ccc477..593d0593a038 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
 obj-$(CONFIG_SSFDC)		+= ssfdc.o
 obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
 obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
+obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
 obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
 
 nftl-objs		:= nftlcore.o nftlmount.o
diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
new file mode 100644
index 000000000000..50c8fc746f39
--- /dev/null
+++ b/drivers/mtd/mtdpstore.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define dev_fmt(fmt) "mtdoops-pstore: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pstore_blk.h>
+#include <linux/mtd/mtd.h>
+#include <linux/bitops.h>
+
+static struct mtdpstore_context {
+	int index;
+	struct pstore_blk_info info;
+	struct psblk_device dev;
+	struct mtd_info *mtd;
+	unsigned long *rmmap;		/* removed bit map */
+	unsigned long *usedmap;		/* used bit map */
+	/*
+	 * used for panic write
+	 * As there are no block_isbad for panic case, we should keep this
+	 * status before panic to ensure panic_write not failed.
+	 */
+	unsigned long *badmap;		/* bad block bit map */
+} oops_cxt;
+
+static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret;
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	ret = mtd_block_isbad(mtd, off);
+	if (ret < 0) {
+		dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
+		return ret;
+	} else if (ret > 0) {
+		set_bit(blknum, cxt->badmap);
+		return true;
+	}
+	return false;
+}
+
+static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	return test_bit(blknum, cxt->badmap);
+}
+
+static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
+	set_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+	clear_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+		clear_bit(zonenum, cxt->usedmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u64 blknum = div_u64(off, cxt->mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	return test_bit(zonenum, cxt->usedmap);
+}
+
+static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->usedmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
+		size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	size_t sz;
+	int i;
+
+	sz = min_t(uint32_t, size, mtd->writesize / 4);
+	for (i = 0; i < sz; i++) {
+		if (buf[i] != (char)0xFF)
+			return false;
+	}
+	return true;
+}
+
+static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
+	set_bit(zonenum, cxt->rmmap);
+}
+
+static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		clear_bit(zonenum, cxt->rmmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->rmmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	struct erase_info erase;
+	int ret;
+
+	dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
+	erase.len = cxt->mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(cxt->mtd, &erase);
+	if (!ret)
+		mtdpstore_block_clear_removed(cxt, off);
+	else
+		dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
+		       (unsigned long long)erase.addr,
+		       (unsigned long long)erase.len, cxt->info.device);
+	return ret;
+}
+
+/*
+ * called while removing file
+ *
+ * Avoiding over erasing, do erase block only when the whole block is unused.
+ * If the block contains valid log, do erase lazily on flush_removed() when
+ * unregister.
+ */
+static ssize_t mtdpstore_erase(size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -EIO;
+
+	mtdpstore_mark_unused(cxt, off);
+
+	/* If the block still has valid data, mtdpstore do erase lazily */
+	if (likely(mtdpstore_block_is_used(cxt, off))) {
+		mtdpstore_mark_removed(cxt, off);
+		return 0;
+	}
+
+	/* all zones are unused, erase it */
+	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
+	return mtdpstore_erase_do(cxt, off);
+}
+
+/*
+ * What is security for mtdpstore?
+ * As there is no erase for panic case, we should ensure at least one zone
+ * is writable. Otherwise, panic write will fail.
+ * If zone is used, write operation will return -ENOMSG, which means that
+ * pstore/blk will try one by one until gets an empty zone. So, it is not
+ * needed to ensure the next zone is empty, but at least one.
+ */
+static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret = 0, i;
+	struct mtd_info *mtd = cxt->mtd;
+	u32 zonenum = (u32)div_u64(off, cxt->info.kmsg_size);
+	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->info.kmsg_size);
+	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
+	u32 erasesize = cxt->mtd->erasesize;
+
+	for (i = 0; i < zonecnt; i++) {
+		u32 num = (zonenum + i) % zonecnt;
+
+		/* found empty zone */
+		if (!test_bit(num, cxt->usedmap))
+			return 0;
+	}
+
+	/* If there is no any empty zone, we have no way but to do erase */
+	off = ALIGN_DOWN(off, erasesize);
+	while (blkcnt--) {
+		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
+
+		if (mtdpstore_block_isbad(cxt, off))
+			continue;
+
+		ret = mtdpstore_erase_do(cxt, off);
+		if (!ret) {
+			mtdpstore_block_mark_unused(cxt, off);
+			break;
+		}
+	}
+
+	if (ret)
+		dev_err(&mtd->dev, "all blocks bad!\n");
+	dev_dbg(&mtd->dev, "end security\n");
+	return ret;
+}
+
+static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENOMSG;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENOMSG;
+
+	dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
+	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || retlen != size) {
+		dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+static inline bool mtdpstore_is_io_error(int ret)
+{
+	return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
+}
+
+/*
+ * All zones will be read as pstore/blk will read zone one by one when do
+ * recover.
+ */
+static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen, done;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENOMSG;
+
+	dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
+	for (done = 0, retlen = 0; done < size; done += retlen) {
+		retlen = 0;
+
+		ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
+				(u_char *)buf + done);
+		if (mtdpstore_is_io_error(ret)) {
+			dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
+					off + done, retlen, size - done, ret);
+			/* the zone may be broken, try next one */
+			return -ENOMSG;
+		}
+
+		/*
+		 * ECC error. The impact on log data is so small. Maybe we can
+		 * still read it and try to understand. So mtdpstore just hands
+		 * over what it gets and user can judge whether the data is
+		 * valid or not.
+		 */
+		if (mtd_is_eccerr(ret)) {
+			dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
+					off + done, retlen, size - done, ret);
+			/* driver may not set retlen when ecc error */
+			retlen = retlen == 0 ? size - done : retlen;
+		}
+	}
+
+	if (mtdpstore_is_empty(cxt, buf, size))
+		mtdpstore_mark_unused(cxt, off);
+	else
+		mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_panic_block_isbad(cxt, off))
+		return -ENOMSG;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENOMSG;
+
+	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || size != retlen) {
+		dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	return retlen;
+}
+
+static void mtdpstore_notify_add(struct mtd_info *mtd)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct pstore_blk_info *info = &cxt->info;
+	unsigned long longcnt;
+
+	if (!strcmp(mtd->name, info->device))
+		cxt->index = mtd->index;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
+
+	if (mtd->size < info->kmsg_size * 2) {
+		dev_err(&mtd->dev, "MTD partition %d not big enough\n",
+				mtd->index);
+		return;
+	}
+	/*
+	 * kmsg_size must be aligned to 4096 Bytes, which is limited by
+	 * psblk. The default value of kmsg_size is 64KB. If kmsg_size
+	 * is larger than erasesize, some errors will occur since mtdpsotre
+	 * is designed on it.
+	 */
+	if (mtd->erasesize < info->kmsg_size) {
+		dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
+				mtd->index);
+		return;
+	}
+	if (unlikely(info->kmsg_size % mtd->writesize)) {
+		dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
+				info->kmsg_size / 1024,
+				mtd->writesize / 1024);
+		return;
+	}
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->kmsg_size));
+	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
+	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	cxt->dev.total_size = mtd->size;
+	/* just support dmesg right now */
+	cxt->dev.flags = PSTORE_FLAGS_DMESG;
+	cxt->dev.read = mtdpstore_read;
+	cxt->dev.write = mtdpstore_write;
+	cxt->dev.erase = mtdpstore_erase;
+	cxt->dev.panic_write = mtdpstore_panic_write;
+
+	ret = psblk_register_device(&cxt->dev);
+	if (ret) {
+		dev_err(&mtd->dev, "mtd%d register to psblk failed\n",
+				mtd->index);
+		return;
+	}
+	cxt->mtd = mtd;
+	dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
+}
+
+static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
+		loff_t off, size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u_char *buf;
+	int ret;
+	size_t retlen;
+	struct erase_info erase;
+
+	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* 1st. read to cache */
+	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
+	if (mtdpstore_is_io_error(ret))
+		goto free;
+
+	/* 2nd. erase block */
+	erase.len = mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(mtd, &erase);
+	if (ret)
+		goto free;
+
+	/* 3rd. write back */
+	while (size) {
+		unsigned int zonesize = cxt->info.kmsg_size;
+
+		/* there is valid data on block, write back */
+		if (mtdpstore_is_used(cxt, off)) {
+			ret = mtd_write(mtd, off, zonesize, &retlen, buf);
+			if (ret)
+				dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
+						off, retlen, zonesize, ret);
+		}
+
+		off += zonesize;
+		size -= min_t(unsigned int, zonesize, size);
+	}
+
+free:
+	kfree(buf);
+	return ret;
+}
+
+/*
+ * What does mtdpstore_flush_removed() do?
+ * When user remove any log file on pstore filesystem, mtdpstore should do
+ * something to ensure log file removed. If the whole block is no longer used,
+ * it's nice to erase the block. However if the block still contains valid log,
+ * what mtdpstore can do is to erase and write the valid log back.
+ */
+static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	int ret;
+	loff_t off;
+	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
+
+	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
+		ret = mtdpstore_block_isbad(cxt, off);
+		if (ret)
+			continue;
+
+		ret = mtdpstore_block_is_removed(cxt, off);
+		if (!ret)
+			continue;
+
+		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static void mtdpstore_notify_remove(struct mtd_info *mtd)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	mtdpstore_flush_removed(cxt);
+
+	psblk_unregister_device(&cxt->dev);
+	kfree(cxt->badmap);
+	kfree(cxt->usedmap);
+	kfree(cxt->rmmap);
+	cxt->mtd = NULL;
+	cxt->index = -1;
+}
+
+static struct mtd_notifier mtdpstore_notifier = {
+	.add	= mtdpstore_notify_add,
+	.remove	= mtdpstore_notify_remove,
+};
+
+static int __init mtdpstore_init(void)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	struct pstore_blk_info *info = &cxt->info;
+
+	ret = pstore_blk_usr_info(info);
+	if (unlikely(ret))
+		return ret;
+
+	if (strlen(info->device) == 0) {
+		dev_err(&mtd->dev, "mtd device must be supplied\n");
+		return -EINVAL;
+	}
+	if (!info->kmsg_size) {
+		dev_err(&mtd->dev, "no backend enabled\n");
+		return -EINVAL;
+	}
+
+	/* Setup the MTD device to use */
+	ret = kstrtoint((char *)info->device, 0, &cxt->index);
+	if (ret)
+		cxt->index = -1;
+
+	register_mtd_user(&mtdpstore_notifier);
+	return 0;
+}
+module_init(mtdpstore_init);
+
+static void __exit mtdpstore_exit(void)
+{
+	unregister_mtd_user(&mtdpstore_notifier);
+}
+module_exit(mtdpstore_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("MTD backend for pstore/blk");
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index b882919b8784..4fb8ec9f3a1c 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -130,26 +130,6 @@ enum pstore_type_id pstore_name_to_type(const char *name)
 }
 EXPORT_SYMBOL_GPL(pstore_name_to_type);
 
-static const char *get_reason_str(enum kmsg_dump_reason reason)
-{
-	switch (reason) {
-	case KMSG_DUMP_PANIC:
-		return "Panic";
-	case KMSG_DUMP_OOPS:
-		return "Oops";
-	case KMSG_DUMP_EMERG:
-		return "Emergency";
-	case KMSG_DUMP_RESTART:
-		return "Restart";
-	case KMSG_DUMP_HALT:
-		return "Halt";
-	case KMSG_DUMP_POWEROFF:
-		return "Poweroff";
-	default:
-		return "Unknown";
-	}
-}
-
 static void pstore_timer_kick(void)
 {
 	if (pstore_update_ms < 0)
@@ -402,7 +382,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 	unsigned int	part = 1;
 	int		ret;
 
-	why = get_reason_str(reason);
+	why = kmsg_dump_reason_str(reason);
 
 	if (down_trylock(&psinfo->buf_lock)) {
 		/* Failed to acquire lock: give up if we cannot wait. */
-- 
2.20.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
  2020-05-08  6:39 ` Kees Cook
@ 2020-05-08  7:27   ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  7:27 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

On Thu, May 07, 2020 at 11:39:52PM -0700, Kees Cook wrote:
> So far, I've identified the following stuff left to do:
> [...]
>         - implement ramoops-like probe feature for pstore/blk

With the following hack, I'm able to start testing the series:

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index a736555e1ed3..7145da079267 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -373,12 +373,14 @@ int psblk_register_blkdev(unsigned int major, unsigned int flags,
 	if (IS_ERR(binfo))
 		return PTR_ERR(binfo);
 
+#if 0
 	/* only allow driver matching the @blkdev */
 	if (!binfo->devt || MAJOR(binfo->devt) != major) {
 		pr_debug("invalid major %u (expect %u)\n",
 				major, MAJOR(binfo->devt));
 		return -ENODEV;
 	}
+#endif
 
 	/* hold bdev exclusively */
 	bdev = psblk_get_bdev(holder);
@@ -423,7 +425,7 @@ void psblk_unregister_blkdev(unsigned int major)
 	struct psblk_device dev = {.read = psblk_generic_blk_read};
 	void *holder = blkdev;
 
-	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
+	if (psblk_bdev/* && MAJOR(psblk_bdev->bd_dev) == major*/) {
 		psblk_unregister_device(&dev);
 		psblk_put_bdev(psblk_bdev, holder);
 		blkdev_panic_write = NULL;
@@ -476,6 +478,24 @@ int pstore_blk_usr_info(struct pstore_blk_info *info)
 }
 EXPORT_SYMBOL_GPL(pstore_blk_usr_info);
 
+static int __init pstore_blk_init(void)
+{
+	int ret = 0;
+
+	if (blkdev[0])
+		ret = psblk_register_blkdev(0, 0, NULL);
+
+	return ret;
+}
+postcore_initcall(pstore_blk_init);
+
+static void __exit pstore_blk_exit(void)
+{
+	psblk_unregister_blkdev(0);
+}
+module_exit(pstore_blk_exit);
+
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
 MODULE_DESCRIPTION("pstore backend for block devices");


Then I can get things up and running with:

# insmod pstore.ko compress=off
# insmod pstore_zone.ko
# truncate pstore-blk.raw --size 100M
# losetup -f --show pstore-blk.raw
/dev/loop0
# insmod pstore_blk.ko blkdev=/dev/loop0 kmsg_size=16 console_size=64

So far, I've hit a few bugs. The most obvious is that "rmmod" causes a
fault, so I think locking and other things need to be fixed up further.
After that, it looked like all the compressed files were failing to
decompress, which implies some kind of buffer offset problem. When I
loaded with pstore.compress=off I got readable logs, but there is a span
of garbage between the header and the body in
/sys/fs/pstore/dmesg-pstore-zone-1 etc.

Cool so far! It just needs a bit more testing a polish. :)

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
@ 2020-05-08  7:27   ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-08  7:27 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

On Thu, May 07, 2020 at 11:39:52PM -0700, Kees Cook wrote:
> So far, I've identified the following stuff left to do:
> [...]
>         - implement ramoops-like probe feature for pstore/blk

With the following hack, I'm able to start testing the series:

diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index a736555e1ed3..7145da079267 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -373,12 +373,14 @@ int psblk_register_blkdev(unsigned int major, unsigned int flags,
 	if (IS_ERR(binfo))
 		return PTR_ERR(binfo);
 
+#if 0
 	/* only allow driver matching the @blkdev */
 	if (!binfo->devt || MAJOR(binfo->devt) != major) {
 		pr_debug("invalid major %u (expect %u)\n",
 				major, MAJOR(binfo->devt));
 		return -ENODEV;
 	}
+#endif
 
 	/* hold bdev exclusively */
 	bdev = psblk_get_bdev(holder);
@@ -423,7 +425,7 @@ void psblk_unregister_blkdev(unsigned int major)
 	struct psblk_device dev = {.read = psblk_generic_blk_read};
 	void *holder = blkdev;
 
-	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
+	if (psblk_bdev/* && MAJOR(psblk_bdev->bd_dev) == major*/) {
 		psblk_unregister_device(&dev);
 		psblk_put_bdev(psblk_bdev, holder);
 		blkdev_panic_write = NULL;
@@ -476,6 +478,24 @@ int pstore_blk_usr_info(struct pstore_blk_info *info)
 }
 EXPORT_SYMBOL_GPL(pstore_blk_usr_info);
 
+static int __init pstore_blk_init(void)
+{
+	int ret = 0;
+
+	if (blkdev[0])
+		ret = psblk_register_blkdev(0, 0, NULL);
+
+	return ret;
+}
+postcore_initcall(pstore_blk_init);
+
+static void __exit pstore_blk_exit(void)
+{
+	psblk_unregister_blkdev(0);
+}
+module_exit(pstore_blk_exit);
+
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
 MODULE_DESCRIPTION("pstore backend for block devices");


Then I can get things up and running with:

# insmod pstore.ko compress=off
# insmod pstore_zone.ko
# truncate pstore-blk.raw --size 100M
# losetup -f --show pstore-blk.raw
/dev/loop0
# insmod pstore_blk.ko blkdev=/dev/loop0 kmsg_size=16 console_size=64

So far, I've hit a few bugs. The most obvious is that "rmmod" causes a
fault, so I think locking and other things need to be fixed up further.
After that, it looked like all the compressed files were failing to
decompress, which implies some kind of buffer offset problem. When I
loaded with pstore.compress=off I got readable logs, but there is a span
of garbage between the header and the body in
/sys/fs/pstore/dmesg-pstore-zone-1 etc.

Cool so far! It just needs a bit more testing a polish. :)

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/12] pstore/zone: Introduce common layer to manage storage zones
  2020-05-08  6:39   ` Kees Cook
@ 2020-05-09  3:09     ` WeiXiong Liao
  -1 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  3:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> Implement a common set of APIs needed to support pstore storage zones,
> based on how ramoops is designed. This will be used by pstore/blk with
> the intention of migrating pstore/ram in the future.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-2-git-send-email-liaoweixiong@allwinnertech.com
> Co-developed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig           |   7 +
>  fs/pstore/Makefile          |   3 +
>  fs/pstore/zone.c            | 973 ++++++++++++++++++++++++++++++++++++
>  include/linux/pstore_zone.h |  44 ++
>  4 files changed, 1027 insertions(+)
>  create mode 100644 fs/pstore/zone.c
>  create mode 100644 include/linux/pstore_zone.h
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 8f0369aad22a..98d2457bdd9f 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -153,3 +153,10 @@ config PSTORE_RAM
>  	  "ramoops.ko".
>  
>  	  For more information, see Documentation/admin-guide/ramoops.rst.
> +
> +config PSTORE_ZONE
> +	tristate
> +	depends on PSTORE
> +	help
> +	  The common layer for pstore/blk (and pstore/ram in the future)
> +	  to manage storage in zones.
> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
> index 967b5891f325..58a967cbe4af 100644
> --- a/fs/pstore/Makefile
> +++ b/fs/pstore/Makefile
> @@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
>  
>  ramoops-objs += ram.o ram_core.o
>  obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
> +
> +pstore_zone-objs += zone.o
> +obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
> diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
> new file mode 100644
> index 000000000000..6c25c443c8e2
> --- /dev/null
> +++ b/fs/pstore/zone.c
> @@ -0,0 +1,973 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define MODNAME "pstore-zone"
> +#define pr_fmt(fmt) MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/pstore.h>
> +#include <linux/mount.h>
> +#include <linux/printk.h>
> +#include <linux/fs.h>
> +#include <linux/pstore_zone.h>
> +#include <linux/kdev_t.h>
> +#include <linux/device.h>
> +#include <linux/namei.h>
> +#include <linux/fcntl.h>
> +#include <linux/uio.h>
> +#include <linux/writeback.h>
> +
> +/**
> + * struct psz_head - header of zone to flush to storage
> + *
> + * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
> + * @datalen: length of data in @data
> + * @data: zone data.
> + */
> +struct psz_buffer {
> +#define PSZ_SIG (0x43474244) /* DBGC */
> +	uint32_t sig;
> +	atomic_t datalen;
> +	uint8_t data[];
> +};
> +
> +/**
> + * struct psz_oops_header - sub header of oops zones to flush to storage
> + *
> + * @magic: magic num for oops header
> + * @time: oops/panic trigger time
> + * @compressed: whether conpressed
> + * @counter: oops/panic counter
> + * @reason: identify oops or panic
> + * @data: pointer to log data
> + *
> + * It's a sub-header of oops zone, trailing after &psz_buffer.
> + */
> +struct psz_oops_header {
> +#define OOPS_HEADER_MAGIC 0x4dfc3ae5 /* Just a ramdom number */
> +	uint32_t magic;
> +	struct timespec64 time;
> +	bool compressed;
> +	uint32_t counter;
> +	enum kmsg_dump_reason reason;
> +	uint8_t data[];
> +};
> +
> +/**
> + * struct pstore_zone - zone information
> + *
> + * @off: zone offset of storage
> + * @type: front-end type for this zone
> + * @name: front-end name for this zone
> + * @buffer: pointer to data buffer managed by this zone
> + * @oldbuf: pointer to old data buffer.
> + * @buffer_size: bytes in @buffer->data
> + * @should_recover: whether this zone should recover from storage
> + * @dirty: whether the data in @buffer dirty
> + *
> + * zone structure in memory.
> + */
> +struct pstore_zone {
> +	loff_t off;
> +	const char *name;
> +	enum pstore_type_id type;
> +
> +	struct psz_buffer *buffer;
> +	struct psz_buffer *oldbuf;
> +	size_t buffer_size;
> +	bool should_recover;
> +	atomic_t dirty;
> +};
> +
> +/**
> + * struct psz_context - all about running state of pstore/zone
> + *
> + * @opszs: oops/panic storage zones
> + * @oops_max_cnt: max count of @opszs
> + * @oops_read_cnt: counter to read oops zone
> + * @oops_write_cnt: counter to write
> + * @oops_counter: counter to oops
> + * @panic_counter: counter to panic
> + * @recovered: whether finish recovering data from storage
> + * @on_panic: whether occur panic
> + * @pstore_zone_info_lock: lock to @pstore_zone_info
> + * @pstore_zone_info: information from back-end
> + * @pstore: structure for pstore
> + */
> +struct psz_context {
> +	struct pstore_zone **opszs;
> +	unsigned int oops_max_cnt;
> +	unsigned int oops_read_cnt;
> +	unsigned int oops_write_cnt;
> +	/*
> +	 * the counter should be recovered when recover.
> +	 * It records the oops/panic times after burning rather than booting.
> +	 */
> +	unsigned int oops_counter;
> +	unsigned int panic_counter;
> +	atomic_t recovered;
> +	atomic_t on_panic;
> +
> +	/*
> +	 * pstore_zone_info_lock just protects "pstore_zone_info" during calls to
> +	 * register_pstore_zone/unregister_pstore_zone
> +	 */
> +	struct mutex pstore_zone_info_lock;
> +	struct pstore_zone_info *pstore_zone_info;
> +	struct pstore_info pstore;
> +};
> +static struct psz_context psz_cxt;
> +
> +/**
> + * enum psz_flush_mode - flush mode for psz_zone_write()
> + *
> + * @FLUSH_NONE: do not flush to storage but update data on memory
> + * @FLUSH_PART: just flush part of data including meta data to storage
> + * @FLUSH_META: just flush meta data of zone to storage
> + * @FLUSH_ALL: flush all of zone
> + */
> +enum psz_flush_mode {
> +	FLUSH_NONE = 0,
> +	FLUSH_PART,
> +	FLUSH_META,
> +	FLUSH_ALL,
> +};
> +
> +static inline int buffer_datalen(struct pstore_zone *zone)
> +{
> +	return atomic_read(&zone->buffer->datalen);
> +}
> +
> +static inline bool is_on_panic(void)
> +{
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	return atomic_read(&cxt->on_panic);
> +}
> +
> +static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
> +		size_t len, unsigned long off)
> +{
> +	if (!buf || !zone->buffer)
> +		return -EINVAL;
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +	len = min_t(size_t, len, zone->buffer_size - off);
> +	memcpy(buf, zone->buffer->data + off, len);
> +	return len;
> +}
> +
> +static int psz_zone_write(struct pstore_zone *zone,
> +		enum psz_flush_mode flush_mode, const char *buf,
> +		size_t len, unsigned long off)
> +{
> +	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
> +	ssize_t wcnt = 0;
> +	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
> +	size_t wlen;
> +
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +
> +	wlen = min_t(size_t, len, zone->buffer_size - off);
> +	if (buf && wlen) {
> +		memcpy(zone->buffer->data + off, buf, wlen);
> +		atomic_set(&zone->buffer->datalen, wlen + off);
> +	}
> +
> +	/* avoid to damage old records */
> +	if (!is_on_panic() && !atomic_read(&psz_cxt.recovered))
> +		goto dirty;
> +
> +	writeop = is_on_panic() ? info->panic_write : info->write;
> +	if (!writeop)
> +		goto dirty;
> +
> +	switch (flush_mode) {
> +	case FLUSH_NONE:
> +		if (unlikely(buf && wlen))
> +			goto dirty;
> +		return 0;
> +	case FLUSH_PART:
> +		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
> +				zone->off + sizeof(*zone->buffer) + off);
> +		if (wcnt != wlen)
> +			goto dirty;
> +		fallthrough;
> +	case FLUSH_META:
> +		wlen = sizeof(struct psz_buffer);
> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
> +		if (wcnt != wlen)
> +			goto dirty;
> +		break;
> +	case FLUSH_ALL:
> +		wlen = zone->buffer_size + sizeof(*zone->buffer);
> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
> +		if (wcnt != wlen)
> +			goto dirty;
> +		break;
> +	}
> +
> +	return 0;
> +dirty:
> +	atomic_set(&zone->dirty, true);
> +	return -EBUSY;
> +}
> +
> +static int psz_flush_dirty_zone(struct pstore_zone *zone)
> +{
> +	int ret;
> +
> +	if (!zone)
> +		return -EINVAL;
> +
> +	if (!atomic_read(&zone->dirty))
> +		return 0;
> +
> +	if (!atomic_read(&psz_cxt.recovered))
> +		return -EBUSY;
> +
> +	ret = psz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
> +	if (!ret)
> +		atomic_set(&zone->dirty, false);
> +	return ret;
> +}

To avoid multi writers call flush_dirty_zone(), I prefer to
use atomic_xchg() as follow:

	static int psz_flush_dirty_zone(struct pstore_zone *zone)
	{
	        int ret;

	        if (unlikely(!zone))
	                return -EINVAL;

	        if (unlikely(!atomic_read(&psz_cxt.recovered)))
	                return -EBUSY;

	       if (!atomic_xchg(&zone->dirty, false))
	                return 0;

	        ret = psz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
	        if (ret)
	                atomic_set(&zone->dirty, true);
	        return ret;
	}

> +
> +static int psz_flush_dirty_zones(struct pstore_zone **zones, unsigned int cnt)
> +{
> +	int i, ret;
> +	struct pstore_zone *zone;
> +
> +	if (!zones)
> +		return -EINVAL;
> +
> +	for (i = 0; i < cnt; i++) {
> +		zone = zones[i];
> +		if (!zone)
> +			return -EINVAL;
> +		ret = psz_flush_dirty_zone(zone);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
> +{
> +	const char *data = (const char *)old->buffer->data;
> +	int ret;
> +
> +	ret = psz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
> +	if (ret) {
> +		atomic_set(&new->buffer->datalen, 0);
> +		atomic_set(&new->dirty, false);
> +		return ret;
> +	}
> +	atomic_set(&old->buffer->datalen, 0);
> +	return 0;
> +}
> +
> +static int psz_recover_oops_data(struct psz_context *cxt)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	struct pstore_zone *zone = NULL;
> +	struct psz_buffer *buf;
> +	unsigned long i;
> +	ssize_t rcnt;
> +
> +	if (!info->read)
> +		return -EINVAL;
> +
> +	for (i = 0; i < cxt->oops_max_cnt; i++) {
> +		zone = cxt->opszs[i];
> +		if (unlikely(!zone))
> +			return -EINVAL;
> +		if (atomic_read(&zone->dirty)) {
> +			unsigned int wcnt = cxt->oops_write_cnt;
> +			struct pstore_zone *new = cxt->opszs[wcnt];
> +			int ret;
> +
> +			ret = psz_move_zone(zone, new);
> +			if (ret) {
> +				pr_err("move zone from %lu to %d failed\n",
> +						i, wcnt);
> +				return ret;
> +			}
> +			cxt->oops_write_cnt = (wcnt + 1) % cxt->oops_max_cnt;
> +		}
> +		if (!zone->should_recover)
> +			continue;
> +		buf = zone->buffer;
> +		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
> +				zone->off);
> +		if (rcnt != zone->buffer_size + sizeof(*buf))
> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int psz_recover_oops_meta(struct psz_context *cxt)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	struct pstore_zone *zone;
> +	size_t rcnt, len;
> +	struct psz_buffer *buf;
> +	struct psz_oops_header *hdr;
> +	struct timespec64 time = {0};
> +	unsigned long i;
> +	/*
> +	 * Recover may on panic, we can't allocate any memory by kmalloc.
> +	 * So, we use local array instead.
> +	 */
> +	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
> +
> +	if (!info->read)
> +		return -EINVAL;
> +
> +	len = sizeof(*buf) + sizeof(*hdr);
> +	buf = (struct psz_buffer *)buffer_header;
> +	for (i = 0; i < cxt->oops_max_cnt; i++) {
> +		zone = cxt->opszs[i];
> +		if (unlikely(!zone))
> +			return -EINVAL;
> +
> +		rcnt = info->read((char *)buf, len, zone->off);
> +		if (rcnt != len) {
> +			pr_err("read %s with id %lu failed\n", zone->name, i);
> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		}
> +
> +		if (buf->sig != zone->buffer->sig) {
> +			pr_debug("no valid data in oops zone %lu\n", i);
> +			continue;
> +		}
> +
> +		if (zone->buffer_size < atomic_read(&buf->datalen)) {
> +			pr_info("found overtop zone: %s: id %lu, off %lld, size %zu\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size);
> +			continue;
> +		}
> +
> +		hdr = (struct psz_oops_header *)buf->data;
> +		if (hdr->magic != OOPS_HEADER_MAGIC) {
> +			pr_info("found invalid zone: %s: id %lu, off %lld, size %zu\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size);
> +			continue;
> +		}
> +
> +		/*
> +		 * we get the newest zone, and the next one must be the oldest
> +		 * or unused zone, because we do write one by one like a circle.
> +		 */
> +		if (hdr->time.tv_sec >= time.tv_sec) {
> +			time.tv_sec = hdr->time.tv_sec;
> +			cxt->oops_write_cnt = (i + 1) % cxt->oops_max_cnt;
> +		}
> +
> +		if (hdr->reason == KMSG_DUMP_OOPS)
> +			cxt->oops_counter =
> +				max(cxt->oops_counter, hdr->counter);
> +		else
> +			cxt->panic_counter =
> +				max(cxt->panic_counter, hdr->counter);
> +
> +		if (!atomic_read(&buf->datalen)) {
> +			pr_debug("found erased zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size,
> +					atomic_read(&buf->datalen));
> +			continue;
> +		}
> +
> +		if (!is_on_panic())
> +			zone->should_recover = true;
> +		pr_debug("found nice zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
> +				zone->name, i, zone->off,
> +				zone->buffer_size, atomic_read(&buf->datalen));
> +	}
> +
> +	return 0;
> +}
> +
> +static int psz_recover_oops(struct psz_context *cxt)
> +{
> +	int ret;
> +
> +	if (!cxt->opszs)
> +		return 0;
> +
> +	ret = psz_recover_oops_meta(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	ret = psz_recover_oops_data(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	return 0;
> +recover_fail:
> +	pr_debug("recover oops failed\n");
> +	return ret;
> +}
> +
> +/**
> + * psz_recovery() - recover data from storage
> + * @cxt: the context of pstore/zone
> + *
> + * recovery means reading data back from storage after rebooting
> + *
> + * Return: 0 on success, others on failure.
> + */
> +static inline int psz_recovery(struct psz_context *cxt)
> +{
> +	int ret = -EBUSY;
> +
> +	if (atomic_read(&cxt->recovered))
> +		return 0;
> +
> +	ret = psz_recover_oops(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	pr_debug("recover end!\n");
> +	atomic_set(&cxt->recovered, 1);
> +	return 0;
> +
> +recover_fail:
> +	pr_err("recover failed\n");
> +	return ret;
> +}
> +
> +static int psz_pstore_open(struct pstore_info *psi)
> +{
> +	struct psz_context *cxt = psi->data;
> +
> +	cxt->oops_read_cnt = 0;
> +	return 0;
> +}
> +
> +static inline bool psz_ok(struct pstore_zone *zone)
> +{
> +	if (zone && zone->buffer && buffer_datalen(zone))
> +		return true;
> +	return false;
> +}
> +
> +static inline int psz_oops_erase(struct psz_context *cxt,
> +		struct pstore_zone *zone, struct pstore_record *record)
> +{
> +	struct psz_buffer *buffer = zone->buffer;
> +	struct psz_oops_header *hdr =
> +		(struct psz_oops_header *)buffer->data;
> +
> +	if (unlikely(!psz_ok(zone)))
> +		return 0;
> +	/* this zone is already updated, no need to erase */
> +	if (record->count != hdr->counter)
> +		return 0;
> +
> +	atomic_set(&zone->buffer->datalen, 0);
> +	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +}
> +
> +static int psz_pstore_erase(struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		if (record->id >= cxt->oops_max_cnt)
> +			return -EINVAL;
> +		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static void psz_write_kmsg_hdr(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +	struct psz_buffer *buffer = zone->buffer;
> +	struct psz_oops_header *hdr =
> +		(struct psz_oops_header *)buffer->data;
> +
> +	hdr->magic = OOPS_HEADER_MAGIC;
> +	hdr->compressed = record->compressed;
> +	hdr->time.tv_sec = record->time.tv_sec;
> +	hdr->time.tv_nsec = record->time.tv_nsec;
> +	hdr->reason = record->reason;
> +	if (hdr->reason == KMSG_DUMP_OOPS)
> +		hdr->counter = ++cxt->oops_counter;
> +	else
> +		hdr->counter = ++cxt->panic_counter;
> +}
> +
> +static inline int notrace psz_oops_write_record(struct psz_context *cxt,
> +		struct pstore_record *record)
> +{
> +	size_t size, hlen;
> +	struct pstore_zone *zone;
> +	unsigned int zonenum;
> +
> +	zonenum = cxt->oops_write_cnt;
> +	zone = cxt->opszs[zonenum];
> +	if (unlikely(!zone))
> +		return -ENOSPC;
> +	cxt->oops_write_cnt = (zonenum + 1) % cxt->oops_max_cnt;
> +
> +	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
> +	psz_write_kmsg_hdr(zone, record);
> +	hlen = sizeof(struct psz_oops_header);
> +	size = min_t(size_t, record->size, zone->buffer_size - hlen);
> +	return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
> +}
> +
> +static int notrace psz_oops_write(struct psz_context *cxt,
> +		struct pstore_record *record)
> +{
> +	int ret;
> +
> +	/*
> +	 * Explicitly only take the first part of any new crash.
> +	 * If our buffer is larger than kmsg_bytes, this can never happen,
> +	 * and if our buffer is smaller than kmsg_bytes, we don't want the
> +	 * report split across multiple records.
> +	 */
> +	if (record->part != 1)
> +		return -ENOSPC;
> +
> +	if (!cxt->opszs)
> +		return -ENOSPC;
> +
> +	ret = psz_oops_write_record(cxt, record);
> +	if (!ret) {
> +		pr_debug("try to flush other dirty oops zones\n");
> +		psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
> +	}
> +
> +	/* always return 0 as we had handled it on buffer */
> +	return 0;
> +}
> +
> +static int notrace psz_pstore_write(struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +
> +	if (record->type == PSTORE_TYPE_DMESG &&
> +			record->reason == KMSG_DUMP_PANIC)
> +		atomic_set(&cxt->on_panic, 1);
> +
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		return psz_oops_write(cxt, record);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
> +{
> +	struct pstore_zone *zone = NULL;
> +
> +	while (cxt->oops_read_cnt < cxt->oops_max_cnt) {
> +		zone = cxt->opszs[cxt->oops_read_cnt++];
> +		if (psz_ok(zone))
> +			return zone;
> +	}
> +
> +	return NULL;
> +}
> +
> +static int psz_read_oops_hdr(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	struct psz_buffer *buffer = zone->buffer;
> +	struct psz_oops_header *hdr =
> +		(struct psz_oops_header *)buffer->data;
> +
> +	if (hdr->magic != OOPS_HEADER_MAGIC)
> +		return -EINVAL;
> +	record->compressed = hdr->compressed;
> +	record->time.tv_sec = hdr->time.tv_sec;
> +	record->time.tv_nsec = hdr->time.tv_nsec;
> +	record->reason = hdr->reason;
> +	record->count = hdr->counter;
> +	return 0;
> +}
> +
> +static ssize_t psz_oops_read(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	ssize_t size, hlen = 0;
> +
> +	size = buffer_datalen(zone);
> +	/* Clear and skip this oops record if it has no valid header */
> +	if (psz_read_oops_hdr(zone, record)) {
> +		atomic_set(&zone->buffer->datalen, 0);
> +		atomic_set(&zone->dirty, 0);
> +		return -ENOMSG;
> +	}
> +	size -= sizeof(struct psz_oops_header);
> +
> +	if (!record->compressed) {
> +		char *buf = kasprintf(GFP_KERNEL, "%s: Total %d times\n",
> +				      kmsg_dump_reason_str(record->reason),
> +				      record->count);
> +		hlen = strlen(buf);
> +		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
> +		if (!record->buf) {
> +			kfree(buf);
> +			return -ENOMEM;
> +		}
> +	} else {
> +		record->buf = kmalloc(size, GFP_KERNEL);
> +		if (!record->buf)
> +			return -ENOMEM;
> +	}
> +
> +	size = psz_zone_read(zone, record->buf + hlen, size,
> +			sizeof(struct psz_oops_header) < 0);

Here should be:
	sizeof(struct psz_oops_header));

That's the reason why all the compressed files were failing to
decompress.

> +	if (unlikely(size < 0)) {
> +		kfree(record->buf);
> +		return -ENOMSG;
> +	}
> +
> +	return size + hlen;
> +}
> +
> +static ssize_t psz_pstore_read(struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +	ssize_t (*readop)(struct pstore_zone *zone,
> +			struct pstore_record *record);
> +	struct pstore_zone *zone;
> +	ssize_t ret;
> +
> +	/* before read, we must recover from storage */
> +	ret = psz_recovery(cxt);
> +	if (ret)
> +		return ret;
> +
> +next_zone:
> +	zone = psz_read_next_zone(cxt);
> +	if (!zone)
> +		return 0;
> +
> +	record->type = zone->type;
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		readop = psz_oops_read;
> +		record->id = cxt->oops_read_cnt - 1;
> +		break;
> +	default:
> +		goto next_zone;
> +	}
> +
> +	ret = readop(zone, record);
> +	if (ret == -ENOMSG)
> +		goto next_zone;
> +	return ret;
> +}
> +
> +static struct psz_context psz_cxt = {
> +	.pstore_zone_info_lock = __MUTEX_INITIALIZER(psz_cxt.pstore_zone_info_lock),
> +	.recovered = ATOMIC_INIT(0),
> +	.on_panic = ATOMIC_INIT(0),
> +	.pstore = {
> +		.owner = THIS_MODULE,
> +		.name = MODNAME,
> +		.open = psz_pstore_open,
> +		.read = psz_pstore_read,
> +		.write = psz_pstore_write,
> +		.erase = psz_pstore_erase,
> +	},
> +};
> +
> +static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
> +		loff_t *off, size_t size)
> +{
> +	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
> +	struct pstore_zone *zone;
> +	const char *name = pstore_type_to_name(type);
> +
> +	if (!size)
> +		return NULL;
> +
> +	if (*off + size > info->total_size) {
> +		pr_err("no room for %s (0x%zx@0x%llx over 0x%lx)\n",
> +			name, size, *off, info->total_size);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	zone = kzalloc(sizeof(struct pstore_zone), GFP_KERNEL);
> +	if (!zone)
> +		return ERR_PTR(-ENOMEM);
> +
> +	zone->buffer = kmalloc(size, GFP_KERNEL);
> +	if (!zone->buffer) {
> +		kfree(zone);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	memset(zone->buffer, 0xFF, size);
> +	zone->off = *off;
> +	zone->name = name;
> +	zone->type = type;
> +	zone->buffer_size = size - sizeof(struct psz_buffer);
> +	zone->buffer->sig = type ^ PSZ_SIG;
> +	atomic_set(&zone->dirty, 0);
> +	atomic_set(&zone->buffer->datalen, 0);
> +
> +	*off += size;
> +
> +	pr_debug("pszone %s: off 0x%llx, %zu header, %zu data\n", zone->name,
> +			zone->off, sizeof(*zone->buffer), zone->buffer_size);
> +	return zone;
> +}
> +
> +static struct pstore_zone **psz_init_zones(enum pstore_type_id type,
> +	loff_t *off, size_t total_size, ssize_t record_size,
> +	unsigned int *cnt)
> +{
> +	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
> +	struct pstore_zone **zones, *zone;
> +	const char *name = pstore_type_to_name(type);
> +	int c, i;
> +
> +	if (!total_size || !record_size)
> +		return NULL;
> +
> +	if (*off + total_size > info->total_size) {
> +		pr_err("no room for zones %s (0x%zx@0x%llx over 0x%lx)\n",
> +			name, total_size, *off, info->total_size);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	c = total_size / record_size;
> +	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
> +	if (!zones) {
> +		pr_err("allocate for zones %s failed\n", name);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	memset(zones, 0, c * sizeof(*zones));
> +
> +	for (i = 0; i < c; i++) {
> +		zone = psz_init_zone(type, off, record_size);
> +		if (!zone || IS_ERR(zone)) {
> +			pr_err("initialize zones %s failed\n", name);
> +			while (--i >= 0) {
> +				kfree(zones[i]->buffer);
> +				kfree(zones[i]);
> +			}
> +			kfree(zones);
> +			return (void *)zone;
> +		}
> +		zones[i] = zone;
> +	}
> +
> +	*cnt = c;
> +	return zones;
> +}
> +
> +static void psz_free_zone(struct pstore_zone **pszone)
> +{
> +	struct pstore_zone *zone = *pszone;
> +
> +	if (!zone)
> +		return;
> +
> +	kfree(zone->buffer);
> +	kfree(zone);
> +	*pszone = NULL;
> +}
> +
> +static void psz_free_zones(struct pstore_zone ***pszones, unsigned int *cnt)
> +{
> +	struct pstore_zone **zones = *pszones;
> +
> +	if (!zones)
> +		return;
> +
> +	while (*cnt > 0) {
> +		psz_free_zone(&zones[*cnt]);
> +		(*cnt)--;
> +	}
> +	kfree(zones);
> +	*pszones = NULL;
> +}
> +
> +static void psz_free_all_zones(struct psz_context *cxt)
> +{
> +	if (cxt->opszs)
> +		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
> +}
> +
> +static int psz_alloc_zones(struct psz_context *cxt)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	loff_t off = 0;
> +	int err;
> +	size_t size;
> +
> +	size = info->total_size;
> +	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
> +			info->kmsg_size, &cxt->oops_max_cnt);
> +	if (IS_ERR(cxt->opszs)) {
> +		err = PTR_ERR(cxt->opszs);
> +		goto fail_out;
> +	}
> +
> +	return 0;
> +fail_out:
> +	return err;
> +}
> +
> +/**
> + * register_pstore_zone() - register to pstore/zone
> + *
> + * @info: back-end driver information. See &struct pstore_zone_info.
> + *
> + * Only one back-end at one time.
> + *
> + * Return: 0 on success, others on failure.
> + */
> +int register_pstore_zone(struct pstore_zone_info *info)
> +{
> +	int err = -EINVAL;
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	if (!info->total_size) {
> +		pr_warn("the total size must be non-zero\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!info->kmsg_size) {
> +		pr_warn("at least one of the records be non-zero\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!info->name || !info->name[0])
> +		return -EINVAL;
> +
> +	if (info->total_size < 4096) {
> +		pr_err("total size must be greater than 4096 bytes\n");
> +		return -EINVAL;
> +	}
> +
> +#define check_size(name, size) {					\
> +		if (info->name > 0 && info->name < (size)) {		\
> +			pr_err(#name " must be over %d\n", (size));	\
> +			return -EINVAL;					\
> +		}							\
> +		if (info->name & (size - 1)) {				\
> +			pr_err(#name " must be a multiple of %d\n",	\
> +					(size));			\
> +			return -EINVAL;					\
> +		}							\
> +	}
> +
> +	check_size(total_size, 4096);
> +	check_size(kmsg_size, SECTOR_SIZE);
> +
> +#undef check_size
> +
> +	/*
> +	 * the @read and @write must be applied.
> +	 * if no @read, pstore may mount failed.
> +	 * if no @write, pstore do not support to remove record file.
> +	 */
> +	if (!info->read || !info->write) {
> +		pr_err("no valid general read/write interface\n");
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&cxt->pstore_zone_info_lock);
> +	if (cxt->pstore_zone_info) {
> +		pr_warn("'%s' already loaded: ignoring '%s'\n",
> +				cxt->pstore_zone_info->name, info->name);
> +		mutex_unlock(&cxt->pstore_zone_info_lock);
> +		return -EBUSY;
> +	}
> +	cxt->pstore_zone_info = info;
> +	mutex_unlock(&cxt->pstore_zone_info_lock);
> +
> +	pr_debug("register %s with properties:\n", info->name);
> +	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
> +	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
> +
> +	err = psz_alloc_zones(cxt);
> +	if (err) {
> +		pr_err("alloc zones failed\n");
> +		goto fail_out;
> +	}
> +
> +	if (info->kmsg_size) {
> +		cxt->pstore.bufsize = cxt->opszs[0]->buffer_size -
> +			sizeof(struct psz_oops_header);
> +		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
> +		if (!cxt->pstore.buf) {
> +			err = -ENOMEM;
> +			goto free_all_zones;
> +		}
> +	}
> +	cxt->pstore.data = cxt;
> +
> +	pr_info("registered %s as backend for", info->name);
> +	cxt->pstore.max_reason = info->max_reason;
> +	if (info->kmsg_size) {
> +		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
> +		pr_cont(" kmsg(%s",
> +			kmsg_dump_reason_str(cxt->pstore.max_reason));
> +		if (cxt->pstore_zone_info->panic_write)
> +			pr_cont(",panic_write");
> +		pr_cont(")");
> +	}
> +	pr_cont("\n");
> +
> +	err = pstore_register(&cxt->pstore);
> +	if (err) {
> +		pr_err("registering with pstore failed\n");
> +		goto free_pstore_buf;
> +	}
> +
> +	return 0;
> +
> +free_pstore_buf:
> +	kfree(cxt->pstore.buf);
> +free_all_zones:
> +	psz_free_all_zones(cxt);
> +fail_out:
> +	mutex_lock(&psz_cxt.pstore_zone_info_lock);
> +	psz_cxt.pstore_zone_info = NULL;
> +	mutex_unlock(&psz_cxt.pstore_zone_info_lock);
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(register_pstore_zone);
> +
> +/**
> + * unregister_pstore_zone() - unregister to pstore/zone
> + *
> + * @info: back-end driver information. See struct pstore_zone_info.
> + */
> +void unregister_pstore_zone(struct pstore_zone_info *info)
> +{
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	pstore_unregister(&cxt->pstore);
> +	kfree(cxt->pstore.buf);
> +	cxt->pstore.bufsize = 0;
> +
> +	mutex_lock(&cxt->pstore_zone_info_lock);
> +	cxt->pstore_zone_info = NULL;
> +	mutex_unlock(&cxt->pstore_zone_info_lock);
> +
> +	psz_free_all_zones(cxt);
> +}
> +EXPORT_SYMBOL_GPL(unregister_pstore_zone);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("Storage Manager for pstore/blk");
> diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
> new file mode 100644
> index 000000000000..a6a79ff1351b
> --- /dev/null
> +++ b/include/linux/pstore_zone.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PSTORE_ZONE_H_
> +#define __PSTORE_ZONE_H_
> +
> +#include <linux/types.h>
> +
> +typedef ssize_t (*psz_read_op)(char *, size_t, loff_t);
> +typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
> +/**
> + * struct pstore_zone_info - pstore/zone back-end driver structure
> + *
> + * @owner:	Module which is responsible for this back-end driver.
> + * @name:	Name of the back-end driver.
> + * @total_size: The total size in bytes pstore/zone can use. It must be greater
> + *		than 4096 and be multiple of 4096.
> + * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
> + *		it must be multiple of SECTOR_SIZE(512 Bytes).
> + * @max_reason: Maximum kmsg dump reason to store.
> + * @read:	The general read operation. Both of the function parameters
> + *		@size and @offset are relative value to storage.
> + *		On success, the number of bytes should be returned, others
> + *		means error.
> + * @write:	The same as @read.
> + * @panic_write:The write operation only used for panic case. It's optional
> + *		if you do not care panic log. The parameters and return value
> + *		are the same as @read.
> + */
> +struct pstore_zone_info {
> +	struct module *owner;
> +	const char *name;
> +
> +	unsigned long total_size;
> +	unsigned long kmsg_size;
> +	int max_reason;
> +	psz_read_op read;
> +	psz_write_op write;
> +	psz_write_op panic_write;
> +};
> +
> +extern int register_pstore_zone(struct pstore_zone_info *info);
> +extern void unregister_pstore_zone(struct pstore_zone_info *info);
> +
> +#endif
> 

I will try to send v5 as soon as possable.

-- 
WeiXiong Liao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/12] pstore/zone: Introduce common layer to manage storage zones
@ 2020-05-09  3:09     ` WeiXiong Liao
  0 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  3:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> Implement a common set of APIs needed to support pstore storage zones,
> based on how ramoops is designed. This will be used by pstore/blk with
> the intention of migrating pstore/ram in the future.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-2-git-send-email-liaoweixiong@allwinnertech.com
> Co-developed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig           |   7 +
>  fs/pstore/Makefile          |   3 +
>  fs/pstore/zone.c            | 973 ++++++++++++++++++++++++++++++++++++
>  include/linux/pstore_zone.h |  44 ++
>  4 files changed, 1027 insertions(+)
>  create mode 100644 fs/pstore/zone.c
>  create mode 100644 include/linux/pstore_zone.h
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 8f0369aad22a..98d2457bdd9f 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -153,3 +153,10 @@ config PSTORE_RAM
>  	  "ramoops.ko".
>  
>  	  For more information, see Documentation/admin-guide/ramoops.rst.
> +
> +config PSTORE_ZONE
> +	tristate
> +	depends on PSTORE
> +	help
> +	  The common layer for pstore/blk (and pstore/ram in the future)
> +	  to manage storage in zones.
> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
> index 967b5891f325..58a967cbe4af 100644
> --- a/fs/pstore/Makefile
> +++ b/fs/pstore/Makefile
> @@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
>  
>  ramoops-objs += ram.o ram_core.o
>  obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
> +
> +pstore_zone-objs += zone.o
> +obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
> diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
> new file mode 100644
> index 000000000000..6c25c443c8e2
> --- /dev/null
> +++ b/fs/pstore/zone.c
> @@ -0,0 +1,973 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define MODNAME "pstore-zone"
> +#define pr_fmt(fmt) MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/pstore.h>
> +#include <linux/mount.h>
> +#include <linux/printk.h>
> +#include <linux/fs.h>
> +#include <linux/pstore_zone.h>
> +#include <linux/kdev_t.h>
> +#include <linux/device.h>
> +#include <linux/namei.h>
> +#include <linux/fcntl.h>
> +#include <linux/uio.h>
> +#include <linux/writeback.h>
> +
> +/**
> + * struct psz_head - header of zone to flush to storage
> + *
> + * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
> + * @datalen: length of data in @data
> + * @data: zone data.
> + */
> +struct psz_buffer {
> +#define PSZ_SIG (0x43474244) /* DBGC */
> +	uint32_t sig;
> +	atomic_t datalen;
> +	uint8_t data[];
> +};
> +
> +/**
> + * struct psz_oops_header - sub header of oops zones to flush to storage
> + *
> + * @magic: magic num for oops header
> + * @time: oops/panic trigger time
> + * @compressed: whether conpressed
> + * @counter: oops/panic counter
> + * @reason: identify oops or panic
> + * @data: pointer to log data
> + *
> + * It's a sub-header of oops zone, trailing after &psz_buffer.
> + */
> +struct psz_oops_header {
> +#define OOPS_HEADER_MAGIC 0x4dfc3ae5 /* Just a ramdom number */
> +	uint32_t magic;
> +	struct timespec64 time;
> +	bool compressed;
> +	uint32_t counter;
> +	enum kmsg_dump_reason reason;
> +	uint8_t data[];
> +};
> +
> +/**
> + * struct pstore_zone - zone information
> + *
> + * @off: zone offset of storage
> + * @type: front-end type for this zone
> + * @name: front-end name for this zone
> + * @buffer: pointer to data buffer managed by this zone
> + * @oldbuf: pointer to old data buffer.
> + * @buffer_size: bytes in @buffer->data
> + * @should_recover: whether this zone should recover from storage
> + * @dirty: whether the data in @buffer dirty
> + *
> + * zone structure in memory.
> + */
> +struct pstore_zone {
> +	loff_t off;
> +	const char *name;
> +	enum pstore_type_id type;
> +
> +	struct psz_buffer *buffer;
> +	struct psz_buffer *oldbuf;
> +	size_t buffer_size;
> +	bool should_recover;
> +	atomic_t dirty;
> +};
> +
> +/**
> + * struct psz_context - all about running state of pstore/zone
> + *
> + * @opszs: oops/panic storage zones
> + * @oops_max_cnt: max count of @opszs
> + * @oops_read_cnt: counter to read oops zone
> + * @oops_write_cnt: counter to write
> + * @oops_counter: counter to oops
> + * @panic_counter: counter to panic
> + * @recovered: whether finish recovering data from storage
> + * @on_panic: whether occur panic
> + * @pstore_zone_info_lock: lock to @pstore_zone_info
> + * @pstore_zone_info: information from back-end
> + * @pstore: structure for pstore
> + */
> +struct psz_context {
> +	struct pstore_zone **opszs;
> +	unsigned int oops_max_cnt;
> +	unsigned int oops_read_cnt;
> +	unsigned int oops_write_cnt;
> +	/*
> +	 * the counter should be recovered when recover.
> +	 * It records the oops/panic times after burning rather than booting.
> +	 */
> +	unsigned int oops_counter;
> +	unsigned int panic_counter;
> +	atomic_t recovered;
> +	atomic_t on_panic;
> +
> +	/*
> +	 * pstore_zone_info_lock just protects "pstore_zone_info" during calls to
> +	 * register_pstore_zone/unregister_pstore_zone
> +	 */
> +	struct mutex pstore_zone_info_lock;
> +	struct pstore_zone_info *pstore_zone_info;
> +	struct pstore_info pstore;
> +};
> +static struct psz_context psz_cxt;
> +
> +/**
> + * enum psz_flush_mode - flush mode for psz_zone_write()
> + *
> + * @FLUSH_NONE: do not flush to storage but update data on memory
> + * @FLUSH_PART: just flush part of data including meta data to storage
> + * @FLUSH_META: just flush meta data of zone to storage
> + * @FLUSH_ALL: flush all of zone
> + */
> +enum psz_flush_mode {
> +	FLUSH_NONE = 0,
> +	FLUSH_PART,
> +	FLUSH_META,
> +	FLUSH_ALL,
> +};
> +
> +static inline int buffer_datalen(struct pstore_zone *zone)
> +{
> +	return atomic_read(&zone->buffer->datalen);
> +}
> +
> +static inline bool is_on_panic(void)
> +{
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	return atomic_read(&cxt->on_panic);
> +}
> +
> +static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
> +		size_t len, unsigned long off)
> +{
> +	if (!buf || !zone->buffer)
> +		return -EINVAL;
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +	len = min_t(size_t, len, zone->buffer_size - off);
> +	memcpy(buf, zone->buffer->data + off, len);
> +	return len;
> +}
> +
> +static int psz_zone_write(struct pstore_zone *zone,
> +		enum psz_flush_mode flush_mode, const char *buf,
> +		size_t len, unsigned long off)
> +{
> +	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
> +	ssize_t wcnt = 0;
> +	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
> +	size_t wlen;
> +
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +
> +	wlen = min_t(size_t, len, zone->buffer_size - off);
> +	if (buf && wlen) {
> +		memcpy(zone->buffer->data + off, buf, wlen);
> +		atomic_set(&zone->buffer->datalen, wlen + off);
> +	}
> +
> +	/* avoid to damage old records */
> +	if (!is_on_panic() && !atomic_read(&psz_cxt.recovered))
> +		goto dirty;
> +
> +	writeop = is_on_panic() ? info->panic_write : info->write;
> +	if (!writeop)
> +		goto dirty;
> +
> +	switch (flush_mode) {
> +	case FLUSH_NONE:
> +		if (unlikely(buf && wlen))
> +			goto dirty;
> +		return 0;
> +	case FLUSH_PART:
> +		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
> +				zone->off + sizeof(*zone->buffer) + off);
> +		if (wcnt != wlen)
> +			goto dirty;
> +		fallthrough;
> +	case FLUSH_META:
> +		wlen = sizeof(struct psz_buffer);
> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
> +		if (wcnt != wlen)
> +			goto dirty;
> +		break;
> +	case FLUSH_ALL:
> +		wlen = zone->buffer_size + sizeof(*zone->buffer);
> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
> +		if (wcnt != wlen)
> +			goto dirty;
> +		break;
> +	}
> +
> +	return 0;
> +dirty:
> +	atomic_set(&zone->dirty, true);
> +	return -EBUSY;
> +}
> +
> +static int psz_flush_dirty_zone(struct pstore_zone *zone)
> +{
> +	int ret;
> +
> +	if (!zone)
> +		return -EINVAL;
> +
> +	if (!atomic_read(&zone->dirty))
> +		return 0;
> +
> +	if (!atomic_read(&psz_cxt.recovered))
> +		return -EBUSY;
> +
> +	ret = psz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
> +	if (!ret)
> +		atomic_set(&zone->dirty, false);
> +	return ret;
> +}

To avoid multi writers call flush_dirty_zone(), I prefer to
use atomic_xchg() as follow:

	static int psz_flush_dirty_zone(struct pstore_zone *zone)
	{
	        int ret;

	        if (unlikely(!zone))
	                return -EINVAL;

	        if (unlikely(!atomic_read(&psz_cxt.recovered)))
	                return -EBUSY;

	       if (!atomic_xchg(&zone->dirty, false))
	                return 0;

	        ret = psz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
	        if (ret)
	                atomic_set(&zone->dirty, true);
	        return ret;
	}

> +
> +static int psz_flush_dirty_zones(struct pstore_zone **zones, unsigned int cnt)
> +{
> +	int i, ret;
> +	struct pstore_zone *zone;
> +
> +	if (!zones)
> +		return -EINVAL;
> +
> +	for (i = 0; i < cnt; i++) {
> +		zone = zones[i];
> +		if (!zone)
> +			return -EINVAL;
> +		ret = psz_flush_dirty_zone(zone);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
> +{
> +	const char *data = (const char *)old->buffer->data;
> +	int ret;
> +
> +	ret = psz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
> +	if (ret) {
> +		atomic_set(&new->buffer->datalen, 0);
> +		atomic_set(&new->dirty, false);
> +		return ret;
> +	}
> +	atomic_set(&old->buffer->datalen, 0);
> +	return 0;
> +}
> +
> +static int psz_recover_oops_data(struct psz_context *cxt)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	struct pstore_zone *zone = NULL;
> +	struct psz_buffer *buf;
> +	unsigned long i;
> +	ssize_t rcnt;
> +
> +	if (!info->read)
> +		return -EINVAL;
> +
> +	for (i = 0; i < cxt->oops_max_cnt; i++) {
> +		zone = cxt->opszs[i];
> +		if (unlikely(!zone))
> +			return -EINVAL;
> +		if (atomic_read(&zone->dirty)) {
> +			unsigned int wcnt = cxt->oops_write_cnt;
> +			struct pstore_zone *new = cxt->opszs[wcnt];
> +			int ret;
> +
> +			ret = psz_move_zone(zone, new);
> +			if (ret) {
> +				pr_err("move zone from %lu to %d failed\n",
> +						i, wcnt);
> +				return ret;
> +			}
> +			cxt->oops_write_cnt = (wcnt + 1) % cxt->oops_max_cnt;
> +		}
> +		if (!zone->should_recover)
> +			continue;
> +		buf = zone->buffer;
> +		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
> +				zone->off);
> +		if (rcnt != zone->buffer_size + sizeof(*buf))
> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int psz_recover_oops_meta(struct psz_context *cxt)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	struct pstore_zone *zone;
> +	size_t rcnt, len;
> +	struct psz_buffer *buf;
> +	struct psz_oops_header *hdr;
> +	struct timespec64 time = {0};
> +	unsigned long i;
> +	/*
> +	 * Recover may on panic, we can't allocate any memory by kmalloc.
> +	 * So, we use local array instead.
> +	 */
> +	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
> +
> +	if (!info->read)
> +		return -EINVAL;
> +
> +	len = sizeof(*buf) + sizeof(*hdr);
> +	buf = (struct psz_buffer *)buffer_header;
> +	for (i = 0; i < cxt->oops_max_cnt; i++) {
> +		zone = cxt->opszs[i];
> +		if (unlikely(!zone))
> +			return -EINVAL;
> +
> +		rcnt = info->read((char *)buf, len, zone->off);
> +		if (rcnt != len) {
> +			pr_err("read %s with id %lu failed\n", zone->name, i);
> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		}
> +
> +		if (buf->sig != zone->buffer->sig) {
> +			pr_debug("no valid data in oops zone %lu\n", i);
> +			continue;
> +		}
> +
> +		if (zone->buffer_size < atomic_read(&buf->datalen)) {
> +			pr_info("found overtop zone: %s: id %lu, off %lld, size %zu\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size);
> +			continue;
> +		}
> +
> +		hdr = (struct psz_oops_header *)buf->data;
> +		if (hdr->magic != OOPS_HEADER_MAGIC) {
> +			pr_info("found invalid zone: %s: id %lu, off %lld, size %zu\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size);
> +			continue;
> +		}
> +
> +		/*
> +		 * we get the newest zone, and the next one must be the oldest
> +		 * or unused zone, because we do write one by one like a circle.
> +		 */
> +		if (hdr->time.tv_sec >= time.tv_sec) {
> +			time.tv_sec = hdr->time.tv_sec;
> +			cxt->oops_write_cnt = (i + 1) % cxt->oops_max_cnt;
> +		}
> +
> +		if (hdr->reason == KMSG_DUMP_OOPS)
> +			cxt->oops_counter =
> +				max(cxt->oops_counter, hdr->counter);
> +		else
> +			cxt->panic_counter =
> +				max(cxt->panic_counter, hdr->counter);
> +
> +		if (!atomic_read(&buf->datalen)) {
> +			pr_debug("found erased zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size,
> +					atomic_read(&buf->datalen));
> +			continue;
> +		}
> +
> +		if (!is_on_panic())
> +			zone->should_recover = true;
> +		pr_debug("found nice zone: %s: id %lu, off %lld, size %zu, datalen %d\n",
> +				zone->name, i, zone->off,
> +				zone->buffer_size, atomic_read(&buf->datalen));
> +	}
> +
> +	return 0;
> +}
> +
> +static int psz_recover_oops(struct psz_context *cxt)
> +{
> +	int ret;
> +
> +	if (!cxt->opszs)
> +		return 0;
> +
> +	ret = psz_recover_oops_meta(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	ret = psz_recover_oops_data(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	return 0;
> +recover_fail:
> +	pr_debug("recover oops failed\n");
> +	return ret;
> +}
> +
> +/**
> + * psz_recovery() - recover data from storage
> + * @cxt: the context of pstore/zone
> + *
> + * recovery means reading data back from storage after rebooting
> + *
> + * Return: 0 on success, others on failure.
> + */
> +static inline int psz_recovery(struct psz_context *cxt)
> +{
> +	int ret = -EBUSY;
> +
> +	if (atomic_read(&cxt->recovered))
> +		return 0;
> +
> +	ret = psz_recover_oops(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	pr_debug("recover end!\n");
> +	atomic_set(&cxt->recovered, 1);
> +	return 0;
> +
> +recover_fail:
> +	pr_err("recover failed\n");
> +	return ret;
> +}
> +
> +static int psz_pstore_open(struct pstore_info *psi)
> +{
> +	struct psz_context *cxt = psi->data;
> +
> +	cxt->oops_read_cnt = 0;
> +	return 0;
> +}
> +
> +static inline bool psz_ok(struct pstore_zone *zone)
> +{
> +	if (zone && zone->buffer && buffer_datalen(zone))
> +		return true;
> +	return false;
> +}
> +
> +static inline int psz_oops_erase(struct psz_context *cxt,
> +		struct pstore_zone *zone, struct pstore_record *record)
> +{
> +	struct psz_buffer *buffer = zone->buffer;
> +	struct psz_oops_header *hdr =
> +		(struct psz_oops_header *)buffer->data;
> +
> +	if (unlikely(!psz_ok(zone)))
> +		return 0;
> +	/* this zone is already updated, no need to erase */
> +	if (record->count != hdr->counter)
> +		return 0;
> +
> +	atomic_set(&zone->buffer->datalen, 0);
> +	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +}
> +
> +static int psz_pstore_erase(struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		if (record->id >= cxt->oops_max_cnt)
> +			return -EINVAL;
> +		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static void psz_write_kmsg_hdr(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +	struct psz_buffer *buffer = zone->buffer;
> +	struct psz_oops_header *hdr =
> +		(struct psz_oops_header *)buffer->data;
> +
> +	hdr->magic = OOPS_HEADER_MAGIC;
> +	hdr->compressed = record->compressed;
> +	hdr->time.tv_sec = record->time.tv_sec;
> +	hdr->time.tv_nsec = record->time.tv_nsec;
> +	hdr->reason = record->reason;
> +	if (hdr->reason == KMSG_DUMP_OOPS)
> +		hdr->counter = ++cxt->oops_counter;
> +	else
> +		hdr->counter = ++cxt->panic_counter;
> +}
> +
> +static inline int notrace psz_oops_write_record(struct psz_context *cxt,
> +		struct pstore_record *record)
> +{
> +	size_t size, hlen;
> +	struct pstore_zone *zone;
> +	unsigned int zonenum;
> +
> +	zonenum = cxt->oops_write_cnt;
> +	zone = cxt->opszs[zonenum];
> +	if (unlikely(!zone))
> +		return -ENOSPC;
> +	cxt->oops_write_cnt = (zonenum + 1) % cxt->oops_max_cnt;
> +
> +	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
> +	psz_write_kmsg_hdr(zone, record);
> +	hlen = sizeof(struct psz_oops_header);
> +	size = min_t(size_t, record->size, zone->buffer_size - hlen);
> +	return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
> +}
> +
> +static int notrace psz_oops_write(struct psz_context *cxt,
> +		struct pstore_record *record)
> +{
> +	int ret;
> +
> +	/*
> +	 * Explicitly only take the first part of any new crash.
> +	 * If our buffer is larger than kmsg_bytes, this can never happen,
> +	 * and if our buffer is smaller than kmsg_bytes, we don't want the
> +	 * report split across multiple records.
> +	 */
> +	if (record->part != 1)
> +		return -ENOSPC;
> +
> +	if (!cxt->opszs)
> +		return -ENOSPC;
> +
> +	ret = psz_oops_write_record(cxt, record);
> +	if (!ret) {
> +		pr_debug("try to flush other dirty oops zones\n");
> +		psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
> +	}
> +
> +	/* always return 0 as we had handled it on buffer */
> +	return 0;
> +}
> +
> +static int notrace psz_pstore_write(struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +
> +	if (record->type == PSTORE_TYPE_DMESG &&
> +			record->reason == KMSG_DUMP_PANIC)
> +		atomic_set(&cxt->on_panic, 1);
> +
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		return psz_oops_write(cxt, record);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
> +{
> +	struct pstore_zone *zone = NULL;
> +
> +	while (cxt->oops_read_cnt < cxt->oops_max_cnt) {
> +		zone = cxt->opszs[cxt->oops_read_cnt++];
> +		if (psz_ok(zone))
> +			return zone;
> +	}
> +
> +	return NULL;
> +}
> +
> +static int psz_read_oops_hdr(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	struct psz_buffer *buffer = zone->buffer;
> +	struct psz_oops_header *hdr =
> +		(struct psz_oops_header *)buffer->data;
> +
> +	if (hdr->magic != OOPS_HEADER_MAGIC)
> +		return -EINVAL;
> +	record->compressed = hdr->compressed;
> +	record->time.tv_sec = hdr->time.tv_sec;
> +	record->time.tv_nsec = hdr->time.tv_nsec;
> +	record->reason = hdr->reason;
> +	record->count = hdr->counter;
> +	return 0;
> +}
> +
> +static ssize_t psz_oops_read(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	ssize_t size, hlen = 0;
> +
> +	size = buffer_datalen(zone);
> +	/* Clear and skip this oops record if it has no valid header */
> +	if (psz_read_oops_hdr(zone, record)) {
> +		atomic_set(&zone->buffer->datalen, 0);
> +		atomic_set(&zone->dirty, 0);
> +		return -ENOMSG;
> +	}
> +	size -= sizeof(struct psz_oops_header);
> +
> +	if (!record->compressed) {
> +		char *buf = kasprintf(GFP_KERNEL, "%s: Total %d times\n",
> +				      kmsg_dump_reason_str(record->reason),
> +				      record->count);
> +		hlen = strlen(buf);
> +		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
> +		if (!record->buf) {
> +			kfree(buf);
> +			return -ENOMEM;
> +		}
> +	} else {
> +		record->buf = kmalloc(size, GFP_KERNEL);
> +		if (!record->buf)
> +			return -ENOMEM;
> +	}
> +
> +	size = psz_zone_read(zone, record->buf + hlen, size,
> +			sizeof(struct psz_oops_header) < 0);

Here should be:
	sizeof(struct psz_oops_header));

That's the reason why all the compressed files were failing to
decompress.

> +	if (unlikely(size < 0)) {
> +		kfree(record->buf);
> +		return -ENOMSG;
> +	}
> +
> +	return size + hlen;
> +}
> +
> +static ssize_t psz_pstore_read(struct pstore_record *record)
> +{
> +	struct psz_context *cxt = record->psi->data;
> +	ssize_t (*readop)(struct pstore_zone *zone,
> +			struct pstore_record *record);
> +	struct pstore_zone *zone;
> +	ssize_t ret;
> +
> +	/* before read, we must recover from storage */
> +	ret = psz_recovery(cxt);
> +	if (ret)
> +		return ret;
> +
> +next_zone:
> +	zone = psz_read_next_zone(cxt);
> +	if (!zone)
> +		return 0;
> +
> +	record->type = zone->type;
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		readop = psz_oops_read;
> +		record->id = cxt->oops_read_cnt - 1;
> +		break;
> +	default:
> +		goto next_zone;
> +	}
> +
> +	ret = readop(zone, record);
> +	if (ret == -ENOMSG)
> +		goto next_zone;
> +	return ret;
> +}
> +
> +static struct psz_context psz_cxt = {
> +	.pstore_zone_info_lock = __MUTEX_INITIALIZER(psz_cxt.pstore_zone_info_lock),
> +	.recovered = ATOMIC_INIT(0),
> +	.on_panic = ATOMIC_INIT(0),
> +	.pstore = {
> +		.owner = THIS_MODULE,
> +		.name = MODNAME,
> +		.open = psz_pstore_open,
> +		.read = psz_pstore_read,
> +		.write = psz_pstore_write,
> +		.erase = psz_pstore_erase,
> +	},
> +};
> +
> +static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
> +		loff_t *off, size_t size)
> +{
> +	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
> +	struct pstore_zone *zone;
> +	const char *name = pstore_type_to_name(type);
> +
> +	if (!size)
> +		return NULL;
> +
> +	if (*off + size > info->total_size) {
> +		pr_err("no room for %s (0x%zx@0x%llx over 0x%lx)\n",
> +			name, size, *off, info->total_size);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	zone = kzalloc(sizeof(struct pstore_zone), GFP_KERNEL);
> +	if (!zone)
> +		return ERR_PTR(-ENOMEM);
> +
> +	zone->buffer = kmalloc(size, GFP_KERNEL);
> +	if (!zone->buffer) {
> +		kfree(zone);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	memset(zone->buffer, 0xFF, size);
> +	zone->off = *off;
> +	zone->name = name;
> +	zone->type = type;
> +	zone->buffer_size = size - sizeof(struct psz_buffer);
> +	zone->buffer->sig = type ^ PSZ_SIG;
> +	atomic_set(&zone->dirty, 0);
> +	atomic_set(&zone->buffer->datalen, 0);
> +
> +	*off += size;
> +
> +	pr_debug("pszone %s: off 0x%llx, %zu header, %zu data\n", zone->name,
> +			zone->off, sizeof(*zone->buffer), zone->buffer_size);
> +	return zone;
> +}
> +
> +static struct pstore_zone **psz_init_zones(enum pstore_type_id type,
> +	loff_t *off, size_t total_size, ssize_t record_size,
> +	unsigned int *cnt)
> +{
> +	struct pstore_zone_info *info = psz_cxt.pstore_zone_info;
> +	struct pstore_zone **zones, *zone;
> +	const char *name = pstore_type_to_name(type);
> +	int c, i;
> +
> +	if (!total_size || !record_size)
> +		return NULL;
> +
> +	if (*off + total_size > info->total_size) {
> +		pr_err("no room for zones %s (0x%zx@0x%llx over 0x%lx)\n",
> +			name, total_size, *off, info->total_size);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	c = total_size / record_size;
> +	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
> +	if (!zones) {
> +		pr_err("allocate for zones %s failed\n", name);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	memset(zones, 0, c * sizeof(*zones));
> +
> +	for (i = 0; i < c; i++) {
> +		zone = psz_init_zone(type, off, record_size);
> +		if (!zone || IS_ERR(zone)) {
> +			pr_err("initialize zones %s failed\n", name);
> +			while (--i >= 0) {
> +				kfree(zones[i]->buffer);
> +				kfree(zones[i]);
> +			}
> +			kfree(zones);
> +			return (void *)zone;
> +		}
> +		zones[i] = zone;
> +	}
> +
> +	*cnt = c;
> +	return zones;
> +}
> +
> +static void psz_free_zone(struct pstore_zone **pszone)
> +{
> +	struct pstore_zone *zone = *pszone;
> +
> +	if (!zone)
> +		return;
> +
> +	kfree(zone->buffer);
> +	kfree(zone);
> +	*pszone = NULL;
> +}
> +
> +static void psz_free_zones(struct pstore_zone ***pszones, unsigned int *cnt)
> +{
> +	struct pstore_zone **zones = *pszones;
> +
> +	if (!zones)
> +		return;
> +
> +	while (*cnt > 0) {
> +		psz_free_zone(&zones[*cnt]);
> +		(*cnt)--;
> +	}
> +	kfree(zones);
> +	*pszones = NULL;
> +}
> +
> +static void psz_free_all_zones(struct psz_context *cxt)
> +{
> +	if (cxt->opszs)
> +		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
> +}
> +
> +static int psz_alloc_zones(struct psz_context *cxt)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	loff_t off = 0;
> +	int err;
> +	size_t size;
> +
> +	size = info->total_size;
> +	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
> +			info->kmsg_size, &cxt->oops_max_cnt);
> +	if (IS_ERR(cxt->opszs)) {
> +		err = PTR_ERR(cxt->opszs);
> +		goto fail_out;
> +	}
> +
> +	return 0;
> +fail_out:
> +	return err;
> +}
> +
> +/**
> + * register_pstore_zone() - register to pstore/zone
> + *
> + * @info: back-end driver information. See &struct pstore_zone_info.
> + *
> + * Only one back-end at one time.
> + *
> + * Return: 0 on success, others on failure.
> + */
> +int register_pstore_zone(struct pstore_zone_info *info)
> +{
> +	int err = -EINVAL;
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	if (!info->total_size) {
> +		pr_warn("the total size must be non-zero\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!info->kmsg_size) {
> +		pr_warn("at least one of the records be non-zero\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!info->name || !info->name[0])
> +		return -EINVAL;
> +
> +	if (info->total_size < 4096) {
> +		pr_err("total size must be greater than 4096 bytes\n");
> +		return -EINVAL;
> +	}
> +
> +#define check_size(name, size) {					\
> +		if (info->name > 0 && info->name < (size)) {		\
> +			pr_err(#name " must be over %d\n", (size));	\
> +			return -EINVAL;					\
> +		}							\
> +		if (info->name & (size - 1)) {				\
> +			pr_err(#name " must be a multiple of %d\n",	\
> +					(size));			\
> +			return -EINVAL;					\
> +		}							\
> +	}
> +
> +	check_size(total_size, 4096);
> +	check_size(kmsg_size, SECTOR_SIZE);
> +
> +#undef check_size
> +
> +	/*
> +	 * the @read and @write must be applied.
> +	 * if no @read, pstore may mount failed.
> +	 * if no @write, pstore do not support to remove record file.
> +	 */
> +	if (!info->read || !info->write) {
> +		pr_err("no valid general read/write interface\n");
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&cxt->pstore_zone_info_lock);
> +	if (cxt->pstore_zone_info) {
> +		pr_warn("'%s' already loaded: ignoring '%s'\n",
> +				cxt->pstore_zone_info->name, info->name);
> +		mutex_unlock(&cxt->pstore_zone_info_lock);
> +		return -EBUSY;
> +	}
> +	cxt->pstore_zone_info = info;
> +	mutex_unlock(&cxt->pstore_zone_info_lock);
> +
> +	pr_debug("register %s with properties:\n", info->name);
> +	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
> +	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
> +
> +	err = psz_alloc_zones(cxt);
> +	if (err) {
> +		pr_err("alloc zones failed\n");
> +		goto fail_out;
> +	}
> +
> +	if (info->kmsg_size) {
> +		cxt->pstore.bufsize = cxt->opszs[0]->buffer_size -
> +			sizeof(struct psz_oops_header);
> +		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
> +		if (!cxt->pstore.buf) {
> +			err = -ENOMEM;
> +			goto free_all_zones;
> +		}
> +	}
> +	cxt->pstore.data = cxt;
> +
> +	pr_info("registered %s as backend for", info->name);
> +	cxt->pstore.max_reason = info->max_reason;
> +	if (info->kmsg_size) {
> +		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
> +		pr_cont(" kmsg(%s",
> +			kmsg_dump_reason_str(cxt->pstore.max_reason));
> +		if (cxt->pstore_zone_info->panic_write)
> +			pr_cont(",panic_write");
> +		pr_cont(")");
> +	}
> +	pr_cont("\n");
> +
> +	err = pstore_register(&cxt->pstore);
> +	if (err) {
> +		pr_err("registering with pstore failed\n");
> +		goto free_pstore_buf;
> +	}
> +
> +	return 0;
> +
> +free_pstore_buf:
> +	kfree(cxt->pstore.buf);
> +free_all_zones:
> +	psz_free_all_zones(cxt);
> +fail_out:
> +	mutex_lock(&psz_cxt.pstore_zone_info_lock);
> +	psz_cxt.pstore_zone_info = NULL;
> +	mutex_unlock(&psz_cxt.pstore_zone_info_lock);
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(register_pstore_zone);
> +
> +/**
> + * unregister_pstore_zone() - unregister to pstore/zone
> + *
> + * @info: back-end driver information. See struct pstore_zone_info.
> + */
> +void unregister_pstore_zone(struct pstore_zone_info *info)
> +{
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	pstore_unregister(&cxt->pstore);
> +	kfree(cxt->pstore.buf);
> +	cxt->pstore.bufsize = 0;
> +
> +	mutex_lock(&cxt->pstore_zone_info_lock);
> +	cxt->pstore_zone_info = NULL;
> +	mutex_unlock(&cxt->pstore_zone_info_lock);
> +
> +	psz_free_all_zones(cxt);
> +}
> +EXPORT_SYMBOL_GPL(unregister_pstore_zone);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("Storage Manager for pstore/blk");
> diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
> new file mode 100644
> index 000000000000..a6a79ff1351b
> --- /dev/null
> +++ b/include/linux/pstore_zone.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PSTORE_ZONE_H_
> +#define __PSTORE_ZONE_H_
> +
> +#include <linux/types.h>
> +
> +typedef ssize_t (*psz_read_op)(char *, size_t, loff_t);
> +typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
> +/**
> + * struct pstore_zone_info - pstore/zone back-end driver structure
> + *
> + * @owner:	Module which is responsible for this back-end driver.
> + * @name:	Name of the back-end driver.
> + * @total_size: The total size in bytes pstore/zone can use. It must be greater
> + *		than 4096 and be multiple of 4096.
> + * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
> + *		it must be multiple of SECTOR_SIZE(512 Bytes).
> + * @max_reason: Maximum kmsg dump reason to store.
> + * @read:	The general read operation. Both of the function parameters
> + *		@size and @offset are relative value to storage.
> + *		On success, the number of bytes should be returned, others
> + *		means error.
> + * @write:	The same as @read.
> + * @panic_write:The write operation only used for panic case. It's optional
> + *		if you do not care panic log. The parameters and return value
> + *		are the same as @read.
> + */
> +struct pstore_zone_info {
> +	struct module *owner;
> +	const char *name;
> +
> +	unsigned long total_size;
> +	unsigned long kmsg_size;
> +	int max_reason;
> +	psz_read_op read;
> +	psz_write_op write;
> +	psz_write_op panic_write;
> +};
> +
> +extern int register_pstore_zone(struct pstore_zone_info *info);
> +extern void unregister_pstore_zone(struct pstore_zone_info *info);
> +
> +#endif
> 

I will try to send v5 as soon as possable.

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 03/12] pstore/blk: Introduce backend for block devices
  2020-05-08  6:39   ` Kees Cook
@ 2020-05-09  3:48     ` WeiXiong Liao
  -1 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  3:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> pstore/blk is similar to pstore/ram, but uses a block device as the
> storage rather than persistent ram.
> 
> The pstore/blk backend solves two common use-cases that used to preclude
> using pstore/ram:
> - not all devices have a battery that could be used to persist
>   regular RAM across power failures.
> - most embedded intelligent equipment have no persistent ram, which
>   increases costs, instead preferring cheaper solutions, like block
>   devices.
> 
> pstore/blk provides separate configurations for the end user and for the
> block drivers. User configuration determines how pstore/blk operates, such
> as record sizes, max kmsg dump reasons, etc. These can be set by Kconfig
> and/or module parameters, but module parameter have priority over Kconfig.
> Driver configuration covers all the details about the target block device,
> such as total size of the device and how to perform read/write operations.
> These are provided by block drivers, calling pstore_register_blkdev(),
> including an optional panic_write callback used to bypass regular IO
> APIs in an effort to avoid potentially destabilized kernel code during
> a panic.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-3-git-send-email-liaoweixiong@allwinnertech.com
> Co-developed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig          |  64 ++++++
>  fs/pstore/Makefile         |   3 +
>  fs/pstore/blk.c            | 426 +++++++++++++++++++++++++++++++++++++
>  include/linux/pstore_blk.h |  27 +++
>  4 files changed, 520 insertions(+)
>  create mode 100644 fs/pstore/blk.c
>  create mode 100644 include/linux/pstore_blk.h
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 98d2457bdd9f..92ba73bd0b62 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -160,3 +160,67 @@ config PSTORE_ZONE
>  	help
>  	  The common layer for pstore/blk (and pstore/ram in the future)
>  	  to manage storage in zones.
> +
> +config PSTORE_BLK
> +	tristate "Log panic/oops to a block device"
> +	depends on PSTORE
> +	depends on BLOCK
> +	select PSTORE_ZONE
> +	default n
> +	help
> +	  This enables panic and oops message to be logged to a block dev
> +	  where it can be read back at some later point.
> +
> +	  If unsure, say N.
> +
> +config PSTORE_BLK_BLKDEV
> +	string "block device identifier"
> +	depends on PSTORE_BLK
> +	default ""
> +	help
> +	  Which block device should be used for pstore/blk.
> +
> +	  It accept the following variants:
> +	  1) <hex_major><hex_minor> device number in hexadecimal represents
> +	     itself no leading 0x, for example b302.
> +	  2) /dev/<disk_name> represents the device number of disk
> +	  3) /dev/<disk_name><decimal> represents the device number
> +	     of partition - device number of disk plus the partition number
> +	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
> +	     used when disk name of partitioned disk ends with a digit.
> +	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
> +	     unique id of a partition if the partition table provides it.
> +	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
> +	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
> +	     filled hex representation of the 32-bit "NT disk signature", and PP
> +	     is a zero-filled hex representation of the 1-based partition number.
> +	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
> +	     to a partition with a known unique id.
> +	  7) <major>:<minor> major and minor number of the device separated by
> +	     a colon.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_KMSG_SIZE
> +	int "Size in Kbytes of kmsg dump log to store"
> +	depends on PSTORE_BLK
> +	default 64
> +	help
> +	  This just sets size of kmsg dump (oops, panic, etc) log for
> +	  pstore/blk. The size is in KB and must be a multiple of 4.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_MAX_REASON
> +	int "Maximum kmsg dump reason to store"
> +	depends on PSTORE_BLK
> +	default 2
> +	help
> +	  The maximum reason for kmsg dumps to store. The default is
> +	  2 (KMSG_DUMP_OOPS), see include/linux/kmsg_dump.h's
> +	  enum kmsg_dump_reason for more details.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
> index 58a967cbe4af..c270467aeece 100644
> --- a/fs/pstore/Makefile
> +++ b/fs/pstore/Makefile
> @@ -15,3 +15,6 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
>  
>  pstore_zone-objs += zone.o
>  obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
> +
> +pstore_blk-objs += blk.o
> +obj-$(CONFIG_PSTORE_BLK)	+= pstore_blk.o
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> new file mode 100644
> index 000000000000..286aa82aa483
> --- /dev/null
> +++ b/fs/pstore/blk.c
> @@ -0,0 +1,426 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define MODNAME "pstore-blk"
> +#define pr_fmt(fmt) MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include "../../block/blk.h"
> +#include <linux/blkdev.h>
> +#include <linux/string.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/platform_device.h>
> +#include <linux/pstore_blk.h>
> +#include <linux/mount.h>
> +#include <linux/uio.h>
> +
> +static long kmsg_size = CONFIG_PSTORE_BLK_KMSG_SIZE;
> +module_param(kmsg_size, long, 0400);
> +MODULE_PARM_DESC(kmsg_size, "kmsg dump record size in kbytes");
> +
> +static int max_reason = CONFIG_PSTORE_BLK_MAX_REASON;
> +module_param(max_reason, int, 0400);
> +MODULE_PARM_DESC(max_reason,
> +		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
> +
> +/*
> + * blkdev - The block device to use.
> + *
> + * Most of the time, it is a partition of block device.
> + *
> + * blkdev accepts the following variants:
> + * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
> + *    no leading 0x, for example b302.
> + * 2) /dev/<disk_name> represents the device number of disk
> + * 3) /dev/<disk_name><decimal> represents the device number
> + *    of partition - device number of disk plus the partition number
> + * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
> + *    used when disk name of partitioned disk ends on a digit.
> + * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
> + *    unique id of a partition if the partition table provides it.
> + *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
> + *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
> + *    filled hex representation of the 32-bit "NT disk signature", and PP
> + *    is a zero-filled hex representation of the 1-based partition number.
> + * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
> + *    a partition with a known unique id.
> + * 7) <major>:<minor> major and minor number of the device separated by
> + *    a colon.
> + */
> +static char blkdev[80] = CONFIG_PSTORE_BLK_BLKDEV;
> +module_param_string(blkdev, blkdev, 80, 0400);
> +MODULE_PARM_DESC(blkdev, "the block device for general read/write");
> +
> +static DEFINE_MUTEX(psz_lock);
> +static struct block_device *psblk_bdev;
> +static struct pstore_zone_info *pstore_zone_info;
> +static psblk_panic_write_op blkdev_panic_write;
> +static struct bdev_info {
> +	dev_t devt;
> +	sector_t nr_sects;
> +	sector_t start_sect;
> +} g_bdev_info;
> +
> +/**
> + * struct psblk_device - back-end pstore/blk driver structure.
> + *
> + * @total_size: The total size in bytes pstore/blk can use. It must be greater
> + *		than 4096 and be multiple of 4096.
> + * @read:	The general read operation. Both of the function parameters
> + *		@size and @offset are relative value to bock device (not the
> + *		whole disk).
> + *		On success, the number of bytes should be returned, others
> + *		means error.
> + * @write:	The same as @read.
> + * @panic_write:The write operation only used for panic case. It's optional
> + *		if you do not care panic log. The parameters and return value
> + *		are the same as @read.
> + */
> +struct psblk_device {
> +	unsigned long total_size;
> +	psz_read_op read;
> +	psz_write_op write;
> +	psz_write_op panic_write;
> +};
> +
> +static int psblk_register_do(struct psblk_device *dev)
> +{
> +	int ret;
> +
> +	if (!dev || !dev->total_size || !dev->read || !dev->write)
> +		return -EINVAL;
> +
> +	mutex_lock(&psz_lock);
> +
> +	/* someone already registered before */
> +	if (pstore_zone_info) {
> +		mutex_unlock(&psz_lock);
> +		return -EBUSY;
> +	}
> +	pstore_zone_info = kzalloc(sizeof(struct pstore_zone_info), GFP_KERNEL);
> +	if (!pstore_zone_info) {
> +		mutex_unlock(&psz_lock);
> +		return -ENOMEM;
> +	}
> +
> +#define verify_size(name, alignsize) {					\
> +		long _##name_ = (name);					\
> +		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
> +		if (_##name_ & ((alignsize) - 1)) {			\
> +			pr_info(#name " must align to %d\n",		\
> +					(alignsize));			\
> +			_##name_ = ALIGN(name, (alignsize));		\
> +		}							\
> +		name = _##name_ / 1024;					\
> +		pstore_zone_info->name = _##name_;				\
> +	}
> +
> +	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);

This leads to compilation failure, I think it should be:

	verify_size(kmsg_size, 4096);

the flags is supported by patch 4.

> +#undef verify_size
> +
> +	pstore_zone_info->total_size = dev->total_size;
> +	pstore_zone_info->max_reason = max_reason;
> +	pstore_zone_info->read = dev->read;
> +	pstore_zone_info->write = dev->write;
> +	pstore_zone_info->panic_write = dev->panic_write;
> +	pstore_zone_info->name = MODNAME;
> +	pstore_zone_info->owner = THIS_MODULE;
> +
> +	ret = register_pstore_zone(pstore_zone_info);
> +	if (ret) {
> +		kfree(pstore_zone_info);
> +		pstore_zone_info = NULL;
> +	}
> +	mutex_unlock(&psz_lock);
> +	return ret;
> +}
> +
> +static void psblk_unregister_do(struct psblk_device *dev)
> +{
> +	mutex_lock(&psz_lock);
> +	if (pstore_zone_info && pstore_zone_info->read == dev->read) {
> +		unregister_pstore_zone(pstore_zone_info);
> +		kfree(pstore_zone_info);
> +		pstore_zone_info = NULL;
> +	}
> +	mutex_unlock(&psz_lock);
> +}
> +
> +/**
> + * psblk_get_bdev() - open block device
> + * @holder: exclusive holder identifier
> + *
> + * Return: pointer to block device on success and others on error.
> + *
> + * On success, the returned block_device has reference count of one.
> + */
> +static struct block_device *psblk_get_bdev(void *holder)
> +{
> +	struct block_device *bdev = ERR_PTR(-ENODEV);
> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
> +
> +	if (!blkdev[0])
> +		return ERR_PTR(-ENODEV);
> +
> +	mutex_lock(&psz_lock);
> +	if (pstore_zone_info)
> +		goto out;
> +	if (holder)
> +		mode |= FMODE_EXCL;
> +	bdev = blkdev_get_by_path(blkdev, mode, holder);
> +	if (IS_ERR(bdev)) {
> +		dev_t devt;
> +
> +		devt = name_to_dev_t(blkdev);
> +		if (devt == 0) {
> +			bdev = ERR_PTR(-ENODEV);
> +			goto out;
> +		}
> +		bdev = blkdev_get_by_dev(devt, mode, holder);
> +	}
> +out:
> +	mutex_unlock(&psz_lock);
> +	return bdev;
> +}
> +
> +static void psblk_put_bdev(struct block_device *bdev, void *holder)
> +{
> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
> +
> +	if (!bdev)
> +		return;
> +
> +	mutex_lock(&psz_lock);
> +	if (holder)
> +		mode |= FMODE_EXCL;
> +	blkdev_put(bdev, mode);
> +	mutex_unlock(&psz_lock);
> +}
> +
> +static ssize_t psblk_generic_blk_read(char *buf, size_t bytes, loff_t pos)
> +{
> +	struct block_device *bdev = psblk_bdev;
> +	struct file file;
> +	struct kiocb kiocb;
> +	struct iov_iter iter;
> +	struct kvec iov = {.iov_base = buf, .iov_len = bytes};
> +
> +	if (!bdev)
> +		return -ENODEV;
> +
> +	memset(&file, 0, sizeof(struct file));
> +	file.f_mapping = bdev->bd_inode->i_mapping;
> +	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
> +	file.f_inode = bdev->bd_inode;
> +	file_ra_state_init(&file.f_ra, file.f_mapping);
> +
> +	init_sync_kiocb(&kiocb, &file);
> +	kiocb.ki_pos = pos;
> +	iov_iter_kvec(&iter, READ, &iov, 1, bytes);
> +
> +	return generic_file_read_iter(&kiocb, &iter);
> +}
> +
> +static ssize_t psblk_generic_blk_write(const char *buf, size_t bytes,
> +		loff_t pos)
> +{
> +	struct block_device *bdev = psblk_bdev;
> +	struct iov_iter iter;
> +	struct kiocb kiocb;
> +	struct file file;
> +	ssize_t ret;
> +	struct kvec iov = {.iov_base = (void *)buf, .iov_len = bytes};
> +
> +	if (!bdev)
> +		return -ENODEV;
> +
> +	/* Console/Ftrace backend may handle buffer until flush dirty zones */
> +	if (in_interrupt() || irqs_disabled())
> +		return -EBUSY;
> +
> +	memset(&file, 0, sizeof(struct file));
> +	file.f_mapping = bdev->bd_inode->i_mapping;
> +	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
> +	file.f_inode = bdev->bd_inode;
> +
> +	init_sync_kiocb(&kiocb, &file);
> +	kiocb.ki_pos = pos;
> +	iov_iter_kvec(&iter, WRITE, &iov, 1, bytes);
> +
> +	inode_lock(bdev->bd_inode);
> +	ret = generic_write_checks(&kiocb, &iter);
> +	if (ret > 0)
> +		ret = generic_perform_write(&file, &iter, pos);
> +	inode_unlock(bdev->bd_inode);
> +
> +	if (likely(ret > 0)) {
> +		const struct file_operations f_op = {.fsync = blkdev_fsync};
> +
> +		file.f_op = &f_op;
> +		kiocb.ki_pos += ret;
> +		ret = generic_write_sync(&kiocb, ret);
> +	}
> +	return ret;
> +}
> +
> +static inline unsigned long psblk_bdev_size(struct block_device *bdev)
> +{
> +	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
> +}
> +
> +static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
> +		loff_t off)
> +{
> +	int ret;
> +
> +	if (!blkdev_panic_write)
> +		return -EOPNOTSUPP;
> +
> +	/* size and off must align to SECTOR_SIZE for block device */
> +	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
> +			size >> SECTOR_SHIFT);
> +	return ret ? -EIO : size;
> +}
> +
> +static struct bdev_info *psblk_get_bdev_info(void)
> +{
> +	struct bdev_info *info = &g_bdev_info;
> +	struct block_device *bdev;
> +
> +	if (info->devt)
> +		return info;
> +
> +	bdev = psblk_get_bdev(NULL);
> +	if (IS_ERR(bdev))
> +		return ERR_CAST(bdev);
> +
> +	info->devt = bdev->bd_dev;
> +	info->nr_sects = part_nr_sects_read(bdev->bd_part);
> +	info->start_sect = get_start_sect(bdev);
> +
> +	if (!psblk_bdev_size(bdev)) {
> +		pr_err("not enough space to '%s'\n", blkdev);
> +		info = ERR_PTR(-ENOSPC);
> +	}
> +
> +	psblk_put_bdev(bdev, NULL);
> +	return info;
> +}
> +
> +/**
> + * psblk_register_blkdev() - register block device to pstore/blk
> + *
> + * @major: the major device number of registering device
> + * @panic_write: the interface for panic case.
> + *
> + * Only the matching major to @blkdev can register.
> + *
> + * If block device do not support panic write, @panic_write can be NULL.
> + *
> + * Return:
> + * * 0		- OK
> + * * Others	- something error.
> + */
> +int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
> +{
> +	struct block_device *bdev;
> +	struct psblk_device dev = {0};
> +	struct bdev_info *binfo;
> +	int ret = -ENODEV;
> +	void *holder = blkdev;
> +
> +	binfo = psblk_get_bdev_info();
> +	if (IS_ERR(binfo))
> +		return PTR_ERR(binfo);
> +
> +	/* only allow driver matching the @blkdev */
> +	if (!binfo->devt || MAJOR(binfo->devt) != major) {
> +		pr_debug("invalid major %u (expect %u)\n",
> +				major, MAJOR(binfo->devt));
> +		return -ENODEV;
> +	}
> +
> +	/* hold bdev exclusively */
> +	bdev = psblk_get_bdev(holder);
> +	if (IS_ERR(bdev)) {
> +		pr_err("failed to open '%s'!\n", blkdev);
> +		return PTR_ERR(bdev);
> +	}
> +
> +	/* psblk_bdev must be assigned before register to pstore/blk */
> +	psblk_bdev = bdev;
> +	blkdev_panic_write = panic_write;
> +
> +	dev.total_size = psblk_bdev_size(bdev);
> +	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
> +	dev.read = psblk_generic_blk_read;
> +	dev.write = psblk_generic_blk_write;
> +
> +	ret = psblk_register_do(&dev);
> +	if (ret)
> +		goto err_put_bdev;
> +
> +	pr_info("using '%s'\n", blkdev);
> +	return 0;
> +
> +err_put_bdev:
> +	psblk_bdev = NULL;
> +	blkdev_panic_write = NULL;
> +	psblk_put_bdev(bdev, holder);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(psblk_register_blkdev);
> +
> +/**
> + * psblk_unregister_blkdev() - unregister block device from pstore/blk
> + *
> + * @major: the major device number of device
> + */
> +void psblk_unregister_blkdev(unsigned int major)
> +{
> +	struct psblk_device dev = {.read = psblk_generic_blk_read};
> +	void *holder = blkdev;
> +
> +	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
> +		psblk_unregister_do(&dev);
> +		psblk_put_bdev(psblk_bdev, holder);
> +		blkdev_panic_write = NULL;
> +		psblk_bdev = NULL;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(psblk_unregister_blkdev);
> +
> +/**
> + * psblk_blkdev_info() - get information of @blkdev
> + *
> + * @devt: the block device num of @blkdev
> + * @nr_sects: the sector count of @blkdev
> + * @start_sect: the start sector of @blkdev
> + *
> + * Block driver needs the follow information for @panic_write.
> + *
> + * Return: 0 on success, others on failure.
> + */
> +int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
> +{
> +	struct bdev_info *binfo;
> +
> +	binfo = psblk_get_bdev_info();
> +	if (IS_ERR(binfo))
> +		return PTR_ERR(binfo);
> +
> +	if (devt)
> +		*devt = binfo->devt;
> +	if (nr_sects)
> +		*nr_sects = binfo->nr_sects;
> +	if (start_sect)
> +		*start_sect = binfo->start_sect;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(psblk_blkdev_info);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("pstore backend for block devices");
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> new file mode 100644
> index 000000000000..5ff465e3953e
> --- /dev/null
> +++ b/include/linux/pstore_blk.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PSTORE_BLK_H_
> +#define __PSTORE_BLK_H_
> +
> +#include <linux/types.h>
> +#include <linux/pstore_zone.h>
> +
> +/**
> + * typedef psblk_panic_write_op - panic write operation to block device
> + *
> + * @buf: the data to write
> + * @start_sect: start sector to block device
> + * @sects: sectors count on buf
> + *
> + * Return: On success, zero should be returned. Others mean error.
> + *
> + * Panic write to block device must be aligned to SECTOR_SIZE.
> + */
> +typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
> +		sector_t sects);
> +
> +int  psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write);
> +void psblk_unregister_blkdev(unsigned int major);
> +int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
> +
> +#endif
> 

-- 
WeiXiong Liao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 03/12] pstore/blk: Introduce backend for block devices
@ 2020-05-09  3:48     ` WeiXiong Liao
  0 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  3:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> pstore/blk is similar to pstore/ram, but uses a block device as the
> storage rather than persistent ram.
> 
> The pstore/blk backend solves two common use-cases that used to preclude
> using pstore/ram:
> - not all devices have a battery that could be used to persist
>   regular RAM across power failures.
> - most embedded intelligent equipment have no persistent ram, which
>   increases costs, instead preferring cheaper solutions, like block
>   devices.
> 
> pstore/blk provides separate configurations for the end user and for the
> block drivers. User configuration determines how pstore/blk operates, such
> as record sizes, max kmsg dump reasons, etc. These can be set by Kconfig
> and/or module parameters, but module parameter have priority over Kconfig.
> Driver configuration covers all the details about the target block device,
> such as total size of the device and how to perform read/write operations.
> These are provided by block drivers, calling pstore_register_blkdev(),
> including an optional panic_write callback used to bypass regular IO
> APIs in an effort to avoid potentially destabilized kernel code during
> a panic.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-3-git-send-email-liaoweixiong@allwinnertech.com
> Co-developed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig          |  64 ++++++
>  fs/pstore/Makefile         |   3 +
>  fs/pstore/blk.c            | 426 +++++++++++++++++++++++++++++++++++++
>  include/linux/pstore_blk.h |  27 +++
>  4 files changed, 520 insertions(+)
>  create mode 100644 fs/pstore/blk.c
>  create mode 100644 include/linux/pstore_blk.h
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 98d2457bdd9f..92ba73bd0b62 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -160,3 +160,67 @@ config PSTORE_ZONE
>  	help
>  	  The common layer for pstore/blk (and pstore/ram in the future)
>  	  to manage storage in zones.
> +
> +config PSTORE_BLK
> +	tristate "Log panic/oops to a block device"
> +	depends on PSTORE
> +	depends on BLOCK
> +	select PSTORE_ZONE
> +	default n
> +	help
> +	  This enables panic and oops message to be logged to a block dev
> +	  where it can be read back at some later point.
> +
> +	  If unsure, say N.
> +
> +config PSTORE_BLK_BLKDEV
> +	string "block device identifier"
> +	depends on PSTORE_BLK
> +	default ""
> +	help
> +	  Which block device should be used for pstore/blk.
> +
> +	  It accept the following variants:
> +	  1) <hex_major><hex_minor> device number in hexadecimal represents
> +	     itself no leading 0x, for example b302.
> +	  2) /dev/<disk_name> represents the device number of disk
> +	  3) /dev/<disk_name><decimal> represents the device number
> +	     of partition - device number of disk plus the partition number
> +	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
> +	     used when disk name of partitioned disk ends with a digit.
> +	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
> +	     unique id of a partition if the partition table provides it.
> +	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
> +	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
> +	     filled hex representation of the 32-bit "NT disk signature", and PP
> +	     is a zero-filled hex representation of the 1-based partition number.
> +	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
> +	     to a partition with a known unique id.
> +	  7) <major>:<minor> major and minor number of the device separated by
> +	     a colon.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_KMSG_SIZE
> +	int "Size in Kbytes of kmsg dump log to store"
> +	depends on PSTORE_BLK
> +	default 64
> +	help
> +	  This just sets size of kmsg dump (oops, panic, etc) log for
> +	  pstore/blk. The size is in KB and must be a multiple of 4.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_MAX_REASON
> +	int "Maximum kmsg dump reason to store"
> +	depends on PSTORE_BLK
> +	default 2
> +	help
> +	  The maximum reason for kmsg dumps to store. The default is
> +	  2 (KMSG_DUMP_OOPS), see include/linux/kmsg_dump.h's
> +	  enum kmsg_dump_reason for more details.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
> index 58a967cbe4af..c270467aeece 100644
> --- a/fs/pstore/Makefile
> +++ b/fs/pstore/Makefile
> @@ -15,3 +15,6 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
>  
>  pstore_zone-objs += zone.o
>  obj-$(CONFIG_PSTORE_ZONE)	+= pstore_zone.o
> +
> +pstore_blk-objs += blk.o
> +obj-$(CONFIG_PSTORE_BLK)	+= pstore_blk.o
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> new file mode 100644
> index 000000000000..286aa82aa483
> --- /dev/null
> +++ b/fs/pstore/blk.c
> @@ -0,0 +1,426 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define MODNAME "pstore-blk"
> +#define pr_fmt(fmt) MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include "../../block/blk.h"
> +#include <linux/blkdev.h>
> +#include <linux/string.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/platform_device.h>
> +#include <linux/pstore_blk.h>
> +#include <linux/mount.h>
> +#include <linux/uio.h>
> +
> +static long kmsg_size = CONFIG_PSTORE_BLK_KMSG_SIZE;
> +module_param(kmsg_size, long, 0400);
> +MODULE_PARM_DESC(kmsg_size, "kmsg dump record size in kbytes");
> +
> +static int max_reason = CONFIG_PSTORE_BLK_MAX_REASON;
> +module_param(max_reason, int, 0400);
> +MODULE_PARM_DESC(max_reason,
> +		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
> +
> +/*
> + * blkdev - The block device to use.
> + *
> + * Most of the time, it is a partition of block device.
> + *
> + * blkdev accepts the following variants:
> + * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
> + *    no leading 0x, for example b302.
> + * 2) /dev/<disk_name> represents the device number of disk
> + * 3) /dev/<disk_name><decimal> represents the device number
> + *    of partition - device number of disk plus the partition number
> + * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
> + *    used when disk name of partitioned disk ends on a digit.
> + * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
> + *    unique id of a partition if the partition table provides it.
> + *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
> + *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
> + *    filled hex representation of the 32-bit "NT disk signature", and PP
> + *    is a zero-filled hex representation of the 1-based partition number.
> + * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
> + *    a partition with a known unique id.
> + * 7) <major>:<minor> major and minor number of the device separated by
> + *    a colon.
> + */
> +static char blkdev[80] = CONFIG_PSTORE_BLK_BLKDEV;
> +module_param_string(blkdev, blkdev, 80, 0400);
> +MODULE_PARM_DESC(blkdev, "the block device for general read/write");
> +
> +static DEFINE_MUTEX(psz_lock);
> +static struct block_device *psblk_bdev;
> +static struct pstore_zone_info *pstore_zone_info;
> +static psblk_panic_write_op blkdev_panic_write;
> +static struct bdev_info {
> +	dev_t devt;
> +	sector_t nr_sects;
> +	sector_t start_sect;
> +} g_bdev_info;
> +
> +/**
> + * struct psblk_device - back-end pstore/blk driver structure.
> + *
> + * @total_size: The total size in bytes pstore/blk can use. It must be greater
> + *		than 4096 and be multiple of 4096.
> + * @read:	The general read operation. Both of the function parameters
> + *		@size and @offset are relative value to bock device (not the
> + *		whole disk).
> + *		On success, the number of bytes should be returned, others
> + *		means error.
> + * @write:	The same as @read.
> + * @panic_write:The write operation only used for panic case. It's optional
> + *		if you do not care panic log. The parameters and return value
> + *		are the same as @read.
> + */
> +struct psblk_device {
> +	unsigned long total_size;
> +	psz_read_op read;
> +	psz_write_op write;
> +	psz_write_op panic_write;
> +};
> +
> +static int psblk_register_do(struct psblk_device *dev)
> +{
> +	int ret;
> +
> +	if (!dev || !dev->total_size || !dev->read || !dev->write)
> +		return -EINVAL;
> +
> +	mutex_lock(&psz_lock);
> +
> +	/* someone already registered before */
> +	if (pstore_zone_info) {
> +		mutex_unlock(&psz_lock);
> +		return -EBUSY;
> +	}
> +	pstore_zone_info = kzalloc(sizeof(struct pstore_zone_info), GFP_KERNEL);
> +	if (!pstore_zone_info) {
> +		mutex_unlock(&psz_lock);
> +		return -ENOMEM;
> +	}
> +
> +#define verify_size(name, alignsize) {					\
> +		long _##name_ = (name);					\
> +		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
> +		if (_##name_ & ((alignsize) - 1)) {			\
> +			pr_info(#name " must align to %d\n",		\
> +					(alignsize));			\
> +			_##name_ = ALIGN(name, (alignsize));		\
> +		}							\
> +		name = _##name_ / 1024;					\
> +		pstore_zone_info->name = _##name_;				\
> +	}
> +
> +	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);

This leads to compilation failure, I think it should be:

	verify_size(kmsg_size, 4096);

the flags is supported by patch 4.

> +#undef verify_size
> +
> +	pstore_zone_info->total_size = dev->total_size;
> +	pstore_zone_info->max_reason = max_reason;
> +	pstore_zone_info->read = dev->read;
> +	pstore_zone_info->write = dev->write;
> +	pstore_zone_info->panic_write = dev->panic_write;
> +	pstore_zone_info->name = MODNAME;
> +	pstore_zone_info->owner = THIS_MODULE;
> +
> +	ret = register_pstore_zone(pstore_zone_info);
> +	if (ret) {
> +		kfree(pstore_zone_info);
> +		pstore_zone_info = NULL;
> +	}
> +	mutex_unlock(&psz_lock);
> +	return ret;
> +}
> +
> +static void psblk_unregister_do(struct psblk_device *dev)
> +{
> +	mutex_lock(&psz_lock);
> +	if (pstore_zone_info && pstore_zone_info->read == dev->read) {
> +		unregister_pstore_zone(pstore_zone_info);
> +		kfree(pstore_zone_info);
> +		pstore_zone_info = NULL;
> +	}
> +	mutex_unlock(&psz_lock);
> +}
> +
> +/**
> + * psblk_get_bdev() - open block device
> + * @holder: exclusive holder identifier
> + *
> + * Return: pointer to block device on success and others on error.
> + *
> + * On success, the returned block_device has reference count of one.
> + */
> +static struct block_device *psblk_get_bdev(void *holder)
> +{
> +	struct block_device *bdev = ERR_PTR(-ENODEV);
> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
> +
> +	if (!blkdev[0])
> +		return ERR_PTR(-ENODEV);
> +
> +	mutex_lock(&psz_lock);
> +	if (pstore_zone_info)
> +		goto out;
> +	if (holder)
> +		mode |= FMODE_EXCL;
> +	bdev = blkdev_get_by_path(blkdev, mode, holder);
> +	if (IS_ERR(bdev)) {
> +		dev_t devt;
> +
> +		devt = name_to_dev_t(blkdev);
> +		if (devt == 0) {
> +			bdev = ERR_PTR(-ENODEV);
> +			goto out;
> +		}
> +		bdev = blkdev_get_by_dev(devt, mode, holder);
> +	}
> +out:
> +	mutex_unlock(&psz_lock);
> +	return bdev;
> +}
> +
> +static void psblk_put_bdev(struct block_device *bdev, void *holder)
> +{
> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
> +
> +	if (!bdev)
> +		return;
> +
> +	mutex_lock(&psz_lock);
> +	if (holder)
> +		mode |= FMODE_EXCL;
> +	blkdev_put(bdev, mode);
> +	mutex_unlock(&psz_lock);
> +}
> +
> +static ssize_t psblk_generic_blk_read(char *buf, size_t bytes, loff_t pos)
> +{
> +	struct block_device *bdev = psblk_bdev;
> +	struct file file;
> +	struct kiocb kiocb;
> +	struct iov_iter iter;
> +	struct kvec iov = {.iov_base = buf, .iov_len = bytes};
> +
> +	if (!bdev)
> +		return -ENODEV;
> +
> +	memset(&file, 0, sizeof(struct file));
> +	file.f_mapping = bdev->bd_inode->i_mapping;
> +	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
> +	file.f_inode = bdev->bd_inode;
> +	file_ra_state_init(&file.f_ra, file.f_mapping);
> +
> +	init_sync_kiocb(&kiocb, &file);
> +	kiocb.ki_pos = pos;
> +	iov_iter_kvec(&iter, READ, &iov, 1, bytes);
> +
> +	return generic_file_read_iter(&kiocb, &iter);
> +}
> +
> +static ssize_t psblk_generic_blk_write(const char *buf, size_t bytes,
> +		loff_t pos)
> +{
> +	struct block_device *bdev = psblk_bdev;
> +	struct iov_iter iter;
> +	struct kiocb kiocb;
> +	struct file file;
> +	ssize_t ret;
> +	struct kvec iov = {.iov_base = (void *)buf, .iov_len = bytes};
> +
> +	if (!bdev)
> +		return -ENODEV;
> +
> +	/* Console/Ftrace backend may handle buffer until flush dirty zones */
> +	if (in_interrupt() || irqs_disabled())
> +		return -EBUSY;
> +
> +	memset(&file, 0, sizeof(struct file));
> +	file.f_mapping = bdev->bd_inode->i_mapping;
> +	file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
> +	file.f_inode = bdev->bd_inode;
> +
> +	init_sync_kiocb(&kiocb, &file);
> +	kiocb.ki_pos = pos;
> +	iov_iter_kvec(&iter, WRITE, &iov, 1, bytes);
> +
> +	inode_lock(bdev->bd_inode);
> +	ret = generic_write_checks(&kiocb, &iter);
> +	if (ret > 0)
> +		ret = generic_perform_write(&file, &iter, pos);
> +	inode_unlock(bdev->bd_inode);
> +
> +	if (likely(ret > 0)) {
> +		const struct file_operations f_op = {.fsync = blkdev_fsync};
> +
> +		file.f_op = &f_op;
> +		kiocb.ki_pos += ret;
> +		ret = generic_write_sync(&kiocb, ret);
> +	}
> +	return ret;
> +}
> +
> +static inline unsigned long psblk_bdev_size(struct block_device *bdev)
> +{
> +	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
> +}
> +
> +static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
> +		loff_t off)
> +{
> +	int ret;
> +
> +	if (!blkdev_panic_write)
> +		return -EOPNOTSUPP;
> +
> +	/* size and off must align to SECTOR_SIZE for block device */
> +	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
> +			size >> SECTOR_SHIFT);
> +	return ret ? -EIO : size;
> +}
> +
> +static struct bdev_info *psblk_get_bdev_info(void)
> +{
> +	struct bdev_info *info = &g_bdev_info;
> +	struct block_device *bdev;
> +
> +	if (info->devt)
> +		return info;
> +
> +	bdev = psblk_get_bdev(NULL);
> +	if (IS_ERR(bdev))
> +		return ERR_CAST(bdev);
> +
> +	info->devt = bdev->bd_dev;
> +	info->nr_sects = part_nr_sects_read(bdev->bd_part);
> +	info->start_sect = get_start_sect(bdev);
> +
> +	if (!psblk_bdev_size(bdev)) {
> +		pr_err("not enough space to '%s'\n", blkdev);
> +		info = ERR_PTR(-ENOSPC);
> +	}
> +
> +	psblk_put_bdev(bdev, NULL);
> +	return info;
> +}
> +
> +/**
> + * psblk_register_blkdev() - register block device to pstore/blk
> + *
> + * @major: the major device number of registering device
> + * @panic_write: the interface for panic case.
> + *
> + * Only the matching major to @blkdev can register.
> + *
> + * If block device do not support panic write, @panic_write can be NULL.
> + *
> + * Return:
> + * * 0		- OK
> + * * Others	- something error.
> + */
> +int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
> +{
> +	struct block_device *bdev;
> +	struct psblk_device dev = {0};
> +	struct bdev_info *binfo;
> +	int ret = -ENODEV;
> +	void *holder = blkdev;
> +
> +	binfo = psblk_get_bdev_info();
> +	if (IS_ERR(binfo))
> +		return PTR_ERR(binfo);
> +
> +	/* only allow driver matching the @blkdev */
> +	if (!binfo->devt || MAJOR(binfo->devt) != major) {
> +		pr_debug("invalid major %u (expect %u)\n",
> +				major, MAJOR(binfo->devt));
> +		return -ENODEV;
> +	}
> +
> +	/* hold bdev exclusively */
> +	bdev = psblk_get_bdev(holder);
> +	if (IS_ERR(bdev)) {
> +		pr_err("failed to open '%s'!\n", blkdev);
> +		return PTR_ERR(bdev);
> +	}
> +
> +	/* psblk_bdev must be assigned before register to pstore/blk */
> +	psblk_bdev = bdev;
> +	blkdev_panic_write = panic_write;
> +
> +	dev.total_size = psblk_bdev_size(bdev);
> +	dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
> +	dev.read = psblk_generic_blk_read;
> +	dev.write = psblk_generic_blk_write;
> +
> +	ret = psblk_register_do(&dev);
> +	if (ret)
> +		goto err_put_bdev;
> +
> +	pr_info("using '%s'\n", blkdev);
> +	return 0;
> +
> +err_put_bdev:
> +	psblk_bdev = NULL;
> +	blkdev_panic_write = NULL;
> +	psblk_put_bdev(bdev, holder);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(psblk_register_blkdev);
> +
> +/**
> + * psblk_unregister_blkdev() - unregister block device from pstore/blk
> + *
> + * @major: the major device number of device
> + */
> +void psblk_unregister_blkdev(unsigned int major)
> +{
> +	struct psblk_device dev = {.read = psblk_generic_blk_read};
> +	void *holder = blkdev;
> +
> +	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
> +		psblk_unregister_do(&dev);
> +		psblk_put_bdev(psblk_bdev, holder);
> +		blkdev_panic_write = NULL;
> +		psblk_bdev = NULL;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(psblk_unregister_blkdev);
> +
> +/**
> + * psblk_blkdev_info() - get information of @blkdev
> + *
> + * @devt: the block device num of @blkdev
> + * @nr_sects: the sector count of @blkdev
> + * @start_sect: the start sector of @blkdev
> + *
> + * Block driver needs the follow information for @panic_write.
> + *
> + * Return: 0 on success, others on failure.
> + */
> +int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
> +{
> +	struct bdev_info *binfo;
> +
> +	binfo = psblk_get_bdev_info();
> +	if (IS_ERR(binfo))
> +		return PTR_ERR(binfo);
> +
> +	if (devt)
> +		*devt = binfo->devt;
> +	if (nr_sects)
> +		*nr_sects = binfo->nr_sects;
> +	if (start_sect)
> +		*start_sect = binfo->start_sect;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(psblk_blkdev_info);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("pstore backend for block devices");
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> new file mode 100644
> index 000000000000..5ff465e3953e
> --- /dev/null
> +++ b/include/linux/pstore_blk.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PSTORE_BLK_H_
> +#define __PSTORE_BLK_H_
> +
> +#include <linux/types.h>
> +#include <linux/pstore_zone.h>
> +
> +/**
> + * typedef psblk_panic_write_op - panic write operation to block device
> + *
> + * @buf: the data to write
> + * @start_sect: start sector to block device
> + * @sects: sectors count on buf
> + *
> + * Return: On success, zero should be returned. Others mean error.
> + *
> + * Panic write to block device must be aligned to SECTOR_SIZE.
> + */
> +typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
> +		sector_t sects);
> +
> +int  psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write);
> +void psblk_unregister_blkdev(unsigned int major);
> +int  psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
> +
> +#endif
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/12] pstore/blk: Add support for pmsg frontend
  2020-05-08  6:39   ` Kees Cook
@ 2020-05-09  4:38     ` WeiXiong Liao
  -1 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  4:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> Add pmsg support to pstore/blk (through pstore/zone). To enable, pmsg_size
> must be greater than 0 and a multiple of 4096.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-5-git-send-email-liaoweixiong@allwinnertech.com
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig           |  12 ++
>  fs/pstore/blk.c             |   9 ++
>  fs/pstore/zone.c            | 268 ++++++++++++++++++++++++++++++++++--
>  include/linux/pstore_zone.h |   2 +
>  4 files changed, 281 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 92ba73bd0b62..f18cd126d83f 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -224,3 +224,15 @@ config PSTORE_BLK_MAX_REASON
>  
>  	  NOTE that, both Kconfig and module parameters can configure
>  	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_PMSG_SIZE
> +	int "Size in Kbytes of pmsg to store"
> +	depends on PSTORE_BLK
> +	depends on PSTORE_PMSG
> +	default 64
> +	help
> +	  This just sets size of pmsg (pmsg_size) for pstore/blk. The size is
> +	  in KB and must be a multiple of 4.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> index d1c3074aa128..401e5ba66a5f 100644
> --- a/fs/pstore/blk.c
> +++ b/fs/pstore/blk.c
> @@ -24,6 +24,14 @@ module_param(max_reason, int, 0400);
>  MODULE_PARM_DESC(max_reason,
>  		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
>  
> +#if IS_ENABLED(CONFIG_PSTORE_PMSG)
> +static long pmsg_size = CONFIG_PSTORE_BLK_PMSG_SIZE;
> +#else
> +static long pmsg_size = -1;
> +#endif
> +module_param(pmsg_size, long, 0400);
> +MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
> +
>  /*
>   * blkdev - The block device to use.
>   *
> @@ -124,6 +132,7 @@ static int psblk_register_do(struct psblk_device *dev)
>  	}
>  
>  	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
> +	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
>  #undef verify_size
>  
>  	pstore_zone_info->total_size = dev->total_size;
> diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
> index 6c25c443c8e2..f472b06a6c14 100644
> --- a/fs/pstore/zone.c
> +++ b/fs/pstore/zone.c
> @@ -23,12 +23,14 @@
>   *
>   * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
>   * @datalen: length of data in @data
> + * @start: offset into @data where the beginning of the stored bytes begin
>   * @data: zone data.
>   */
>  struct psz_buffer {
>  #define PSZ_SIG (0x43474244) /* DBGC */
>  	uint32_t sig;
>  	atomic_t datalen;
> +	atomic_t start;
>  	uint8_t data[];
>  };
>  
> @@ -84,9 +86,11 @@ struct pstore_zone {
>   * struct psz_context - all about running state of pstore/zone
>   *
>   * @opszs: oops/panic storage zones
> + * @ppsz: pmsg storage zone
>   * @oops_max_cnt: max count of @opszs
>   * @oops_read_cnt: counter to read oops zone
>   * @oops_write_cnt: counter to write
> + * @pmsg_read_cnt: counter to read pmsg zone
>   * @oops_counter: counter to oops
>   * @panic_counter: counter to panic
>   * @recovered: whether finish recovering data from storage
> @@ -97,9 +101,11 @@ struct pstore_zone {
>   */
>  struct psz_context {
>  	struct pstore_zone **opszs;
> +	struct pstore_zone *ppsz;
>  	unsigned int oops_max_cnt;
>  	unsigned int oops_read_cnt;
>  	unsigned int oops_write_cnt;
> +	unsigned int pmsg_read_cnt;
>  	/*
>  	 * the counter should be recovered when recover.
>  	 * It records the oops/panic times after burning rather than booting.
> @@ -139,6 +145,11 @@ static inline int buffer_datalen(struct pstore_zone *zone)
>  	return atomic_read(&zone->buffer->datalen);
>  }
>  
> +static inline int buffer_start(struct pstore_zone *zone)
> +{
> +	return atomic_read(&zone->buffer->start);
> +}
> +
>  static inline bool is_on_panic(void)
>  {
>  	struct psz_context *cxt = &psz_cxt;
> @@ -146,10 +157,10 @@ static inline bool is_on_panic(void)
>  	return atomic_read(&cxt->on_panic);
>  }
>  
> -static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
> +static ssize_t psz_zone_read_buffer(struct pstore_zone *zone, char *buf,
>  		size_t len, unsigned long off)
>  {
> -	if (!buf || !zone->buffer)
> +	if (!buf || !zone || !zone->buffer)
>  		return -EINVAL;
>  	if (off > zone->buffer_size)
>  		return -EINVAL;
> @@ -158,6 +169,18 @@ static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
>  	return len;
>  }
>  
> +static int psz_zone_read_oldbuf(struct pstore_zone *zone, char *buf,
> +		size_t len, unsigned long off)
> +{
> +	if (!buf || !zone || !zone->oldbuf)
> +		return -EINVAL;
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +	len = min_t(size_t, len, zone->buffer_size - off);
> +	memcpy(buf, zone->oldbuf->data + off, len);
> +	return 0;
> +}
> +
>  static int psz_zone_write(struct pstore_zone *zone,
>  		enum psz_flush_mode flush_mode, const char *buf,
>  		size_t len, unsigned long off)
> @@ -413,6 +436,93 @@ static int psz_recover_oops(struct psz_context *cxt)
>  	return ret;
>  }
>  
> +static int psz_recover_zone(struct psz_context *cxt, struct pstore_zone *zone)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	struct psz_buffer *oldbuf, tmpbuf;
> +	int ret = 0;
> +	char *buf;
> +	ssize_t rcnt, len, start, off;
> +
> +	if (!zone || zone->oldbuf)
> +		return 0;
> +
> +	if (is_on_panic()) {
> +		/* save data as much as possible */
> +		psz_flush_dirty_zone(zone);
> +		return 0;
> +	}
> +
> +	if (unlikely(!info->read))
> +		return -EINVAL;
> +
> +	len = sizeof(struct psz_buffer);
> +	rcnt = info->read((char *)&tmpbuf, len, zone->off);
> +	if (rcnt != len) {
> +		pr_debug("read zone %s failed\n", zone->name);
> +		return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +	}
> +
> +	if (tmpbuf.sig != zone->buffer->sig) {
> +		pr_debug("no valid data in zone %s\n", zone->name);
> +		return 0;
> +	}
> +
> +	if (zone->buffer_size < atomic_read(&tmpbuf.datalen) ||
> +		zone->buffer_size < atomic_read(&tmpbuf.start)) {
> +		pr_info("found overtop zone: %s: off %lld, size %zu\n",
> +				zone->name, zone->off, zone->buffer_size);
> +		/* just keep going */
> +		return 0;
> +	}
> +
> +	if (!atomic_read(&tmpbuf.datalen)) {
> +		pr_debug("found erased zone: %s: off %lld, size %zu, datalen %d\n",
> +				zone->name, zone->off, zone->buffer_size,
> +				atomic_read(&tmpbuf.datalen));
> +		return 0;
> +	}
> +
> +	pr_debug("found nice zone: %s: off %lld, size %zu, datalen %d\n",
> +			zone->name, zone->off, zone->buffer_size,
> +			atomic_read(&tmpbuf.datalen));
> +
> +	len = atomic_read(&tmpbuf.datalen) + sizeof(*oldbuf);
> +	oldbuf = kzalloc(len, GFP_KERNEL);
> +	if (!oldbuf)
> +		return -ENOMEM;
> +
> +	memcpy(oldbuf, &tmpbuf, sizeof(*oldbuf));
> +	buf = (char *)oldbuf + sizeof(*oldbuf);
> +	len = atomic_read(&oldbuf->datalen);
> +	start = atomic_read(&oldbuf->start);
> +	off = zone->off + sizeof(*oldbuf);
> +
> +	/* get part of data */
> +	rcnt = info->read(buf, len - start, off + start);
> +	if (rcnt != len - start) {
> +		pr_err("read zone %s failed\n", zone->name);
> +		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		goto free_oldbuf;
> +	}
> +
> +	/* get the rest of data */
> +	rcnt = info->read(buf + len - start, start, off);
> +	if (rcnt != start) {
> +		pr_err("read zone %s failed\n", zone->name);
> +		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		goto free_oldbuf;
> +	}
> +
> +	zone->oldbuf = oldbuf;
> +	psz_flush_dirty_zone(zone);
> +	return 0;
> +
> +free_oldbuf:
> +	kfree(oldbuf);
> +	return ret;
> +}
> +
>  /**
>   * psz_recovery() - recover data from storage
>   * @cxt: the context of pstore/zone
> @@ -432,6 +542,10 @@ static inline int psz_recovery(struct psz_context *cxt)
>  	if (ret)
>  		goto recover_fail;
>  
> +	ret = psz_recover_zone(cxt, cxt->ppsz);
> +	if (ret)
> +		goto recover_fail;
> +
>  	pr_debug("recover end!\n");
>  	atomic_set(&cxt->recovered, 1);
>  	return 0;
> @@ -446,9 +560,17 @@ static int psz_pstore_open(struct pstore_info *psi)
>  	struct psz_context *cxt = psi->data;
>  
>  	cxt->oops_read_cnt = 0;
> +	cxt->pmsg_read_cnt = 0;
>  	return 0;
>  }
>  
> +static inline bool psz_old_ok(struct pstore_zone *zone)
> +{
> +	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
> +		return true;
> +	return false;
> +}
> +
>  static inline bool psz_ok(struct pstore_zone *zone)
>  {
>  	if (zone && zone->buffer && buffer_datalen(zone))
> @@ -473,6 +595,25 @@ static inline int psz_oops_erase(struct psz_context *cxt,
>  	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>  }
>  
> +static inline int psz_record_erase(struct psz_context *cxt,
> +		struct pstore_zone *zone)
> +{
> +	if (unlikely(!psz_old_ok(zone)))
> +		return 0;
> +
> +	kfree(zone->oldbuf);
> +	zone->oldbuf = NULL;
> +	/*
> +	 * if there are new data in zone buffer, that means the old data
> +	 * are already invalid. It is no need to flush 0 (erase) to
> +	 * block device.
> +	 */
> +	if (!buffer_datalen(zone))
> +		return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +	psz_flush_dirty_zone(zone);
> +	return 0;
> +}
> +
>  static int psz_pstore_erase(struct pstore_record *record)
>  {
>  	struct psz_context *cxt = record->psi->data;
> @@ -482,6 +623,8 @@ static int psz_pstore_erase(struct pstore_record *record)
>  		if (record->id >= cxt->oops_max_cnt)
>  			return -EINVAL;
>  		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
> +	case PSTORE_TYPE_PMSG:
> +		return psz_record_erase(cxt, cxt->ppsz);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -502,8 +645,10 @@ static void psz_write_kmsg_hdr(struct pstore_zone *zone,
>  	hdr->reason = record->reason;
>  	if (hdr->reason == KMSG_DUMP_OOPS)
>  		hdr->counter = ++cxt->oops_counter;
> -	else
> +	else if (hdr->reason == KMSG_DUMP_PANIC)
>  		hdr->counter = ++cxt->panic_counter;
> +	else
> +		hdr->counter = 0;
>  }
>  
>  static inline int notrace psz_oops_write_record(struct psz_context *cxt,
> @@ -553,6 +698,53 @@ static int notrace psz_oops_write(struct psz_context *cxt,

I think we should also try to flush pmsg zone if it's dirty in case of panic
and lost data.

@@ -690,8 +690,9 @@ static int notrace psz_oops_write(struct psz_context
*cxt,

        ret = psz_oops_write_record(cxt, record);
        if (!ret) {
-               pr_debug("try to flush other dirty oops zones\n");
+               pr_debug("try to flush other dirty zones\n");
                psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+               psz_flush_dirty_zone(cxt->ppsz);
        }

        /* always return 0 as we had handled it on buffer */

>  	return 0;
>  }
>  
> +static int notrace psz_record_write(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	size_t start, rem;
> +	int cnt = record->size;
> +	bool is_full_data = false;
> +	char *buf = record->buf;
> +
> +	if (!zone || !record)
> +		return -ENOSPC;
> +
> +	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
> +		is_full_data = true;
> +
> +	if (unlikely(cnt > zone->buffer_size)) {
> +		buf += cnt - zone->buffer_size;
> +		cnt = zone->buffer_size;
> +	}
> +
> +	start = buffer_start(zone);
> +	rem = zone->buffer_size - start;
> +	if (unlikely(rem < cnt)) {
> +		psz_zone_write(zone, FLUSH_PART, buf, rem, start);
> +		buf += rem;
> +		cnt -= rem;
> +		start = 0;
> +		is_full_data = true;
> +	}
> +
> +	atomic_set(&zone->buffer->start, cnt + start);
> +	psz_zone_write(zone, FLUSH_PART, buf, cnt, start);
> +
> +	/**
> +	 * psz_zone_write will set datalen as start + cnt.
> +	 * It work if actual data length lesser than buffer size.
> +	 * If data length greater than buffer size, pmsg will rewrite to
> +	 * beginning of zone, which make buffer->datalen wrongly.
> +	 * So we should reset datalen as buffer size once actual data length
> +	 * greater than buffer size.
> +	 */
> +	if (is_full_data) {
> +		atomic_set(&zone->buffer->datalen, zone->buffer_size);
> +		psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +	}
> +	return 0;
> +}
> +
>  static int notrace psz_pstore_write(struct pstore_record *record)
>  {
>  	struct psz_context *cxt = record->psi->data;
> @@ -564,6 +756,8 @@ static int notrace psz_pstore_write(struct pstore_record *record)
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
>  		return psz_oops_write(cxt, record);
> +	case PSTORE_TYPE_PMSG:
> +		return psz_record_write(cxt->ppsz, record);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -579,6 +773,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
>  			return zone;
>  	}
>  
> +	if (cxt->pmsg_read_cnt == 0) {
> +		cxt->pmsg_read_cnt++;
> +		zone = cxt->ppsz;
> +		if (psz_old_ok(zone))
> +			return zone;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -629,7 +830,7 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
>  			return -ENOMEM;
>  	}
>  
> -	size = psz_zone_read(zone, record->buf + hlen, size,
> +	size = psz_zone_read_buffer(zone, record->buf + hlen, size,
>  			sizeof(struct psz_oops_header) < 0);
>  	if (unlikely(size < 0)) {
>  		kfree(record->buf);
> @@ -639,6 +840,32 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
>  	return size + hlen;
>  }
>  
> +static ssize_t psz_record_read(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	size_t len;
> +	struct psz_buffer *buf;
> +
> +	if (!zone || !record)
> +		return -ENOSPC;
> +
> +	buf = (struct psz_buffer *)zone->oldbuf;
> +	if (!buf)
> +		return -ENOMSG;
> +
> +	len = atomic_read(&buf->datalen);
> +	record->buf = kmalloc(len, GFP_KERNEL);
> +	if (!record->buf)
> +		return -ENOMEM;
> +
> +	if (unlikely(psz_zone_read_oldbuf(zone, record->buf, len, 0))) {
> +		kfree(record->buf);
> +		return -ENOMSG;
> +	}
> +
> +	return len;
> +}
> +
>  static ssize_t psz_pstore_read(struct pstore_record *record)
>  {
>  	struct psz_context *cxt = record->psi->data;
> @@ -663,6 +890,9 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
>  		readop = psz_oops_read;
>  		record->id = cxt->oops_read_cnt - 1;
>  		break;
> +	case PSTORE_TYPE_PMSG:
> +		readop = psz_record_read;
> +		break;
>  	default:
>  		goto next_zone;
>  	}
> @@ -718,8 +948,10 @@ static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
>  	zone->type = type;
>  	zone->buffer_size = size - sizeof(struct psz_buffer);
>  	zone->buffer->sig = type ^ PSZ_SIG;
> +	zone->oldbuf = NULL;
>  	atomic_set(&zone->dirty, 0);
>  	atomic_set(&zone->buffer->datalen, 0);
> +	atomic_set(&zone->buffer->start, 0);
>  
>  	*off += size;
>  
> @@ -803,6 +1035,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
>  {
>  	if (cxt->opszs)
>  		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
> +	if (cxt->ppsz)
> +		psz_free_zone(&cxt->ppsz);
>  }
>  
>  static int psz_alloc_zones(struct psz_context *cxt)
> @@ -810,18 +1044,26 @@ static int psz_alloc_zones(struct psz_context *cxt)
>  	struct pstore_zone_info *info = cxt->pstore_zone_info;
>  	loff_t off = 0;
>  	int err;
> -	size_t size;
> +	size_t off_size = 0;
>  
> -	size = info->total_size;
> -	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
> +	off_size += info->pmsg_size;
> +	cxt->ppsz = psz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
> +	if (IS_ERR(cxt->ppsz)) {
> +		err = PTR_ERR(cxt->ppsz);
> +		goto free_out;
> +	}
> +
> +	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
> +			info->total_size - off_size,
>  			info->kmsg_size, &cxt->oops_max_cnt);
>  	if (IS_ERR(cxt->opszs)) {
>  		err = PTR_ERR(cxt->opszs);
> -		goto fail_out;
> +		goto free_out;
>  	}
>  
>  	return 0;
> -fail_out:
> +free_out:
> +	psz_free_all_zones(cxt);
>  	return err;
>  }
>  
> @@ -844,7 +1086,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  		return -EINVAL;
>  	}
>  
> -	if (!info->kmsg_size) {
> +	if (!info->kmsg_size && !info->pmsg_size) {
>  		pr_warn("at least one of the records be non-zero\n");
>  		return -EINVAL;
>  	}
> @@ -871,6 +1113,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  
>  	check_size(total_size, 4096);
>  	check_size(kmsg_size, SECTOR_SIZE);
> +	check_size(pmsg_size, SECTOR_SIZE);
>  
>  #undef check_size
>  
> @@ -897,6 +1140,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  	pr_debug("register %s with properties:\n", info->name);
>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>  	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
> +	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
>  
>  	err = psz_alloc_zones(cxt);
>  	if (err) {
> @@ -925,6 +1169,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  			pr_cont(",panic_write");
>  		pr_cont(")");
>  	}
> +	if (info->pmsg_size) {
> +		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
> +		pr_cont(" pmsg");
> +	}
>  	pr_cont("\n");
>  
>  	err = pstore_register(&cxt->pstore);
> diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
> index a6a79ff1351b..39c2cb944123 100644
> --- a/include/linux/pstore_zone.h
> +++ b/include/linux/pstore_zone.h
> @@ -17,6 +17,7 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
>   * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
>   *		it must be multiple of SECTOR_SIZE(512 Bytes).
>   * @max_reason: Maximum kmsg dump reason to store.
> + * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
>   * @read:	The general read operation. Both of the function parameters
>   *		@size and @offset are relative value to storage.
>   *		On success, the number of bytes should be returned, others
> @@ -33,6 +34,7 @@ struct pstore_zone_info {
>  	unsigned long total_size;
>  	unsigned long kmsg_size;
>  	int max_reason;
> +	unsigned long pmsg_size;
>  	psz_read_op read;
>  	psz_write_op write;
>  	psz_write_op panic_write;
> 

-- 
WeiXiong Liao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/12] pstore/blk: Add support for pmsg frontend
@ 2020-05-09  4:38     ` WeiXiong Liao
  0 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  4:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> Add pmsg support to pstore/blk (through pstore/zone). To enable, pmsg_size
> must be greater than 0 and a multiple of 4096.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-5-git-send-email-liaoweixiong@allwinnertech.com
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig           |  12 ++
>  fs/pstore/blk.c             |   9 ++
>  fs/pstore/zone.c            | 268 ++++++++++++++++++++++++++++++++++--
>  include/linux/pstore_zone.h |   2 +
>  4 files changed, 281 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 92ba73bd0b62..f18cd126d83f 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -224,3 +224,15 @@ config PSTORE_BLK_MAX_REASON
>  
>  	  NOTE that, both Kconfig and module parameters can configure
>  	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_PMSG_SIZE
> +	int "Size in Kbytes of pmsg to store"
> +	depends on PSTORE_BLK
> +	depends on PSTORE_PMSG
> +	default 64
> +	help
> +	  This just sets size of pmsg (pmsg_size) for pstore/blk. The size is
> +	  in KB and must be a multiple of 4.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> index d1c3074aa128..401e5ba66a5f 100644
> --- a/fs/pstore/blk.c
> +++ b/fs/pstore/blk.c
> @@ -24,6 +24,14 @@ module_param(max_reason, int, 0400);
>  MODULE_PARM_DESC(max_reason,
>  		 "maximum reason for kmsg dump (default 2: Oops and Panic)");
>  
> +#if IS_ENABLED(CONFIG_PSTORE_PMSG)
> +static long pmsg_size = CONFIG_PSTORE_BLK_PMSG_SIZE;
> +#else
> +static long pmsg_size = -1;
> +#endif
> +module_param(pmsg_size, long, 0400);
> +MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
> +
>  /*
>   * blkdev - The block device to use.
>   *
> @@ -124,6 +132,7 @@ static int psblk_register_do(struct psblk_device *dev)
>  	}
>  
>  	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
> +	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
>  #undef verify_size
>  
>  	pstore_zone_info->total_size = dev->total_size;
> diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
> index 6c25c443c8e2..f472b06a6c14 100644
> --- a/fs/pstore/zone.c
> +++ b/fs/pstore/zone.c
> @@ -23,12 +23,14 @@
>   *
>   * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value)
>   * @datalen: length of data in @data
> + * @start: offset into @data where the beginning of the stored bytes begin
>   * @data: zone data.
>   */
>  struct psz_buffer {
>  #define PSZ_SIG (0x43474244) /* DBGC */
>  	uint32_t sig;
>  	atomic_t datalen;
> +	atomic_t start;
>  	uint8_t data[];
>  };
>  
> @@ -84,9 +86,11 @@ struct pstore_zone {
>   * struct psz_context - all about running state of pstore/zone
>   *
>   * @opszs: oops/panic storage zones
> + * @ppsz: pmsg storage zone
>   * @oops_max_cnt: max count of @opszs
>   * @oops_read_cnt: counter to read oops zone
>   * @oops_write_cnt: counter to write
> + * @pmsg_read_cnt: counter to read pmsg zone
>   * @oops_counter: counter to oops
>   * @panic_counter: counter to panic
>   * @recovered: whether finish recovering data from storage
> @@ -97,9 +101,11 @@ struct pstore_zone {
>   */
>  struct psz_context {
>  	struct pstore_zone **opszs;
> +	struct pstore_zone *ppsz;
>  	unsigned int oops_max_cnt;
>  	unsigned int oops_read_cnt;
>  	unsigned int oops_write_cnt;
> +	unsigned int pmsg_read_cnt;
>  	/*
>  	 * the counter should be recovered when recover.
>  	 * It records the oops/panic times after burning rather than booting.
> @@ -139,6 +145,11 @@ static inline int buffer_datalen(struct pstore_zone *zone)
>  	return atomic_read(&zone->buffer->datalen);
>  }
>  
> +static inline int buffer_start(struct pstore_zone *zone)
> +{
> +	return atomic_read(&zone->buffer->start);
> +}
> +
>  static inline bool is_on_panic(void)
>  {
>  	struct psz_context *cxt = &psz_cxt;
> @@ -146,10 +157,10 @@ static inline bool is_on_panic(void)
>  	return atomic_read(&cxt->on_panic);
>  }
>  
> -static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
> +static ssize_t psz_zone_read_buffer(struct pstore_zone *zone, char *buf,
>  		size_t len, unsigned long off)
>  {
> -	if (!buf || !zone->buffer)
> +	if (!buf || !zone || !zone->buffer)
>  		return -EINVAL;
>  	if (off > zone->buffer_size)
>  		return -EINVAL;
> @@ -158,6 +169,18 @@ static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf,
>  	return len;
>  }
>  
> +static int psz_zone_read_oldbuf(struct pstore_zone *zone, char *buf,
> +		size_t len, unsigned long off)
> +{
> +	if (!buf || !zone || !zone->oldbuf)
> +		return -EINVAL;
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +	len = min_t(size_t, len, zone->buffer_size - off);
> +	memcpy(buf, zone->oldbuf->data + off, len);
> +	return 0;
> +}
> +
>  static int psz_zone_write(struct pstore_zone *zone,
>  		enum psz_flush_mode flush_mode, const char *buf,
>  		size_t len, unsigned long off)
> @@ -413,6 +436,93 @@ static int psz_recover_oops(struct psz_context *cxt)
>  	return ret;
>  }
>  
> +static int psz_recover_zone(struct psz_context *cxt, struct pstore_zone *zone)
> +{
> +	struct pstore_zone_info *info = cxt->pstore_zone_info;
> +	struct psz_buffer *oldbuf, tmpbuf;
> +	int ret = 0;
> +	char *buf;
> +	ssize_t rcnt, len, start, off;
> +
> +	if (!zone || zone->oldbuf)
> +		return 0;
> +
> +	if (is_on_panic()) {
> +		/* save data as much as possible */
> +		psz_flush_dirty_zone(zone);
> +		return 0;
> +	}
> +
> +	if (unlikely(!info->read))
> +		return -EINVAL;
> +
> +	len = sizeof(struct psz_buffer);
> +	rcnt = info->read((char *)&tmpbuf, len, zone->off);
> +	if (rcnt != len) {
> +		pr_debug("read zone %s failed\n", zone->name);
> +		return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +	}
> +
> +	if (tmpbuf.sig != zone->buffer->sig) {
> +		pr_debug("no valid data in zone %s\n", zone->name);
> +		return 0;
> +	}
> +
> +	if (zone->buffer_size < atomic_read(&tmpbuf.datalen) ||
> +		zone->buffer_size < atomic_read(&tmpbuf.start)) {
> +		pr_info("found overtop zone: %s: off %lld, size %zu\n",
> +				zone->name, zone->off, zone->buffer_size);
> +		/* just keep going */
> +		return 0;
> +	}
> +
> +	if (!atomic_read(&tmpbuf.datalen)) {
> +		pr_debug("found erased zone: %s: off %lld, size %zu, datalen %d\n",
> +				zone->name, zone->off, zone->buffer_size,
> +				atomic_read(&tmpbuf.datalen));
> +		return 0;
> +	}
> +
> +	pr_debug("found nice zone: %s: off %lld, size %zu, datalen %d\n",
> +			zone->name, zone->off, zone->buffer_size,
> +			atomic_read(&tmpbuf.datalen));
> +
> +	len = atomic_read(&tmpbuf.datalen) + sizeof(*oldbuf);
> +	oldbuf = kzalloc(len, GFP_KERNEL);
> +	if (!oldbuf)
> +		return -ENOMEM;
> +
> +	memcpy(oldbuf, &tmpbuf, sizeof(*oldbuf));
> +	buf = (char *)oldbuf + sizeof(*oldbuf);
> +	len = atomic_read(&oldbuf->datalen);
> +	start = atomic_read(&oldbuf->start);
> +	off = zone->off + sizeof(*oldbuf);
> +
> +	/* get part of data */
> +	rcnt = info->read(buf, len - start, off + start);
> +	if (rcnt != len - start) {
> +		pr_err("read zone %s failed\n", zone->name);
> +		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		goto free_oldbuf;
> +	}
> +
> +	/* get the rest of data */
> +	rcnt = info->read(buf + len - start, start, off);
> +	if (rcnt != start) {
> +		pr_err("read zone %s failed\n", zone->name);
> +		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		goto free_oldbuf;
> +	}
> +
> +	zone->oldbuf = oldbuf;
> +	psz_flush_dirty_zone(zone);
> +	return 0;
> +
> +free_oldbuf:
> +	kfree(oldbuf);
> +	return ret;
> +}
> +
>  /**
>   * psz_recovery() - recover data from storage
>   * @cxt: the context of pstore/zone
> @@ -432,6 +542,10 @@ static inline int psz_recovery(struct psz_context *cxt)
>  	if (ret)
>  		goto recover_fail;
>  
> +	ret = psz_recover_zone(cxt, cxt->ppsz);
> +	if (ret)
> +		goto recover_fail;
> +
>  	pr_debug("recover end!\n");
>  	atomic_set(&cxt->recovered, 1);
>  	return 0;
> @@ -446,9 +560,17 @@ static int psz_pstore_open(struct pstore_info *psi)
>  	struct psz_context *cxt = psi->data;
>  
>  	cxt->oops_read_cnt = 0;
> +	cxt->pmsg_read_cnt = 0;
>  	return 0;
>  }
>  
> +static inline bool psz_old_ok(struct pstore_zone *zone)
> +{
> +	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
> +		return true;
> +	return false;
> +}
> +
>  static inline bool psz_ok(struct pstore_zone *zone)
>  {
>  	if (zone && zone->buffer && buffer_datalen(zone))
> @@ -473,6 +595,25 @@ static inline int psz_oops_erase(struct psz_context *cxt,
>  	return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>  }
>  
> +static inline int psz_record_erase(struct psz_context *cxt,
> +		struct pstore_zone *zone)
> +{
> +	if (unlikely(!psz_old_ok(zone)))
> +		return 0;
> +
> +	kfree(zone->oldbuf);
> +	zone->oldbuf = NULL;
> +	/*
> +	 * if there are new data in zone buffer, that means the old data
> +	 * are already invalid. It is no need to flush 0 (erase) to
> +	 * block device.
> +	 */
> +	if (!buffer_datalen(zone))
> +		return psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +	psz_flush_dirty_zone(zone);
> +	return 0;
> +}
> +
>  static int psz_pstore_erase(struct pstore_record *record)
>  {
>  	struct psz_context *cxt = record->psi->data;
> @@ -482,6 +623,8 @@ static int psz_pstore_erase(struct pstore_record *record)
>  		if (record->id >= cxt->oops_max_cnt)
>  			return -EINVAL;
>  		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
> +	case PSTORE_TYPE_PMSG:
> +		return psz_record_erase(cxt, cxt->ppsz);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -502,8 +645,10 @@ static void psz_write_kmsg_hdr(struct pstore_zone *zone,
>  	hdr->reason = record->reason;
>  	if (hdr->reason == KMSG_DUMP_OOPS)
>  		hdr->counter = ++cxt->oops_counter;
> -	else
> +	else if (hdr->reason == KMSG_DUMP_PANIC)
>  		hdr->counter = ++cxt->panic_counter;
> +	else
> +		hdr->counter = 0;
>  }
>  
>  static inline int notrace psz_oops_write_record(struct psz_context *cxt,
> @@ -553,6 +698,53 @@ static int notrace psz_oops_write(struct psz_context *cxt,

I think we should also try to flush pmsg zone if it's dirty in case of panic
and lost data.

@@ -690,8 +690,9 @@ static int notrace psz_oops_write(struct psz_context
*cxt,

        ret = psz_oops_write_record(cxt, record);
        if (!ret) {
-               pr_debug("try to flush other dirty oops zones\n");
+               pr_debug("try to flush other dirty zones\n");
                psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+               psz_flush_dirty_zone(cxt->ppsz);
        }

        /* always return 0 as we had handled it on buffer */

>  	return 0;
>  }
>  
> +static int notrace psz_record_write(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	size_t start, rem;
> +	int cnt = record->size;
> +	bool is_full_data = false;
> +	char *buf = record->buf;
> +
> +	if (!zone || !record)
> +		return -ENOSPC;
> +
> +	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
> +		is_full_data = true;
> +
> +	if (unlikely(cnt > zone->buffer_size)) {
> +		buf += cnt - zone->buffer_size;
> +		cnt = zone->buffer_size;
> +	}
> +
> +	start = buffer_start(zone);
> +	rem = zone->buffer_size - start;
> +	if (unlikely(rem < cnt)) {
> +		psz_zone_write(zone, FLUSH_PART, buf, rem, start);
> +		buf += rem;
> +		cnt -= rem;
> +		start = 0;
> +		is_full_data = true;
> +	}
> +
> +	atomic_set(&zone->buffer->start, cnt + start);
> +	psz_zone_write(zone, FLUSH_PART, buf, cnt, start);
> +
> +	/**
> +	 * psz_zone_write will set datalen as start + cnt.
> +	 * It work if actual data length lesser than buffer size.
> +	 * If data length greater than buffer size, pmsg will rewrite to
> +	 * beginning of zone, which make buffer->datalen wrongly.
> +	 * So we should reset datalen as buffer size once actual data length
> +	 * greater than buffer size.
> +	 */
> +	if (is_full_data) {
> +		atomic_set(&zone->buffer->datalen, zone->buffer_size);
> +		psz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +	}
> +	return 0;
> +}
> +
>  static int notrace psz_pstore_write(struct pstore_record *record)
>  {
>  	struct psz_context *cxt = record->psi->data;
> @@ -564,6 +756,8 @@ static int notrace psz_pstore_write(struct pstore_record *record)
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
>  		return psz_oops_write(cxt, record);
> +	case PSTORE_TYPE_PMSG:
> +		return psz_record_write(cxt->ppsz, record);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -579,6 +773,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
>  			return zone;
>  	}
>  
> +	if (cxt->pmsg_read_cnt == 0) {
> +		cxt->pmsg_read_cnt++;
> +		zone = cxt->ppsz;
> +		if (psz_old_ok(zone))
> +			return zone;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -629,7 +830,7 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
>  			return -ENOMEM;
>  	}
>  
> -	size = psz_zone_read(zone, record->buf + hlen, size,
> +	size = psz_zone_read_buffer(zone, record->buf + hlen, size,
>  			sizeof(struct psz_oops_header) < 0);
>  	if (unlikely(size < 0)) {
>  		kfree(record->buf);
> @@ -639,6 +840,32 @@ static ssize_t psz_oops_read(struct pstore_zone *zone,
>  	return size + hlen;
>  }
>  
> +static ssize_t psz_record_read(struct pstore_zone *zone,
> +		struct pstore_record *record)
> +{
> +	size_t len;
> +	struct psz_buffer *buf;
> +
> +	if (!zone || !record)
> +		return -ENOSPC;
> +
> +	buf = (struct psz_buffer *)zone->oldbuf;
> +	if (!buf)
> +		return -ENOMSG;
> +
> +	len = atomic_read(&buf->datalen);
> +	record->buf = kmalloc(len, GFP_KERNEL);
> +	if (!record->buf)
> +		return -ENOMEM;
> +
> +	if (unlikely(psz_zone_read_oldbuf(zone, record->buf, len, 0))) {
> +		kfree(record->buf);
> +		return -ENOMSG;
> +	}
> +
> +	return len;
> +}
> +
>  static ssize_t psz_pstore_read(struct pstore_record *record)
>  {
>  	struct psz_context *cxt = record->psi->data;
> @@ -663,6 +890,9 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
>  		readop = psz_oops_read;
>  		record->id = cxt->oops_read_cnt - 1;
>  		break;
> +	case PSTORE_TYPE_PMSG:
> +		readop = psz_record_read;
> +		break;
>  	default:
>  		goto next_zone;
>  	}
> @@ -718,8 +948,10 @@ static struct pstore_zone *psz_init_zone(enum pstore_type_id type,
>  	zone->type = type;
>  	zone->buffer_size = size - sizeof(struct psz_buffer);
>  	zone->buffer->sig = type ^ PSZ_SIG;
> +	zone->oldbuf = NULL;
>  	atomic_set(&zone->dirty, 0);
>  	atomic_set(&zone->buffer->datalen, 0);
> +	atomic_set(&zone->buffer->start, 0);
>  
>  	*off += size;
>  
> @@ -803,6 +1035,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
>  {
>  	if (cxt->opszs)
>  		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
> +	if (cxt->ppsz)
> +		psz_free_zone(&cxt->ppsz);
>  }
>  
>  static int psz_alloc_zones(struct psz_context *cxt)
> @@ -810,18 +1044,26 @@ static int psz_alloc_zones(struct psz_context *cxt)
>  	struct pstore_zone_info *info = cxt->pstore_zone_info;
>  	loff_t off = 0;
>  	int err;
> -	size_t size;
> +	size_t off_size = 0;
>  
> -	size = info->total_size;
> -	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size,
> +	off_size += info->pmsg_size;
> +	cxt->ppsz = psz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
> +	if (IS_ERR(cxt->ppsz)) {
> +		err = PTR_ERR(cxt->ppsz);
> +		goto free_out;
> +	}
> +
> +	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
> +			info->total_size - off_size,
>  			info->kmsg_size, &cxt->oops_max_cnt);
>  	if (IS_ERR(cxt->opszs)) {
>  		err = PTR_ERR(cxt->opszs);
> -		goto fail_out;
> +		goto free_out;
>  	}
>  
>  	return 0;
> -fail_out:
> +free_out:
> +	psz_free_all_zones(cxt);
>  	return err;
>  }
>  
> @@ -844,7 +1086,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  		return -EINVAL;
>  	}
>  
> -	if (!info->kmsg_size) {
> +	if (!info->kmsg_size && !info->pmsg_size) {
>  		pr_warn("at least one of the records be non-zero\n");
>  		return -EINVAL;
>  	}
> @@ -871,6 +1113,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  
>  	check_size(total_size, 4096);
>  	check_size(kmsg_size, SECTOR_SIZE);
> +	check_size(pmsg_size, SECTOR_SIZE);
>  
>  #undef check_size
>  
> @@ -897,6 +1140,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  	pr_debug("register %s with properties:\n", info->name);
>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>  	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
> +	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
>  
>  	err = psz_alloc_zones(cxt);
>  	if (err) {
> @@ -925,6 +1169,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  			pr_cont(",panic_write");
>  		pr_cont(")");
>  	}
> +	if (info->pmsg_size) {
> +		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
> +		pr_cont(" pmsg");
> +	}
>  	pr_cont("\n");
>  
>  	err = pstore_register(&cxt->pstore);
> diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
> index a6a79ff1351b..39c2cb944123 100644
> --- a/include/linux/pstore_zone.h
> +++ b/include/linux/pstore_zone.h
> @@ -17,6 +17,7 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
>   * @kmsg_size:	The size of oops/panic zone. Zero means disabled, otherwise,
>   *		it must be multiple of SECTOR_SIZE(512 Bytes).
>   * @max_reason: Maximum kmsg dump reason to store.
> + * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
>   * @read:	The general read operation. Both of the function parameters
>   *		@size and @offset are relative value to storage.
>   *		On success, the number of bytes should be returned, others
> @@ -33,6 +34,7 @@ struct pstore_zone_info {
>  	unsigned long total_size;
>  	unsigned long kmsg_size;
>  	int max_reason;
> +	unsigned long pmsg_size;
>  	psz_read_op read;
>  	psz_write_op write;
>  	psz_write_op panic_write;
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 06/12] pstore/blk: Add console frontend support
  2020-05-08  6:39   ` Kees Cook
@ 2020-05-09  4:53     ` WeiXiong Liao
  -1 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  4:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> Support backend for console. To enable console backend, just make
> console_size be greater than 0 and a multiple of 4096.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-6-git-send-email-liaoweixiong@allwinnertech.com
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig           | 12 +++++++
>  fs/pstore/blk.c             | 12 ++++++-
>  fs/pstore/zone.c            | 67 +++++++++++++++++++++++++++++++++++--
>  include/linux/pstore_zone.h |  4 ++-
>  4 files changed, 90 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index f18cd126d83f..f1484f751c5e 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -236,3 +236,15 @@ config PSTORE_BLK_PMSG_SIZE
>  
>  	  NOTE that, both Kconfig and module parameters can configure
>  	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_CONSOLE_SIZE
> +	int "Size in Kbytes of console to store"
> +	depends on PSTORE_BLK
> +	depends on PSTORE_CONSOLE
> +	default 64
> +	help
> +	  This just sets size of console (console_size) for pstore/blk. The
> +	  size is in KB and must be a multiple of 4.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> index 401e5ba66a5f..813025ea7edd 100644
> --- a/fs/pstore/blk.c
> +++ b/fs/pstore/blk.c
> @@ -32,6 +32,14 @@ static long pmsg_size = -1;
>  module_param(pmsg_size, long, 0400);
>  MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
>  
> +#if IS_ENABLED(CONFIG_PSTORE_CONSOLE)
> +static long console_size = CONFIG_PSTORE_BLK_CONSOLE_SIZE;
> +#else
> +static long console_size = -1;
> +#endif
> +module_param(console_size, long, 0400);
> +MODULE_PARM_DESC(console_size, "console size in kbytes");
> +
>  /*
>   * blkdev - The block device to use.
>   *
> @@ -83,7 +91,8 @@ static struct bdev_info {
>   *		whole disk).
>   *		On success, the number of bytes should be returned, others
>   *		means error.
> - * @write:	The same as @read.
> + * @write:	The same as @read, but the following error number:
> + *		-EBUSY means try to write again later.
>   * @panic_write:The write operation only used for panic case. It's optional
>   *		if you do not care panic log. The parameters and return value
>   *		are the same as @read.
> @@ -133,6 +142,7 @@ static int psblk_register_do(struct psblk_device *dev)
>  
>  	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
>  	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
> +	verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
>  #undef verify_size
>  
>  	pstore_zone_info->total_size = dev->total_size;
> diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
> index f472b06a6c14..0b952eea39fe 100644
> --- a/fs/pstore/zone.c
> +++ b/fs/pstore/zone.c
> @@ -87,10 +87,12 @@ struct pstore_zone {
>   *
>   * @opszs: oops/panic storage zones
>   * @ppsz: pmsg storage zone
> + * @cpsz: console storage zone
>   * @oops_max_cnt: max count of @opszs
>   * @oops_read_cnt: counter to read oops zone
>   * @oops_write_cnt: counter to write
>   * @pmsg_read_cnt: counter to read pmsg zone
> + * @console_read_cnt: counter to read console zone
>   * @oops_counter: counter to oops
>   * @panic_counter: counter to panic
>   * @recovered: whether finish recovering data from storage
> @@ -102,10 +104,12 @@ struct pstore_zone {
>  struct psz_context {
>  	struct pstore_zone **opszs;
>  	struct pstore_zone *ppsz;
> +	struct pstore_zone *cpsz;
>  	unsigned int oops_max_cnt;
>  	unsigned int oops_read_cnt;
>  	unsigned int oops_write_cnt;
>  	unsigned int pmsg_read_cnt;
> +	unsigned int console_read_cnt;
>  	/*
>  	 * the counter should be recovered when recover.
>  	 * It records the oops/panic times after burning rather than booting.
> @@ -125,6 +129,9 @@ struct psz_context {
>  };
>  static struct psz_context psz_cxt;
>  
> +static void psz_flush_all_dirty_zones(struct work_struct *);
> +static DECLARE_WORK(psz_cleaner, psz_flush_all_dirty_zones);

I think it's better to use delayed work.

	static DECLARE_DELAYED_WORK(psz_cleaner, psz_flush_all_dirty_zones);

> +
>  /**
>   * enum psz_flush_mode - flush mode for psz_zone_write()
>   *
> @@ -235,6 +242,9 @@ static int psz_zone_write(struct pstore_zone *zone,
>  	return 0;
>  dirty:
>  	atomic_set(&zone->dirty, true);
> +	/* flush dirty zones nicely */
> +	if (wcnt == -EBUSY && !is_on_panic())
> +		schedule_work(&psz_cleaner);

Change to:
	
	schedule_delayed_work(&psz_cleaner, msecs_to_jiffies(500));

delay for 500ms to merge more log of console and reduce calling times.

>  	return -EBUSY;
>  }
>  
> @@ -291,6 +301,15 @@ static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
>  	return 0;
>  }
>  
> +static void psz_flush_all_dirty_zones(struct work_struct *work)
> +{
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	psz_flush_dirty_zone(cxt->ppsz);
> +	psz_flush_dirty_zone(cxt->cpsz);
> +	psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);


If flush dirty failed, I think it should try again later.

	int ret = 0;
	
	ret |= psz_flush_dirty_zone(cxt->ppsz);
	ret |= psz_flush_dirty_zone(cxt->cpsz);
	ret |= psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
	if (ret)
		schedule_delayed_work(&psz_cleaner, msecs_to_jiffies(1000));

And add this diff:

@@ -714,10 +717,10 @@ static int notrace psz_oops_write(struct
psz_context *cxt,
                return -ENOSPC;

        ret = psz_oops_write_record(cxt, record);
-       if (!ret) {
+       if (!ret && is_on_panic()) {
+               /* ensure all data are flushed to storage when panic */
                pr_debug("try to flush other dirty zones\n");
-               psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
-               psz_flush_dirty_zone(cxt->ppsz);
+               psz_flush_all_dirty_zones(NULL);
        }

        /* always return 0 as we had handled it on buffer */

We should flush only when panic since all the dirty zones will be flushed by
delayed_work after this patch.

> +}
> +>  static int psz_recover_oops_data(struct psz_context *cxt)
>  {
>  	struct pstore_zone_info *info = cxt->pstore_zone_info;
> @@ -546,6 +565,10 @@ static inline int psz_recovery(struct psz_context *cxt)
>  	if (ret)
>  		goto recover_fail;
>  
> +	ret = psz_recover_zone(cxt, cxt->cpsz);
> +	if (ret)
> +		goto recover_fail;
> +
>  	pr_debug("recover end!\n");
>  	atomic_set(&cxt->recovered, 1);
>  	return 0;
> @@ -561,6 +584,7 @@ static int psz_pstore_open(struct pstore_info *psi)
>  
>  	cxt->oops_read_cnt = 0;
>  	cxt->pmsg_read_cnt = 0;
> +	cxt->console_read_cnt = 0;
>  	return 0;
>  }
>  
> @@ -625,8 +649,9 @@ static int psz_pstore_erase(struct pstore_record *record)
>  		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
>  	case PSTORE_TYPE_PMSG:
>  		return psz_record_erase(cxt, cxt->ppsz);
> -	default:
> -		return -EINVAL;
> +	case PSTORE_TYPE_CONSOLE:
> +		return psz_record_erase(cxt, cxt->cpsz);
> +	default: return -EINVAL;
>  	}
>  }
>  
> @@ -753,9 +778,18 @@ static int notrace psz_pstore_write(struct pstore_record *record)
>  			record->reason == KMSG_DUMP_PANIC)
>  		atomic_set(&cxt->on_panic, 1);
>  
> +	/*
> +	 * if on panic, do not write except panic records
> +	 * Fix case that panic_write prints log which wakes up console backend.
> +	 */
> +	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
> +		return -EBUSY;
> +
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
>  		return psz_oops_write(cxt, record);
> +	case PSTORE_TYPE_CONSOLE:
> +		return psz_record_write(cxt->cpsz, record);
>  	case PSTORE_TYPE_PMSG:
>  		return psz_record_write(cxt->ppsz, record);
>  	default:
> @@ -780,6 +814,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
>  			return zone;
>  	}
>  
> +	if (cxt->console_read_cnt == 0) {
> +		cxt->console_read_cnt++;
> +		zone = cxt->cpsz;
> +		if (psz_old_ok(zone))
> +			return zone;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -890,6 +931,8 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
>  		readop = psz_oops_read;
>  		record->id = cxt->oops_read_cnt - 1;
>  		break;
> +	case PSTORE_TYPE_CONSOLE:
> +		fallthrough;
>  	case PSTORE_TYPE_PMSG:
>  		readop = psz_record_read;
>  		break;
> @@ -1037,6 +1080,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
>  		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
>  	if (cxt->ppsz)
>  		psz_free_zone(&cxt->ppsz);
> +	if (cxt->cpsz)
> +		psz_free_zone(&cxt->cpsz);
>  }
>  
>  static int psz_alloc_zones(struct psz_context *cxt)
> @@ -1053,6 +1098,14 @@ static int psz_alloc_zones(struct psz_context *cxt)
>  		goto free_out;
>  	}
>  
> +	off_size += info->console_size;
> +	cxt->cpsz = psz_init_zone(PSTORE_TYPE_CONSOLE, &off,
> +			info->console_size);
> +	if (IS_ERR(cxt->cpsz)) {
> +		err = PTR_ERR(cxt->cpsz);
> +		goto free_out;
> +	}
> +
>  	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
>  			info->total_size - off_size,
>  			info->kmsg_size, &cxt->oops_max_cnt);
> @@ -1086,7 +1139,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  		return -EINVAL;
>  	}
>  
> -	if (!info->kmsg_size && !info->pmsg_size) {
> +	if (!info->kmsg_size && !info->pmsg_size && !info->console_size) {
>  		pr_warn("at least one of the records be non-zero\n");
>  		return -EINVAL;
>  	}
> @@ -1114,6 +1167,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  	check_size(total_size, 4096);
>  	check_size(kmsg_size, SECTOR_SIZE);
>  	check_size(pmsg_size, SECTOR_SIZE);
> +	check_size(console_size, SECTOR_SIZE);
>  
>  #undef check_size
>  
> @@ -1141,6 +1195,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>  	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
>  	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
> +	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
>  
>  	err = psz_alloc_zones(cxt);
>  	if (err) {
> @@ -1173,6 +1228,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
>  		pr_cont(" pmsg");
>  	}
> +	if (info->console_size) {
> +		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
> +		pr_cont(" console");
> +	}
>  	pr_cont("\n");
>  
>  	err = pstore_register(&cxt->pstore);
> @@ -1204,6 +1263,8 @@ void unregister_pstore_zone(struct pstore_zone_info *info)
>  {
>  	struct psz_context *cxt = &psz_cxt;
>  
> +	flush_work(&psz_cleaner);
> +


I think it should try to flush dirty zones before unregister in case of
lost data.

	psz_flush_all_dirty_zones(NULL);
	flush_delayed_work(&psz_cleaner);

>  	pstore_unregister(&cxt->pstore);
>  	kfree(cxt->pstore.buf);
>  	cxt->pstore.bufsize = 0;
> diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
> index 39c2cb944123..da294e6d7661 100644
> --- a/include/linux/pstore_zone.h
> +++ b/include/linux/pstore_zone.h
> @@ -18,11 +18,12 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
>   *		it must be multiple of SECTOR_SIZE(512 Bytes).
>   * @max_reason: Maximum kmsg dump reason to store.
>   * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
> + * @console_size:The size of console zone which is the same as @kmsg_size.
>   * @read:	The general read operation. Both of the function parameters
>   *		@size and @offset are relative value to storage.
>   *		On success, the number of bytes should be returned, others
>   *		means error.
> - * @write:	The same as @read.
> + * @write:	The same as @read, but -EBUSY means try to write again later.
>   * @panic_write:The write operation only used for panic case. It's optional
>   *		if you do not care panic log. The parameters and return value
>   *		are the same as @read.
> @@ -35,6 +36,7 @@ struct pstore_zone_info {
>  	unsigned long kmsg_size;
>  	int max_reason;
>  	unsigned long pmsg_size;
> +	unsigned long console_size;
>  	psz_read_op read;
>  	psz_write_op write;
>  	psz_write_op panic_write;
> 

-- 
WeiXiong Liao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 06/12] pstore/blk: Add console frontend support
@ 2020-05-09  4:53     ` WeiXiong Liao
  0 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  4:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

hi Kees Cook,

On 2020/5/8 PM 2:39, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> Support backend for console. To enable console backend, just make
> console_size be greater than 0 and a multiple of 4096.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-6-git-send-email-liaoweixiong@allwinnertech.com
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/pstore/Kconfig           | 12 +++++++
>  fs/pstore/blk.c             | 12 ++++++-
>  fs/pstore/zone.c            | 67 +++++++++++++++++++++++++++++++++++--
>  include/linux/pstore_zone.h |  4 ++-
>  4 files changed, 90 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index f18cd126d83f..f1484f751c5e 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -236,3 +236,15 @@ config PSTORE_BLK_PMSG_SIZE
>  
>  	  NOTE that, both Kconfig and module parameters can configure
>  	  pstore/blk, but module parameters have priority over Kconfig.
> +
> +config PSTORE_BLK_CONSOLE_SIZE
> +	int "Size in Kbytes of console to store"
> +	depends on PSTORE_BLK
> +	depends on PSTORE_CONSOLE
> +	default 64
> +	help
> +	  This just sets size of console (console_size) for pstore/blk. The
> +	  size is in KB and must be a multiple of 4.
> +
> +	  NOTE that, both Kconfig and module parameters can configure
> +	  pstore/blk, but module parameters have priority over Kconfig.
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> index 401e5ba66a5f..813025ea7edd 100644
> --- a/fs/pstore/blk.c
> +++ b/fs/pstore/blk.c
> @@ -32,6 +32,14 @@ static long pmsg_size = -1;
>  module_param(pmsg_size, long, 0400);
>  MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
>  
> +#if IS_ENABLED(CONFIG_PSTORE_CONSOLE)
> +static long console_size = CONFIG_PSTORE_BLK_CONSOLE_SIZE;
> +#else
> +static long console_size = -1;
> +#endif
> +module_param(console_size, long, 0400);
> +MODULE_PARM_DESC(console_size, "console size in kbytes");
> +
>  /*
>   * blkdev - The block device to use.
>   *
> @@ -83,7 +91,8 @@ static struct bdev_info {
>   *		whole disk).
>   *		On success, the number of bytes should be returned, others
>   *		means error.
> - * @write:	The same as @read.
> + * @write:	The same as @read, but the following error number:
> + *		-EBUSY means try to write again later.
>   * @panic_write:The write operation only used for panic case. It's optional
>   *		if you do not care panic log. The parameters and return value
>   *		are the same as @read.
> @@ -133,6 +142,7 @@ static int psblk_register_do(struct psblk_device *dev)
>  
>  	verify_size(kmsg_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
>  	verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
> +	verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
>  #undef verify_size
>  
>  	pstore_zone_info->total_size = dev->total_size;
> diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c
> index f472b06a6c14..0b952eea39fe 100644
> --- a/fs/pstore/zone.c
> +++ b/fs/pstore/zone.c
> @@ -87,10 +87,12 @@ struct pstore_zone {
>   *
>   * @opszs: oops/panic storage zones
>   * @ppsz: pmsg storage zone
> + * @cpsz: console storage zone
>   * @oops_max_cnt: max count of @opszs
>   * @oops_read_cnt: counter to read oops zone
>   * @oops_write_cnt: counter to write
>   * @pmsg_read_cnt: counter to read pmsg zone
> + * @console_read_cnt: counter to read console zone
>   * @oops_counter: counter to oops
>   * @panic_counter: counter to panic
>   * @recovered: whether finish recovering data from storage
> @@ -102,10 +104,12 @@ struct pstore_zone {
>  struct psz_context {
>  	struct pstore_zone **opszs;
>  	struct pstore_zone *ppsz;
> +	struct pstore_zone *cpsz;
>  	unsigned int oops_max_cnt;
>  	unsigned int oops_read_cnt;
>  	unsigned int oops_write_cnt;
>  	unsigned int pmsg_read_cnt;
> +	unsigned int console_read_cnt;
>  	/*
>  	 * the counter should be recovered when recover.
>  	 * It records the oops/panic times after burning rather than booting.
> @@ -125,6 +129,9 @@ struct psz_context {
>  };
>  static struct psz_context psz_cxt;
>  
> +static void psz_flush_all_dirty_zones(struct work_struct *);
> +static DECLARE_WORK(psz_cleaner, psz_flush_all_dirty_zones);

I think it's better to use delayed work.

	static DECLARE_DELAYED_WORK(psz_cleaner, psz_flush_all_dirty_zones);

> +
>  /**
>   * enum psz_flush_mode - flush mode for psz_zone_write()
>   *
> @@ -235,6 +242,9 @@ static int psz_zone_write(struct pstore_zone *zone,
>  	return 0;
>  dirty:
>  	atomic_set(&zone->dirty, true);
> +	/* flush dirty zones nicely */
> +	if (wcnt == -EBUSY && !is_on_panic())
> +		schedule_work(&psz_cleaner);

Change to:
	
	schedule_delayed_work(&psz_cleaner, msecs_to_jiffies(500));

delay for 500ms to merge more log of console and reduce calling times.

>  	return -EBUSY;
>  }
>  
> @@ -291,6 +301,15 @@ static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new)
>  	return 0;
>  }
>  
> +static void psz_flush_all_dirty_zones(struct work_struct *work)
> +{
> +	struct psz_context *cxt = &psz_cxt;
> +
> +	psz_flush_dirty_zone(cxt->ppsz);
> +	psz_flush_dirty_zone(cxt->cpsz);
> +	psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);


If flush dirty failed, I think it should try again later.

	int ret = 0;
	
	ret |= psz_flush_dirty_zone(cxt->ppsz);
	ret |= psz_flush_dirty_zone(cxt->cpsz);
	ret |= psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
	if (ret)
		schedule_delayed_work(&psz_cleaner, msecs_to_jiffies(1000));

And add this diff:

@@ -714,10 +717,10 @@ static int notrace psz_oops_write(struct
psz_context *cxt,
                return -ENOSPC;

        ret = psz_oops_write_record(cxt, record);
-       if (!ret) {
+       if (!ret && is_on_panic()) {
+               /* ensure all data are flushed to storage when panic */
                pr_debug("try to flush other dirty zones\n");
-               psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
-               psz_flush_dirty_zone(cxt->ppsz);
+               psz_flush_all_dirty_zones(NULL);
        }

        /* always return 0 as we had handled it on buffer */

We should flush only when panic since all the dirty zones will be flushed by
delayed_work after this patch.

> +}
> +>  static int psz_recover_oops_data(struct psz_context *cxt)
>  {
>  	struct pstore_zone_info *info = cxt->pstore_zone_info;
> @@ -546,6 +565,10 @@ static inline int psz_recovery(struct psz_context *cxt)
>  	if (ret)
>  		goto recover_fail;
>  
> +	ret = psz_recover_zone(cxt, cxt->cpsz);
> +	if (ret)
> +		goto recover_fail;
> +
>  	pr_debug("recover end!\n");
>  	atomic_set(&cxt->recovered, 1);
>  	return 0;
> @@ -561,6 +584,7 @@ static int psz_pstore_open(struct pstore_info *psi)
>  
>  	cxt->oops_read_cnt = 0;
>  	cxt->pmsg_read_cnt = 0;
> +	cxt->console_read_cnt = 0;
>  	return 0;
>  }
>  
> @@ -625,8 +649,9 @@ static int psz_pstore_erase(struct pstore_record *record)
>  		return psz_oops_erase(cxt, cxt->opszs[record->id], record);
>  	case PSTORE_TYPE_PMSG:
>  		return psz_record_erase(cxt, cxt->ppsz);
> -	default:
> -		return -EINVAL;
> +	case PSTORE_TYPE_CONSOLE:
> +		return psz_record_erase(cxt, cxt->cpsz);
> +	default: return -EINVAL;
>  	}
>  }
>  
> @@ -753,9 +778,18 @@ static int notrace psz_pstore_write(struct pstore_record *record)
>  			record->reason == KMSG_DUMP_PANIC)
>  		atomic_set(&cxt->on_panic, 1);
>  
> +	/*
> +	 * if on panic, do not write except panic records
> +	 * Fix case that panic_write prints log which wakes up console backend.
> +	 */
> +	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
> +		return -EBUSY;
> +
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
>  		return psz_oops_write(cxt, record);
> +	case PSTORE_TYPE_CONSOLE:
> +		return psz_record_write(cxt->cpsz, record);
>  	case PSTORE_TYPE_PMSG:
>  		return psz_record_write(cxt->ppsz, record);
>  	default:
> @@ -780,6 +814,13 @@ static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt)
>  			return zone;
>  	}
>  
> +	if (cxt->console_read_cnt == 0) {
> +		cxt->console_read_cnt++;
> +		zone = cxt->cpsz;
> +		if (psz_old_ok(zone))
> +			return zone;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -890,6 +931,8 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
>  		readop = psz_oops_read;
>  		record->id = cxt->oops_read_cnt - 1;
>  		break;
> +	case PSTORE_TYPE_CONSOLE:
> +		fallthrough;
>  	case PSTORE_TYPE_PMSG:
>  		readop = psz_record_read;
>  		break;
> @@ -1037,6 +1080,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
>  		psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
>  	if (cxt->ppsz)
>  		psz_free_zone(&cxt->ppsz);
> +	if (cxt->cpsz)
> +		psz_free_zone(&cxt->cpsz);
>  }
>  
>  static int psz_alloc_zones(struct psz_context *cxt)
> @@ -1053,6 +1098,14 @@ static int psz_alloc_zones(struct psz_context *cxt)
>  		goto free_out;
>  	}
>  
> +	off_size += info->console_size;
> +	cxt->cpsz = psz_init_zone(PSTORE_TYPE_CONSOLE, &off,
> +			info->console_size);
> +	if (IS_ERR(cxt->cpsz)) {
> +		err = PTR_ERR(cxt->cpsz);
> +		goto free_out;
> +	}
> +
>  	cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
>  			info->total_size - off_size,
>  			info->kmsg_size, &cxt->oops_max_cnt);
> @@ -1086,7 +1139,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  		return -EINVAL;
>  	}
>  
> -	if (!info->kmsg_size && !info->pmsg_size) {
> +	if (!info->kmsg_size && !info->pmsg_size && !info->console_size) {
>  		pr_warn("at least one of the records be non-zero\n");
>  		return -EINVAL;
>  	}
> @@ -1114,6 +1167,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  	check_size(total_size, 4096);
>  	check_size(kmsg_size, SECTOR_SIZE);
>  	check_size(pmsg_size, SECTOR_SIZE);
> +	check_size(console_size, SECTOR_SIZE);
>  
>  #undef check_size
>  
> @@ -1141,6 +1195,7 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>  	pr_debug("\toops size : %ld Bytes\n", info->kmsg_size);
>  	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
> +	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
>  
>  	err = psz_alloc_zones(cxt);
>  	if (err) {
> @@ -1173,6 +1228,10 @@ int register_pstore_zone(struct pstore_zone_info *info)
>  		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
>  		pr_cont(" pmsg");
>  	}
> +	if (info->console_size) {
> +		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
> +		pr_cont(" console");
> +	}
>  	pr_cont("\n");
>  
>  	err = pstore_register(&cxt->pstore);
> @@ -1204,6 +1263,8 @@ void unregister_pstore_zone(struct pstore_zone_info *info)
>  {
>  	struct psz_context *cxt = &psz_cxt;
>  
> +	flush_work(&psz_cleaner);
> +


I think it should try to flush dirty zones before unregister in case of
lost data.

	psz_flush_all_dirty_zones(NULL);
	flush_delayed_work(&psz_cleaner);

>  	pstore_unregister(&cxt->pstore);
>  	kfree(cxt->pstore.buf);
>  	cxt->pstore.bufsize = 0;
> diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
> index 39c2cb944123..da294e6d7661 100644
> --- a/include/linux/pstore_zone.h
> +++ b/include/linux/pstore_zone.h
> @@ -18,11 +18,12 @@ typedef ssize_t (*psz_write_op)(const char *, size_t, loff_t);
>   *		it must be multiple of SECTOR_SIZE(512 Bytes).
>   * @max_reason: Maximum kmsg dump reason to store.
>   * @pmsg_size:	The size of pmsg zone which is the same as @kmsg_size.
> + * @console_size:The size of console zone which is the same as @kmsg_size.
>   * @read:	The general read operation. Both of the function parameters
>   *		@size and @offset are relative value to storage.
>   *		On success, the number of bytes should be returned, others
>   *		means error.
> - * @write:	The same as @read.
> + * @write:	The same as @read, but -EBUSY means try to write again later.
>   * @panic_write:The write operation only used for panic case. It's optional
>   *		if you do not care panic log. The parameters and return value
>   *		are the same as @read.
> @@ -35,6 +36,7 @@ struct pstore_zone_info {
>  	unsigned long kmsg_size;
>  	int max_reason;
>  	unsigned long pmsg_size;
> +	unsigned long console_size;
>  	psz_read_op read;
>  	psz_write_op write;
>  	psz_write_op panic_write;
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 12/12] mtd: Support kmsg dumper based on pstore/blk
  2020-05-08  6:40   ` Kees Cook
@ 2020-05-09  5:14     ` WeiXiong Liao
  -1 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  5:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

hi Kees Cook,

On 2020/5/8 PM 2:40, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> This introduces mtdpstore, which is similar to mtdoops but more
> powerful. It uses pstore/blk, and aims to store panic and oops logs to
> a flash partition, where pstore can later read back and present as files
> in the mounted pstore filesystem.
> 
> To make mtdpstore work, the "blkdev" of pstore/blk should be set
> as MTD device name or MTD device number. For more details, see
> Documentation/admin-guide/pstore-blk.rst
> 
> This solves a number of issues:
> - Work duplication: both of pstore and mtdoops do the same job storing
>   panic/oops log. They have very similar logic, registering to kmsg
>   dumper and storing logs to several chunks one by one.
> - Layer violations: drivers should provides methods instead of polices.
>   MTD should provide read/write/erase operations, and allow a higher
>   level drivers to provide the chunk management, kmsg dump
>   configuration, etc.
> - Missing features: pstore provides many additional features, including
>   presenting the logs as files, logging dump time and count, and
>   supporting other frontends like pmsg, console, etc.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-12-git-send-email-liaoweixiong@allwinnertech.com
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  Documentation/admin-guide/pstore-blk.rst |   9 +-
>  drivers/mtd/Kconfig                      |  10 +
>  drivers/mtd/Makefile                     |   1 +
>  drivers/mtd/mtdpstore.c                  | 564 +++++++++++++++++++++++
>  fs/pstore/platform.c                     |  22 +-
>  5 files changed, 583 insertions(+), 23 deletions(-)
>  create mode 100644 drivers/mtd/mtdpstore.c
> 
> diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
> index 2f3602397715..bf0b5a227e24 100644
> --- a/Documentation/admin-guide/pstore-blk.rst
> +++ b/Documentation/admin-guide/pstore-blk.rst
> @@ -43,9 +43,9 @@ blkdev
>  ~~~~~~
>  
>  The block device to use. Most of the time, it is a partition of block device.
> -It's required for pstore/blk.
> +It's required for pstore/blk. It is also used for MTD device.
>  
> -It accepts the following variants:
> +It accepts the following variants for block device:
>  
>  1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
>     leading 0x, for example b302.
> @@ -64,6 +64,11 @@ It accepts the following variants:
>     partition with a known unique id.
>  #. <major>:<minor> major and minor number of the device separated by a colon.
>  
> +It accepts the following variants for MTD device:
> +
> +1. <device name> MTD device name. "pstore" is recommended.
> +#. <device number> MTD device number.
> +
>  kmsg_size
>  ~~~~~~~~~
>  
> diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
> index 42d401ea60ee..6ddab796216d 100644
> --- a/drivers/mtd/Kconfig
> +++ b/drivers/mtd/Kconfig
> @@ -170,6 +170,16 @@ config MTD_OOPS
>  	  buffer in a flash partition where it can be read back at some
>  	  later point.
>  
> +config MTD_PSTORE
> +	tristate "Log panic/oops to an MTD buffer based on pstore"
> +	depends on PSTORE_BLK
> +	help
> +	  This enables panic and oops messages to be logged to a circular
> +	  buffer in a flash partition where it can be read back as files after
> +	  mounting pstore filesystem.
> +
> +	  If unsure, say N.
> +
>  config MTD_SWAP
>  	tristate "Swap on MTD device support"
>  	depends on MTD && SWAP
> diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
> index 56cc60ccc477..593d0593a038 100644
> --- a/drivers/mtd/Makefile
> +++ b/drivers/mtd/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
>  obj-$(CONFIG_SSFDC)		+= ssfdc.o
>  obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
>  obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
> +obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
>  obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
>  
>  nftl-objs		:= nftlcore.o nftlmount.o
> diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
> new file mode 100644
> index 000000000000..50c8fc746f39
> --- /dev/null
> +++ b/drivers/mtd/mtdpstore.c
> @@ -0,0 +1,564 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define dev_fmt(fmt) "mtdoops-pstore: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/pstore_blk.h>
> +#include <linux/mtd/mtd.h>
> +#include <linux/bitops.h>
> +
> +static struct mtdpstore_context {
> +	int index;
> +	struct pstore_blk_info info;
> +	struct psblk_device dev;
> +	struct mtd_info *mtd;
> +	unsigned long *rmmap;		/* removed bit map */
> +	unsigned long *usedmap;		/* used bit map */
> +	/*
> +	 * used for panic write
> +	 * As there are no block_isbad for panic case, we should keep this
> +	 * status before panic to ensure panic_write not failed.
> +	 */
> +	unsigned long *badmap;		/* bad block bit map */
> +} oops_cxt;
> +
> +static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret;
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	ret = mtd_block_isbad(mtd, off);
> +	if (ret < 0) {
> +		dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
> +		return ret;
> +	} else if (ret > 0) {
> +		set_bit(blknum, cxt->badmap);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	return test_bit(blknum, cxt->badmap);
> +}
> +
> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
> +	set_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
> +	clear_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
> +		clear_bit(zonenum, cxt->usedmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	return test_bit(zonenum, cxt->usedmap);
> +}
> +
> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->usedmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
> +		size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t sz;
> +	int i;
> +
> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
> +	for (i = 0; i < sz; i++) {
> +		if (buf[i] != (char)0xFF)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
> +	set_bit(zonenum, cxt->rmmap);
> +}
> +
> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		clear_bit(zonenum, cxt->rmmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->rmmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	struct erase_info erase;
> +	int ret;
> +
> +	dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
> +	erase.len = cxt->mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(cxt->mtd, &erase);
> +	if (!ret)
> +		mtdpstore_block_clear_removed(cxt, off);
> +	else
> +		dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
> +		       (unsigned long long)erase.addr,
> +		       (unsigned long long)erase.len, cxt->info.device);
> +	return ret;
> +}
> +
> +/*
> + * called while removing file
> + *
> + * Avoiding over erasing, do erase block only when the whole block is unused.
> + * If the block contains valid log, do erase lazily on flush_removed() when
> + * unregister.
> + */
> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -EIO;
> +
> +	mtdpstore_mark_unused(cxt, off);
> +
> +	/* If the block still has valid data, mtdpstore do erase lazily */
> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
> +		mtdpstore_mark_removed(cxt, off);
> +		return 0;
> +	}
> +
> +	/* all zones are unused, erase it */
> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
> +	return mtdpstore_erase_do(cxt, off);
> +}
> +
> +/*
> + * What is security for mtdpstore?
> + * As there is no erase for panic case, we should ensure at least one zone
> + * is writable. Otherwise, panic write will fail.
> + * If zone is used, write operation will return -ENOMSG, which means that
> + * pstore/blk will try one by one until gets an empty zone. So, it is not
> + * needed to ensure the next zone is empty, but at least one.
> + */
> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret = 0, i;
> +	struct mtd_info *mtd = cxt->mtd;
> +	u32 zonenum = (u32)div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->info.kmsg_size);
> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
> +	u32 erasesize = cxt->mtd->erasesize;
> +
> +	for (i = 0; i < zonecnt; i++) {
> +		u32 num = (zonenum + i) % zonecnt;
> +
> +		/* found empty zone */
> +		if (!test_bit(num, cxt->usedmap))
> +			return 0;
> +	}
> +
> +	/* If there is no any empty zone, we have no way but to do erase */
> +	off = ALIGN_DOWN(off, erasesize);
> +	while (blkcnt--) {
> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
> +
> +		if (mtdpstore_block_isbad(cxt, off))
> +			continue;
> +
> +		ret = mtdpstore_erase_do(cxt, off);
> +		if (!ret) {
> +			mtdpstore_block_mark_unused(cxt, off);
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		dev_err(&mtd->dev, "all blocks bad!\n");
> +	dev_dbg(&mtd->dev, "end security\n");
> +	return ret;
> +}
> +
> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENOMSG;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENOMSG;
> +
> +	dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || retlen != size) {
> +		dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static inline bool mtdpstore_is_io_error(int ret)
> +{
> +	return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
> +}
> +
> +/*
> + * All zones will be read as pstore/blk will read zone one by one when do
> + * recover.
> + */
> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen, done;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENOMSG;
> +
> +	dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
> +	for (done = 0, retlen = 0; done < size; done += retlen) {
> +		retlen = 0;
> +
> +		ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
> +				(u_char *)buf + done);
> +		if (mtdpstore_is_io_error(ret)) {
> +			dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
> +					off + done, retlen, size - done, ret);
> +			/* the zone may be broken, try next one */
> +			return -ENOMSG;
> +		}
> +
> +		/*
> +		 * ECC error. The impact on log data is so small. Maybe we can
> +		 * still read it and try to understand. So mtdpstore just hands
> +		 * over what it gets and user can judge whether the data is
> +		 * valid or not.
> +		 */
> +		if (mtd_is_eccerr(ret)) {
> +			dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
> +					off + done, retlen, size - done, ret);
> +			/* driver may not set retlen when ecc error */
> +			retlen = retlen == 0 ? size - done : retlen;
> +		}
> +	}
> +
> +	if (mtdpstore_is_empty(cxt, buf, size))
> +		mtdpstore_mark_unused(cxt, off);
> +	else
> +		mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_panic_block_isbad(cxt, off))
> +		return -ENOMSG;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENOMSG;
> +
> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || size != retlen) {
> +		dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	return retlen;
> +}
> +
> +static void mtdpstore_notify_add(struct mtd_info *mtd)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct pstore_blk_info *info = &cxt->info;
> +	unsigned long longcnt;
> +
> +	if (!strcmp(mtd->name, info->device))
> +		cxt->index = mtd->index;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
> +
> +	if (mtd->size < info->kmsg_size * 2) {
> +		dev_err(&mtd->dev, "MTD partition %d not big enough\n",
> +				mtd->index);
> +		return;
> +	}
> +	/*
> +	 * kmsg_size must be aligned to 4096 Bytes, which is limited by
> +	 * psblk. The default value of kmsg_size is 64KB. If kmsg_size
> +	 * is larger than erasesize, some errors will occur since mtdpsotre
> +	 * is designed on it.
> +	 */
> +	if (mtd->erasesize < info->kmsg_size) {
> +		dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
> +				mtd->index);
> +		return;
> +	}
> +	if (unlikely(info->kmsg_size % mtd->writesize)) {
> +		dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
> +				info->kmsg_size / 1024,
> +				mtd->writesize / 1024);
> +		return;
> +	}
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->kmsg_size));
> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	cxt->dev.total_size = mtd->size;
> +	/* just support dmesg right now */
> +	cxt->dev.flags = PSTORE_FLAGS_DMESG;
> +	cxt->dev.read = mtdpstore_read;
> +	cxt->dev.write = mtdpstore_write;
> +	cxt->dev.erase = mtdpstore_erase;
> +	cxt->dev.panic_write = mtdpstore_panic_write;
> +
> +	ret = psblk_register_device(&cxt->dev);
> +	if (ret) {
> +		dev_err(&mtd->dev, "mtd%d register to psblk failed\n",
> +				mtd->index);
> +		return;
> +	}
> +	cxt->mtd = mtd;
> +	dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
> +}
> +
> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
> +		loff_t off, size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u_char *buf;
> +	int ret;
> +	size_t retlen;
> +	struct erase_info erase;
> +
> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	/* 1st. read to cache */
> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
> +	if (mtdpstore_is_io_error(ret))
> +		goto free;
> +
> +	/* 2nd. erase block */
> +	erase.len = mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(mtd, &erase);
> +	if (ret)
> +		goto free;
> +
> +	/* 3rd. write back */
> +	while (size) {
> +		unsigned int zonesize = cxt->info.kmsg_size;
> +
> +		/* there is valid data on block, write back */
> +		if (mtdpstore_is_used(cxt, off)) {
> +			ret = mtd_write(mtd, off, zonesize, &retlen, buf);
> +			if (ret)
> +				dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
> +						off, retlen, zonesize, ret);
> +		}
> +
> +		off += zonesize;
> +		size -= min_t(unsigned int, zonesize, size);
> +	}
> +
> +free:
> +	kfree(buf);
> +	return ret;
> +}
> +
> +/*
> + * What does mtdpstore_flush_removed() do?
> + * When user remove any log file on pstore filesystem, mtdpstore should do
> + * something to ensure log file removed. If the whole block is no longer used,
> + * it's nice to erase the block. However if the block still contains valid log,
> + * what mtdpstore can do is to erase and write the valid log back.
> + */
> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	int ret;
> +	loff_t off;
> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
> +
> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
> +		ret = mtdpstore_block_isbad(cxt, off);
> +		if (ret)
> +			continue;
> +
> +		ret = mtdpstore_block_is_removed(cxt, off);
> +		if (!ret)
> +			continue;
> +
> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	mtdpstore_flush_removed(cxt);
> +
> +	psblk_unregister_device(&cxt->dev);
> +	kfree(cxt->badmap);
> +	kfree(cxt->usedmap);
> +	kfree(cxt->rmmap);
> +	cxt->mtd = NULL;
> +	cxt->index = -1;
> +}
> +
> +static struct mtd_notifier mtdpstore_notifier = {
> +	.add	= mtdpstore_notify_add,
> +	.remove	= mtdpstore_notify_remove,
> +};
> +
> +static int __init mtdpstore_init(void)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	struct pstore_blk_info *info = &cxt->info;
> +
> +	ret = pstore_blk_usr_info(info);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (strlen(info->device) == 0) {
> +		dev_err(&mtd->dev, "mtd device must be supplied\n");

we should not use dev_err here since mtd is NULL right now.

	pr_err("mtd device must be supplied\n");

> +		return -EINVAL;
> +	}
> +	if (!info->kmsg_size) {
> +		dev_err(&mtd->dev, "no backend enabled\n");

The same as above.

> +		return -EINVAL;
> +	}
> +
> +	/* Setup the MTD device to use */
> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
> +	if (ret)
> +		cxt->index = -1;
> +
> +	register_mtd_user(&mtdpstore_notifier);
> +	return 0;
> +}
> +module_init(mtdpstore_init);
> +
> +static void __exit mtdpstore_exit(void)
> +{
> +	unregister_mtd_user(&mtdpstore_notifier);
> +}
> +module_exit(mtdpstore_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("MTD backend for pstore/blk");
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> index b882919b8784..4fb8ec9f3a1c 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c

How about move following changes to patch 1?

> @@ -130,26 +130,6 @@ enum pstore_type_id pstore_name_to_type(const char *name)
>  }
>  EXPORT_SYMBOL_GPL(pstore_name_to_type);
>  
> -static const char *get_reason_str(enum kmsg_dump_reason reason)
> -{
> -	switch (reason) {
> -	case KMSG_DUMP_PANIC:
> -		return "Panic";
> -	case KMSG_DUMP_OOPS:
> -		return "Oops";
> -	case KMSG_DUMP_EMERG:
> -		return "Emergency";
> -	case KMSG_DUMP_RESTART:
> -		return "Restart";
> -	case KMSG_DUMP_HALT:
> -		return "Halt";
> -	case KMSG_DUMP_POWEROFF:
> -		return "Poweroff";
> -	default:
> -		return "Unknown";
> -	}
> -}
> -
>  static void pstore_timer_kick(void)
>  {
>  	if (pstore_update_ms < 0)
> @@ -402,7 +382,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>  	unsigned int	part = 1;
>  	int		ret;
>  
> -	why = get_reason_str(reason);
> +	why = kmsg_dump_reason_str(reason);
>  
>  	if (down_trylock(&psinfo->buf_lock)) {
>  		/* Failed to acquire lock: give up if we cannot wait. */
> 

-- 
WeiXiong Liao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 12/12] mtd: Support kmsg dumper based on pstore/blk
@ 2020-05-09  5:14     ` WeiXiong Liao
  0 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09  5:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

hi Kees Cook,

On 2020/5/8 PM 2:40, Kees Cook wrote:
> From: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> This introduces mtdpstore, which is similar to mtdoops but more
> powerful. It uses pstore/blk, and aims to store panic and oops logs to
> a flash partition, where pstore can later read back and present as files
> in the mounted pstore filesystem.
> 
> To make mtdpstore work, the "blkdev" of pstore/blk should be set
> as MTD device name or MTD device number. For more details, see
> Documentation/admin-guide/pstore-blk.rst
> 
> This solves a number of issues:
> - Work duplication: both of pstore and mtdoops do the same job storing
>   panic/oops log. They have very similar logic, registering to kmsg
>   dumper and storing logs to several chunks one by one.
> - Layer violations: drivers should provides methods instead of polices.
>   MTD should provide read/write/erase operations, and allow a higher
>   level drivers to provide the chunk management, kmsg dump
>   configuration, etc.
> - Missing features: pstore provides many additional features, including
>   presenting the logs as files, logging dump time and count, and
>   supporting other frontends like pmsg, console, etc.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Link: https://lore.kernel.org/r/1585126506-18635-12-git-send-email-liaoweixiong@allwinnertech.com
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  Documentation/admin-guide/pstore-blk.rst |   9 +-
>  drivers/mtd/Kconfig                      |  10 +
>  drivers/mtd/Makefile                     |   1 +
>  drivers/mtd/mtdpstore.c                  | 564 +++++++++++++++++++++++
>  fs/pstore/platform.c                     |  22 +-
>  5 files changed, 583 insertions(+), 23 deletions(-)
>  create mode 100644 drivers/mtd/mtdpstore.c
> 
> diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
> index 2f3602397715..bf0b5a227e24 100644
> --- a/Documentation/admin-guide/pstore-blk.rst
> +++ b/Documentation/admin-guide/pstore-blk.rst
> @@ -43,9 +43,9 @@ blkdev
>  ~~~~~~
>  
>  The block device to use. Most of the time, it is a partition of block device.
> -It's required for pstore/blk.
> +It's required for pstore/blk. It is also used for MTD device.
>  
> -It accepts the following variants:
> +It accepts the following variants for block device:
>  
>  1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
>     leading 0x, for example b302.
> @@ -64,6 +64,11 @@ It accepts the following variants:
>     partition with a known unique id.
>  #. <major>:<minor> major and minor number of the device separated by a colon.
>  
> +It accepts the following variants for MTD device:
> +
> +1. <device name> MTD device name. "pstore" is recommended.
> +#. <device number> MTD device number.
> +
>  kmsg_size
>  ~~~~~~~~~
>  
> diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
> index 42d401ea60ee..6ddab796216d 100644
> --- a/drivers/mtd/Kconfig
> +++ b/drivers/mtd/Kconfig
> @@ -170,6 +170,16 @@ config MTD_OOPS
>  	  buffer in a flash partition where it can be read back at some
>  	  later point.
>  
> +config MTD_PSTORE
> +	tristate "Log panic/oops to an MTD buffer based on pstore"
> +	depends on PSTORE_BLK
> +	help
> +	  This enables panic and oops messages to be logged to a circular
> +	  buffer in a flash partition where it can be read back as files after
> +	  mounting pstore filesystem.
> +
> +	  If unsure, say N.
> +
>  config MTD_SWAP
>  	tristate "Swap on MTD device support"
>  	depends on MTD && SWAP
> diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
> index 56cc60ccc477..593d0593a038 100644
> --- a/drivers/mtd/Makefile
> +++ b/drivers/mtd/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
>  obj-$(CONFIG_SSFDC)		+= ssfdc.o
>  obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
>  obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
> +obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
>  obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
>  
>  nftl-objs		:= nftlcore.o nftlmount.o
> diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
> new file mode 100644
> index 000000000000..50c8fc746f39
> --- /dev/null
> +++ b/drivers/mtd/mtdpstore.c
> @@ -0,0 +1,564 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define dev_fmt(fmt) "mtdoops-pstore: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/pstore_blk.h>
> +#include <linux/mtd/mtd.h>
> +#include <linux/bitops.h>
> +
> +static struct mtdpstore_context {
> +	int index;
> +	struct pstore_blk_info info;
> +	struct psblk_device dev;
> +	struct mtd_info *mtd;
> +	unsigned long *rmmap;		/* removed bit map */
> +	unsigned long *usedmap;		/* used bit map */
> +	/*
> +	 * used for panic write
> +	 * As there are no block_isbad for panic case, we should keep this
> +	 * status before panic to ensure panic_write not failed.
> +	 */
> +	unsigned long *badmap;		/* bad block bit map */
> +} oops_cxt;
> +
> +static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret;
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	ret = mtd_block_isbad(mtd, off);
> +	if (ret < 0) {
> +		dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
> +		return ret;
> +	} else if (ret > 0) {
> +		set_bit(blknum, cxt->badmap);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	return test_bit(blknum, cxt->badmap);
> +}
> +
> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
> +	set_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
> +	clear_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
> +		clear_bit(zonenum, cxt->usedmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	return test_bit(zonenum, cxt->usedmap);
> +}
> +
> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->usedmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
> +		size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t sz;
> +	int i;
> +
> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
> +	for (i = 0; i < sz; i++) {
> +		if (buf[i] != (char)0xFF)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
> +	set_bit(zonenum, cxt->rmmap);
> +}
> +
> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		clear_bit(zonenum, cxt->rmmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->info.kmsg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->rmmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	struct erase_info erase;
> +	int ret;
> +
> +	dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
> +	erase.len = cxt->mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(cxt->mtd, &erase);
> +	if (!ret)
> +		mtdpstore_block_clear_removed(cxt, off);
> +	else
> +		dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
> +		       (unsigned long long)erase.addr,
> +		       (unsigned long long)erase.len, cxt->info.device);
> +	return ret;
> +}
> +
> +/*
> + * called while removing file
> + *
> + * Avoiding over erasing, do erase block only when the whole block is unused.
> + * If the block contains valid log, do erase lazily on flush_removed() when
> + * unregister.
> + */
> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -EIO;
> +
> +	mtdpstore_mark_unused(cxt, off);
> +
> +	/* If the block still has valid data, mtdpstore do erase lazily */
> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
> +		mtdpstore_mark_removed(cxt, off);
> +		return 0;
> +	}
> +
> +	/* all zones are unused, erase it */
> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
> +	return mtdpstore_erase_do(cxt, off);
> +}
> +
> +/*
> + * What is security for mtdpstore?
> + * As there is no erase for panic case, we should ensure at least one zone
> + * is writable. Otherwise, panic write will fail.
> + * If zone is used, write operation will return -ENOMSG, which means that
> + * pstore/blk will try one by one until gets an empty zone. So, it is not
> + * needed to ensure the next zone is empty, but at least one.
> + */
> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret = 0, i;
> +	struct mtd_info *mtd = cxt->mtd;
> +	u32 zonenum = (u32)div_u64(off, cxt->info.kmsg_size);
> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->info.kmsg_size);
> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
> +	u32 erasesize = cxt->mtd->erasesize;
> +
> +	for (i = 0; i < zonecnt; i++) {
> +		u32 num = (zonenum + i) % zonecnt;
> +
> +		/* found empty zone */
> +		if (!test_bit(num, cxt->usedmap))
> +			return 0;
> +	}
> +
> +	/* If there is no any empty zone, we have no way but to do erase */
> +	off = ALIGN_DOWN(off, erasesize);
> +	while (blkcnt--) {
> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
> +
> +		if (mtdpstore_block_isbad(cxt, off))
> +			continue;
> +
> +		ret = mtdpstore_erase_do(cxt, off);
> +		if (!ret) {
> +			mtdpstore_block_mark_unused(cxt, off);
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		dev_err(&mtd->dev, "all blocks bad!\n");
> +	dev_dbg(&mtd->dev, "end security\n");
> +	return ret;
> +}
> +
> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENOMSG;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENOMSG;
> +
> +	dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || retlen != size) {
> +		dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static inline bool mtdpstore_is_io_error(int ret)
> +{
> +	return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
> +}
> +
> +/*
> + * All zones will be read as pstore/blk will read zone one by one when do
> + * recover.
> + */
> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen, done;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENOMSG;
> +
> +	dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
> +	for (done = 0, retlen = 0; done < size; done += retlen) {
> +		retlen = 0;
> +
> +		ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
> +				(u_char *)buf + done);
> +		if (mtdpstore_is_io_error(ret)) {
> +			dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
> +					off + done, retlen, size - done, ret);
> +			/* the zone may be broken, try next one */
> +			return -ENOMSG;
> +		}
> +
> +		/*
> +		 * ECC error. The impact on log data is so small. Maybe we can
> +		 * still read it and try to understand. So mtdpstore just hands
> +		 * over what it gets and user can judge whether the data is
> +		 * valid or not.
> +		 */
> +		if (mtd_is_eccerr(ret)) {
> +			dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
> +					off + done, retlen, size - done, ret);
> +			/* driver may not set retlen when ecc error */
> +			retlen = retlen == 0 ? size - done : retlen;
> +		}
> +	}
> +
> +	if (mtdpstore_is_empty(cxt, buf, size))
> +		mtdpstore_mark_unused(cxt, off);
> +	else
> +		mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_panic_block_isbad(cxt, off))
> +		return -ENOMSG;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENOMSG;
> +
> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || size != retlen) {
> +		dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	return retlen;
> +}
> +
> +static void mtdpstore_notify_add(struct mtd_info *mtd)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct pstore_blk_info *info = &cxt->info;
> +	unsigned long longcnt;
> +
> +	if (!strcmp(mtd->name, info->device))
> +		cxt->index = mtd->index;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
> +
> +	if (mtd->size < info->kmsg_size * 2) {
> +		dev_err(&mtd->dev, "MTD partition %d not big enough\n",
> +				mtd->index);
> +		return;
> +	}
> +	/*
> +	 * kmsg_size must be aligned to 4096 Bytes, which is limited by
> +	 * psblk. The default value of kmsg_size is 64KB. If kmsg_size
> +	 * is larger than erasesize, some errors will occur since mtdpsotre
> +	 * is designed on it.
> +	 */
> +	if (mtd->erasesize < info->kmsg_size) {
> +		dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
> +				mtd->index);
> +		return;
> +	}
> +	if (unlikely(info->kmsg_size % mtd->writesize)) {
> +		dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
> +				info->kmsg_size / 1024,
> +				mtd->writesize / 1024);
> +		return;
> +	}
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->kmsg_size));
> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	cxt->dev.total_size = mtd->size;
> +	/* just support dmesg right now */
> +	cxt->dev.flags = PSTORE_FLAGS_DMESG;
> +	cxt->dev.read = mtdpstore_read;
> +	cxt->dev.write = mtdpstore_write;
> +	cxt->dev.erase = mtdpstore_erase;
> +	cxt->dev.panic_write = mtdpstore_panic_write;
> +
> +	ret = psblk_register_device(&cxt->dev);
> +	if (ret) {
> +		dev_err(&mtd->dev, "mtd%d register to psblk failed\n",
> +				mtd->index);
> +		return;
> +	}
> +	cxt->mtd = mtd;
> +	dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
> +}
> +
> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
> +		loff_t off, size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u_char *buf;
> +	int ret;
> +	size_t retlen;
> +	struct erase_info erase;
> +
> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	/* 1st. read to cache */
> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
> +	if (mtdpstore_is_io_error(ret))
> +		goto free;
> +
> +	/* 2nd. erase block */
> +	erase.len = mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(mtd, &erase);
> +	if (ret)
> +		goto free;
> +
> +	/* 3rd. write back */
> +	while (size) {
> +		unsigned int zonesize = cxt->info.kmsg_size;
> +
> +		/* there is valid data on block, write back */
> +		if (mtdpstore_is_used(cxt, off)) {
> +			ret = mtd_write(mtd, off, zonesize, &retlen, buf);
> +			if (ret)
> +				dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
> +						off, retlen, zonesize, ret);
> +		}
> +
> +		off += zonesize;
> +		size -= min_t(unsigned int, zonesize, size);
> +	}
> +
> +free:
> +	kfree(buf);
> +	return ret;
> +}
> +
> +/*
> + * What does mtdpstore_flush_removed() do?
> + * When user remove any log file on pstore filesystem, mtdpstore should do
> + * something to ensure log file removed. If the whole block is no longer used,
> + * it's nice to erase the block. However if the block still contains valid log,
> + * what mtdpstore can do is to erase and write the valid log back.
> + */
> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	int ret;
> +	loff_t off;
> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
> +
> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
> +		ret = mtdpstore_block_isbad(cxt, off);
> +		if (ret)
> +			continue;
> +
> +		ret = mtdpstore_block_is_removed(cxt, off);
> +		if (!ret)
> +			continue;
> +
> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	mtdpstore_flush_removed(cxt);
> +
> +	psblk_unregister_device(&cxt->dev);
> +	kfree(cxt->badmap);
> +	kfree(cxt->usedmap);
> +	kfree(cxt->rmmap);
> +	cxt->mtd = NULL;
> +	cxt->index = -1;
> +}
> +
> +static struct mtd_notifier mtdpstore_notifier = {
> +	.add	= mtdpstore_notify_add,
> +	.remove	= mtdpstore_notify_remove,
> +};
> +
> +static int __init mtdpstore_init(void)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	struct pstore_blk_info *info = &cxt->info;
> +
> +	ret = pstore_blk_usr_info(info);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (strlen(info->device) == 0) {
> +		dev_err(&mtd->dev, "mtd device must be supplied\n");

we should not use dev_err here since mtd is NULL right now.

	pr_err("mtd device must be supplied\n");

> +		return -EINVAL;
> +	}
> +	if (!info->kmsg_size) {
> +		dev_err(&mtd->dev, "no backend enabled\n");

The same as above.

> +		return -EINVAL;
> +	}
> +
> +	/* Setup the MTD device to use */
> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
> +	if (ret)
> +		cxt->index = -1;
> +
> +	register_mtd_user(&mtdpstore_notifier);
> +	return 0;
> +}
> +module_init(mtdpstore_init);
> +
> +static void __exit mtdpstore_exit(void)
> +{
> +	unregister_mtd_user(&mtdpstore_notifier);
> +}
> +module_exit(mtdpstore_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("MTD backend for pstore/blk");
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> index b882919b8784..4fb8ec9f3a1c 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c

How about move following changes to patch 1?

> @@ -130,26 +130,6 @@ enum pstore_type_id pstore_name_to_type(const char *name)
>  }
>  EXPORT_SYMBOL_GPL(pstore_name_to_type);
>  
> -static const char *get_reason_str(enum kmsg_dump_reason reason)
> -{
> -	switch (reason) {
> -	case KMSG_DUMP_PANIC:
> -		return "Panic";
> -	case KMSG_DUMP_OOPS:
> -		return "Oops";
> -	case KMSG_DUMP_EMERG:
> -		return "Emergency";
> -	case KMSG_DUMP_RESTART:
> -		return "Restart";
> -	case KMSG_DUMP_HALT:
> -		return "Halt";
> -	case KMSG_DUMP_POWEROFF:
> -		return "Poweroff";
> -	default:
> -		return "Unknown";
> -	}
> -}
> -
>  static void pstore_timer_kick(void)
>  {
>  	if (pstore_update_ms < 0)
> @@ -402,7 +382,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>  	unsigned int	part = 1;
>  	int		ret;
>  
> -	why = get_reason_str(reason);
> +	why = kmsg_dump_reason_str(reason);
>  
>  	if (down_trylock(&psinfo->buf_lock)) {
>  		/* Failed to acquire lock: give up if we cannot wait. */
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
  2020-05-08  7:27   ` Kees Cook
@ 2020-05-09 10:32     ` WeiXiong Liao
  -1 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09 10:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

hi Kees Cook,

On 2020/5/8 PM 3:27, Kees Cook wrote:
> On Thu, May 07, 2020 at 11:39:52PM -0700, Kees Cook wrote:
>> So far, I've identified the following stuff left to do:
>> [...]
>>         - implement ramoops-like probe feature for pstore/blk
> 
> With the following hack, I'm able to start testing the series:
> 
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> index a736555e1ed3..7145da079267 100644
> --- a/fs/pstore/blk.c
> +++ b/fs/pstore/blk.c
> @@ -373,12 +373,14 @@ int psblk_register_blkdev(unsigned int major, unsigned int flags,
>  	if (IS_ERR(binfo))
>  		return PTR_ERR(binfo);
>  
> +#if 0
>  	/* only allow driver matching the @blkdev */
>  	if (!binfo->devt || MAJOR(binfo->devt) != major) {
>  		pr_debug("invalid major %u (expect %u)\n",
>  				major, MAJOR(binfo->devt));
>  		return -ENODEV;
>  	}
> +#endif
>  
>  	/* hold bdev exclusively */
>  	bdev = psblk_get_bdev(holder);
> @@ -423,7 +425,7 @@ void psblk_unregister_blkdev(unsigned int major)
>  	struct psblk_device dev = {.read = psblk_generic_blk_read};
>  	void *holder = blkdev;
>  
> -	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
> +	if (psblk_bdev/* && MAJOR(psblk_bdev->bd_dev) == major*/) {
>  		psblk_unregister_device(&dev);
>  		psblk_put_bdev(psblk_bdev, holder);
>  		blkdev_panic_write = NULL;
> @@ -476,6 +478,24 @@ int pstore_blk_usr_info(struct pstore_blk_info *info)
>  }
>  EXPORT_SYMBOL_GPL(pstore_blk_usr_info);
>  
> +static int __init pstore_blk_init(void)
> +{
> +	int ret = 0;
> +
> +	if (blkdev[0])
> +		ret = psblk_register_blkdev(0, 0, NULL);
> +
> +	return ret;
> +}
> +postcore_initcall(pstore_blk_init);
> +
> +static void __exit pstore_blk_exit(void)
> +{
> +	psblk_unregister_blkdev(0);
> +}
> +module_exit(pstore_blk_exit);
> +
> +
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>  MODULE_DESCRIPTION("pstore backend for block devices");
> 
> 
> Then I can get things up and running with:
> 
> # insmod pstore.ko compress=off
> # insmod pstore_zone.ko
> # truncate pstore-blk.raw --size 100M
> # losetup -f --show pstore-blk.raw
> /dev/loop0
> # insmod pstore_blk.ko blkdev=/dev/loop0 kmsg_size=16 console_size=64
> 
> So far, I've hit a few bugs. The most obvious is that "rmmod" causes a
> fault, so I think locking and other things need to be fixed up further.
> After that, it looked like all the compressed files were failing to
> decompress, which implies some kind of buffer offset problem. When I
> loaded with pstore.compress=off I got readable logs, but there is a span
> of garbage between the header and the body in
> /sys/fs/pstore/dmesg-pstore-zone-1 etc.
> 

Both of the above two bugs have been fix on series v6.

The following diff is to fix "rmmod" bug.

@@ -1273,8 +1273,8 @@ static void psz_free_zones(struct pstore_zone
***pszones, unsigned int *cnt)
                return;

        while (*cnt > 0) {
-               psz_free_zone(&zones[*cnt]);
                (*cnt)--;
+               psz_free_zone(&zones[*cnt]);
        }
        kfree(zones);
        *pszones = NULL;

> Cool so far! It just needs a bit more testing a polish. :)
> 

-- 
WeiXiong Liao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
@ 2020-05-09 10:32     ` WeiXiong Liao
  0 siblings, 0 replies; 42+ messages in thread
From: WeiXiong Liao @ 2020-05-09 10:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

hi Kees Cook,

On 2020/5/8 PM 3:27, Kees Cook wrote:
> On Thu, May 07, 2020 at 11:39:52PM -0700, Kees Cook wrote:
>> So far, I've identified the following stuff left to do:
>> [...]
>>         - implement ramoops-like probe feature for pstore/blk
> 
> With the following hack, I'm able to start testing the series:
> 
> diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
> index a736555e1ed3..7145da079267 100644
> --- a/fs/pstore/blk.c
> +++ b/fs/pstore/blk.c
> @@ -373,12 +373,14 @@ int psblk_register_blkdev(unsigned int major, unsigned int flags,
>  	if (IS_ERR(binfo))
>  		return PTR_ERR(binfo);
>  
> +#if 0
>  	/* only allow driver matching the @blkdev */
>  	if (!binfo->devt || MAJOR(binfo->devt) != major) {
>  		pr_debug("invalid major %u (expect %u)\n",
>  				major, MAJOR(binfo->devt));
>  		return -ENODEV;
>  	}
> +#endif
>  
>  	/* hold bdev exclusively */
>  	bdev = psblk_get_bdev(holder);
> @@ -423,7 +425,7 @@ void psblk_unregister_blkdev(unsigned int major)
>  	struct psblk_device dev = {.read = psblk_generic_blk_read};
>  	void *holder = blkdev;
>  
> -	if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
> +	if (psblk_bdev/* && MAJOR(psblk_bdev->bd_dev) == major*/) {
>  		psblk_unregister_device(&dev);
>  		psblk_put_bdev(psblk_bdev, holder);
>  		blkdev_panic_write = NULL;
> @@ -476,6 +478,24 @@ int pstore_blk_usr_info(struct pstore_blk_info *info)
>  }
>  EXPORT_SYMBOL_GPL(pstore_blk_usr_info);
>  
> +static int __init pstore_blk_init(void)
> +{
> +	int ret = 0;
> +
> +	if (blkdev[0])
> +		ret = psblk_register_blkdev(0, 0, NULL);
> +
> +	return ret;
> +}
> +postcore_initcall(pstore_blk_init);
> +
> +static void __exit pstore_blk_exit(void)
> +{
> +	psblk_unregister_blkdev(0);
> +}
> +module_exit(pstore_blk_exit);
> +
> +
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>  MODULE_DESCRIPTION("pstore backend for block devices");
> 
> 
> Then I can get things up and running with:
> 
> # insmod pstore.ko compress=off
> # insmod pstore_zone.ko
> # truncate pstore-blk.raw --size 100M
> # losetup -f --show pstore-blk.raw
> /dev/loop0
> # insmod pstore_blk.ko blkdev=/dev/loop0 kmsg_size=16 console_size=64
> 
> So far, I've hit a few bugs. The most obvious is that "rmmod" causes a
> fault, so I think locking and other things need to be fixed up further.
> After that, it looked like all the compressed files were failing to
> decompress, which implies some kind of buffer offset problem. When I
> loaded with pstore.compress=off I got readable logs, but there is a span
> of garbage between the header and the body in
> /sys/fs/pstore/dmesg-pstore-zone-1 etc.
> 

Both of the above two bugs have been fix on series v6.

The following diff is to fix "rmmod" bug.

@@ -1273,8 +1273,8 @@ static void psz_free_zones(struct pstore_zone
***pszones, unsigned int *cnt)
                return;

        while (*cnt > 0) {
-               psz_free_zone(&zones[*cnt]);
                (*cnt)--;
+               psz_free_zone(&zones[*cnt]);
        }
        kfree(zones);
        *pszones = NULL;

> Cool so far! It just needs a bit more testing a polish. :)
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
  2020-05-09 10:32     ` WeiXiong Liao
@ 2020-05-09 19:10       ` Kees Cook
  -1 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-09 19:10 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, linux-doc, linux-mtd,
	linux-kernel

On Sat, May 09, 2020 at 06:32:28PM +0800, WeiXiong Liao wrote:
> The following diff is to fix "rmmod" bug.
> 
> @@ -1273,8 +1273,8 @@ static void psz_free_zones(struct pstore_zone
> ***pszones, unsigned int *cnt)
>                 return;
> 
>         while (*cnt > 0) {
> -               psz_free_zone(&zones[*cnt]);
>                 (*cnt)--;
> +               psz_free_zone(&zones[*cnt]);
>         }
>         kfree(zones);
>         *pszones = NULL;

Ah-ha! Thanks; I'd almost found that. I got confused because I wasn't
see NULL free()s, and I finally noticed that the zones had left over
ERR_PTRs:

        if (IS_ERR(cxt->fpszs)) {
                err = PTR_ERR(cxt->fpszs);
+               cxt->fpszs = NULL;
                goto free_out;
        }

I'll fix those and your v5 and my lastest tree merged.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device
@ 2020-05-09 19:10       ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2020-05-09 19:10 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Petr Mladek, Tony Luck, linux-doc, Anton Vorontsov, linux-kernel,
	Steven Rostedt, Sergey Senozhatsky, linux-mtd, Colin Cross

On Sat, May 09, 2020 at 06:32:28PM +0800, WeiXiong Liao wrote:
> The following diff is to fix "rmmod" bug.
> 
> @@ -1273,8 +1273,8 @@ static void psz_free_zones(struct pstore_zone
> ***pszones, unsigned int *cnt)
>                 return;
> 
>         while (*cnt > 0) {
> -               psz_free_zone(&zones[*cnt]);
>                 (*cnt)--;
> +               psz_free_zone(&zones[*cnt]);
>         }
>         kfree(zones);
>         *pszones = NULL;

Ah-ha! Thanks; I'd almost found that. I got confused because I wasn't
see NULL free()s, and I finally noticed that the zones had left over
ERR_PTRs:

        if (IS_ERR(cxt->fpszs)) {
                err = PTR_ERR(cxt->fpszs);
+               cxt->fpszs = NULL;
                goto free_out;
        }

I'll fix those and your v5 and my lastest tree merged.

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-05-09 19:11 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08  6:39 [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device Kees Cook
2020-05-08  6:39 ` Kees Cook
2020-05-08  6:39 ` [PATCH v4 01/12] printk: Introduce kmsg_dump_reason_str() Kees Cook
2020-05-08  6:39   ` Kees Cook
2020-05-08  6:39 ` [PATCH v4 02/12] pstore/zone: Introduce common layer to manage storage zones Kees Cook
2020-05-08  6:39   ` Kees Cook
2020-05-09  3:09   ` WeiXiong Liao
2020-05-09  3:09     ` WeiXiong Liao
2020-05-08  6:39 ` [PATCH v4 03/12] pstore/blk: Introduce backend for block devices Kees Cook
2020-05-08  6:39   ` Kees Cook
2020-05-09  3:48   ` WeiXiong Liao
2020-05-09  3:48     ` WeiXiong Liao
2020-05-08  6:39 ` [PATCH v4 04/12] pstore/blk: Provide way to choose pstore frontend support Kees Cook
2020-05-08  6:39   ` Kees Cook
2020-05-08  6:39 ` [PATCH v4 05/12] pstore/blk: Add support for pmsg frontend Kees Cook
2020-05-08  6:39   ` Kees Cook
2020-05-09  4:38   ` WeiXiong Liao
2020-05-09  4:38     ` WeiXiong Liao
2020-05-08  6:39 ` [PATCH v4 06/12] pstore/blk: Add console frontend support Kees Cook
2020-05-08  6:39   ` Kees Cook
2020-05-09  4:53   ` WeiXiong Liao
2020-05-09  4:53     ` WeiXiong Liao
2020-05-08  6:39 ` [PATCH v4 07/12] pstore/blk: Add ftrace " Kees Cook
2020-05-08  6:39   ` Kees Cook
2020-05-08  6:40 ` [PATCH v4 08/12] Documentation: Add details for pstore/blk Kees Cook
2020-05-08  6:40   ` Kees Cook
2020-05-08  6:40 ` [PATCH v4 09/12] pstore/zone: Provide way to skip "broken" zone for MTD devices Kees Cook
2020-05-08  6:40   ` Kees Cook
2020-05-08  6:40 ` [PATCH v4 10/12] pstore/blk: Provide way to query pstore configuration Kees Cook
2020-05-08  6:40   ` Kees Cook
2020-05-08  6:40 ` [PATCH v4 11/12] pstore/blk: Support non-block storage devices Kees Cook
2020-05-08  6:40   ` Kees Cook
2020-05-08  6:40 ` [PATCH v4 12/12] mtd: Support kmsg dumper based on pstore/blk Kees Cook
2020-05-08  6:40   ` Kees Cook
2020-05-09  5:14   ` WeiXiong Liao
2020-05-09  5:14     ` WeiXiong Liao
2020-05-08  7:27 ` [PATCH v4 00/12] pstore: mtd: support crash log to block and mtd device Kees Cook
2020-05-08  7:27   ` Kees Cook
2020-05-09 10:32   ` WeiXiong Liao
2020-05-09 10:32     ` WeiXiong Liao
2020-05-09 19:10     ` Kees Cook
2020-05-09 19:10       ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.