linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device
@ 2020-02-07 12:25 WeiXiong Liao
  2020-02-07 12:25 ` [PATCH v2 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
                   ` (10 more replies)
  0 siblings, 11 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

Why do we need to log to block (mtd) device?
1. Most embedded intelligent equipment have no persistent ram, which
   increases costs. We perfer to cheaper solutions, like block devices.
2. Do not any equipment have battery, which means that it lost all data
   on general ram if power failure. Pstore has little to do for these
   equipments.

Why do we need mtdpstore instead of mtdoops?
1. repetitive jobs between pstore and mtdoops
   Both of pstore and mtdoops do the same jobs that store panic/oops log.
2. do what a driver should do
   To me, a driver should provide methods instead of policies. What MTD
   should do is to provide read/write/erase operations, geting rid of codes
   about chunk management, kmsg dumper and configuration.
3. enhanced feature
   Not only store log, but also show it as files.
   Not only log, but also trigger time and trigger count.
   Not only panic/oops log, but also log recorder for pmsg, console and
   ftrace in the future.

Before upstream submission, pstore/blk is tested on arch ARM and x84_64,
block device and mtd device, built as modules and in kernel. Here are the
details:

	https://github.com/gmpy/articles/blob/master/pstore/Test-Pstore-Block.md

[PATCH v2]:
1. fix syntax error in documents. Thank Randy Dunlap <rdunlap@infradead.org>
2. replace pr_* with dev_* for mtdpstore.
   Thank Vignesh Raghavendra <vigneshr@ti.com>
3. improve mtdpstore. Thank Miquel Raynal <mraynal@kernel.org>
[PATCH v1]:
1. fix errors and warnings reported by kbuild test robot.

WeiXiong Liao (11):
  pstore/blk: new support logger for block devices
  blkoops: add blkoops, a warpper for pstore/blk
  pstore/blk: blkoops: support pmsg recorder
  pstore/blk: blkoops: support console recorder
  pstore/blk: blkoops: support ftrace recorder
  Documentation: pstore/blk: blkoops: create document for pstore_blk
  pstore/blk: skip broken zone for mtd device
  blkoops: respect for device to pick recorders
  pstore/blk: blkoops: support special removing jobs for dmesg.
  blkoops: add interface for dirver to get information of blkoops
  mtd: new support oops logger based on pstore/blk

 Documentation/admin-guide/pstore-block.rst |  306 +++++++
 MAINTAINERS                                |    3 +-
 drivers/mtd/Kconfig                        |   10 +
 drivers/mtd/Makefile                       |    1 +
 drivers/mtd/mtdpstore.c                    |  564 ++++++++++++
 fs/pstore/Kconfig                          |  109 +++
 fs/pstore/Makefile                         |    5 +
 fs/pstore/blkoops.c                        |  475 ++++++++++
 fs/pstore/blkzone.c                        | 1328 ++++++++++++++++++++++++++++
 include/linux/blkoops.h                    |   94 ++
 include/linux/pstore_blk.h                 |   91 ++
 11 files changed, 2985 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/admin-guide/pstore-block.rst
 create mode 100644 drivers/mtd/mtdpstore.c
 create mode 100644 fs/pstore/blkoops.c
 create mode 100644 fs/pstore/blkzone.c
 create mode 100644 include/linux/blkoops.h
 create mode 100644 include/linux/pstore_blk.h

-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-02-26  0:52   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

pstore/blk is similar to pstore/ram, but dump log to block devices
rather than persistent ram.

Why do we need pstore/blk?
1. Most embedded intelligent equipment have no persistent ram, which
increases costs. We perfer to cheaper solutions, like block devices.
2. Do not any equipment have battery, which means that it lost all data
on general ram if power failure. Pstore has little to do for these
equipments.

pstore/blk is one of series patches, and provides the zones management
of partition of block device or non-block device likes mtd devices. It
only supports dmesg recorder right now.

To make pstore/blk work, the block/non-block driver should calls
blkz_register() and call blkz_unregister() when exits. On other patches
of series, a better wrapper for pstore/blk, named blkoops, will be
there.

It's different with pstore/ram, pstore/blk relies on read/write APIs
from device driver, especially, write operation for panic record.

Recommend that, the block/non-block driver should register to pstore/blk
only after devices have registered to Linux and ready to work.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/Kconfig          |  10 +
 fs/pstore/Makefile         |   3 +
 fs/pstore/blkzone.c        | 948 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pstore_blk.h |  62 +++
 4 files changed, 1023 insertions(+)
 create mode 100644 fs/pstore/blkzone.c
 create mode 100644 include/linux/pstore_blk.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 8f0369aad22a..536fde9e13e8 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -153,3 +153,13 @@ config PSTORE_RAM
 	  "ramoops.ko".
 
 	  For more information, see Documentation/admin-guide/ramoops.rst.
+
+config PSTORE_BLK
+	tristate "Log panic/oops to a block device"
+	depends on PSTORE
+	depends on BLOCK
+	help
+	  This enables panic and oops message to be logged to a block dev
+	  where it can be read back at some later point.
+
+	  If unsure, say N.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 967b5891f325..0ee2fc8d1bfb 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
 
 ramoops-objs += ram.o ram_core.o
 obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
+
+obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
+pstore_blk-y += blkzone.o
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
new file mode 100644
index 000000000000..f77f612b50ba
--- /dev/null
+++ b/fs/pstore/blkzone.c
@@ -0,0 +1,948 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define MODNAME "pstore-blk"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/blkdev.h>
+#include <linux/pstore.h>
+#include <linux/mount.h>
+#include <linux/printk.h>
+#include <linux/fs.h>
+#include <linux/pstore_blk.h>
+#include <linux/kdev_t.h>
+#include <linux/device.h>
+#include <linux/namei.h>
+#include <linux/fcntl.h>
+#include <linux/uio.h>
+#include <linux/writeback.h>
+
+/**
+ * struct blkz_head - head of zone to flush to storage
+ *
+ * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
+ * @datalen: length of data in @data
+ * @data: zone data.
+ */
+struct blkz_buffer {
+#define BLK_SIG (0x43474244) /* DBGC */
+	uint32_t sig;
+	atomic_t datalen;
+	uint8_t data[];
+};
+
+/**
+ * struct blkz_dmesg_header: dmesg information
+ *
+ * @magic: magic num for dmesg header
+ * @time: trigger time
+ * @compressed: whether conpressed
+ * @count: oops/panic counter
+ * @reason: identify oops or panic
+ */
+struct blkz_dmesg_header {
+#define DMESG_HEADER_MAGIC 0x4dfc3ae5
+	uint32_t magic;
+	struct timespec64 time;
+	bool compressed;
+	uint32_t counter;
+	enum kmsg_dump_reason reason;
+	uint8_t data[0];
+};
+
+/**
+ * struct blkz_zone - zone information
+ * @off:
+ *	zone offset of block device
+ * @type:
+ *	frontent type for this zone
+ * @name:
+ *	frontent name for this zone
+ * @buffer:
+ *	pointer to data buffer managed by this zone
+ * @oldbuf:
+ *	pointer to old data buffer.
+ * @buffer_size:
+ *	bytes in @buffer->data
+ * @should_recover:
+ *	should recover from storage
+ * @dirty:
+ *	mark whether the data in @buffer are dirty (not flush to storage yet)
+ */
+struct blkz_zone {
+	unsigned long off;
+	const char *name;
+	enum pstore_type_id type;
+
+	struct blkz_buffer *buffer;
+	struct blkz_buffer *oldbuf;
+	size_t buffer_size;
+	bool should_recover;
+	atomic_t dirty;
+};
+
+struct blkz_context {
+	struct blkz_zone **dbzs;	/* dmesg block zones */
+	unsigned int dmesg_max_cnt;
+	unsigned int dmesg_read_cnt;
+	unsigned int dmesg_write_cnt;
+	/*
+	 * the counter should be recovered when recover.
+	 * It records the oops/panic times after burning rather than booting.
+	 */
+	unsigned int oops_counter;
+	unsigned int panic_counter;
+	atomic_t recovered;
+	atomic_t on_panic;
+
+	/*
+	 * bzinfo_lock just protects "bzinfo" during calls to
+	 * blkz_register/blkz_unregister
+	 */
+	spinlock_t bzinfo_lock;
+	struct blkz_info *bzinfo;
+	struct pstore_info pstore;
+};
+static struct blkz_context blkz_cxt;
+
+enum blkz_flush_mode {
+	FLUSH_NONE = 0,
+	FLUSH_PART,
+	FLUSH_META,
+	FLUSH_ALL,
+};
+
+static inline int buffer_datalen(struct blkz_zone *zone)
+{
+	return atomic_read(&zone->buffer->datalen);
+}
+
+static inline bool is_on_panic(void)
+{
+	struct blkz_context *cxt = &blkz_cxt;
+
+	return atomic_read(&cxt->on_panic);
+}
+
+static int blkz_zone_read(struct blkz_zone *zone, char *buf,
+		size_t len, unsigned long off)
+{
+	if (!buf || !zone->buffer)
+		return -EINVAL;
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	len = min_t(size_t, len, zone->buffer_size - off);
+	memcpy(buf, zone->buffer->data + off, len);
+	return 0;
+}
+
+static int blkz_zone_write(struct blkz_zone *zone,
+		enum blkz_flush_mode flush_mode, const char *buf,
+		size_t len, unsigned long off)
+{
+	struct blkz_info *info = blkz_cxt.bzinfo;
+	ssize_t wcnt = 0;
+	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
+	size_t wlen;
+
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	wlen = min_t(size_t, len, zone->buffer_size - off);
+	if (buf && wlen) {
+		memcpy(zone->buffer->data + off, buf, wlen);
+		atomic_set(&zone->buffer->datalen, wlen + off);
+	}
+
+	/* avoid to damage old records */
+	if (!is_on_panic() && !atomic_read(&blkz_cxt.recovered))
+		goto set_dirty;
+
+	writeop = is_on_panic() ? info->panic_write : info->write;
+	if (!writeop)
+		goto set_dirty;
+
+	switch (flush_mode) {
+	case FLUSH_NONE:
+		if (unlikely(buf && wlen))
+			goto set_dirty;
+		return 0;
+	case FLUSH_PART:
+		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
+				zone->off + sizeof(*zone->buffer) + off);
+		if (wcnt != wlen)
+			goto set_dirty;
+		/* fallthrough */
+	case FLUSH_META:
+		wlen = sizeof(struct blkz_buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto set_dirty;
+		break;
+	case FLUSH_ALL:
+		wlen = zone->buffer_size + sizeof(*zone->buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto set_dirty;
+		break;
+	}
+
+	return 0;
+set_dirty:
+	atomic_set(&zone->dirty, true);
+	return -EBUSY;
+}
+
+static int blkz_flush_dirty_zone(struct blkz_zone *zone)
+{
+	int ret;
+
+	if (!zone)
+		return -EINVAL;
+
+	if (!atomic_read(&zone->dirty))
+		return 0;
+
+	if (!atomic_read(&blkz_cxt.recovered))
+		return -EBUSY;
+
+	ret = blkz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
+	if (!ret)
+		atomic_set(&zone->dirty, false);
+	return ret;
+}
+
+static int blkz_flush_dirty_zones(struct blkz_zone **zones, unsigned int cnt)
+{
+	int i, ret;
+	struct blkz_zone *zone;
+
+	if (!zones)
+		return -EINVAL;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (!zone)
+			return -EINVAL;
+		ret = blkz_flush_dirty_zone(zone);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+/**
+ * blkz_move_zone: move data from a old zone to a new zone
+ *
+ * @old: the old zone
+ * @new: the new zone
+ *
+ * NOTE:
+ *	Call blkz_zone_write to copy and flush data. If it failed, we
+ *	should reset new->dirty, because the new zone not really dirty.
+ */
+static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
+{
+	const char *data = (const char *)old->buffer->data;
+	int ret;
+
+	ret = blkz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
+	if (ret) {
+		atomic_set(&new->buffer->datalen, 0);
+		atomic_set(&new->dirty, false);
+		return ret;
+	}
+	atomic_set(&old->buffer->datalen, 0);
+	return 0;
+}
+
+static int blkz_recover_dmesg_data(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	struct blkz_zone *zone = NULL;
+	struct blkz_buffer *buf;
+	unsigned long i;
+	ssize_t rcnt;
+
+	if (!info->read)
+		return -EINVAL;
+
+	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
+		zone = cxt->dbzs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+		if (atomic_read(&zone->dirty)) {
+			unsigned int wcnt = cxt->dmesg_write_cnt;
+			struct blkz_zone *new = cxt->dbzs[wcnt];
+			int ret;
+
+			ret = blkz_move_zone(zone, new);
+			if (ret) {
+				pr_err("move zone from %lu to %d failed\n",
+						i, wcnt);
+				return ret;
+			}
+			cxt->dmesg_write_cnt = (wcnt + 1) % cxt->dmesg_max_cnt;
+		}
+		if (!zone->should_recover)
+			continue;
+		buf = zone->buffer;
+		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
+				zone->off);
+		if (rcnt != zone->buffer_size + sizeof(*buf))
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+	}
+	return 0;
+}
+
+/*
+ * blkz_recover_dmesg_meta: recover metadata of dmesg
+ *
+ * Recover metadata as follow:
+ * @cxt->dmesg_write_cnt
+ * @cxt->oops_counter
+ * @cxt->panic_counter
+ */
+static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	struct blkz_zone *zone;
+	size_t rcnt, len;
+	struct blkz_buffer *buf;
+	struct blkz_dmesg_header *hdr;
+	struct timespec64 time = {0};
+	unsigned long i;
+	/*
+	 * Recover may on panic, we can't allocate any memory by kmalloc.
+	 * So, we use local array instead.
+	 */
+	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
+
+	if (!info->read)
+		return -EINVAL;
+
+	len = sizeof(*buf) + sizeof(*hdr);
+	buf = (struct blkz_buffer *)buffer_header;
+	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
+		zone = cxt->dbzs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+
+		rcnt = info->read((char *)buf, len, zone->off);
+		if (rcnt != len) {
+			pr_err("read %s with id %lu failed\n", zone->name, i);
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+		}
+
+		if (buf->sig != zone->buffer->sig) {
+			pr_debug("no valid data in dmesg zone %lu\n", i);
+			continue;
+		}
+
+		if (zone->buffer_size < atomic_read(&buf->datalen)) {
+			pr_info("found overtop zone: %s: id %lu, off %lu, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		hdr = (struct blkz_dmesg_header *)buf->data;
+		if (hdr->magic != DMESG_HEADER_MAGIC) {
+			pr_info("found invalid zone: %s: id %lu, off %lu, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		/*
+		 * we get the newest zone, and the next one must be the oldest
+		 * or unused zone, because we do write one by one like a circle.
+		 */
+		if (hdr->time.tv_sec >= time.tv_sec) {
+			time.tv_sec = hdr->time.tv_sec;
+			cxt->dmesg_write_cnt = (i + 1) % cxt->dmesg_max_cnt;
+		}
+
+		if (hdr->reason == KMSG_DUMP_OOPS)
+			cxt->oops_counter =
+				max(cxt->oops_counter, hdr->counter);
+		else
+			cxt->panic_counter =
+				max(cxt->panic_counter, hdr->counter);
+
+		if (!atomic_read(&buf->datalen)) {
+			pr_debug("found erased zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
+					zone->name, i, zone->off,
+					zone->buffer_size,
+					atomic_read(&buf->datalen));
+			continue;
+		}
+
+		if (!is_on_panic())
+			zone->should_recover = true;
+		pr_debug("found nice zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
+				zone->name, i, zone->off,
+				zone->buffer_size, atomic_read(&buf->datalen));
+	}
+
+	return 0;
+}
+
+static int blkz_recover_dmesg(struct blkz_context *cxt)
+{
+	int ret;
+
+	if (!cxt->dbzs)
+		return 0;
+
+	ret = blkz_recover_dmesg_meta(cxt);
+	if (ret)
+		goto recover_fail;
+
+	ret = blkz_recover_dmesg_data(cxt);
+	if (ret)
+		goto recover_fail;
+
+	return 0;
+recover_fail:
+	pr_debug("recover dmesg failed\n");
+	return ret;
+}
+
+static inline int blkz_recovery(struct blkz_context *cxt)
+{
+	int ret = -EBUSY;
+
+	if (atomic_read(&cxt->recovered))
+		return 0;
+
+	ret = blkz_recover_dmesg(cxt);
+	if (ret)
+		goto recover_fail;
+
+	pr_debug("recover end!\n");
+	atomic_set(&cxt->recovered, 1);
+	return 0;
+
+recover_fail:
+	pr_err("recover failed\n");
+	return ret;
+}
+
+static int blkz_pstore_open(struct pstore_info *psi)
+{
+	struct blkz_context *cxt = psi->data;
+
+	cxt->dmesg_read_cnt = 0;
+	return 0;
+}
+
+static inline bool blkz_ok(struct blkz_zone *zone)
+{
+	if (zone && zone->buffer && buffer_datalen(zone))
+		return true;
+	return false;
+}
+
+static inline int blkz_dmesg_erase(struct blkz_context *cxt,
+		struct blkz_zone *zone)
+{
+	if (unlikely(!blkz_ok(zone)))
+		return 0;
+
+	atomic_set(&zone->buffer->datalen, 0);
+	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+}
+
+static int blkz_pstore_erase(struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
+	default:
+		return -EINVAL;
+	}
+}
+
+static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+	struct blkz_buffer *buffer = zone->buffer;
+	struct blkz_dmesg_header *hdr =
+		(struct blkz_dmesg_header *)buffer->data;
+
+	hdr->magic = DMESG_HEADER_MAGIC;
+	hdr->compressed = record->compressed;
+	hdr->time.tv_sec = record->time.tv_sec;
+	hdr->time.tv_nsec = record->time.tv_nsec;
+	hdr->reason = record->reason;
+	if (hdr->reason == KMSG_DUMP_OOPS)
+		hdr->counter = ++cxt->oops_counter;
+	else
+		hdr->counter = ++cxt->panic_counter;
+}
+
+static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
+		struct pstore_record *record)
+{
+	size_t size, hlen;
+	struct blkz_zone *zone;
+	unsigned int zonenum;
+
+	zonenum = cxt->dmesg_write_cnt;
+	zone = cxt->dbzs[zonenum];
+	if (unlikely(!zone))
+		return -ENOSPC;
+	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
+
+	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+	blkz_write_kmsg_hdr(zone, record);
+	hlen = sizeof(struct blkz_dmesg_header);
+	size = min_t(size_t, record->size, zone->buffer_size - hlen);
+	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+}
+
+static int notrace blkz_dmesg_write(struct blkz_context *cxt,
+		struct pstore_record *record)
+{
+	int ret;
+	struct blkz_info *info = cxt->bzinfo;
+
+	/*
+	 * Out of the various dmesg dump types, pstore/blk is currently designed
+	 * to only store crash logs, rather than storing general kernel logs.
+	 */
+	if (record->reason != KMSG_DUMP_OOPS &&
+			record->reason != KMSG_DUMP_PANIC)
+		return -EINVAL;
+
+	/* Skip Oopes when configured to do so. */
+	if (record->reason == KMSG_DUMP_OOPS && !info->dump_oops)
+		return -EINVAL;
+
+	/*
+	 * Explicitly only take the first part of any new crash.
+	 * If our buffer is larger than kmsg_bytes, this can never happen,
+	 * and if our buffer is smaller than kmsg_bytes, we don't want the
+	 * report split across multiple records.
+	 */
+	if (record->part != 1)
+		return -ENOSPC;
+
+	if (!cxt->dbzs)
+		return -ENOSPC;
+
+	ret = blkz_dmesg_write_do(cxt, record);
+	if (!ret) {
+		pr_debug("try to flush other dirty dmesg zones\n");
+		blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
+	}
+
+	/* alway return 0 as we had handled it on buffer */
+	return 0;
+}
+
+static int notrace blkz_pstore_write(struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+
+	if (record->type == PSTORE_TYPE_DMESG &&
+			record->reason == KMSG_DUMP_PANIC)
+		atomic_set(&cxt->on_panic, 1);
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		return blkz_dmesg_write(cxt, record);
+	default:
+		return -EINVAL;
+	}
+}
+
+#define READ_NEXT_ZONE ((ssize_t)(-1024))
+static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
+{
+	struct blkz_zone *zone = NULL;
+
+	while (cxt->dmesg_read_cnt < cxt->dmesg_max_cnt) {
+		zone = cxt->dbzs[cxt->dmesg_read_cnt++];
+		if (blkz_ok(zone))
+			return zone;
+	}
+
+	return NULL;
+}
+
+static int blkz_read_dmesg_hdr(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	struct blkz_buffer *buffer = zone->buffer;
+	struct blkz_dmesg_header *hdr =
+		(struct blkz_dmesg_header *)buffer->data;
+
+	if (hdr->magic != DMESG_HEADER_MAGIC)
+		return -EINVAL;
+	record->compressed = hdr->compressed;
+	record->time.tv_sec = hdr->time.tv_sec;
+	record->time.tv_nsec = hdr->time.tv_nsec;
+	record->reason = hdr->reason;
+	record->count = hdr->counter;
+	return 0;
+}
+
+static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	size_t size, hlen = 0;
+
+	size = buffer_datalen(zone);
+	/* Clear and skip this DMESG record if it has no valid header */
+	if (blkz_read_dmesg_hdr(zone, record)) {
+		atomic_set(&zone->buffer->datalen, 0);
+		atomic_set(&zone->dirty, 0);
+		return READ_NEXT_ZONE;
+	}
+	size -= sizeof(struct blkz_dmesg_header);
+
+	if (!record->compressed) {
+		char *buf = kasprintf(GFP_KERNEL,
+				"%s: Total %d times\n",
+				record->reason == KMSG_DUMP_OOPS ? "Oops" :
+				"Panic", record->count);
+		hlen = strlen(buf);
+		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
+		if (!record->buf) {
+			kfree(buf);
+			return -ENOMEM;
+		}
+	} else {
+		record->buf = kmalloc(size, GFP_KERNEL);
+		if (!record->buf)
+			return -ENOMEM;
+	}
+
+	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
+				sizeof(struct blkz_dmesg_header)) < 0)) {
+		kfree(record->buf);
+		return READ_NEXT_ZONE;
+	}
+
+	return size + hlen;
+}
+
+static ssize_t blkz_pstore_read(struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+	ssize_t (*blkz_read)(struct blkz_zone *zone,
+			struct pstore_record *record);
+	struct blkz_zone *zone;
+	ssize_t ret;
+
+	/* before read, we must recover from storage */
+	ret = blkz_recovery(cxt);
+	if (ret)
+		return ret;
+
+next_zone:
+	zone = blkz_read_next_zone(cxt);
+	if (!zone)
+		return 0;
+
+	record->type = zone->type;
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		blkz_read = blkz_dmesg_read;
+		record->id = cxt->dmesg_read_cnt - 1;
+		break;
+	default:
+		goto next_zone;
+	}
+
+	ret = blkz_read(zone, record);
+	if (ret == READ_NEXT_ZONE)
+		goto next_zone;
+	return ret;
+}
+
+static struct blkz_context blkz_cxt = {
+	.bzinfo_lock = __SPIN_LOCK_UNLOCKED(blkz_cxt.bzinfo_lock),
+	.recovered = ATOMIC_INIT(0),
+	.on_panic = ATOMIC_INIT(0),
+	.pstore = {
+		.owner = THIS_MODULE,
+		.name = MODNAME,
+		.open = blkz_pstore_open,
+		.read = blkz_pstore_read,
+		.write = blkz_pstore_write,
+		.erase = blkz_pstore_erase,
+	},
+};
+
+static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
+		unsigned long *off, size_t size)
+{
+	struct blkz_info *info = blkz_cxt.bzinfo;
+	struct blkz_zone *zone;
+	const char *name = pstore_type_to_name(type);
+
+	if (!size)
+		return NULL;
+
+	if (*off + size > info->total_size) {
+		pr_err("no room for %s (0x%zx@0x%lx over 0x%lx)\n",
+			name, size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	zone = kzalloc(sizeof(struct blkz_zone), GFP_KERNEL);
+	if (!zone)
+		return ERR_PTR(-ENOMEM);
+
+	zone->buffer = kmalloc(size, GFP_KERNEL);
+	if (!zone->buffer) {
+		kfree(zone);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zone->buffer, 0xFF, size);
+	zone->off = *off;
+	zone->name = name;
+	zone->type = type;
+	zone->buffer_size = size - sizeof(struct blkz_buffer);
+	zone->buffer->sig = type ^ BLK_SIG;
+	atomic_set(&zone->dirty, 0);
+	atomic_set(&zone->buffer->datalen, 0);
+
+	*off += size;
+
+	pr_debug("blkzone %s: off 0x%lx, %zu header, %zu data\n", zone->name,
+			zone->off, sizeof(*zone->buffer), zone->buffer_size);
+	return zone;
+}
+
+static struct blkz_zone **blkz_init_zones(enum pstore_type_id type,
+	unsigned long *off, size_t total_size, ssize_t record_size,
+	unsigned int *cnt)
+{
+	struct blkz_info *info = blkz_cxt.bzinfo;
+	struct blkz_zone **zones, *zone;
+	const char *name = pstore_type_to_name(type);
+	int c, i;
+
+	if (!total_size || !record_size)
+		return NULL;
+
+	if (*off + total_size > info->total_size) {
+		pr_err("no room for zones %s (0x%zx@0x%lx over 0x%lx)\n",
+			name, total_size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	c = total_size / record_size;
+	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
+	if (!zones) {
+		pr_err("allocate for zones %s failed\n", name);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zones, 0, c * sizeof(*zones));
+
+	for (i = 0; i < c; i++) {
+		zone = blkz_init_zone(type, off, record_size);
+		if (!zone || IS_ERR(zone)) {
+			pr_err("initialize zones %s failed\n", name);
+			while (--i >= 0) {
+				kfree(zones[i]->buffer);
+				kfree(zones[i]);
+			}
+			kfree(zones);
+			return (void *)zone;
+		}
+		zones[i] = zone;
+	}
+
+	*cnt = c;
+	return zones;
+}
+
+static void blkz_free_zone(struct blkz_zone **blkzone)
+{
+	struct blkz_zone *zone = *blkzone;
+
+	if (!zone)
+		return;
+
+	kfree(zone->buffer);
+	kfree(zone);
+	*blkzone = NULL;
+}
+
+static void blkz_free_zones(struct blkz_zone ***blkzones, unsigned int *cnt)
+{
+	struct blkz_zone **zones = *blkzones;
+
+	if (!zones)
+		return;
+
+	while (*cnt > 0) {
+		blkz_free_zone(&zones[*cnt]);
+		(*cnt)--;
+	}
+	kfree(zones);
+	*blkzones = NULL;
+}
+
+static int blkz_cut_zones(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	unsigned long off = 0;
+	int err;
+	size_t size;
+
+	size = info->total_size;
+	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+			info->dmesg_size, &cxt->dmesg_max_cnt);
+	if (IS_ERR(cxt->dbzs)) {
+		err = PTR_ERR(cxt->dbzs);
+		goto fail_out;
+	}
+
+	return 0;
+fail_out:
+	return err;
+}
+
+int blkz_register(struct blkz_info *info)
+{
+	int err = -EINVAL;
+	struct blkz_context *cxt = &blkz_cxt;
+	struct module *owner = info->owner;
+
+	if (!info->total_size) {
+		pr_warn("the total size must be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->dmesg_size) {
+		pr_warn("at least one of the records be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->name || !info->name[0])
+		return -EINVAL;
+
+	if (info->total_size < 4096) {
+		pr_err("total size must be greater than 4096 bytes\n");
+		return -EINVAL;
+	}
+
+#define check_size(name, size) {					\
+		if (info->name > 0 && info->name < (size)) {		\
+			pr_err(#name " must be over %d\n", (size));	\
+			return -EINVAL;					\
+		}							\
+		if (info->name & (size - 1)) {				\
+			pr_err(#name " must be a multiple of %d\n",	\
+					(size));			\
+			return -EINVAL;					\
+		}							\
+	}
+
+	check_size(total_size, 4096);
+	check_size(dmesg_size, SECTOR_SIZE);
+
+#undef check_size
+
+	/*
+	 * the @read and @write must be applied.
+	 * if no @read, pstore may mount failed.
+	 * if no @write, pstore do not support to remove record file.
+	 */
+	if (!info->read || !info->write) {
+		pr_err("no valid general read/write interface\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&cxt->bzinfo_lock);
+	if (cxt->bzinfo) {
+		pr_warn("blk '%s' already loaded: ignoring '%s'\n",
+				cxt->bzinfo->name, info->name);
+		spin_unlock(&cxt->bzinfo_lock);
+		return -EBUSY;
+	}
+	cxt->bzinfo = info;
+	spin_unlock(&cxt->bzinfo_lock);
+
+	if (owner && !try_module_get(owner)) {
+		err = -EBUSY;
+		goto fail_out;
+	}
+
+	pr_debug("register %s with properties:\n", info->name);
+	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
+	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
+
+	err = blkz_cut_zones(cxt);
+	if (err) {
+		pr_err("cut zones fialed\n");
+		goto put_module;
+	}
+
+	if (info->dmesg_size) {
+		cxt->pstore.bufsize = cxt->dbzs[0]->buffer_size -
+			sizeof(struct blkz_dmesg_header);
+		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
+		if (!cxt->pstore.buf) {
+			err = -ENOMEM;
+			goto put_module;
+		}
+	}
+	cxt->pstore.data = cxt;
+	if (info->dmesg_size)
+		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
+
+	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
+			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
+			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
+
+	err = pstore_register(&cxt->pstore);
+	if (err) {
+		pr_err("registering with pstore failed\n");
+		goto free_pstore_buf;
+	}
+
+	module_put(owner);
+	return 0;
+
+free_pstore_buf:
+	kfree(cxt->pstore.buf);
+put_module:
+	module_put(owner);
+fail_out:
+	spin_lock(&blkz_cxt.bzinfo_lock);
+	blkz_cxt.bzinfo = NULL;
+	spin_unlock(&blkz_cxt.bzinfo_lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(blkz_register);
+
+void blkz_unregister(struct blkz_info *info)
+{
+	struct blkz_context *cxt = &blkz_cxt;
+
+	pstore_unregister(&cxt->pstore);
+	kfree(cxt->pstore.buf);
+	cxt->pstore.bufsize = 0;
+
+	spin_lock(&cxt->bzinfo_lock);
+	blkz_cxt.bzinfo = NULL;
+	spin_unlock(&cxt->bzinfo_lock);
+
+	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
+}
+EXPORT_SYMBOL_GPL(blkz_unregister);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("Block device Oops/Panic logger");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
new file mode 100644
index 000000000000..589d276fa4e4
--- /dev/null
+++ b/include/linux/pstore_blk.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSTORE_BLK_H_
+#define __PSTORE_BLK_H_
+
+#include <linux/types.h>
+#include <linux/blkdev.h>
+
+/**
+ * struct blkz_info - backend blkzone driver structure
+ *
+ * @owner:
+ *	Module which is responsible for this backend driver.
+ * @name:
+ *	Name of the backend driver.
+ * @total_size:
+ *	The total size in bytes pstore/blk can use. It must be greater than
+ *	4096 and be multiple of 4096.
+ * @dmesg_size:
+ *	The size of each zones for dmesg (oops & panic). Zero means disabled,
+ *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
+ * @dump_oops:
+ *	Dump oops and panic log or only panic.
+ * @read, @write:
+ *	The general (not panic) read/write operation. It's required unless you
+ *	are block device and supply valid @bdev. In this case, blkzone will
+ *	replace it as a general read/write interface.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes read/write should be returned.
+ *	On error, negative number should be returned.
+ * @panic_write:
+ *	The write operation only used for panic. It's optional if you do not
+ *	care panic record. If panic occur but blkzone do not recover yet, the
+ *	first zone of dmesg is used.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes write should be returned.
+ *	On error, negative number should be returned.
+ */
+typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
+typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
+struct blkz_info {
+	struct module *owner;
+	const char *name;
+
+	unsigned long total_size;
+	unsigned long dmesg_size;
+	int dump_oops;
+	blkz_read_op read;
+	blkz_write_op write;
+	blkz_write_op panic_write;
+};
+
+extern int blkz_register(struct blkz_info *info);
+extern void blkz_unregister(struct blkz_info *info);
+
+#endif
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
  2020-02-07 12:25 ` [PATCH v2 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:06   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder WeiXiong Liao
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

blkoops is a better wrapper for pstore/blk, which provides efficient
configuration mothod. It divides all configurations of pstore/blk into
2 parts, configurations for user and configurations for driver.

Configurations for user detemine how pstore/blk work, such as
dump_oops and dmesg_size. They can be set by Kconfig and module
parameters.

Configurations for driver are all about block/non-block device, such as
total_size of device and read/write operations. They should be provided
by device drivers, calling blkoops_register_device() for non-block
device and blkoops_register_blkdev() for block device.

If device driver support for panic records, @panic_write must be valid.
If panic occurs and pstore/blk does not recover yet, the first zone
of dmesg will be used.

Besides, Block device driver has no need to verify which partition is
used and provides generic read/write operations. Because blkoops has
done it. It also means that if users do not care panic records but
records for oops/console/pmsg/ftrace, block device driver should do
nothing.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 MAINTAINERS             |   2 +-
 fs/pstore/Kconfig       |  61 ++++++++
 fs/pstore/Makefile      |   2 +
 fs/pstore/blkoops.c     | 402 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkoops.h |  58 +++++++
 5 files changed, 524 insertions(+), 1 deletion(-)
 create mode 100644 fs/pstore/blkoops.c
 create mode 100644 include/linux/blkoops.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cc0a4a8ae06a..e4ba97130560 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13381,7 +13381,7 @@ F:	drivers/firmware/efi/efi-pstore.c
 F:	drivers/acpi/apei/erst.c
 F:	Documentation/admin-guide/ramoops.rst
 F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
-K:	\b(pstore|ramoops)
+K:	\b(pstore|ramoops|blkoops)
 
 PTP HARDWARE CLOCK SUPPORT
 M:	Richard Cochran <richardcochran@gmail.com>
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 536fde9e13e8..7a57a8edb612 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -163,3 +163,64 @@ config PSTORE_BLK
 	  where it can be read back at some later point.
 
 	  If unsure, say N.
+
+config PSTORE_BLKOOPS
+	tristate "pstore block with oops logger"
+	depends on PSTORE_BLK
+	help
+	  This is a wrapper for pstore/blk.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
+	  If unsure, say N.
+
+config PSTORE_BLKOOPS_DMESG_SIZE
+	int "dmesg size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	default 64
+	help
+	  This just sets size of dmesg (dmesg_size) for pstore/blk. The size is
+	  in KB and must be a multiple of 4.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
+config PSTORE_BLKOOPS_BLKDEV
+	string "block device for blkoops"
+	depends on PSTORE_BLKOOPS
+	default ""
+	help
+	  Which block device should be used for pstore/blk.
+
+	  It accept the following variants:
+	  1) <hex_major><hex_minor> device number in hexadecimal represents
+	     itself no leading 0x, for example b302.
+	  2) /dev/<disk_name> represents the device number of disk
+	  3) /dev/<disk_name><decimal> represents the device number
+	     of partition - device number of disk plus the partition number
+	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
+	     used when disk name of partitioned disk ends with a digit.
+	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+	     unique id of a partition if the partition table provides it.
+	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+	     filled hex representation of the 32-bit "NT disk signature", and PP
+	     is a zero-filled hex representation of the 1-based partition number.
+	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
+	     to a partition with a known unique id.
+	  7) <major>:<minor> major and minor number of the device separated by
+	     a colon.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
+config PSTORE_BLKOOPS_DUMP_OOPS
+	bool "dump oops"
+	depends on PSTORE_BLKOOPS
+	default y
+	help
+	  Whether blkoops dumps oops or not.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 0ee2fc8d1bfb..24b3d488d2f0 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -15,3 +15,5 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
 
 obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
 pstore_blk-y += blkzone.o
+
+obj-$(CONFIG_PSTORE_BLKOOPS) += blkoops.o
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
new file mode 100644
index 000000000000..8027c3af8c8d
--- /dev/null
+++ b/fs/pstore/blkoops.c
@@ -0,0 +1,402 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "blkoops : " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/platform_device.h>
+#include <linux/blkoops.h>
+#include <linux/mount.h>
+#include <linux/uio.h>
+
+static long dmesg_size = -1;
+module_param(dmesg_size, long, 0400);
+MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
+
+static int dump_oops = -1;
+module_param(dump_oops, int, 0400);
+MODULE_PARM_DESC(total_size, "whether dump oops");
+
+/**
+ * The block device to use. Most of the time, it is a partition of block
+ * device. It's fine to ignore it if you are not block device and register
+ * to blkoops by blkoops_register_device(). In this case, @blkdev is
+ * useless and @read, @write and @total_size must be supplied.
+ *
+ * @blkdev accepts the following variants:
+ * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
+ *    no leading 0x, for example b302.
+ * 2) /dev/<disk_name> represents the device number of disk
+ * 3) /dev/<disk_name><decimal> represents the device number
+ *    of partition - device number of disk plus the partition number
+ * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
+ *    used when disk name of partitioned disk ends on a digit.
+ * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ *    unique id of a partition if the partition table provides it.
+ *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ *    filled hex representation of the 32-bit "NT disk signature", and PP
+ *    is a zero-filled hex representation of the 1-based partition number.
+ * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
+ *    a partition with a known unique id.
+ * 7) <major>:<minor> major and minor number of the device separated by
+ *    a colon.
+ */
+static char blkdev[80];
+module_param_string(blkdev, blkdev, 80, 0400);
+MODULE_PARM_DESC(blkdev, "the block device for general read/write");
+
+static DEFINE_MUTEX(blkz_lock);
+static struct block_device *blkoops_bdev;
+static struct blkz_info *bzinfo;
+static blkoops_blk_panic_write_op blkdev_panic_write;
+
+#ifdef CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
+#define DEFAULT_DMESG_SIZE CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
+#else
+#define DEFAULT_DMESG_SIZE 0
+#endif
+
+#ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
+#define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
+#else
+#define DEFAULT_DUMP_OOPS 1
+#endif
+
+#ifdef CONFIG_PSTORE_BLKOOPS_BLKDEV
+#define DEFAULT_BLKDEV CONFIG_PSTORE_BLKOOPS_BLKDEV
+#else
+#define DEFAULT_BLKDEV ""
+#endif
+
+/**
+ * register device to blkoops
+ *
+ * Drivers, not only block drivers but also non-block drivers can call this
+ * function to register to blkoops. It will pack for blkzone and pstore.
+ */
+int blkoops_register_device(struct blkoops_device *bo_dev)
+{
+	int ret;
+
+	if (!bo_dev || !bo_dev->total_size || !bo_dev->read || !bo_dev->write)
+		return -EINVAL;
+
+	mutex_lock(&blkz_lock);
+
+	/* someone already registered before */
+	if (bzinfo) {
+		mutex_unlock(&blkz_lock);
+		return -EBUSY;
+	}
+	bzinfo = kzalloc(sizeof(struct blkz_info), GFP_KERNEL);
+	if (!bzinfo) {
+		mutex_unlock(&blkz_lock);
+		return -ENOMEM;
+	}
+
+#define verify_size(name, defsize, alignsize) {				\
+		long _##name_ = (name);					\
+		if (_##name_ < 0)					\
+			_##name_ = (defsize);				\
+		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+		if (_##name_ & ((alignsize) - 1)) {			\
+			pr_info(#name " must align to %d\n",		\
+					(alignsize));			\
+			_##name_ = ALIGN(name, (alignsize));		\
+		}							\
+		name = _##name_ / 1024;					\
+		bzinfo->name = _##name_;				\
+	}
+
+	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
+#undef verify_size
+	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
+
+	bzinfo->total_size = bo_dev->total_size;
+	bzinfo->dump_oops = dump_oops;
+	bzinfo->read = bo_dev->read;
+	bzinfo->write = bo_dev->write;
+	bzinfo->panic_write = bo_dev->panic_write;
+	bzinfo->name = "blkoops";
+	bzinfo->owner = THIS_MODULE;
+
+	ret = blkz_register(bzinfo);
+	if (ret) {
+		kfree(bzinfo);
+		bzinfo = NULL;
+	}
+	mutex_unlock(&blkz_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkoops_register_device);
+
+void blkoops_unregister_device(struct blkoops_device *bo_dev)
+{
+	mutex_lock(&blkz_lock);
+	if (bzinfo && bzinfo->read == bo_dev->read) {
+		blkz_unregister(bzinfo);
+		kfree(bzinfo);
+		bzinfo = NULL;
+	}
+	mutex_unlock(&blkz_lock);
+}
+EXPORT_SYMBOL_GPL(blkoops_unregister_device);
+
+/**
+ * get block_device of @blkdev
+ * @holder: exclusive holder identifier
+ *
+ * On success, @blkoops_bdev will save the block_device and the returned
+ * block_device has reference count of one.
+ */
+static struct block_device *blkoops_get_bdev(void *holder)
+{
+	struct block_device *bdev = ERR_PTR(-ENODEV);
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!blkdev[0] && strlen(DEFAULT_BLKDEV))
+		snprintf(blkdev, 80, "%s", DEFAULT_BLKDEV);
+	if (!blkdev[0])
+		return ERR_PTR(-ENODEV);
+
+	mutex_lock(&blkz_lock);
+	if (bzinfo)
+		goto out;
+	if (holder)
+		mode |= FMODE_EXCL;
+	bdev = blkdev_get_by_path(blkdev, mode, holder);
+	if (IS_ERR(bdev)) {
+		dev_t devt;
+
+		devt = name_to_dev_t(blkdev);
+		if (devt == 0) {
+			bdev = ERR_PTR(-ENODEV);
+			goto out;
+		}
+		bdev = blkdev_get_by_dev(devt, mode, holder);
+	}
+out:
+	mutex_unlock(&blkz_lock);
+	return bdev;
+}
+
+static void blkoops_put_bdev(struct block_device *bdev, void *holder)
+{
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!bdev)
+		return;
+
+	mutex_lock(&blkz_lock);
+	if (holder)
+		mode |= FMODE_EXCL;
+	blkdev_put(bdev, mode);
+	mutex_unlock(&blkz_lock);
+}
+
+static ssize_t blkoops_generic_blk_read(char *buf, size_t bytes, loff_t pos)
+{
+	ssize_t ret;
+	struct block_device *bdev = blkoops_bdev;
+	struct file filp;
+	mm_segment_t ofs;
+	struct kiocb kiocb;
+	struct iov_iter iter;
+	struct iovec iov = {
+		.iov_base = (void __user *)buf,
+		.iov_len = bytes
+	};
+
+	if (!bdev)
+		return -ENODEV;
+
+	memset(&filp, 0, sizeof(struct file));
+	filp.f_mapping = bdev->bd_inode->i_mapping;
+	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	filp.f_inode = bdev->bd_inode;
+
+	init_sync_kiocb(&kiocb, &filp);
+	kiocb.ki_pos = pos;
+	iov_iter_init(&iter, READ, &iov, 1, bytes);
+
+	ofs = get_fs();
+	set_fs(KERNEL_DS);
+	ret = generic_file_read_iter(&kiocb, &iter);
+	set_fs(ofs);
+	return ret;
+}
+
+static ssize_t blkoops_generic_blk_write(const char *buf, size_t bytes,
+		loff_t pos)
+{
+	struct block_device *bdev = blkoops_bdev;
+	struct iov_iter iter;
+	struct kiocb kiocb;
+	struct file filp;
+	mm_segment_t ofs;
+	ssize_t ret;
+	struct iovec iov = {
+		.iov_base = (void __user *)buf,
+		.iov_len = bytes
+	};
+
+	if (!bdev)
+		return -ENODEV;
+
+	/* Console/Ftrace recorder may handle buffer until flush dirty zones */
+	if (in_interrupt() || irqs_disabled())
+		return -EBUSY;
+
+	memset(&filp, 0, sizeof(struct file));
+	filp.f_mapping = bdev->bd_inode->i_mapping;
+	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	filp.f_inode = bdev->bd_inode;
+
+	init_sync_kiocb(&kiocb, &filp);
+	kiocb.ki_pos = pos;
+	iov_iter_init(&iter, WRITE, &iov, 1, bytes);
+
+	ofs = get_fs();
+	set_fs(KERNEL_DS);
+
+	inode_lock(bdev->bd_inode);
+	ret = generic_write_checks(&kiocb, &iter);
+	if (ret > 0)
+		ret = generic_perform_write(&filp, &iter, pos);
+	inode_unlock(bdev->bd_inode);
+
+	if (likely(ret > 0)) {
+		const struct file_operations f_op = {.fsync = blkdev_fsync};
+
+		filp.f_op = &f_op;
+		kiocb.ki_pos += ret;
+		ret = generic_write_sync(&kiocb, ret);
+	}
+	set_fs(ofs);
+	return ret;
+}
+
+static inline unsigned long blkoops_bdev_size(struct block_device *bdev)
+{
+	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
+}
+
+static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
+		loff_t off)
+{
+	int ret;
+
+	if (!blkdev_panic_write)
+		return -EOPNOTSUPP;
+
+	/* size and off must align to SECTOR_SIZE for block device */
+	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
+			size >> SECTOR_SHIFT);
+	return ret ? -EIO : size;
+}
+
+/**
+ * register block device to blkoops
+ * @major: the major device number of registering device
+ * @panic_write: the write interface for panic case.
+ *
+ * It is ONLY used for block device to register to blkoops. In this case,
+ * the module parameter @blkdev must be valid. Generic read/write interfaces
+ * will be used.
+ *
+ * Block driver has no need to verify which partition is used. Block driver
+ * should only tell me what major number is, so blkoops can get the matching
+ * driver for @blkdev.
+ *
+ * If block driver support for panic records, @panic_write must be valid. If
+ * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
+ * will be used.
+ */
+int blkoops_register_blkdev(unsigned int major,
+		blkoops_blk_panic_write_op panic_write)
+{
+	struct block_device *bdev;
+	struct blkoops_device bo_dev = {0};
+	int ret = -ENODEV;
+	void *holder = blkdev;
+
+	bdev = blkoops_get_bdev(holder);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+
+	blkoops_bdev = bdev;
+	blkdev_panic_write = panic_write;
+
+	/* only allow driver matching the @blkdev */
+	if (!bdev->bd_dev || MAJOR(bdev->bd_dev) != major)
+		goto err_put_bdev;
+
+	bo_dev.total_size = blkoops_bdev_size(bdev);
+	if (bo_dev.total_size == 0)
+		goto err_put_bdev;
+	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
+	bo_dev.read = blkoops_generic_blk_read;
+	bo_dev.write = blkoops_generic_blk_write;
+
+	ret = blkoops_register_device(&bo_dev);
+	if (ret)
+		goto err_put_bdev;
+	return 0;
+
+err_put_bdev:
+	blkdev_panic_write = NULL;
+	blkoops_bdev = NULL;
+	blkoops_put_bdev(bdev, holder);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkoops_register_blkdev);
+
+void blkoops_unregister_blkdev(unsigned int major)
+{
+	struct blkoops_device bo_dev = {.read = blkoops_generic_blk_read};
+	void *holder = blkdev;
+
+	if (blkoops_bdev && MAJOR(blkoops_bdev->bd_dev) == major) {
+		blkoops_unregister_device(&bo_dev);
+		blkoops_put_bdev(blkoops_bdev, holder);
+		blkdev_panic_write = NULL;
+		blkoops_bdev = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(blkoops_unregister_blkdev);
+
+/**
+ * get information of @blkdev
+ * @devt: the block device num of @blkdev
+ * @nr_sectors: the sector count of @blkdev
+ * @start_sect: the start sector of @blkdev
+ *
+ * Block driver needs the follow information for @panic_write.
+ */
+int blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
+{
+	struct block_device *bdev;
+
+	bdev = blkoops_get_bdev(NULL);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+
+	if (devt)
+		*devt = bdev->bd_dev;
+	if (nr_sects)
+		*nr_sects = part_nr_sects_read(bdev->bd_part);
+	if (start_sect)
+		*start_sect = get_start_sect(bdev);
+
+	blkoops_put_bdev(bdev, NULL);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkoops_blkdev_info);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("Wrapper for Pstore BLK with Oops logger");
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
new file mode 100644
index 000000000000..fe63739309aa
--- /dev/null
+++ b/include/linux/blkoops.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __BLKOOPS_H_
+#define __BLKOOPS_H_
+
+#include <linux/types.h>
+#include <linux/blkdev.h>
+#include <linux/pstore_blk.h>
+
+/**
+ * struct blkoops_device - backend blkoops driver structure.
+ *
+ * This structure is ONLY used for non-block device by
+ * blkoops_register_device(). If block device, you are strongly recommended
+ * to use blkoops_register_blkdev().
+ *
+ * @total_size:
+ *	The total size in bytes pstore/blk can use. It must be greater than
+ *	4096 and be multiple of 4096.
+ * @read, @write:
+ *	The general (not panic) read/write operation.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes read should be returned.
+ *	On error, negative number should be returned.
+ * @panic_write:
+ *	The write operation only used for panic.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes read should be returned.
+ *	On error, negative number should be returned.
+ */
+struct blkoops_device {
+	unsigned long total_size;
+	blkz_read_op read;
+	blkz_write_op write;
+	blkz_write_op panic_write;
+};
+
+/*
+ * Panic write for block device who should write alignmemt to SECTOR_SIZE.
+ * On success, zero should be returned. Others mean error.
+ */
+typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
+		sector_t sects);
+
+int  blkoops_register_device(struct blkoops_device *bo_dev);
+void blkoops_unregister_device(struct blkoops_device *bo_dev);
+int  blkoops_register_blkdev(unsigned int major,
+		blkoops_blk_panic_write_op panic_write);
+void blkoops_unregister_blkdev(unsigned int major);
+int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
+
+#endif
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
  2020-02-07 12:25 ` [PATCH v2 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
  2020-02-07 12:25 ` [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:13   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

pmsg support recorder for userspace. To enable pmsg, just make pmsg_size
be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/Kconfig          |  12 +++
 fs/pstore/blkoops.c        |  11 +++
 fs/pstore/blkzone.c        | 229 +++++++++++++++++++++++++++++++++++++++++++--
 include/linux/pstore_blk.h |   4 +
 4 files changed, 246 insertions(+), 10 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 7a57a8edb612..bbf1fdb5eaa7 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -186,6 +186,18 @@ config PSTORE_BLKOOPS_DMESG_SIZE
 	  NOTE that, both kconfig and module parameters can configure blkoops,
 	  but module parameters have priority over kconfig.
 
+config PSTORE_BLKOOPS_PMSG_SIZE
+	int "pmsg size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	depends on PSTORE_PMSG
+	default 64
+	help
+	  This just sets size of pmsg (pmsg_size) for pstore/blk. The size is
+	  in KB and must be a multiple of 4.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
 config PSTORE_BLKOOPS_BLKDEV
 	string "block device for blkoops"
 	depends on PSTORE_BLKOOPS
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 8027c3af8c8d..02e6e4c1f965 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -16,6 +16,10 @@
 module_param(dmesg_size, long, 0400);
 MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
 
+static long pmsg_size = -1;
+module_param(pmsg_size, long, 0400);
+MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
+
 static int dump_oops = -1;
 module_param(dump_oops, int, 0400);
 MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -60,6 +64,12 @@
 #define DEFAULT_DMESG_SIZE 0
 #endif
 
+#ifdef CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
+#define DEFAULT_PMSG_SIZE CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
+#else
+#define DEFAULT_PMSG_SIZE 0
+#endif
+
 #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #else
@@ -113,6 +123,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 	}
 
 	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
+	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index f77f612b50ba..a3464252d52e 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -24,12 +24,14 @@
  *
  * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
  * @datalen: length of data in @data
+ * @start: offset into @data where the beginning of the stored bytes begin
  * @data: zone data.
  */
 struct blkz_buffer {
 #define BLK_SIG (0x43474244) /* DBGC */
 	uint32_t sig;
 	atomic_t datalen;
+	atomic_t start;
 	uint8_t data[];
 };
 
@@ -85,8 +87,10 @@ struct blkz_zone {
 
 struct blkz_context {
 	struct blkz_zone **dbzs;	/* dmesg block zones */
+	struct blkz_zone *pbz;		/* Pmsg block zone */
 	unsigned int dmesg_max_cnt;
 	unsigned int dmesg_read_cnt;
+	unsigned int pmsg_read_cnt;
 	unsigned int dmesg_write_cnt;
 	/*
 	 * the counter should be recovered when recover.
@@ -119,6 +123,11 @@ static inline int buffer_datalen(struct blkz_zone *zone)
 	return atomic_read(&zone->buffer->datalen);
 }
 
+static inline int buffer_start(struct blkz_zone *zone)
+{
+	return atomic_read(&zone->buffer->start);
+}
+
 static inline bool is_on_panic(void)
 {
 	struct blkz_context *cxt = &blkz_cxt;
@@ -410,6 +419,69 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
 	return ret;
 }
 
+static int blkz_recover_pmsg(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	struct blkz_buffer *oldbuf;
+	struct blkz_zone *zone = NULL;
+	int ret = 0;
+	ssize_t rcnt, len;
+
+	zone = cxt->pbz;
+	if (!zone || zone->oldbuf)
+		return 0;
+
+	if (is_on_panic())
+		goto out;
+
+	if (unlikely(!info->read))
+		return -EINVAL;
+
+	len = zone->buffer_size + sizeof(*oldbuf);
+	oldbuf = kzalloc(len, GFP_KERNEL);
+	if (!oldbuf)
+		return -ENOMEM;
+
+	rcnt = info->read((char *)oldbuf, len, zone->off);
+	if (rcnt != len) {
+		pr_debug("recover pmsg failed\n");
+		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
+		goto free_oldbuf;
+	}
+
+	if (oldbuf->sig != zone->buffer->sig) {
+		pr_debug("no valid data in zone %s\n", zone->name);
+		goto free_oldbuf;
+	}
+
+	if (zone->buffer_size < atomic_read(&oldbuf->datalen) ||
+		zone->buffer_size < atomic_read(&oldbuf->start)) {
+		pr_info("found overtop zone: %s: off %lu, size %zu\n",
+				zone->name, zone->off, zone->buffer_size);
+		goto free_oldbuf;
+	}
+
+	if (!atomic_read(&oldbuf->datalen)) {
+		pr_debug("found erased zone: %s: id 0, off %lu, size %zu, datalen %d\n",
+				zone->name, zone->off, zone->buffer_size,
+				atomic_read(&oldbuf->datalen));
+		kfree(oldbuf);
+		goto out;
+	}
+
+	pr_debug("found nice zone: %s: id 0, off %lu, size %zu, datalen %d\n",
+			zone->name, zone->off, zone->buffer_size,
+			atomic_read(&oldbuf->datalen));
+	zone->oldbuf = oldbuf;
+out:
+	blkz_flush_dirty_zone(zone);
+	return 0;
+
+free_oldbuf:
+	kfree(oldbuf);
+	return ret;
+}
+
 static inline int blkz_recovery(struct blkz_context *cxt)
 {
 	int ret = -EBUSY;
@@ -421,6 +493,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = blkz_recover_pmsg(cxt);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -435,9 +511,17 @@ static int blkz_pstore_open(struct pstore_info *psi)
 	struct blkz_context *cxt = psi->data;
 
 	cxt->dmesg_read_cnt = 0;
+	cxt->pmsg_read_cnt = 0;
 	return 0;
 }
 
+static inline bool blkz_old_ok(struct blkz_zone *zone)
+{
+	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
+		return true;
+	return false;
+}
+
 static inline bool blkz_ok(struct blkz_zone *zone)
 {
 	if (zone && zone->buffer && buffer_datalen(zone))
@@ -455,6 +539,25 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
 	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
+static inline int blkz_pmsg_erase(struct blkz_context *cxt,
+		struct blkz_zone *zone)
+{
+	if (unlikely(!blkz_old_ok(zone)))
+		return 0;
+
+	kfree(zone->oldbuf);
+	zone->oldbuf = NULL;
+	/*
+	 * if there are new data in zone buffer, that means the old data
+	 * are already invalid. It is no need to flush 0 (erase) to
+	 * block device.
+	 */
+	if (!buffer_datalen(zone))
+		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	blkz_flush_dirty_zone(zone);
+	return 0;
+}
+
 static int blkz_pstore_erase(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
@@ -462,6 +565,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
+	case PSTORE_TYPE_PMSG:
+		return blkz_pmsg_erase(cxt, cxt->pbz);
 	default:
 		return -EINVAL;
 	}
@@ -482,8 +587,10 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
 	hdr->reason = record->reason;
 	if (hdr->reason == KMSG_DUMP_OOPS)
 		hdr->counter = ++cxt->oops_counter;
-	else
+	else if (hdr->reason == KMSG_DUMP_PANIC)
 		hdr->counter = ++cxt->panic_counter;
+	else
+		hdr->counter = 0;
 }
 
 static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
@@ -546,6 +653,55 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
 	return 0;
 }
 
+static int notrace blkz_pmsg_write(struct blkz_context *cxt,
+		struct pstore_record *record)
+{
+	struct blkz_zone *zone;
+	size_t start, rem;
+	int cnt = record->size;
+	bool is_full_data = false;
+	char *buf = record->buf;
+
+	zone = cxt->pbz;
+	if (!zone)
+		return -ENOSPC;
+
+	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
+		is_full_data = true;
+
+	if (unlikely(cnt > zone->buffer_size)) {
+		buf += cnt - zone->buffer_size;
+		cnt = zone->buffer_size;
+	}
+
+	start = buffer_start(zone);
+	rem = zone->buffer_size - start;
+	if (unlikely(rem < cnt)) {
+		blkz_zone_write(zone, FLUSH_PART, buf, rem, start);
+		buf += rem;
+		cnt -= rem;
+		start = 0;
+		is_full_data = true;
+	}
+
+	atomic_set(&zone->buffer->start, cnt + start);
+	blkz_zone_write(zone, FLUSH_PART, buf, cnt, start);
+
+	/**
+	 * blkz_zone_write will set datalen as start + cnt.
+	 * It work if actual data length lesser than buffer size.
+	 * If data length greater than buffer size, pmsg will rewrite to
+	 * beginning of zone, which make buffer->datalen wrongly.
+	 * So we should reset datalen as buffer size once actual data length
+	 * greater than buffer size.
+	 */
+	if (is_full_data) {
+		atomic_set(&zone->buffer->datalen, zone->buffer_size);
+		blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	}
+	return 0;
+}
+
 static int notrace blkz_pstore_write(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
@@ -557,6 +713,8 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_write(cxt, record);
+	case PSTORE_TYPE_PMSG:
+		return blkz_pmsg_write(cxt, record);
 	default:
 		return -EINVAL;
 	}
@@ -573,6 +731,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->pmsg_read_cnt == 0) {
+		cxt->pmsg_read_cnt++;
+		zone = cxt->pbz;
+		if (blkz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -611,7 +776,8 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 		char *buf = kasprintf(GFP_KERNEL,
 				"%s: Total %d times\n",
 				record->reason == KMSG_DUMP_OOPS ? "Oops" :
-				"Panic", record->count);
+				record->reason == KMSG_DUMP_PANIC ? "Panic" :
+				"Unknown", record->count);
 		hlen = strlen(buf);
 		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
 		if (!record->buf) {
@@ -633,6 +799,29 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	return size + hlen;
 }
 
+static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	size_t size, start;
+	struct blkz_buffer *buf;
+
+	buf = (struct blkz_buffer *)zone->oldbuf;
+	if (!buf)
+		return READ_NEXT_ZONE;
+
+	size = atomic_read(&buf->datalen);
+	start = atomic_read(&buf->start);
+
+	record->buf = kmalloc(size, GFP_KERNEL);
+	if (!record->buf)
+		return -ENOMEM;
+
+	memcpy(record->buf, buf->data + start, size - start);
+	memcpy(record->buf + size - start, buf->data, start);
+
+	return size;
+}
+
 static ssize_t blkz_pstore_read(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
@@ -657,6 +846,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 		blkz_read = blkz_dmesg_read;
 		record->id = cxt->dmesg_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_PMSG:
+		blkz_read = blkz_pmsg_read;
+		break;
 	default:
 		goto next_zone;
 	}
@@ -712,8 +904,10 @@ static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
 	zone->type = type;
 	zone->buffer_size = size - sizeof(struct blkz_buffer);
 	zone->buffer->sig = type ^ BLK_SIG;
+	zone->oldbuf = NULL;
 	atomic_set(&zone->dirty, 0);
 	atomic_set(&zone->buffer->datalen, 0);
+	atomic_set(&zone->buffer->start, 0);
 
 	*off += size;
 
@@ -798,17 +992,26 @@ static int blkz_cut_zones(struct blkz_context *cxt)
 	struct blkz_info *info = cxt->bzinfo;
 	unsigned long off = 0;
 	int err;
-	size_t size;
+	size_t off_size = 0;
 
-	size = info->total_size;
-	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+	off_size += info->pmsg_size;
+	cxt->pbz = blkz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
+	if (IS_ERR(cxt->pbz)) {
+		err = PTR_ERR(cxt->pbz);
+		goto fail_out;
+	}
+
+	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
+			info->total_size - off_size,
 			info->dmesg_size, &cxt->dmesg_max_cnt);
 	if (IS_ERR(cxt->dbzs)) {
 		err = PTR_ERR(cxt->dbzs);
-		goto fail_out;
+		goto free_pmsg;
 	}
 
 	return 0;
+free_pmsg:
+	blkz_free_zone(&cxt->pbz);
 fail_out:
 	return err;
 }
@@ -824,7 +1027,7 @@ int blkz_register(struct blkz_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->dmesg_size) {
+	if (!info->dmesg_size && !info->pmsg_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -851,6 +1054,7 @@ int blkz_register(struct blkz_info *info)
 
 	check_size(total_size, 4096);
 	check_size(dmesg_size, SECTOR_SIZE);
+	check_size(pmsg_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -882,6 +1086,7 @@ int blkz_register(struct blkz_info *info)
 	pr_debug("register %s with properties:\n", info->name);
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
+	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 
 	err = blkz_cut_zones(cxt);
 	if (err) {
@@ -900,11 +1105,14 @@ int blkz_register(struct blkz_info *info)
 	}
 	cxt->pstore.data = cxt;
 	if (info->dmesg_size)
-		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
+		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
+	if (info->pmsg_size)
+		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
 
-	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
+	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
 			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
-			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
+			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
+			cxt->pbz ? "Pmsg" : "");
 
 	err = pstore_register(&cxt->pstore);
 	if (err) {
@@ -940,6 +1148,7 @@ void blkz_unregister(struct blkz_info *info)
 	spin_unlock(&cxt->bzinfo_lock);
 
 	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
+	blkz_free_zone(&cxt->pbz);
 }
 EXPORT_SYMBOL_GPL(blkz_unregister);
 
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 589d276fa4e4..af06be25bd01 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -19,6 +19,9 @@
  * @dmesg_size:
  *	The size of each zones for dmesg (oops & panic). Zero means disabled,
  *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
+ * @pmsg_size:
+ *	The size of zone for pmsg. Zero means disabled, othewise, it must be
+ *	multiple of SECTOR_SIZE(512).
  * @dump_oops:
  *	Dump oops and panic log or only panic.
  * @read, @write:
@@ -50,6 +53,7 @@ struct blkz_info {
 
 	unsigned long total_size;
 	unsigned long dmesg_size;
+	unsigned long pmsg_size;
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 04/11] pstore/blk: blkoops: support console recorder
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (2 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:16   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

Support recorder for console. To enable console recorder, just make
console_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/Kconfig          |  12 ++++++
 fs/pstore/blkoops.c        |  11 +++++
 fs/pstore/blkzone.c        | 101 ++++++++++++++++++++++++++++++++++-----------
 include/linux/blkoops.h    |   6 ++-
 include/linux/pstore_blk.h |   8 +++-
 5 files changed, 112 insertions(+), 26 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index bbf1fdb5eaa7..5f0a42823028 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -198,6 +198,18 @@ config PSTORE_BLKOOPS_PMSG_SIZE
 	  NOTE that, both kconfig and module parameters can configure blkoops,
 	  but module parameters have priority over kconfig.
 
+config PSTORE_BLKOOPS_CONSOLE_SIZE
+	int "console size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	depends on PSTORE_CONSOLE
+	default 64
+	help
+	  This just sets size of console (console_size) for pstore/blk. The
+	  size is in KB and must be a multiple of 4.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
 config PSTORE_BLKOOPS_BLKDEV
 	string "block device for blkoops"
 	depends on PSTORE_BLKOOPS
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 02e6e4c1f965..05990bc3b168 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -20,6 +20,10 @@
 module_param(pmsg_size, long, 0400);
 MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
 
+static long console_size = -1;
+module_param(console_size, long, 0400);
+MODULE_PARM_DESC(console_size, "console size in kbytes");
+
 static int dump_oops = -1;
 module_param(dump_oops, int, 0400);
 MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -70,6 +74,12 @@
 #define DEFAULT_PMSG_SIZE 0
 #endif
 
+#ifdef CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
+#define DEFAULT_CONSOLE_SIZE CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
+#else
+#define DEFAULT_CONSOLE_SIZE 0
+#endif
+
 #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #else
@@ -124,6 +134,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 
 	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
 	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
+	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index a3464252d52e..9a7e9b06ccf7 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -88,9 +88,11 @@ struct blkz_zone {
 struct blkz_context {
 	struct blkz_zone **dbzs;	/* dmesg block zones */
 	struct blkz_zone *pbz;		/* Pmsg block zone */
+	struct blkz_zone *cbz;		/* console block zone */
 	unsigned int dmesg_max_cnt;
 	unsigned int dmesg_read_cnt;
 	unsigned int pmsg_read_cnt;
+	unsigned int console_read_cnt;
 	unsigned int dmesg_write_cnt;
 	/*
 	 * the counter should be recovered when recover.
@@ -111,6 +113,9 @@ struct blkz_context {
 };
 static struct blkz_context blkz_cxt;
 
+static void blkz_flush_all_dirty_zones(struct work_struct *);
+static DECLARE_WORK(blkz_cleaner, blkz_flush_all_dirty_zones);
+
 enum blkz_flush_mode {
 	FLUSH_NONE = 0,
 	FLUSH_PART,
@@ -200,6 +205,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
 	return 0;
 set_dirty:
 	atomic_set(&zone->dirty, true);
+	/* flush dirty zones nicely */
+	if (wcnt == -EBUSY && !is_on_panic())
+		schedule_work(&blkz_cleaner);
 	return -EBUSY;
 }
 
@@ -266,6 +274,15 @@ static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
 	return 0;
 }
 
+static void blkz_flush_all_dirty_zones(struct work_struct *work)
+{
+	struct blkz_context *cxt = &blkz_cxt;
+
+	blkz_flush_dirty_zone(cxt->pbz);
+	blkz_flush_dirty_zone(cxt->cbz);
+	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
+}
+
 static int blkz_recover_dmesg_data(struct blkz_context *cxt)
 {
 	struct blkz_info *info = cxt->bzinfo;
@@ -419,15 +436,13 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
 	return ret;
 }
 
-static int blkz_recover_pmsg(struct blkz_context *cxt)
+static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
 {
 	struct blkz_info *info = cxt->bzinfo;
 	struct blkz_buffer *oldbuf;
-	struct blkz_zone *zone = NULL;
 	int ret = 0;
 	ssize_t rcnt, len;
 
-	zone = cxt->pbz;
 	if (!zone || zone->oldbuf)
 		return 0;
 
@@ -493,7 +508,11 @@ static inline int blkz_recovery(struct blkz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
-	ret = blkz_recover_pmsg(cxt);
+	ret = blkz_recover_zone(cxt, cxt->pbz);
+	if (ret)
+		goto recover_fail;
+
+	ret = blkz_recover_zone(cxt, cxt->cbz);
 	if (ret)
 		goto recover_fail;
 
@@ -512,6 +531,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
 
 	cxt->dmesg_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
+	cxt->console_read_cnt = 0;
 	return 0;
 }
 
@@ -539,7 +559,7 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
 	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
-static inline int blkz_pmsg_erase(struct blkz_context *cxt,
+static inline int blkz_record_erase(struct blkz_context *cxt,
 		struct blkz_zone *zone)
 {
 	if (unlikely(!blkz_old_ok(zone)))
@@ -566,9 +586,10 @@ static int blkz_pstore_erase(struct pstore_record *record)
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
 	case PSTORE_TYPE_PMSG:
-		return blkz_pmsg_erase(cxt, cxt->pbz);
-	default:
-		return -EINVAL;
+		return blkz_record_erase(cxt, cxt->pbz);
+	case PSTORE_TYPE_CONSOLE:
+		return blkz_record_erase(cxt, cxt->cbz);
+	default: return -EINVAL;
 	}
 }
 
@@ -653,17 +674,15 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
 	return 0;
 }
 
-static int notrace blkz_pmsg_write(struct blkz_context *cxt,
-		struct pstore_record *record)
+static int notrace blkz_record_write(struct blkz_context *cxt,
+		struct blkz_zone *zone, struct pstore_record *record)
 {
-	struct blkz_zone *zone;
 	size_t start, rem;
 	int cnt = record->size;
 	bool is_full_data = false;
 	char *buf = record->buf;
 
-	zone = cxt->pbz;
-	if (!zone)
+	if (!zone || !record)
 		return -ENOSPC;
 
 	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
@@ -710,11 +729,20 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 			record->reason == KMSG_DUMP_PANIC)
 		atomic_set(&cxt->on_panic, 1);
 
+	/*
+	 * if on panic, do not write except dmesg records
+	 * Fix case that panic_write prints log which wakes up console recorder.
+	 */
+	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
+		return -EBUSY;
+
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_write(cxt, record);
+	case PSTORE_TYPE_CONSOLE:
+		return blkz_record_write(cxt, cxt->cbz, record);
 	case PSTORE_TYPE_PMSG:
-		return blkz_pmsg_write(cxt, record);
+		return blkz_record_write(cxt, cxt->pbz, record);
 	default:
 		return -EINVAL;
 	}
@@ -738,6 +766,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->console_read_cnt == 0) {
+		cxt->console_read_cnt++;
+		zone = cxt->cbz;
+		if (blkz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -799,7 +834,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	return size + hlen;
 }
 
-static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
+static ssize_t blkz_record_read(struct blkz_zone *zone,
 		struct pstore_record *record)
 {
 	size_t size, start;
@@ -825,7 +860,7 @@ static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
 static ssize_t blkz_pstore_read(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
-	ssize_t (*blkz_read)(struct blkz_zone *zone,
+	ssize_t (*readop)(struct blkz_zone *zone,
 			struct pstore_record *record);
 	struct blkz_zone *zone;
 	ssize_t ret;
@@ -843,17 +878,19 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 	record->type = zone->type;
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
-		blkz_read = blkz_dmesg_read;
+		readop = blkz_dmesg_read;
 		record->id = cxt->dmesg_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_CONSOLE:
+		/* fallthrough */
 	case PSTORE_TYPE_PMSG:
-		blkz_read = blkz_pmsg_read;
+		readop = blkz_record_read;
 		break;
 	default:
 		goto next_zone;
 	}
 
-	ret = blkz_read(zone, record);
+	ret = readop(zone, record);
 	if (ret == READ_NEXT_ZONE)
 		goto next_zone;
 	return ret;
@@ -1001,15 +1038,25 @@ static int blkz_cut_zones(struct blkz_context *cxt)
 		goto fail_out;
 	}
 
+	off_size += info->console_size;
+	cxt->cbz = blkz_init_zone(PSTORE_TYPE_CONSOLE, &off,
+			info->console_size);
+	if (IS_ERR(cxt->cbz)) {
+		err = PTR_ERR(cxt->cbz);
+		goto free_pmsg;
+	}
+
 	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->dmesg_size, &cxt->dmesg_max_cnt);
 	if (IS_ERR(cxt->dbzs)) {
 		err = PTR_ERR(cxt->dbzs);
-		goto free_pmsg;
+		goto free_console;
 	}
 
 	return 0;
+free_console:
+	blkz_free_zone(&cxt->cbz);
 free_pmsg:
 	blkz_free_zone(&cxt->pbz);
 fail_out:
@@ -1027,7 +1074,7 @@ int blkz_register(struct blkz_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->dmesg_size && !info->pmsg_size) {
+	if (!info->dmesg_size && !info->pmsg_size && !info->console_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -1055,6 +1102,7 @@ int blkz_register(struct blkz_info *info)
 	check_size(total_size, 4096);
 	check_size(dmesg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
+	check_size(console_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1087,6 +1135,7 @@ int blkz_register(struct blkz_info *info)
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
+	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
 
 	err = blkz_cut_zones(cxt);
 	if (err) {
@@ -1108,11 +1157,15 @@ int blkz_register(struct blkz_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
 	if (info->pmsg_size)
 		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
+	if (info->console_size)
+		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
 
-	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
+	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
+			info->name,
 			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
 			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
-			cxt->pbz ? "Pmsg" : "");
+			cxt->pbz ? "Pmsg " : "",
+			cxt->cbz ? "Console" : "");
 
 	err = pstore_register(&cxt->pstore);
 	if (err) {
@@ -1139,6 +1192,8 @@ void blkz_unregister(struct blkz_info *info)
 {
 	struct blkz_context *cxt = &blkz_cxt;
 
+	flush_work(&blkz_cleaner);
+
 	pstore_unregister(&cxt->pstore);
 	kfree(cxt->pstore.buf);
 	cxt->pstore.bufsize = 0;
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index fe63739309aa..8f40f225545d 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -23,8 +23,10 @@
  *	Both of the @size and @offset parameters on this interface are
  *	the relative size of the space provided, not the whole disk/flash.
  *
- *	On success, the number of bytes read should be returned.
- *	On error, negative number should be returned.
+ *	On success, the number of bytes read/write should be returned.
+ *	On error, negative number should be returned. The following returning
+ *	number means more:
+ *	  -EBUSY: pstore/blk should try again later.
  * @panic_write:
  *	The write operation only used for panic.
  *
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index af06be25bd01..546375e04419 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -22,6 +22,9 @@
  * @pmsg_size:
  *	The size of zone for pmsg. Zero means disabled, othewise, it must be
  *	multiple of SECTOR_SIZE(512).
+ * @console_size:
+ *	The size of zone for console. Zero means disabled, othewise, it must
+ *	be multiple of SECTOR_SIZE(512).
  * @dump_oops:
  *	Dump oops and panic log or only panic.
  * @read, @write:
@@ -33,7 +36,9 @@
  *	the relative size of the space provided, not the whole disk/flash.
  *
  *	On success, the number of bytes read/write should be returned.
- *	On error, negative number should be returned.
+ *	On error, negative number should be returned. The following returning
+ *	number means more:
+ *	  -EBUSY: pstore/blk should try again later.
  * @panic_write:
  *	The write operation only used for panic. It's optional if you do not
  *	care panic record. If panic occur but blkzone do not recover yet, the
@@ -54,6 +59,7 @@ struct blkz_info {
 	unsigned long total_size;
 	unsigned long dmesg_size;
 	unsigned long pmsg_size;
+	unsigned long console_size;
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (3 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:19   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

Support recorder for ftrace. To enable ftrace recorder, just make
ftrace_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/Kconfig          | 12 ++++++++
 fs/pstore/blkoops.c        | 11 +++++++
 fs/pstore/blkzone.c        | 75 ++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/pstore_blk.h |  4 +++
 4 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 5f0a42823028..308a0a4c5ee5 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -210,6 +210,18 @@ config PSTORE_BLKOOPS_CONSOLE_SIZE
 	  NOTE that, both kconfig and module parameters can configure blkoops,
 	  but module parameters have priority over kconfig.
 
+config PSTORE_BLKOOPS_FTRACE_SIZE
+	int "ftrace size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	depends on PSTORE_FTRACE
+	default 64
+	help
+	  This just sets size of ftrace (ftrace_size) for pstore/blk. The
+	  size is in KB and must be a multiple of 4.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
 config PSTORE_BLKOOPS_BLKDEV
 	string "block device for blkoops"
 	depends on PSTORE_BLKOOPS
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 05990bc3b168..c76bab671b0b 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -24,6 +24,10 @@
 module_param(console_size, long, 0400);
 MODULE_PARM_DESC(console_size, "console size in kbytes");
 
+static long ftrace_size = -1;
+module_param(ftrace_size, long, 0400);
+MODULE_PARM_DESC(ftrace_size, "ftrace size in kbytes");
+
 static int dump_oops = -1;
 module_param(dump_oops, int, 0400);
 MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -80,6 +84,12 @@
 #define DEFAULT_CONSOLE_SIZE 0
 #endif
 
+#ifdef CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
+#define DEFAULT_FTRACE_SIZE CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
+#else
+#define DEFAULT_FTRACE_SIZE 0
+#endif
+
 #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #else
@@ -135,6 +145,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
 	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
 	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
+	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 9a7e9b06ccf7..442e5a5bbfda 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -89,10 +89,13 @@ struct blkz_context {
 	struct blkz_zone **dbzs;	/* dmesg block zones */
 	struct blkz_zone *pbz;		/* Pmsg block zone */
 	struct blkz_zone *cbz;		/* console block zone */
+	struct blkz_zone **fbzs;	/* Ftrace zones */
 	unsigned int dmesg_max_cnt;
 	unsigned int dmesg_read_cnt;
 	unsigned int pmsg_read_cnt;
 	unsigned int console_read_cnt;
+	unsigned int ftrace_max_cnt;
+	unsigned int ftrace_read_cnt;
 	unsigned int dmesg_write_cnt;
 	/*
 	 * the counter should be recovered when recover.
@@ -281,6 +284,7 @@ static void blkz_flush_all_dirty_zones(struct work_struct *work)
 	blkz_flush_dirty_zone(cxt->pbz);
 	blkz_flush_dirty_zone(cxt->cbz);
 	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
+	blkz_flush_dirty_zones(cxt->fbzs, cxt->ftrace_max_cnt);
 }
 
 static int blkz_recover_dmesg_data(struct blkz_context *cxt)
@@ -497,6 +501,31 @@ static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
 	return ret;
 }
 
+static int blkz_recover_zones(struct blkz_context *cxt,
+		struct blkz_zone **zones, unsigned int cnt)
+{
+	int ret;
+	unsigned int i;
+	struct blkz_zone *zone;
+
+	if (!zones)
+		return 0;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (unlikely(!zone))
+			continue;
+		ret = blkz_recover_zone(cxt, zone);
+		if (ret)
+			goto recover_fail;
+	}
+
+	return 0;
+recover_fail:
+	pr_debug("recover %s[%u] failed\n", zone->name, i);
+	return ret;
+}
+
 static inline int blkz_recovery(struct blkz_context *cxt)
 {
 	int ret = -EBUSY;
@@ -516,6 +545,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = blkz_recover_zones(cxt, cxt->fbzs, cxt->ftrace_max_cnt);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -532,6 +565,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
 	cxt->dmesg_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
 	cxt->console_read_cnt = 0;
+	cxt->ftrace_read_cnt = 0;
 	return 0;
 }
 
@@ -589,6 +623,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
 		return blkz_record_erase(cxt, cxt->pbz);
 	case PSTORE_TYPE_CONSOLE:
 		return blkz_record_erase(cxt, cxt->cbz);
+	case PSTORE_TYPE_FTRACE:
+		return blkz_record_erase(cxt, cxt->fbzs[record->id]);
 	default: return -EINVAL;
 	}
 }
@@ -743,6 +779,13 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 		return blkz_record_write(cxt, cxt->cbz, record);
 	case PSTORE_TYPE_PMSG:
 		return blkz_record_write(cxt, cxt->pbz, record);
+	case PSTORE_TYPE_FTRACE: {
+		int zonenum = smp_processor_id();
+
+		if (!cxt->fbzs)
+			return -ENOSPC;
+		return blkz_record_write(cxt, cxt->fbzs[zonenum], record);
+	}
 	default:
 		return -EINVAL;
 	}
@@ -759,6 +802,12 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 			return zone;
 	}
 
+	while (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt) {
+		zone = cxt->fbzs[cxt->ftrace_read_cnt++];
+		if (blkz_old_ok(zone))
+			return zone;
+	}
+
 	if (cxt->pmsg_read_cnt == 0) {
 		cxt->pmsg_read_cnt++;
 		zone = cxt->pbz;
@@ -881,6 +930,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 		readop = blkz_dmesg_read;
 		record->id = cxt->dmesg_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_FTRACE:
+		record->id = cxt->ftrace_read_cnt - 1;
+		/* fallthrough */
 	case PSTORE_TYPE_CONSOLE:
 		/* fallthrough */
 	case PSTORE_TYPE_PMSG:
@@ -1046,15 +1098,27 @@ static int blkz_cut_zones(struct blkz_context *cxt)
 		goto free_pmsg;
 	}
 
+	off_size += info->ftrace_size;
+	cxt->fbzs = blkz_init_zones(PSTORE_TYPE_FTRACE, &off,
+			info->ftrace_size,
+			info->ftrace_size / nr_cpu_ids,
+			&cxt->ftrace_max_cnt);
+	if (IS_ERR(cxt->fbzs)) {
+		err = PTR_ERR(cxt->fbzs);
+		goto free_console;
+	}
+
 	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->dmesg_size, &cxt->dmesg_max_cnt);
 	if (IS_ERR(cxt->dbzs)) {
 		err = PTR_ERR(cxt->dbzs);
-		goto free_console;
+		goto free_ftrace;
 	}
 
 	return 0;
+free_ftrace:
+	blkz_free_zones(&cxt->fbzs, &cxt->ftrace_max_cnt);
 free_console:
 	blkz_free_zone(&cxt->cbz);
 free_pmsg:
@@ -1103,6 +1167,7 @@ int blkz_register(struct blkz_info *info)
 	check_size(dmesg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
 	check_size(console_size, SECTOR_SIZE);
+	check_size(ftrace_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1136,6 +1201,7 @@ int blkz_register(struct blkz_info *info)
 	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
+	pr_debug("\tftrace size : %ld Bytes\n", info->ftrace_size);
 
 	err = blkz_cut_zones(cxt);
 	if (err) {
@@ -1159,13 +1225,16 @@ int blkz_register(struct blkz_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
 	if (info->console_size)
 		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
+	if (info->ftrace_size)
+		cxt->pstore.flags |= PSTORE_FLAGS_FTRACE;
 
-	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
+	pr_info("Registered %s as blkzone backend for %s%s%s%s%s\n",
 			info->name,
 			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
 			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
 			cxt->pbz ? "Pmsg " : "",
-			cxt->cbz ? "Console" : "");
+			cxt->cbz ? "Console " : "",
+			cxt->fbzs ? "Ftrace" : "");
 
 	err = pstore_register(&cxt->pstore);
 	if (err) {
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 546375e04419..77704c1b404a 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -25,6 +25,9 @@
  * @console_size:
  *	The size of zone for console. Zero means disabled, othewise, it must
  *	be multiple of SECTOR_SIZE(512).
+ * @ftrace_size:
+ *	The size of zone for ftrace. Zero means disabled, othewise, it must
+ *	be multiple of SECTOR_SIZE(512).
  * @dump_oops:
  *	Dump oops and panic log or only panic.
  * @read, @write:
@@ -60,6 +63,7 @@ struct blkz_info {
 	unsigned long dmesg_size;
 	unsigned long pmsg_size;
 	unsigned long console_size;
+	unsigned long ftrace_size;
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (4 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:31   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

The document, at Documentation/admin-guide/pstore-block.rst, tells us
how to use pstore/blk and blkoops.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst | 281 +++++++++++++++++++++++++++++
 MAINTAINERS                                |   1 +
 fs/pstore/Kconfig                          |   2 +
 3 files changed, 284 insertions(+)
 create mode 100644 Documentation/admin-guide/pstore-block.rst

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
new file mode 100644
index 000000000000..c8a5f68960c3
--- /dev/null
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -0,0 +1,281 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Pstore block oops/panic logger
+==============================
+
+Introduction
+------------
+
+Pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
+block device before the system crashes. It also supports non-block devices such
+as mtd device.
+
+There is a trapper named blkoops for pstore/blk, which makes pstore/blk be
+nicer to device drivers.
+
+Pstore block concepts
+---------------------
+
+Pstore/blk works as a zone manager as it cuts the block device or partition
+into several zones and stores data for different recorders. What device drivers
+should do is to provide read/write APIs.
+
+Pstore/blk begins at function ``blkz_register``. Besides, blkoops, a wrapper of
+pstore/blk, begins at function ``blkoops_register_blkdev`` for block device and
+``blkoops_register_device`` for non-block device, which is recommended instead
+of directly using pstore/blk.
+
+Blkoops provides efficient configuration method for pstore/blk, which divides
+all configurations of pstore/blk into two parts, configurations for user and
+configurations for driver.
+
+Configurations for user determine how pstore/blk works, such as pmsg_size,
+dmesg_size and so on. All of them support both kconfig and module parameters,
+but module parameters have priority over kconfig.
+
+Configurations for driver are all about block/non-block device, such as
+total_size of device and read/write operations. Device driver transfers a
+structure ``blkoops_device`` defined in *linux/blkoops.h*.
+
+All of the following are for blkoops.
+
+Configurations for user
+-----------------------
+
+All of these configurations support both kconfig and module parameters, but
+module parameters have priority over kconfig.
+Here is an example for module parameters::
+
+        blkoops.blkdev=179:7 blkoops.dmesg_size=64 blkoops.dump_oops=1
+
+The detail of each configurations may be of interest to you.
+
+blkdev
+~~~~~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's fine to ignore it if you are not using a block device.
+
+It accepts the following variants:
+
+1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
+   leading 0x, for example b302.
+#. /dev/<disk_name> represents the device number of disk
+#. /dev/<disk_name><decimal> represents the device number of partition - device
+   number of disk plus the partition number
+#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
+   name of partitioned disk ends with a digit.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
+   a partition if the partition table provides it. The UUID may be either an
+   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
+   where SSSSSSSS is a zero-filled hex representation of the 32-bit
+   "NT disk signature", and PP is a zero-filled hex representation of the
+   1-based partition number.
+#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
+   partition with a known unique id.
+#. <major>:<minor> major and minor number of the device separated by a colon.
+
+dmesg_size
+~~~~~~~~~~
+
+The chunk size in KB for dmesg(oops/panic). It **MUST** be a multiple of 4.
+If you don't need it, safely set it to 0 or ignore it.
+
+NOTE that, the remaining space, except ``pmsg_size``, ``console_size``` and
+others, belongs to dmesg. It means that there are multiple chunks for dmesg.
+
+Pstore/blk will log to dmesg chunks one by one, and always overwrite the oldest
+chunk if there is no more free chunks.
+
+pmsg_size
+~~~~~~~~~
+
+The chunk size in KB for pmsg. It **MUST** be a multiple of 4. If you do not
+need it, safely set it to 0 or ignore it.
+
+There is only one chunk for pmsg.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+/sys/fs/pstore/pmsg-pstore-blk-0.
+
+console_size
+~~~~~~~~~~~~
+
+The chunk size in KB for console. It **MUST** be a multiple of 4. If you
+do not need it, safely set it to 0 or ignore it.
+
+There is only one chunk for console.
+
+All log of console will be appended to the chunk. On reboot the contents are
+available in /sys/fs/pstore/console-pstore-blk-0.
+
+ftrace_size
+~~~~~~~~~~~
+
+The chunk size in KB for ftrace. It **MUST** be a multiple of 4. If you
+do not need it, safely set it to 0 or ignore it.
+
+There may be several chunks for ftrace, according to how many processors on
+your CPU. Each chunk size is equal to (ftrace_size / processors_count).
+
+All log of ftrace will be appended to the chunk. On reboot the contents are
+available in /sys/fs/pstore/ftrace-pstore-blk-[N], where N is the processor
+number.
+
+Persistent function tracing might be useful for debugging software or hardware
+related hangs. Here is an example of usage::
+
+ # mount -t pstore pstore /sys/fs/pstore
+ # mount -t debugfs debugfs /sys/kernel/debug/
+ # echo 1 > /sys/kernel/debug/pstore/record_ftrace
+ # reboot -f
+ [...]
+ # mount -t pstore pstore /sys/fs/pstore
+ # tail /sys/fs/pstore/ftrace-pstore-blk-0
+ CPU:0 ts:109860 c03a4310  c0063ebc  cpuidle_select <- cpu_startup_entry+0x1a8/0x1e0
+ CPU:0 ts:109861 c03a5878  c03a4324  menu_select <- cpuidle_select+0x24/0x2c
+ CPU:0 ts:109862 c00670e8  c03a589c  pm_qos_request <- menu_select+0x38/0x4cc
+ CPU:0 ts:109863 c0092bbc  c03a5960  tick_nohz_get_sleep_length <- menu_select+0xfc/0x4cc
+ CPU:0 ts:109865 c004b2f4  c03a59d4  get_iowait_load <- menu_select+0x170/0x4cc
+ CPU:0 ts:109868 c0063b60  c0063ecc  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
+ CPU:0 ts:109869 c03a433c  c0063b94  cpuidle_enter <- call_cpuidle+0x44/0x48
+ CPU:0 ts:109871 c03a4000  c03a4350  cpuidle_enter_state <- cpuidle_enter+0x24/0x28
+ CPU:0 ts:109873 c0063ba8  c03a4090  sched_idle_set_state <- cpuidle_enter_state+0xa4/0x314
+ CPU:0 ts:109874 c03a605c  c03a40b4  arm_enter_idle_state <- cpuidle_enter_state+0xc8/0x314
+
+dump_oops
+~~~~~~~~~
+
+Dumping both oopses and panics can be done by setting 1 (not zero) in the
+``dump_oops`` member while setting 0 in that variable dumps only the panics.
+
+Configurations for driver
+-------------------------
+
+Only a device driver cares about these configurations. A block device driver
+uses ``blkoops_register_blkdev`` while a non-block device driver uses
+``blkoops_register_device``
+
+The parameters of these two APIs may be of interest to you.
+
+major
+~~~~~
+
+It is only required by block device which is registered by
+``blkoops_register_blkdev``.  It's the major device number of registered
+devices, by which blkoops can get the matching driver for @blkdev.
+
+total_size
+~~~~~~~~~~
+
+It is only required by non-block device which is registered by
+``blkoops_register_device``.  It tells pstore/blk the total size
+pstore/blk can use. It is in KB and **MUST** be greater than or equal to 4
+and a multiple of 4.
+
+For block devices, blkoops can get size of block device/partition automatically.
+
+read/write
+~~~~~~~~~~
+
+It's generic read/write APIs for pstore/blk, which are required by non-block
+device. The generic APIs are used for almost all data except panic data,
+such as pmsg, console, oops and ftrace.
+
+The parameter @offset of these interface is the relative position of the device.
+
+Normally the number of bytes read/written should be returned, while for error,
+negative number will be returned. The following return numbers mean more:
+
+-EBUSY: pstore/blk should try again later.
+
+panic_write (for non-block device)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It's a interface for panic recorder and will be used only when panic occurs.
+Non-block device driver registers it by ``blkoops_register_device``. If panic
+log is unnecessary, it's fine to ignore it.
+
+Note that pstore/blk will recover data from device while mounting pstore
+filesystem by default. If panic occurs but pstore/blk does not recover yet, the
+first zone of dmesg will be used.
+
+The parameter @offset of this interface is the relative position of the device.
+
+Normally the number of bytes written should be returned, while for error,
+negative number should be returned.
+
+panic_write (for block device)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It's much similar to panic_write for non-block device, but the position and
+data size of panic_write for block device must be aligned to SECTOR_SIZE,
+that's why the parameters are @sects and @start_sect. Block device driver
+should register it by ``blkoops_register_blkdev``.
+
+The parameter @start_sect is the relative position of the block device and
+partition. If block driver requires absolute position for panic_write,
+``blkoops_blkdev_info`` will be helpful, which can provide the absolute
+position of the block device (or partition) on the whole disk/flash.
+
+Normally zero should be returned, otherwise it indicates an error.
+
+Compression and header
+----------------------
+
+Block device is large enough for uncompressed dmesg data. Actually we do not
+recommend data compression because pstore/blk will insert some information into
+the first line of dmesg data. For example::
+
+        Panic: Total 16 times
+
+It means that it's OOPS|Panic for the 16th time since the first booting.
+Sometimes the number of occurrences of oops|panic since the first booting is
+important to judge whether the system is stable.
+
+The following line is inserted by pstore filesystem. For example::
+
+        Oops#2 Part1
+
+It means that it's OOPS for the 2nd time on the last boot.
+
+Reading the data
+----------------
+
+The dump data can be read from the pstore filesystem. The format for these
+files is ``dmesg-pstore-blk-[N]`` for dmesg(oops|panic), ``pmsg-pstore-blk-0``
+for pmsg and so on, where N is the record number. To delete a stored
+record from block device, simply unlink the respective pstore file. The
+timestamp of the dump file records the trigger time.
+
+Attentions in panic read/write APIs
+-----------------------------------
+
+If on panic, the kernel is not going to run for much longer, the tasks will not
+be scheduled and most kernel resources will be out of service. It
+looks like a single-threaded program running on a single-core computer.
+
+The following points require special attention for panic read/write APIs:
+
+1. Can **NOT** allocate any memory.
+   If you need memory, just allocate while the block driver is initializing
+   rather than waiting until the panic.
+#. Must be polled, **NOT** interrupt driven.
+   No task schedule any more. The block driver should delay to ensure the write
+   succeeds, but NOT sleep.
+#. Can **NOT** take any lock.
+   There is no other task, nor any shared resource; you are safe to break all
+   locks.
+#. Just use CPU to transfer.
+   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
+#. Control registers directly.
+   Please control registers directly rather than use Linux kernel resources.
+   Do I/O map while initializing rather than wait until a panic occurs.
+#. Reset your block device and controller if necessary.
+   If you are not sure of the state of your block device and controller when
+   a panic occurs, you are safe to stop and reset them.
+
+Blkoops supports blkoops_blkdev_info(), which is defined in *linux/blkoops.h*,
+to get information of block device, such as the device number, sector count and
+start sector of the whole disk.
diff --git a/MAINTAINERS b/MAINTAINERS
index e4ba97130560..a5122e3aaf76 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13380,6 +13380,7 @@ F:	include/linux/pstore*
 F:	drivers/firmware/efi/efi-pstore.c
 F:	drivers/acpi/apei/erst.c
 F:	Documentation/admin-guide/ramoops.rst
+F:	Documentation/admin-guide/pstore-block.rst
 F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
 K:	\b(pstore|ramoops|blkoops)
 
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 308a0a4c5ee5..466908a242aa 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -162,6 +162,8 @@ config PSTORE_BLK
 	  This enables panic and oops message to be logged to a block dev
 	  where it can be read back at some later point.
 
+	  For more information, see Documentation/admin-guide/pstore-block.rst.
+
 	  If unsure, say N.
 
 config PSTORE_BLKOOPS
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 07/11] pstore/blk: skip broken zone for mtd device
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (5 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:35   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

It's one of a series of patches for adaptive to MTD device.

MTD device is not block device. As the block of flash (MTD device) will
be broken, it's necessary for pstore/blk to skip the broken block
(bad block).

If device drivers return -ENEXT, pstore/blk will try next zone of dmesg.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst |  3 +-
 fs/pstore/blkzone.c                        | 74 +++++++++++++++++++++++-------
 include/linux/blkoops.h                    |  4 +-
 include/linux/pstore_blk.h                 |  4 ++
 4 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index c8a5f68960c3..be865dfc1a28 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -188,7 +188,8 @@ The parameter @offset of these interface is the relative position of the device.
 Normally the number of bytes read/written should be returned, while for error,
 negative number will be returned. The following return numbers mean more:
 
--EBUSY: pstore/blk should try again later.
+1. -EBUSY: pstore/blk should try again later.
+#. -ENEXT: this zone is used or broken, pstore/blk should try next one.
 
 panic_write (for non-block device)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 442e5a5bbfda..205aeff28992 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -207,6 +207,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
 
 	return 0;
 set_dirty:
+	/* no need to mark dirty if going to try next zone */
+	if (wcnt == -ENEXT)
+		return -ENEXT;
 	atomic_set(&zone->dirty, true);
 	/* flush dirty zones nicely */
 	if (wcnt == -EBUSY && !is_on_panic())
@@ -360,7 +363,11 @@ static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
 			return -EINVAL;
 
 		rcnt = info->read((char *)buf, len, zone->off);
-		if (rcnt != len) {
+		if (rcnt == -ENEXT) {
+			pr_debug("%s with id %lu may be broken, skip\n",
+					zone->name, i);
+			continue;
+		} else if (rcnt != len) {
 			pr_err("read %s with id %lu failed\n", zone->name, i);
 			return (int)rcnt < 0 ? (int)rcnt : -EIO;
 		}
@@ -650,24 +657,58 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
 		hdr->counter = 0;
 }
 
+/*
+ * In case zone is broken, which may occur to MTD device, we try each zones,
+ * start at cxt->dmesg_write_cnt.
+ */
 static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
 		struct pstore_record *record)
 {
+	int ret = -EBUSY;
 	size_t size, hlen;
 	struct blkz_zone *zone;
-	unsigned int zonenum;
+	unsigned int i;
 
-	zonenum = cxt->dmesg_write_cnt;
-	zone = cxt->dbzs[zonenum];
-	if (unlikely(!zone))
-		return -ENOSPC;
-	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
+	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
+		unsigned int zonenum, len;
+
+		zonenum = (cxt->dmesg_write_cnt + i) % cxt->dmesg_max_cnt;
+		zone = cxt->dbzs[zonenum];
+		if (unlikely(!zone))
+			return -ENOSPC;
 
-	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
-	blkz_write_kmsg_hdr(zone, record);
-	hlen = sizeof(struct blkz_dmesg_header);
-	size = min_t(size_t, record->size, zone->buffer_size - hlen);
-	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		/* avoid destorying old data, allocate a new one */
+		len = zone->buffer_size + sizeof(*zone->buffer);
+		zone->oldbuf = zone->buffer;
+		zone->buffer = kzalloc(len, GFP_KERNEL);
+		if (!zone->buffer) {
+			zone->buffer = zone->oldbuf;
+			return -ENOMEM;
+		}
+		zone->buffer->sig = zone->oldbuf->sig;
+
+		pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+		blkz_write_kmsg_hdr(zone, record);
+		hlen = sizeof(struct blkz_dmesg_header);
+		size = min_t(size_t, record->size, zone->buffer_size - hlen);
+		ret = blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		if (likely(!ret || ret != -ENEXT)) {
+			cxt->dmesg_write_cnt = zonenum + 1;
+			cxt->dmesg_write_cnt %= cxt->dmesg_max_cnt;
+			/* no need to try next zone, free last zone buffer */
+			kfree(zone->oldbuf);
+			zone->oldbuf = NULL;
+			return ret;
+		}
+
+		pr_debug("zone %u may be broken, try next dmesg zone\n",
+				zonenum);
+		kfree(zone->buffer);
+		zone->buffer = zone->oldbuf;
+		zone->oldbuf = NULL;
+	}
+
+	return -EBUSY;
 }
 
 static int notrace blkz_dmesg_write(struct blkz_context *cxt,
@@ -791,7 +832,6 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 	}
 }
 
-#define READ_NEXT_ZONE ((ssize_t)(-1024))
 static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 {
 	struct blkz_zone *zone = NULL;
@@ -852,7 +892,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	if (blkz_read_dmesg_hdr(zone, record)) {
 		atomic_set(&zone->buffer->datalen, 0);
 		atomic_set(&zone->dirty, 0);
-		return READ_NEXT_ZONE;
+		return -ENEXT;
 	}
 	size -= sizeof(struct blkz_dmesg_header);
 
@@ -877,7 +917,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
 				sizeof(struct blkz_dmesg_header)) < 0)) {
 		kfree(record->buf);
-		return READ_NEXT_ZONE;
+		return -ENEXT;
 	}
 
 	return size + hlen;
@@ -891,7 +931,7 @@ static ssize_t blkz_record_read(struct blkz_zone *zone,
 
 	buf = (struct blkz_buffer *)zone->oldbuf;
 	if (!buf)
-		return READ_NEXT_ZONE;
+		return -ENEXT;
 
 	size = atomic_read(&buf->datalen);
 	start = atomic_read(&buf->start);
@@ -943,7 +983,7 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 	}
 
 	ret = readop(zone, record);
-	if (ret == READ_NEXT_ZONE)
+	if (ret == -ENEXT)
 		goto next_zone;
 	return ret;
 }
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index 8f40f225545d..71c596fd4cc8 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -27,6 +27,7 @@
  *	On error, negative number should be returned. The following returning
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
+ *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
  * @panic_write:
  *	The write operation only used for panic.
  *
@@ -45,7 +46,8 @@ struct blkoops_device {
 
 /*
  * Panic write for block device who should write alignmemt to SECTOR_SIZE.
- * On success, zero should be returned. Others mean error.
+ * On success, zero should be returned. Others mean error except that -ENEXT
+ * means the zone is used or broken, pstore/blk should try next one.
  */
 typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
 		sector_t sects);
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 77704c1b404a..bbbe4fe37f7c 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -6,6 +6,9 @@
 #include <linux/types.h>
 #include <linux/blkdev.h>
 
+/* read/write function return -ENEXT means try next zone */
+#define ENEXT ((ssize_t)(1024))
+
 /**
  * struct blkz_info - backend blkzone driver structure
  *
@@ -42,6 +45,7 @@
  *	On error, negative number should be returned. The following returning
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
+ *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
  * @panic_write:
  *	The write operation only used for panic. It's optional if you do not
  *	care panic record. If panic occur but blkzone do not recover yet, the
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 08/11] blkoops: respect for device to pick recorders
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (6 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:42   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

It's one of a series of patches for adaptive to MTD device.

MTD device is not block device. The sector of flash (MTD device) will be
broken if erase over limited cycles. Avoid damaging block so fast, we
can not write to a sector frequently. So, the recorders of pstore/blk
like console and ftrace recorder should not be supported.

Besides, mtd device need aligned write/erase size. To avoid
over-erasing/writing flash, we should keep a aligned cache and read old
data to cache before write/erase, which make codes more complex. So,
pmsg do not be supported now because it writes misaligned.

How about dmesg? Luckly, pstore/blk keeps several aligned chunks for
dmesg and uses one by one for wear balance.

So, MTD device for pstore should pick recorders, that is why the patch
here.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst |  9 +++++++++
 fs/pstore/blkoops.c                        | 29 +++++++++++++++++++++--------
 include/linux/blkoops.h                    | 14 +++++++++++++-
 3 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index be865dfc1a28..299142b3d8e6 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -166,6 +166,15 @@ It is only required by block device which is registered by
 ``blkoops_register_blkdev``.  It's the major device number of registered
 devices, by which blkoops can get the matching driver for @blkdev.
 
+flags
+~~~~~
+
+Refer to macro starting with *BLKOOPS_DEV_SUPPORT_* which is defined in
+*linux/blkoops.h*. They tell us that which pstore/blk recorders this device
+supports. Default zero means all recorders for compatible, witch is the same
+as BLKOOPS_DEV_SUPPORT_ALL. Recorder works only when chunk size is not zero
+and device support.
+
 total_size
 ~~~~~~~~~~
 
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index c76bab671b0b..01170b344f00 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -128,9 +128,16 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 		return -ENOMEM;
 	}
 
-#define verify_size(name, defsize, alignsize) {				\
-		long _##name_ = (name);					\
-		if (_##name_ < 0)					\
+	/* zero means all recorders for compatible */
+	if (bo_dev->flags == BLKOOPS_DEV_SUPPORT_DEFAULT)
+		bo_dev->flags = BLKOOPS_DEV_SUPPORT_ALL;
+#define verify_size(name, defsize, alignsize, enable) {			\
+		long _##name_;						\
+		if (!(enable))						\
+			_##name_ = 0;					\
+		else if ((name) >= 0)					\
+			_##name_ = (name);				\
+		else							\
 			_##name_ = (defsize);				\
 		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
 		if (_##name_ & ((alignsize) - 1)) {			\
@@ -142,10 +149,14 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 		bzinfo->name = _##name_;				\
 	}
 
-	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
-	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
-	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
-	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
+	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_DMESG);
+	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_PMSG);
+	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_CONSOLE);
+	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_FTRACE);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
@@ -336,6 +347,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
  * register block device to blkoops
  * @major: the major device number of registering device
  * @panic_write: the write interface for panic case.
+ * @flags: Refer to macro starting with BLKOOPS_DEV_SUPPORT.
  *
  * It is ONLY used for block device to register to blkoops. In this case,
  * the module parameter @blkdev must be valid. Generic read/write interfaces
@@ -349,7 +361,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
  * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
  * will be used.
  */
-int blkoops_register_blkdev(unsigned int major,
+int blkoops_register_blkdev(unsigned int major, unsigned int flags,
 		blkoops_blk_panic_write_op panic_write)
 {
 	struct block_device *bdev;
@@ -372,6 +384,7 @@ int blkoops_register_blkdev(unsigned int major,
 	if (bo_dev.total_size == 0)
 		goto err_put_bdev;
 	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
+	bo_dev.flags = flags;
 	bo_dev.read = blkoops_generic_blk_read;
 	bo_dev.write = blkoops_generic_blk_write;
 
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index 71c596fd4cc8..bc7665d14a98 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 #include <linux/blkdev.h>
 #include <linux/pstore_blk.h>
+#include <linux/bitops.h>
 
 /**
  * struct blkoops_device - backend blkoops driver structure.
@@ -14,6 +15,10 @@
  * blkoops_register_device(). If block device, you are strongly recommended
  * to use blkoops_register_blkdev().
  *
+ * @flags:
+ *	Refer to macro starting with BLKOOPS_DEV_SUPPORT_. These macros tell
+ *	us that which pstore/blk recorders this device supports. Zero means
+ *	all recorders for compatible.
  * @total_size:
  *	The total size in bytes pstore/blk can use. It must be greater than
  *	4096 and be multiple of 4096.
@@ -38,6 +43,13 @@
  *	On error, negative number should be returned.
  */
 struct blkoops_device {
+	unsigned int flags;
+#define BLKOOPS_DEV_SUPPORT_ALL		UINT_MAX
+#define BLKOOPS_DEV_SUPPORT_DEFAULT	(0)
+#define BLKOOPS_DEV_SUPPORT_DMESG	BIT(0)
+#define BLKOOPS_DEV_SUPPORT_PMSG	BIT(1)
+#define BLKOOPS_DEV_SUPPORT_CONSOLE	BIT(2)
+#define BLKOOPS_DEV_SUPPORT_FTRACE	BIT(3)
 	unsigned long total_size;
 	blkz_read_op read;
 	blkz_write_op write;
@@ -54,7 +66,7 @@ typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
 
 int  blkoops_register_device(struct blkoops_device *bo_dev);
 void blkoops_unregister_device(struct blkoops_device *bo_dev);
-int  blkoops_register_blkdev(unsigned int major,
+int  blkoops_register_blkdev(unsigned int major, unsigned int flags,
 		blkoops_blk_panic_write_op panic_write);
 void blkoops_unregister_blkdev(unsigned int major);
 int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 09/11] pstore/blk: blkoops: support special removing jobs for dmesg.
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (7 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-03-18 18:47   ` Kees Cook
  2020-02-07 12:25 ` [PATCH v2 10/11] blkoops: add interface for dirver to get information of blkoops WeiXiong Liao
  2020-02-07 12:25 ` [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
  10 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

It's one of a series of patches for adaptive to MTD device.

MTD device is not block device. To write to flash device on MTD, erase
must to be done before. However, pstore/blk just set datalen as 0 when
remove, which is not enough for mtd device. That's why this patch here,
to support special jobs when removing pstore/blk record.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst |  9 +++++++++
 fs/pstore/blkoops.c                        |  4 +++-
 fs/pstore/blkzone.c                        |  9 ++++++++-
 include/linux/blkoops.h                    | 10 ++++++++++
 include/linux/pstore_blk.h                 | 11 +++++++++++
 5 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index 299142b3d8e6..1735476621df 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -200,6 +200,15 @@ negative number will be returned. The following return numbers mean more:
 1. -EBUSY: pstore/blk should try again later.
 #. -ENEXT: this zone is used or broken, pstore/blk should try next one.
 
+erase
+~~~~~
+
+It's generic erase API for pstore/blk, which is requested by non-block device.
+It will be called while pstore record is removing. It's required only when the
+device has special removing jobs. For example, MTD device tries to erase block.
+
+Normally zero should be returned, otherwise it indicates an error.
+
 panic_write (for non-block device)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 01170b344f00..7cf4731e52f7 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -164,6 +164,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 	bzinfo->dump_oops = dump_oops;
 	bzinfo->read = bo_dev->read;
 	bzinfo->write = bo_dev->write;
+	bzinfo->erase = bo_dev->erase;
 	bzinfo->panic_write = bo_dev->panic_write;
 	bzinfo->name = "blkoops";
 	bzinfo->owner = THIS_MODULE;
@@ -383,10 +384,11 @@ int blkoops_register_blkdev(unsigned int major, unsigned int flags,
 	bo_dev.total_size = blkoops_bdev_size(bdev);
 	if (bo_dev.total_size == 0)
 		goto err_put_bdev;
-	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
 	bo_dev.flags = flags;
 	bo_dev.read = blkoops_generic_blk_read;
 	bo_dev.write = blkoops_generic_blk_write;
+	bo_dev.erase = NULL;
+	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
 
 	ret = blkoops_register_device(&bo_dev);
 	if (ret)
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 205aeff28992..a17fff77b875 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -593,11 +593,18 @@ static inline bool blkz_ok(struct blkz_zone *zone)
 static inline int blkz_dmesg_erase(struct blkz_context *cxt,
 		struct blkz_zone *zone)
 {
+	size_t size;
+
 	if (unlikely(!blkz_ok(zone)))
 		return 0;
 
 	atomic_set(&zone->buffer->datalen, 0);
-	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+
+	size = buffer_datalen(zone) + sizeof(*zone->buffer);
+	if (cxt->bzinfo->erase)
+		return cxt->bzinfo->erase(size, zone->off);
+	else
+		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
 static inline int blkz_record_erase(struct blkz_context *cxt,
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index bc7665d14a98..11cb3036ad5f 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -33,6 +33,15 @@
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
  *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
+ * @erase:
+ *	The general (not panic) erase operation. It will be call while pstore
+ *	record is removing. It's required only when device have special
+ *	removing jobs, for example, MTD device try to erase block.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, 0 should be returned. Others mean error.
  * @panic_write:
  *	The write operation only used for panic.
  *
@@ -53,6 +62,7 @@ struct blkoops_device {
 	unsigned long total_size;
 	blkz_read_op read;
 	blkz_write_op write;
+	blkz_erase_op erase;
 	blkz_write_op panic_write;
 };
 
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index bbbe4fe37f7c..9641969f888f 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -46,6 +46,15 @@
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
  *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
+ * @erase:
+ *	The general (not panic) erase operation. It will be call while pstore
+ *	record is removing. It's required only when device have special
+ *	removing jobs, for example, MTD device try to erase block.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, 0 should be returned. Others mean error.
  * @panic_write:
  *	The write operation only used for panic. It's optional if you do not
  *	care panic record. If panic occur but blkzone do not recover yet, the
@@ -59,6 +68,7 @@
  */
 typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
 typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
+typedef ssize_t (*blkz_erase_op)(size_t, loff_t);
 struct blkz_info {
 	struct module *owner;
 	const char *name;
@@ -71,6 +81,7 @@ struct blkz_info {
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
+	blkz_erase_op erase;
 	blkz_write_op panic_write;
 };
 
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 10/11] blkoops: add interface for dirver to get information of blkoops
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (8 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-02-07 12:25 ` [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
  10 siblings, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

It's one of a series of patches for adaptive to MTD device.

MTD device need to check size of recorder and get mtddev index to verify
which mtd device to use. All it needs is defined in blkoops. So, there
should be a interface for MTD driver to get all information it need.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/blkoops.c     | 47 ++++++++++++++++++++++++++++++++++++-----------
 include/linux/blkoops.h | 10 ++++++++++
 2 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 7cf4731e52f7..4fc6ac4c69c5 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -102,6 +102,20 @@
 #define DEFAULT_BLKDEV ""
 #endif
 
+#define check_size(name, defsize, alignsize) ({			\
+	long _##name_ = (name);					\
+	if ((name) < 0)						\
+		_##name_ = (defsize);				\
+	_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+	if (_##name_ & ((alignsize) - 1)) {			\
+		pr_info(#name " must align to %d\n",		\
+				(alignsize));			\
+		_##name_ = ALIGN(name, (alignsize));		\
+	}							\
+	_##name_;						\
+})
+
+
 /**
  * register device to blkoops
  *
@@ -133,18 +147,10 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 		bo_dev->flags = BLKOOPS_DEV_SUPPORT_ALL;
 #define verify_size(name, defsize, alignsize, enable) {			\
 		long _##name_;						\
-		if (!(enable))						\
-			_##name_ = 0;					\
-		else if ((name) >= 0)					\
-			_##name_ = (name);				\
+		if (enable)						\
+			_##name_ = check_size(name, defsize, alignsize);\
 		else							\
-			_##name_ = (defsize);				\
-		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
-		if (_##name_ & ((alignsize) - 1)) {			\
-			pr_info(#name " must align to %d\n",		\
-					(alignsize));			\
-			_##name_ = ALIGN(name, (alignsize));		\
-		}							\
+			_##name_ = 0;					\
 		name = _##name_ / 1024;					\
 		bzinfo->name = _##name_;				\
 	}
@@ -445,6 +451,25 @@ int blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
 }
 EXPORT_SYMBOL_GPL(blkoops_blkdev_info);
 
+/* get information of blkoops */
+int  blkoops_info(struct blkoops_info *info)
+{
+	if (!blkdev[0] && strlen(DEFAULT_BLKDEV))
+		snprintf(blkdev, 80, "%s", DEFAULT_BLKDEV);
+
+	memcpy(info->device, blkdev, 80);
+	info->dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
+
+	info->dmesg_size = check_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
+	info->pmsg_size = check_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
+	info->ftrace_size = check_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
+	info->console_size = check_size(console_size, DEFAULT_CONSOLE_SIZE,
+			4096);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkoops_info);
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
 MODULE_DESCRIPTION("Wrapper for Pstore BLK with Oops logger");
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index 11cb3036ad5f..ea56f3f92360 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -81,4 +81,14 @@ int  blkoops_register_blkdev(unsigned int major, unsigned int flags,
 void blkoops_unregister_blkdev(unsigned int major);
 int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
 
+struct blkoops_info {
+	int dump_oops;
+	char device[80];
+	unsigned long dmesg_size;
+	unsigned long pmsg_size;
+	unsigned long console_size;
+	unsigned long ftrace_size;
+};
+int  blkoops_info(struct blkoops_info *info);
+
 #endif
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
                   ` (9 preceding siblings ...)
  2020-02-07 12:25 ` [PATCH v2 10/11] blkoops: add interface for dirver to get information of blkoops WeiXiong Liao
@ 2020-02-07 12:25 ` WeiXiong Liao
  2020-02-18 10:34   ` Miquel Raynal
  2020-03-18 18:57   ` Kees Cook
  10 siblings, 2 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-02-07 12:25 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-mtd, linux-kernel, linux-doc

It's the last one of a series of patches for adaptive to MTD device.

The mtdpstore is similar to mtdoops but more powerful. It bases on
pstore/blk, aims to store panic and oops logs to a flash partition,
where it can be read back as files after mounting pstore filesystem.

The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
block device at the very beginning, but now, compatible to not only
block device. After this series of patches, pstore/blk can also work
for MTD device. To make it work, 'blkdev' on kconfig or module
parameter of blkoops should be set as mtd device name or mtd number.
See more about pstore/blk and blkoops on:
    Documentation/admin-guide/pstore-block.rst

Why do we need mtdpstore?
1. repetitive jobs between pstore and mtdoops
   Both of pstore and mtdoops do the same jobs that store panic/oops log.
   They have much similar logic that register to kmsg dumper and store
   log to several chunks one by one.
2. do what a driver should do
   To me, a driver should provide methods instead of policies. What MTD
   should do is to provide read/write/erase operations, geting rid of codes
   about chunk management, kmsg dumper and configuration.
3. enhanced feature
   Not only store log, but also show it as files.
   Not only log, but also trigger time and trigger count.
   Not only panic/oops log, but also log recorder for pmsg, console and
   ftrace in the future.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst |  10 +-
 drivers/mtd/Kconfig                        |  10 +
 drivers/mtd/Makefile                       |   1 +
 drivers/mtd/mtdpstore.c                    | 564 +++++++++++++++++++++++++++++
 4 files changed, 583 insertions(+), 2 deletions(-)
 create mode 100644 drivers/mtd/mtdpstore.c

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index 1735476621df..823fe2b4b84f 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -54,9 +54,10 @@ blkdev
 ~~~~~~
 
 The block device to use. Most of the time, it is a partition of block device.
-It's fine to ignore it if you are not using a block device.
+It is also used for MTD device. It's fine to ignore it if you are not using
+a block device or a MTD device.
 
-It accepts the following variants:
+It accepts the following variants for block device:
 
 1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
    leading 0x, for example b302.
@@ -75,6 +76,11 @@ It accepts the following variants:
    partition with a known unique id.
 #. <major>:<minor> major and minor number of the device separated by a colon.
 
+It accepts the following variants for MTD device:
+
+1. <device name> MTD device name. "pstore" is recommended.
+#. <device number> MTD device number.
+
 dmesg_size
 ~~~~~~~~~~
 
diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
index 42d401ea60ee..5d53d5cd2998 100644
--- a/drivers/mtd/Kconfig
+++ b/drivers/mtd/Kconfig
@@ -170,6 +170,16 @@ config MTD_OOPS
 	  buffer in a flash partition where it can be read back at some
 	  later point.
 
+config MTD_PSTORE
+	tristate "Log panic/oops to an MTD buffer based on pstore"
+	depends on PSTORE_BLKOOPS
+	help
+	  This enables panic and oops messages to be logged to a circular
+	  buffer in a flash partition where it can be read back as files after
+	  mounting pstore filesystem.
+
+	  If unsure, say N.
+
 config MTD_SWAP
 	tristate "Swap on MTD device support"
 	depends on MTD && SWAP
diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index 56cc60ccc477..593d0593a038 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
 obj-$(CONFIG_SSFDC)		+= ssfdc.o
 obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
 obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
+obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
 obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
 
 nftl-objs		:= nftlcore.o nftlmount.o
diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
new file mode 100644
index 000000000000..58b9e10ef675
--- /dev/null
+++ b/drivers/mtd/mtdpstore.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define dev_fmt(fmt) "mtdoops-pstore: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/blkoops.h>
+#include <linux/mtd/mtd.h>
+#include <linux/bitops.h>
+
+static struct mtdpstore_context {
+	int index;
+	struct blkoops_info bo_info;
+	struct blkoops_device bo_dev;
+	struct mtd_info *mtd;
+	unsigned long *rmmap;		/* removed bit map */
+	unsigned long *usedmap;		/* used bit map */
+	/*
+	 * used for panic write
+	 * As there are no block_isbad for panic case, we should keep this
+	 * status before panic to ensure panic_write not failed.
+	 */
+	unsigned long *badmap;		/* bad block bit map */
+} oops_cxt;
+
+static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret;
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	ret = mtd_block_isbad(mtd, off);
+	if (ret < 0) {
+		dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
+		return ret;
+	} else if (ret > 0) {
+		set_bit(blknum, cxt->badmap);
+		return true;
+	}
+	return false;
+}
+
+static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	return test_bit(blknum, cxt->badmap);
+}
+
+static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
+	set_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+	clear_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+		clear_bit(zonenum, cxt->usedmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u64 blknum = div_u64(off, cxt->mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	return test_bit(zonenum, cxt->usedmap);
+}
+
+static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->usedmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
+		size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	size_t sz;
+	int i;
+
+	sz = min_t(uint32_t, size, mtd->writesize / 4);
+	for (i = 0; i < sz; i++) {
+		if (buf[i] != (char)0xFF)
+			return false;
+	}
+	return true;
+}
+
+static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+
+	dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
+	set_bit(zonenum, cxt->rmmap);
+}
+
+static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		clear_bit(zonenum, cxt->rmmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->rmmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	struct erase_info erase;
+	int ret;
+
+	dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
+	erase.len = cxt->mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(cxt->mtd, &erase);
+	if (!ret)
+		mtdpstore_block_clear_removed(cxt, off);
+	else
+		dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
+		       (unsigned long long)erase.addr,
+		       (unsigned long long)erase.len, cxt->bo_info.device);
+	return ret;
+}
+
+/*
+ * called while removing file
+ *
+ * Avoiding over erasing, do erase block only when the whole block is unused.
+ * If the block contains valid log, do erase lazily on flush_removed() when
+ * unregister.
+ */
+static ssize_t mtdpstore_erase(size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -EIO;
+
+	mtdpstore_mark_unused(cxt, off);
+
+	/* If the block still has valid data, mtdpstore do erase lazily */
+	if (likely(mtdpstore_block_is_used(cxt, off))) {
+		mtdpstore_mark_removed(cxt, off);
+		return 0;
+	}
+
+	/* all zones are unused, erase it */
+	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
+	return mtdpstore_erase_do(cxt, off);
+}
+
+/*
+ * What is security for mtdpstore?
+ * As there is no erase for panic case, we should ensure at least one zone
+ * is writable. Otherwise, panic write will fail.
+ * If zone is used, write operation will return -ENEXT, which means that
+ * pstore/blk will try one by one until gets an empty zone. So, it is not
+ * needed to ensure the next zone is empty, but at least one.
+ */
+static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret = 0, i;
+	struct mtd_info *mtd = cxt->mtd;
+	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
+	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
+	u32 erasesize = cxt->mtd->erasesize;
+
+	for (i = 0; i < zonecnt; i++) {
+		u32 num = (zonenum + i) % zonecnt;
+
+		/* found empty zone */
+		if (!test_bit(num, cxt->usedmap))
+			return 0;
+	}
+
+	/* If there is no any empty zone, we have no way but to do erase */
+	off = ALIGN_DOWN(off, erasesize);
+	while (blkcnt--) {
+		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
+
+		if (mtdpstore_block_isbad(cxt, off))
+			continue;
+
+		ret = mtdpstore_erase_do(cxt, off);
+		if (!ret) {
+			mtdpstore_block_mark_unused(cxt, off);
+			break;
+		}
+	}
+
+	if (ret)
+		dev_err(&mtd->dev, "all blocks bad!\n");
+	dev_dbg(&mtd->dev, "end security\n");
+	return ret;
+}
+
+static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENEXT;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENEXT;
+
+	dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
+	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || retlen != size) {
+		dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+static inline bool mtdpstore_is_io_error(int ret)
+{
+	return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
+}
+
+/*
+ * All zones will be read as pstore/blk will read zone one by one when do
+ * recover.
+ */
+static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen, done;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENEXT;
+
+	dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
+	for (done = 0, retlen = 0; done < size; done += retlen) {
+		retlen = 0;
+
+		ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
+				(u_char *)buf + done);
+		if (mtdpstore_is_io_error(ret)) {
+			dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
+					off + done, retlen, size - done, ret);
+			/* the zone may be broken, try next one */
+			return -ENEXT;
+		}
+
+		/*
+		 * ECC error. The impact on log data is so small. Maybe we can
+		 * still read it and try to understand. So mtdpstore just hands
+		 * over what it gets and user can judge whether the data is
+		 * valid or not.
+		 */
+		if (mtd_is_eccerr(ret)) {
+			dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
+					off + done, retlen, size - done, ret);
+			/* driver may not set retlen when ecc error */
+			retlen = retlen == 0 ? size - done : retlen;
+		}
+	}
+
+	if (mtdpstore_is_empty(cxt, buf, size))
+		mtdpstore_mark_unused(cxt, off);
+	else
+		mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_panic_block_isbad(cxt, off))
+		return -ENEXT;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENEXT;
+
+	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || size != retlen) {
+		dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	return retlen;
+}
+
+static void mtdpstore_notify_add(struct mtd_info *mtd)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct blkoops_info *info = &cxt->bo_info;
+	unsigned long longcnt;
+
+	if (!strcmp(mtd->name, info->device))
+		cxt->index = mtd->index;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
+
+	if (mtd->size < info->dmesg_size * 2) {
+		dev_err(&mtd->dev, "MTD partition %d not big enough\n",
+				mtd->index);
+		return;
+	}
+	/*
+	 * dmesg_size must be aligned to 4096 Bytes, which is limited by
+	 * blkoops. The default value of dmesg_size is 64KB. If dmesg_size
+	 * is larger than erasesize, some errors will occur since mtdpsotre
+	 * is designed on it.
+	 */
+	if (mtd->erasesize < info->dmesg_size) {
+		dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
+				mtd->index);
+		return;
+	}
+	if (unlikely(info->dmesg_size % mtd->writesize)) {
+		dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
+				info->dmesg_size / 1024,
+				mtd->writesize / 1024);
+		return;
+	}
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
+	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
+	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	cxt->bo_dev.total_size = mtd->size;
+	/* just support dmesg right now */
+	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
+	cxt->bo_dev.read = mtdpstore_read;
+	cxt->bo_dev.write = mtdpstore_write;
+	cxt->bo_dev.erase = mtdpstore_erase;
+	cxt->bo_dev.panic_write = mtdpstore_panic_write;
+
+	ret = blkoops_register_device(&cxt->bo_dev);
+	if (ret) {
+		dev_err(&mtd->dev, "mtd%d register to blkoops failed\n",
+				mtd->index);
+		return;
+	}
+	cxt->mtd = mtd;
+	dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
+}
+
+static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
+		loff_t off, size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u_char *buf;
+	int ret;
+	size_t retlen;
+	struct erase_info erase;
+
+	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* 1st. read to cache */
+	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
+	if (mtdpstore_is_io_error(ret))
+		goto free;
+
+	/* 2nd. erase block */
+	erase.len = mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(mtd, &erase);
+	if (ret)
+		goto free;
+
+	/* 3rd. write back */
+	while (size) {
+		unsigned int zonesize = cxt->bo_info.dmesg_size;
+
+		/* there is valid data on block, write back */
+		if (mtdpstore_is_used(cxt, off)) {
+			ret = mtd_write(mtd, off, zonesize, &retlen, buf);
+			if (ret)
+				dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
+						off, retlen, zonesize, ret);
+		}
+
+		off += zonesize;
+		size -= min_t(unsigned int, zonesize, size);
+	}
+
+free:
+	kfree(buf);
+	return ret;
+}
+
+/*
+ * What does mtdpstore_flush_removed() do?
+ * When user remove any log file on pstore filesystem, mtdpstore should do
+ * something to ensure log file removed. If the whole block is no longer used,
+ * it's nice to erase the block. However if the block still contains valid log,
+ * what mtdpstore can do is to erase and write the valid log back.
+ */
+static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	int ret;
+	loff_t off;
+	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
+
+	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
+		ret = mtdpstore_block_isbad(cxt, off);
+		if (ret)
+			continue;
+
+		ret = mtdpstore_block_is_removed(cxt, off);
+		if (!ret)
+			continue;
+
+		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static void mtdpstore_notify_remove(struct mtd_info *mtd)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	mtdpstore_flush_removed(cxt);
+
+	blkoops_unregister_device(&cxt->bo_dev);
+	kfree(cxt->badmap);
+	kfree(cxt->usedmap);
+	kfree(cxt->rmmap);
+	cxt->mtd = NULL;
+	cxt->index = -1;
+}
+
+static struct mtd_notifier mtdpstore_notifier = {
+	.add	= mtdpstore_notify_add,
+	.remove	= mtdpstore_notify_remove,
+};
+
+static int __init mtdpstore_init(void)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct mtd_info *mtd = cxt->mtd;
+	struct blkoops_info *info = &cxt->bo_info;
+
+	ret = blkoops_info(info);
+	if (unlikely(ret))
+		return ret;
+
+	if (strlen(info->device) == 0) {
+		dev_err(&mtd->dev, "mtd device must be supplied\n");
+		return -EINVAL;
+	}
+	if (!info->dmesg_size) {
+		dev_err(&mtd->dev, "no recorder enabled\n");
+		return -EINVAL;
+	}
+
+	/* Setup the MTD device to use */
+	ret = kstrtoint((char *)info->device, 0, &cxt->index);
+	if (ret)
+		cxt->index = -1;
+
+	register_mtd_user(&mtdpstore_notifier);
+	return 0;
+}
+module_init(mtdpstore_init);
+
+static void __exit mtdpstore_exit(void)
+{
+	unregister_mtd_user(&mtdpstore_notifier);
+}
+module_exit(mtdpstore_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
-- 
1.9.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-07 12:25 ` [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
@ 2020-02-18 10:34   ` Miquel Raynal
  2020-02-19  1:13     ` liaoweixiong
  2020-03-18 18:57   ` Kees Cook
  1 sibling, 1 reply; 43+ messages in thread
From: Miquel Raynal @ 2020-02-18 10:34 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Kees Cook, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, linux-mtd, Jonathan Cameron,
	Colin Cross, Mauro Carvalho Chehab, David S. Miller,
	Vignesh Raghavendra

Hi WeiXiong,

WeiXiong Liao <liaoweixiong@allwinnertech.com> wrote on Fri,  7 Feb
2020 20:25:55 +0800:

> It's the last one of a series of patches for adaptive to MTD device.
> 
> The mtdpstore is similar to mtdoops but more powerful. It bases on
> pstore/blk, aims to store panic and oops logs to a flash partition,
> where it can be read back as files after mounting pstore filesystem.
> 
> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
> block device at the very beginning, but now, compatible to not only
> block device. After this series of patches, pstore/blk can also work
> for MTD device. To make it work, 'blkdev' on kconfig or module
> parameter of blkoops should be set as mtd device name or mtd number.
> See more about pstore/blk and blkoops on:
>     Documentation/admin-guide/pstore-block.rst
> 
> Why do we need mtdpstore?
> 1. repetitive jobs between pstore and mtdoops
>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
>    They have much similar logic that register to kmsg dumper and store
>    log to several chunks one by one.
> 2. do what a driver should do
>    To me, a driver should provide methods instead of policies. What MTD
>    should do is to provide read/write/erase operations, geting rid of codes
>    about chunk management, kmsg dumper and configuration.
> 3. enhanced feature
>    Not only store log, but also show it as files.
>    Not only log, but also trigger time and trigger count.
>    Not only panic/oops log, but also log recorder for pmsg, console and
>    ftrace in the future.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>

Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>

Richard, your PoV on this is welcome.

I suppose this patch depends on the others to work correctly so maybe
we should wait the next release before applying it.

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-18 10:34   ` Miquel Raynal
@ 2020-02-19  1:13     ` liaoweixiong
  0 siblings, 0 replies; 43+ messages in thread
From: liaoweixiong @ 2020-02-19  1:13 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Rob Herring, Tony Luck, Kees Cook, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, linux-mtd, Jonathan Cameron,
	Colin Cross, Mauro Carvalho Chehab, David S. Miller,
	Vignesh Raghavendra

hi Miquel Raynal,

On 2020/2/18 下午6:34, Miquel Raynal wrote:
> Hi WeiXiong,
> 
> WeiXiong Liao <liaoweixiong@allwinnertech.com> wrote on Fri,  7 Feb
> 2020 20:25:55 +0800:
> 
>> It's the last one of a series of patches for adaptive to MTD device.
>>
>> The mtdpstore is similar to mtdoops but more powerful. It bases on
>> pstore/blk, aims to store panic and oops logs to a flash partition,
>> where it can be read back as files after mounting pstore filesystem.
>>
>> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
>> block device at the very beginning, but now, compatible to not only
>> block device. After this series of patches, pstore/blk can also work
>> for MTD device. To make it work, 'blkdev' on kconfig or module
>> parameter of blkoops should be set as mtd device name or mtd number.
>> See more about pstore/blk and blkoops on:
>>     Documentation/admin-guide/pstore-block.rst
>>
>> Why do we need mtdpstore?
>> 1. repetitive jobs between pstore and mtdoops
>>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
>>    They have much similar logic that register to kmsg dumper and store
>>    log to several chunks one by one.
>> 2. do what a driver should do
>>    To me, a driver should provide methods instead of policies. What MTD
>>    should do is to provide read/write/erase operations, geting rid of codes
>>    about chunk management, kmsg dumper and configuration.
>> 3. enhanced feature
>>    Not only store log, but also show it as files.
>>    Not only log, but also trigger time and trigger count.
>>    Not only panic/oops log, but also log recorder for pmsg, console and
>>    ftrace in the future.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> 
> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
> 
> Richard, your PoV on this is welcome.
> 
> I suppose this patch depends on the others to work correctly so maybe
> we should wait the next release before applying it.
> 

Of couse, thank you for your review

> Thanks,
> Miquèl
> 

-- 
liaoweixiong

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-02-07 12:25 ` [PATCH v2 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
@ 2020-02-26  0:52   ` Kees Cook
  2020-02-27  8:21     ` liaoweixiong
  2020-03-09  0:52     ` WeiXiong Liao
  0 siblings, 2 replies; 43+ messages in thread
From: Kees Cook @ 2020-02-26  0:52 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
> pstore/blk is similar to pstore/ram, but dump log to block devices
> rather than persistent ram.

Thanks for waiting for me to get to this review! Notes below...

> 
> Why do we need pstore/blk?
> 1. Most embedded intelligent equipment have no persistent ram, which
> increases costs. We perfer to cheaper solutions, like block devices.
> 2. Do not any equipment have battery, which means that it lost all data
> on general ram if power failure. Pstore has little to do for these
> equipments.
> 
> pstore/blk is one of series patches, and provides the zones management
> of partition of block device or non-block device likes mtd devices. It
> only supports dmesg recorder right now.
> 
> To make pstore/blk work, the block/non-block driver should calls
> blkz_register() and call blkz_unregister() when exits. On other patches
> of series, a better wrapper for pstore/blk, named blkoops, will be
> there.
> 
> It's different with pstore/ram, pstore/blk relies on read/write APIs
> from device driver, especially, write operation for panic record.
> 
> Recommend that, the block/non-block driver should register to pstore/blk
> only after devices have registered to Linux and ready to work.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  fs/pstore/Kconfig          |  10 +
>  fs/pstore/Makefile         |   3 +
>  fs/pstore/blkzone.c        | 948 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pstore_blk.h |  62 +++
>  4 files changed, 1023 insertions(+)
>  create mode 100644 fs/pstore/blkzone.c
>  create mode 100644 include/linux/pstore_blk.h
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 8f0369aad22a..536fde9e13e8 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -153,3 +153,13 @@ config PSTORE_RAM
>  	  "ramoops.ko".
>  
>  	  For more information, see Documentation/admin-guide/ramoops.rst.
> +
> +config PSTORE_BLK
> +	tristate "Log panic/oops to a block device"
> +	depends on PSTORE
> +	depends on BLOCK
> +	help
> +	  This enables panic and oops message to be logged to a block dev
> +	  where it can be read back at some later point.

I think more accurate would be:
"... read back on the next boot via pstorefs."

> +
> +	  If unsure, say N.
> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
> index 967b5891f325..0ee2fc8d1bfb 100644
> --- a/fs/pstore/Makefile
> +++ b/fs/pstore/Makefile
> @@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
>  
>  ramoops-objs += ram.o ram_core.o
>  obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
> +
> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
> +pstore_blk-y += blkzone.o

Why this dance with files? I would just expect:

obj-$(CONFIG_PSTORE_BLK)     += blkzone.o

(Regardless, please keep tabs lined up in this file)

> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
> new file mode 100644
> index 000000000000..f77f612b50ba
> --- /dev/null
> +++ b/fs/pstore/blkzone.c
> @@ -0,0 +1,948 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define MODNAME "pstore-blk"
> +#define pr_fmt(fmt) MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/blkdev.h>
> +#include <linux/pstore.h>
> +#include <linux/mount.h>
> +#include <linux/printk.h>
> +#include <linux/fs.h>
> +#include <linux/pstore_blk.h>
> +#include <linux/kdev_t.h>
> +#include <linux/device.h>
> +#include <linux/namei.h>
> +#include <linux/fcntl.h>
> +#include <linux/uio.h>
> +#include <linux/writeback.h>
> +
> +/**
> + * struct blkz_head - head of zone to flush to storage
> + *
> + * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
> + * @datalen: length of data in @data
> + * @data: zone data.
> + */
> +struct blkz_buffer {
> +#define BLK_SIG (0x43474244) /* DBGC */

I was going to suggest extracting PERSISTENT_RAM_SIG, renaming it and
using it in here and in ram_core.c, but then I realize they're not
marking the same structure. How about choosing a new magic sig for the
blkzone data header?

> +	uint32_t sig;
> +	atomic_t datalen;
> +	uint8_t data[];
> +};
> +
> +/**
> + * struct blkz_dmesg_header: dmesg information

This is the on-disk structure also?

> + *
> + * @magic: magic num for dmesg header
> + * @time: trigger time
> + * @compressed: whether conpressed
> + * @count: oops/panic counter
> + * @reason: identify oops or panic
> + */
> +struct blkz_dmesg_header {
> +#define DMESG_HEADER_MAGIC 0x4dfc3ae5

How was this magic chosen?

> +	uint32_t magic;
> +	struct timespec64 time;
> +	bool compressed;
> +	uint32_t counter;
> +	enum kmsg_dump_reason reason;
> +	uint8_t data[0];

Please use [] instead of [0].

> +};
> +
> +/**
> + * struct blkz_zone - zone information
> + * @off:
> + *	zone offset of block device
> + * @type:
> + *	frontent type for this zone
> + * @name:
> + *	frontent name for this zone
> + * @buffer:
> + *	pointer to data buffer managed by this zone
> + * @oldbuf:
> + *	pointer to old data buffer.
> + * @buffer_size:
> + *	bytes in @buffer->data
> + * @should_recover:
> + *	should recover from storage
> + * @dirty:
> + *	mark whether the data in @buffer are dirty (not flush to storage yet)
> + */

Thank you for the kerndoc! :) Is it linked to from any .rst files?

> +struct blkz_zone {
> +	unsigned long off;

Should this be loff_t?

> +	const char *name;
> +	enum pstore_type_id type;
> +
> +	struct blkz_buffer *buffer;
> +	struct blkz_buffer *oldbuf;
> +	size_t buffer_size;
> +	bool should_recover;
> +	atomic_t dirty;
> +};
> +
> +struct blkz_context {
> +	struct blkz_zone **dbzs;	/* dmesg block zones */
> +	unsigned int dmesg_max_cnt;
> +	unsigned int dmesg_read_cnt;
> +	unsigned int dmesg_write_cnt;
> +	/*
> +	 * the counter should be recovered when recover.
> +	 * It records the oops/panic times after burning rather than booting.
> +	 */
> +	unsigned int oops_counter;
> +	unsigned int panic_counter;
> +	atomic_t recovered;
> +	atomic_t on_panic;
> +
> +	/*
> +	 * bzinfo_lock just protects "bzinfo" during calls to
> +	 * blkz_register/blkz_unregister
> +	 */
> +	spinlock_t bzinfo_lock;
> +	struct blkz_info *bzinfo;
> +	struct pstore_info pstore;
> +};
> +static struct blkz_context blkz_cxt;
> +
> +enum blkz_flush_mode {
> +	FLUSH_NONE = 0,
> +	FLUSH_PART,
> +	FLUSH_META,
> +	FLUSH_ALL,
> +};
> +
> +static inline int buffer_datalen(struct blkz_zone *zone)
> +{
> +	return atomic_read(&zone->buffer->datalen);
> +}
> +
> +static inline bool is_on_panic(void)
> +{
> +	struct blkz_context *cxt = &blkz_cxt;
> +
> +	return atomic_read(&cxt->on_panic);
> +}
> +
> +static int blkz_zone_read(struct blkz_zone *zone, char *buf,
> +		size_t len, unsigned long off)
> +{
> +	if (!buf || !zone->buffer)
> +		return -EINVAL;
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +	len = min_t(size_t, len, zone->buffer_size - off);
> +	memcpy(buf, zone->buffer->data + off, len);

Should the remainder of the buffer be zeroed if
	len > zone->buffer_size - off
? If not, I was expecting this to return how much was copied.

> +	return 0;
> +}
> +
> +static int blkz_zone_write(struct blkz_zone *zone,
> +		enum blkz_flush_mode flush_mode, const char *buf,
> +		size_t len, unsigned long off)
> +{
> +	struct blkz_info *info = blkz_cxt.bzinfo;
> +	ssize_t wcnt = 0;
> +	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
> +	size_t wlen;
> +
> +	if (off > zone->buffer_size)
> +		return -EINVAL;
> +	wlen = min_t(size_t, len, zone->buffer_size - off);
> +	if (buf && wlen) {
> +		memcpy(zone->buffer->data + off, buf, wlen);
> +		atomic_set(&zone->buffer->datalen, wlen + off);
> +	}

If you're expecting concurrent writers (use of atomic_set(), I would
expect the whole write to be locked instead. (i.e. what happens if
multiple callers call blkz_zone_write()?)

> +
> +	/* avoid to damage old records */
> +	if (!is_on_panic() && !atomic_read(&blkz_cxt.recovered))
> +		goto set_dirty;
> +
> +	writeop = is_on_panic() ? info->panic_write : info->write;
> +	if (!writeop)
> +		goto set_dirty;
> +
> +	switch (flush_mode) {
> +	case FLUSH_NONE:
> +		if (unlikely(buf && wlen))
> +			goto set_dirty;
> +		return 0;
> +	case FLUSH_PART:
> +		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
> +				zone->off + sizeof(*zone->buffer) + off);
> +		if (wcnt != wlen)
> +			goto set_dirty;
> +		/* fallthrough */
> +	case FLUSH_META:
> +		wlen = sizeof(struct blkz_buffer);
> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
> +		if (wcnt != wlen)
> +			goto set_dirty;
> +		break;
> +	case FLUSH_ALL:
> +		wlen = zone->buffer_size + sizeof(*zone->buffer);
> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
> +		if (wcnt != wlen)
> +			goto set_dirty;
> +		break;
> +	}
> +
> +	return 0;
> +set_dirty:
> +	atomic_set(&zone->dirty, true);
> +	return -EBUSY;
> +}
> +
> +static int blkz_flush_dirty_zone(struct blkz_zone *zone)
> +{
> +	int ret;
> +
> +	if (!zone)
> +		return -EINVAL;
> +
> +	if (!atomic_read(&zone->dirty))
> +		return 0;
> +
> +	if (!atomic_read(&blkz_cxt.recovered))
> +		return -EBUSY;
> +
> +	ret = blkz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
> +	if (!ret)
> +		atomic_set(&zone->dirty, false);
> +	return ret;
> +}
> +
> +static int blkz_flush_dirty_zones(struct blkz_zone **zones, unsigned int cnt)
> +{
> +	int i, ret;
> +	struct blkz_zone *zone;
> +
> +	if (!zones)
> +		return -EINVAL;
> +
> +	for (i = 0; i < cnt; i++) {
> +		zone = zones[i];
> +		if (!zone)
> +			return -EINVAL;
> +		ret = blkz_flush_dirty_zone(zone);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +/**
> + * blkz_move_zone: move data from a old zone to a new zone
> + *
> + * @old: the old zone
> + * @new: the new zone
> + *
> + * NOTE:
> + *	Call blkz_zone_write to copy and flush data. If it failed, we
> + *	should reset new->dirty, because the new zone not really dirty.
> + */
> +static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
> +{
> +	const char *data = (const char *)old->buffer->data;
> +	int ret;
> +
> +	ret = blkz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
> +	if (ret) {
> +		atomic_set(&new->buffer->datalen, 0);
> +		atomic_set(&new->dirty, false);
> +		return ret;
> +	}
> +	atomic_set(&old->buffer->datalen, 0);
> +	return 0;
> +}
> +
> +static int blkz_recover_dmesg_data(struct blkz_context *cxt)

What does "recover" mean in this context? Is this "read from storage"?

> +{
> +	struct blkz_info *info = cxt->bzinfo;
> +	struct blkz_zone *zone = NULL;
> +	struct blkz_buffer *buf;
> +	unsigned long i;
> +	ssize_t rcnt;
> +
> +	if (!info->read)
> +		return -EINVAL;
> +
> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
> +		zone = cxt->dbzs[i];
> +		if (unlikely(!zone))
> +			return -EINVAL;
> +		if (atomic_read(&zone->dirty)) {
> +			unsigned int wcnt = cxt->dmesg_write_cnt;
> +			struct blkz_zone *new = cxt->dbzs[wcnt];
> +			int ret;
> +
> +			ret = blkz_move_zone(zone, new);
> +			if (ret) {
> +				pr_err("move zone from %lu to %d failed\n",
> +						i, wcnt);
> +				return ret;
> +			}
> +			cxt->dmesg_write_cnt = (wcnt + 1) % cxt->dmesg_max_cnt;
> +		}
> +		if (!zone->should_recover)
> +			continue;
> +		buf = zone->buffer;
> +		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
> +				zone->off);
> +		if (rcnt != zone->buffer_size + sizeof(*buf))
> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * blkz_recover_dmesg_meta: recover metadata of dmesg
> + *
> + * Recover metadata as follow:
> + * @cxt->dmesg_write_cnt
> + * @cxt->oops_counter
> + * @cxt->panic_counter
> + */
> +static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
> +{
> +	struct blkz_info *info = cxt->bzinfo;
> +	struct blkz_zone *zone;
> +	size_t rcnt, len;
> +	struct blkz_buffer *buf;
> +	struct blkz_dmesg_header *hdr;
> +	struct timespec64 time = {0};
> +	unsigned long i;
> +	/*
> +	 * Recover may on panic, we can't allocate any memory by kmalloc.
> +	 * So, we use local array instead.
> +	 */
> +	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
> +
> +	if (!info->read)
> +		return -EINVAL;
> +
> +	len = sizeof(*buf) + sizeof(*hdr);
> +	buf = (struct blkz_buffer *)buffer_header;
> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
> +		zone = cxt->dbzs[i];
> +		if (unlikely(!zone))
> +			return -EINVAL;
> +
> +		rcnt = info->read((char *)buf, len, zone->off);
> +		if (rcnt != len) {
> +			pr_err("read %s with id %lu failed\n", zone->name, i);
> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		}
> +
> +		if (buf->sig != zone->buffer->sig) {
> +			pr_debug("no valid data in dmesg zone %lu\n", i);
> +			continue;
> +		}
> +
> +		if (zone->buffer_size < atomic_read(&buf->datalen)) {
> +			pr_info("found overtop zone: %s: id %lu, off %lu, size %zu\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size);
> +			continue;
> +		}
> +
> +		hdr = (struct blkz_dmesg_header *)buf->data;
> +		if (hdr->magic != DMESG_HEADER_MAGIC) {
> +			pr_info("found invalid zone: %s: id %lu, off %lu, size %zu\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size);
> +			continue;
> +		}
> +
> +		/*
> +		 * we get the newest zone, and the next one must be the oldest
> +		 * or unused zone, because we do write one by one like a circle.
> +		 */
> +		if (hdr->time.tv_sec >= time.tv_sec) {
> +			time.tv_sec = hdr->time.tv_sec;
> +			cxt->dmesg_write_cnt = (i + 1) % cxt->dmesg_max_cnt;
> +		}
> +
> +		if (hdr->reason == KMSG_DUMP_OOPS)
> +			cxt->oops_counter =
> +				max(cxt->oops_counter, hdr->counter);
> +		else
> +			cxt->panic_counter =
> +				max(cxt->panic_counter, hdr->counter);
> +
> +		if (!atomic_read(&buf->datalen)) {
> +			pr_debug("found erased zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
> +					zone->name, i, zone->off,
> +					zone->buffer_size,
> +					atomic_read(&buf->datalen));
> +			continue;
> +		}
> +
> +		if (!is_on_panic())
> +			zone->should_recover = true;
> +		pr_debug("found nice zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
> +				zone->name, i, zone->off,
> +				zone->buffer_size, atomic_read(&buf->datalen));
> +	}
> +
> +	return 0;
> +}
> +
> +static int blkz_recover_dmesg(struct blkz_context *cxt)
> +{
> +	int ret;
> +
> +	if (!cxt->dbzs)
> +		return 0;
> +
> +	ret = blkz_recover_dmesg_meta(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	ret = blkz_recover_dmesg_data(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	return 0;
> +recover_fail:
> +	pr_debug("recover dmesg failed\n");
> +	return ret;
> +}
> +
> +static inline int blkz_recovery(struct blkz_context *cxt)
> +{
> +	int ret = -EBUSY;
> +
> +	if (atomic_read(&cxt->recovered))
> +		return 0;
> +
> +	ret = blkz_recover_dmesg(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
> +	pr_debug("recover end!\n");
> +	atomic_set(&cxt->recovered, 1);
> +	return 0;
> +
> +recover_fail:
> +	pr_err("recover failed\n");
> +	return ret;
> +}
> +
> +static int blkz_pstore_open(struct pstore_info *psi)
> +{
> +	struct blkz_context *cxt = psi->data;
> +
> +	cxt->dmesg_read_cnt = 0;
> +	return 0;
> +}
> +
> +static inline bool blkz_ok(struct blkz_zone *zone)
> +{
> +	if (zone && zone->buffer && buffer_datalen(zone))
> +		return true;
> +	return false;
> +}
> +
> +static inline int blkz_dmesg_erase(struct blkz_context *cxt,
> +		struct blkz_zone *zone)
> +{
> +	if (unlikely(!blkz_ok(zone)))
> +		return 0;
> +
> +	atomic_set(&zone->buffer->datalen, 0);
> +	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +}

cxt is unused?

> +
> +static int blkz_pstore_erase(struct pstore_record *record)
> +{
> +	struct blkz_context *cxt = record->psi->data;
> +

Please sanity-check the record->id is in bounds before using it.

> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
> +		struct pstore_record *record)
> +{
> +	struct blkz_context *cxt = record->psi->data;
> +	struct blkz_buffer *buffer = zone->buffer;
> +	struct blkz_dmesg_header *hdr =
> +		(struct blkz_dmesg_header *)buffer->data;
> +
> +	hdr->magic = DMESG_HEADER_MAGIC;
> +	hdr->compressed = record->compressed;
> +	hdr->time.tv_sec = record->time.tv_sec;
> +	hdr->time.tv_nsec = record->time.tv_nsec;
> +	hdr->reason = record->reason;
> +	if (hdr->reason == KMSG_DUMP_OOPS)
> +		hdr->counter = ++cxt->oops_counter;
> +	else
> +		hdr->counter = ++cxt->panic_counter;
> +}
> +
> +static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
> +		struct pstore_record *record)

Instead of "..._do" maybe name this "..._record", since it writes one
record?

> +{
> +	size_t size, hlen;
> +	struct blkz_zone *zone;
> +	unsigned int zonenum;
> +
> +	zonenum = cxt->dmesg_write_cnt;
> +	zone = cxt->dbzs[zonenum];
> +	if (unlikely(!zone))
> +		return -ENOSPC;
> +	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
> +
> +	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
> +	blkz_write_kmsg_hdr(zone, record);
> +	hlen = sizeof(struct blkz_dmesg_header);
> +	size = min_t(size_t, record->size, zone->buffer_size - hlen);
> +	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
> +}
> +
> +static int notrace blkz_dmesg_write(struct blkz_context *cxt,
> +		struct pstore_record *record)
> +{
> +	int ret;
> +	struct blkz_info *info = cxt->bzinfo;
> +
> +	/*
> +	 * Out of the various dmesg dump types, pstore/blk is currently designed
> +	 * to only store crash logs, rather than storing general kernel logs.
> +	 */
> +	if (record->reason != KMSG_DUMP_OOPS &&
> +			record->reason != KMSG_DUMP_PANIC)
> +		return -EINVAL;
> +
> +	/* Skip Oopes when configured to do so. */
> +	if (record->reason == KMSG_DUMP_OOPS && !info->dump_oops)
> +		return -EINVAL;
> +
> +	/*
> +	 * Explicitly only take the first part of any new crash.
> +	 * If our buffer is larger than kmsg_bytes, this can never happen,
> +	 * and if our buffer is smaller than kmsg_bytes, we don't want the
> +	 * report split across multiple records.
> +	 */
> +	if (record->part != 1)
> +		return -ENOSPC;
> +
> +	if (!cxt->dbzs)
> +		return -ENOSPC;
> +
> +	ret = blkz_dmesg_write_do(cxt, record);
> +	if (!ret) {
> +		pr_debug("try to flush other dirty dmesg zones\n");
> +		blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
> +	}
> +
> +	/* alway return 0 as we had handled it on buffer */
> +	return 0;
> +}
> +
> +static int notrace blkz_pstore_write(struct pstore_record *record)
> +{
> +	struct blkz_context *cxt = record->psi->data;
> +
> +	if (record->type == PSTORE_TYPE_DMESG &&
> +			record->reason == KMSG_DUMP_PANIC)
> +		atomic_set(&cxt->on_panic, 1);
> +
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		return blkz_dmesg_write(cxt, record);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +#define READ_NEXT_ZONE ((ssize_t)(-1024))
> +static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
> +{
> +	struct blkz_zone *zone = NULL;
> +
> +	while (cxt->dmesg_read_cnt < cxt->dmesg_max_cnt) {
> +		zone = cxt->dbzs[cxt->dmesg_read_cnt++];
> +		if (blkz_ok(zone))
> +			return zone;
> +	}
> +
> +	return NULL;
> +}
> +
> +static int blkz_read_dmesg_hdr(struct blkz_zone *zone,
> +		struct pstore_record *record)
> +{
> +	struct blkz_buffer *buffer = zone->buffer;
> +	struct blkz_dmesg_header *hdr =
> +		(struct blkz_dmesg_header *)buffer->data;
> +
> +	if (hdr->magic != DMESG_HEADER_MAGIC)
> +		return -EINVAL;
> +	record->compressed = hdr->compressed;
> +	record->time.tv_sec = hdr->time.tv_sec;
> +	record->time.tv_nsec = hdr->time.tv_nsec;
> +	record->reason = hdr->reason;
> +	record->count = hdr->counter;
> +	return 0;
> +}
> +
> +static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
> +		struct pstore_record *record)
> +{
> +	size_t size, hlen = 0;
> +
> +	size = buffer_datalen(zone);
> +	/* Clear and skip this DMESG record if it has no valid header */
> +	if (blkz_read_dmesg_hdr(zone, record)) {
> +		atomic_set(&zone->buffer->datalen, 0);
> +		atomic_set(&zone->dirty, 0);
> +		return READ_NEXT_ZONE;
> +	}
> +	size -= sizeof(struct blkz_dmesg_header);
> +
> +	if (!record->compressed) {
> +		char *buf = kasprintf(GFP_KERNEL,
> +				"%s: Total %d times\n",
> +				record->reason == KMSG_DUMP_OOPS ? "Oops" :
> +				"Panic", record->count);
> +		hlen = strlen(buf);
> +		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
> +		if (!record->buf) {
> +			kfree(buf);
> +			return -ENOMEM;
> +		}
> +	} else {
> +		record->buf = kmalloc(size, GFP_KERNEL);
> +		if (!record->buf)
> +			return -ENOMEM;
> +	}
> +
> +	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
> +				sizeof(struct blkz_dmesg_header)) < 0)) {
> +		kfree(record->buf);
> +		return READ_NEXT_ZONE;
> +	}
> +
> +	return size + hlen;
> +}
> +
> +static ssize_t blkz_pstore_read(struct pstore_record *record)
> +{
> +	struct blkz_context *cxt = record->psi->data;
> +	ssize_t (*blkz_read)(struct blkz_zone *zone,
> +			struct pstore_record *record);
> +	struct blkz_zone *zone;
> +	ssize_t ret;
> +
> +	/* before read, we must recover from storage */
> +	ret = blkz_recovery(cxt);
> +	if (ret)
> +		return ret;
> +
> +next_zone:
> +	zone = blkz_read_next_zone(cxt);
> +	if (!zone)
> +		return 0;
> +
> +	record->type = zone->type;
> +	switch (record->type) {
> +	case PSTORE_TYPE_DMESG:
> +		blkz_read = blkz_dmesg_read;
> +		record->id = cxt->dmesg_read_cnt - 1;
> +		break;
> +	default:
> +		goto next_zone;
> +	}
> +
> +	ret = blkz_read(zone, record);
> +	if (ret == READ_NEXT_ZONE)
> +		goto next_zone;
> +	return ret;
> +}
> +
> +static struct blkz_context blkz_cxt = {
> +	.bzinfo_lock = __SPIN_LOCK_UNLOCKED(blkz_cxt.bzinfo_lock),
> +	.recovered = ATOMIC_INIT(0),
> +	.on_panic = ATOMIC_INIT(0),
> +	.pstore = {
> +		.owner = THIS_MODULE,
> +		.name = MODNAME,
> +		.open = blkz_pstore_open,
> +		.read = blkz_pstore_read,
> +		.write = blkz_pstore_write,
> +		.erase = blkz_pstore_erase,
> +	},
> +};
> +
> +static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
> +		unsigned long *off, size_t size)
> +{
> +	struct blkz_info *info = blkz_cxt.bzinfo;
> +	struct blkz_zone *zone;
> +	const char *name = pstore_type_to_name(type);
> +
> +	if (!size)
> +		return NULL;
> +
> +	if (*off + size > info->total_size) {
> +		pr_err("no room for %s (0x%zx@0x%lx over 0x%lx)\n",
> +			name, size, *off, info->total_size);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	zone = kzalloc(sizeof(struct blkz_zone), GFP_KERNEL);
> +	if (!zone)
> +		return ERR_PTR(-ENOMEM);
> +
> +	zone->buffer = kmalloc(size, GFP_KERNEL);
> +	if (!zone->buffer) {
> +		kfree(zone);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	memset(zone->buffer, 0xFF, size);
> +	zone->off = *off;
> +	zone->name = name;
> +	zone->type = type;
> +	zone->buffer_size = size - sizeof(struct blkz_buffer);
> +	zone->buffer->sig = type ^ BLK_SIG;
> +	atomic_set(&zone->dirty, 0);
> +	atomic_set(&zone->buffer->datalen, 0);
> +
> +	*off += size;
> +
> +	pr_debug("blkzone %s: off 0x%lx, %zu header, %zu data\n", zone->name,
> +			zone->off, sizeof(*zone->buffer), zone->buffer_size);
> +	return zone;
> +}
> +
> +static struct blkz_zone **blkz_init_zones(enum pstore_type_id type,
> +	unsigned long *off, size_t total_size, ssize_t record_size,
> +	unsigned int *cnt)
> +{
> +	struct blkz_info *info = blkz_cxt.bzinfo;
> +	struct blkz_zone **zones, *zone;
> +	const char *name = pstore_type_to_name(type);
> +	int c, i;
> +
> +	if (!total_size || !record_size)
> +		return NULL;
> +
> +	if (*off + total_size > info->total_size) {
> +		pr_err("no room for zones %s (0x%zx@0x%lx over 0x%lx)\n",
> +			name, total_size, *off, info->total_size);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	c = total_size / record_size;
> +	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
> +	if (!zones) {
> +		pr_err("allocate for zones %s failed\n", name);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	memset(zones, 0, c * sizeof(*zones));
> +
> +	for (i = 0; i < c; i++) {
> +		zone = blkz_init_zone(type, off, record_size);
> +		if (!zone || IS_ERR(zone)) {
> +			pr_err("initialize zones %s failed\n", name);
> +			while (--i >= 0) {
> +				kfree(zones[i]->buffer);
> +				kfree(zones[i]);
> +			}
> +			kfree(zones);
> +			return (void *)zone;
> +		}
> +		zones[i] = zone;
> +	}
> +
> +	*cnt = c;
> +	return zones;
> +}
> +
> +static void blkz_free_zone(struct blkz_zone **blkzone)
> +{
> +	struct blkz_zone *zone = *blkzone;
> +
> +	if (!zone)
> +		return;
> +
> +	kfree(zone->buffer);
> +	kfree(zone);
> +	*blkzone = NULL;
> +}
> +
> +static void blkz_free_zones(struct blkz_zone ***blkzones, unsigned int *cnt)
> +{
> +	struct blkz_zone **zones = *blkzones;
> +
> +	if (!zones)
> +		return;
> +
> +	while (*cnt > 0) {
> +		blkz_free_zone(&zones[*cnt]);
> +		(*cnt)--;
> +	}
> +	kfree(zones);
> +	*blkzones = NULL;
> +}
> +
> +static int blkz_cut_zones(struct blkz_context *cxt)

What does "cut" mean here? Maybe "alloc" instead?

> +{
> +	struct blkz_info *info = cxt->bzinfo;
> +	unsigned long off = 0;
> +	int err;
> +	size_t size;
> +
> +	size = info->total_size;
> +	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
> +			info->dmesg_size, &cxt->dmesg_max_cnt);
> +	if (IS_ERR(cxt->dbzs)) {
> +		err = PTR_ERR(cxt->dbzs);
> +		goto fail_out;
> +	}
> +
> +	return 0;
> +fail_out:
> +	return err;
> +}
> +
> +int blkz_register(struct blkz_info *info)
> +{
> +	int err = -EINVAL;
> +	struct blkz_context *cxt = &blkz_cxt;
> +	struct module *owner = info->owner;
> +
> +	if (!info->total_size) {
> +		pr_warn("the total size must be non-zero\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!info->dmesg_size) {
> +		pr_warn("at least one of the records be non-zero\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!info->name || !info->name[0])
> +		return -EINVAL;
> +
> +	if (info->total_size < 4096) {
> +		pr_err("total size must be greater than 4096 bytes\n");
> +		return -EINVAL;
> +	}
> +
> +#define check_size(name, size) {					\
> +		if (info->name > 0 && info->name < (size)) {		\
> +			pr_err(#name " must be over %d\n", (size));	\
> +			return -EINVAL;					\
> +		}							\
> +		if (info->name & (size - 1)) {				\
> +			pr_err(#name " must be a multiple of %d\n",	\
> +					(size));			\
> +			return -EINVAL;					\
> +		}							\
> +	}
> +
> +	check_size(total_size, 4096);
> +	check_size(dmesg_size, SECTOR_SIZE);
> +
> +#undef check_size
> +
> +	/*
> +	 * the @read and @write must be applied.
> +	 * if no @read, pstore may mount failed.
> +	 * if no @write, pstore do not support to remove record file.
> +	 */
> +	if (!info->read || !info->write) {
> +		pr_err("no valid general read/write interface\n");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock(&cxt->bzinfo_lock);
> +	if (cxt->bzinfo) {
> +		pr_warn("blk '%s' already loaded: ignoring '%s'\n",
> +				cxt->bzinfo->name, info->name);
> +		spin_unlock(&cxt->bzinfo_lock);
> +		return -EBUSY;
> +	}
> +	cxt->bzinfo = info;
> +	spin_unlock(&cxt->bzinfo_lock);
> +
> +	if (owner && !try_module_get(owner)) {
> +		err = -EBUSY;
> +		goto fail_out;
> +	}
> +
> +	pr_debug("register %s with properties:\n", info->name);
> +	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
> +	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
> +
> +	err = blkz_cut_zones(cxt);
> +	if (err) {
> +		pr_err("cut zones fialed\n");

typo: "failed"

> +		goto put_module;
> +	}
> +
> +	if (info->dmesg_size) {
> +		cxt->pstore.bufsize = cxt->dbzs[0]->buffer_size -
> +			sizeof(struct blkz_dmesg_header);
> +		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
> +		if (!cxt->pstore.buf) {
> +			err = -ENOMEM;

I think the allocated zones need to be freed here.

> +			goto put_module;
> +		}
> +	}
> +	cxt->pstore.data = cxt;
> +	if (info->dmesg_size)
> +		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
> +
> +	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
> +			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
> +			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
> +
> +	err = pstore_register(&cxt->pstore);
> +	if (err) {
> +		pr_err("registering with pstore failed\n");

Also here?

> +		goto free_pstore_buf;
> +	}
> +
> +	module_put(owner);
> +	return 0;
> +
> +free_pstore_buf:
> +	kfree(cxt->pstore.buf);
> +put_module:
> +	module_put(owner);
> +fail_out:
> +	spin_lock(&blkz_cxt.bzinfo_lock);
> +	blkz_cxt.bzinfo = NULL;
> +	spin_unlock(&blkz_cxt.bzinfo_lock);
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(blkz_register);
> +
> +void blkz_unregister(struct blkz_info *info)
> +{
> +	struct blkz_context *cxt = &blkz_cxt;
> +
> +	pstore_unregister(&cxt->pstore);
> +	kfree(cxt->pstore.buf);
> +	cxt->pstore.bufsize = 0;
> +
> +	spin_lock(&cxt->bzinfo_lock);
> +	blkz_cxt.bzinfo = NULL;
> +	spin_unlock(&cxt->bzinfo_lock);
> +
> +	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
> +}
> +EXPORT_SYMBOL_GPL(blkz_unregister);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("Block device Oops/Panic logger");
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> new file mode 100644
> index 000000000000..589d276fa4e4
> --- /dev/null
> +++ b/include/linux/pstore_blk.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PSTORE_BLK_H_
> +#define __PSTORE_BLK_H_
> +
> +#include <linux/types.h>
> +#include <linux/blkdev.h>
> +
> +/**
> + * struct blkz_info - backend blkzone driver structure
> + *
> + * @owner:
> + *	Module which is responsible for this backend driver.
> + * @name:
> + *	Name of the backend driver.
> + * @total_size:
> + *	The total size in bytes pstore/blk can use. It must be greater than
> + *	4096 and be multiple of 4096.
> + * @dmesg_size:
> + *	The size of each zones for dmesg (oops & panic). Zero means disabled,
> + *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
> + * @dump_oops:
> + *	Dump oops and panic log or only panic.
> + * @read, @write:
> + *	The general (not panic) read/write operation. It's required unless you
> + *	are block device and supply valid @bdev. In this case, blkzone will
> + *	replace it as a general read/write interface.
> + *
> + *	Both of the @size and @offset parameters on this interface are
> + *	the relative size of the space provided, not the whole disk/flash.
> + *
> + *	On success, the number of bytes read/write should be returned.
> + *	On error, negative number should be returned.
> + * @panic_write:
> + *	The write operation only used for panic. It's optional if you do not
> + *	care panic record. If panic occur but blkzone do not recover yet, the
> + *	first zone of dmesg is used.
> + *
> + *	Both of the @size and @offset parameters on this interface are
> + *	the relative size of the space provided, not the whole disk/flash.
> + *
> + *	On success, the number of bytes write should be returned.
> + *	On error, negative number should be returned.
> + */
> +typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
> +typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
> +struct blkz_info {
> +	struct module *owner;
> +	const char *name;
> +
> +	unsigned long total_size;
> +	unsigned long dmesg_size;
> +	int dump_oops;
> +	blkz_read_op read;
> +	blkz_write_op write;
> +	blkz_write_op panic_write;
> +};
> +
> +extern int blkz_register(struct blkz_info *info);
> +extern void blkz_unregister(struct blkz_info *info);
> +
> +#endif
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-02-26  0:52   ` Kees Cook
@ 2020-02-27  8:21     ` liaoweixiong
  2020-03-18 17:23       ` Kees Cook
  2020-03-09  0:52     ` WeiXiong Liao
  1 sibling, 1 reply; 43+ messages in thread
From: liaoweixiong @ 2020-02-27  8:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/2/26 AM 8:52, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
>> pstore/blk is similar to pstore/ram, but dump log to block devices
>> rather than persistent ram.
> 
> Thanks for waiting for me to get to this review! Notes below...
> 
>>
>> Why do we need pstore/blk?
>> 1. Most embedded intelligent equipment have no persistent ram, which
>> increases costs. We perfer to cheaper solutions, like block devices.
>> 2. Do not any equipment have battery, which means that it lost all data
>> on general ram if power failure. Pstore has little to do for these
>> equipments.
>>
>> pstore/blk is one of series patches, and provides the zones management
>> of partition of block device or non-block device likes mtd devices. It
>> only supports dmesg recorder right now.
>>
>> To make pstore/blk work, the block/non-block driver should calls
>> blkz_register() and call blkz_unregister() when exits. On other patches
>> of series, a better wrapper for pstore/blk, named blkoops, will be
>> there.
>>
>> It's different with pstore/ram, pstore/blk relies on read/write APIs
>> from device driver, especially, write operation for panic record.
>>
>> Recommend that, the block/non-block driver should register to pstore/blk
>> only after devices have registered to Linux and ready to work.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  fs/pstore/Kconfig          |  10 +
>>  fs/pstore/Makefile         |   3 +
>>  fs/pstore/blkzone.c        | 948 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/pstore_blk.h |  62 +++
>>  4 files changed, 1023 insertions(+)
>>  create mode 100644 fs/pstore/blkzone.c
>>  create mode 100644 include/linux/pstore_blk.h
>>
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index 8f0369aad22a..536fde9e13e8 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -153,3 +153,13 @@ config PSTORE_RAM
>>  	  "ramoops.ko".
>>  
>>  	  For more information, see Documentation/admin-guide/ramoops.rst.
>> +
>> +config PSTORE_BLK
>> +	tristate "Log panic/oops to a block device"
>> +	depends on PSTORE
>> +	depends on BLOCK
>> +	help
>> +	  This enables panic and oops message to be logged to a block dev
>> +	  where it can be read back at some later point.
> 
> I think more accurate would be:
> "... read back on the next boot via pstorefs."
> 

I will fix it later.

>> +
>> +	  If unsure, say N.
>> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
>> index 967b5891f325..0ee2fc8d1bfb 100644
>> --- a/fs/pstore/Makefile
>> +++ b/fs/pstore/Makefile
>> @@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
>>  
>>  ramoops-objs += ram.o ram_core.o
>>  obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
>> +
>> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
>> +pstore_blk-y += blkzone.o
> 
> Why this dance with files? I would just expect:
> 
> obj-$(CONFIG_PSTORE_BLK)     += blkzone.o
> 

This makes the built module named blkzone.ko rather than
pstore_blk.ko.

> (Regardless, please keep tabs lined up in this file)
>

OK.

>> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
>> new file mode 100644
>> index 000000000000..f77f612b50ba
>> --- /dev/null
>> +++ b/fs/pstore/blkzone.c
>> @@ -0,0 +1,948 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#define MODNAME "pstore-blk"
>> +#define pr_fmt(fmt) MODNAME ": " fmt
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/slab.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/pstore.h>
>> +#include <linux/mount.h>
>> +#include <linux/printk.h>
>> +#include <linux/fs.h>
>> +#include <linux/pstore_blk.h>
>> +#include <linux/kdev_t.h>
>> +#include <linux/device.h>
>> +#include <linux/namei.h>
>> +#include <linux/fcntl.h>
>> +#include <linux/uio.h>
>> +#include <linux/writeback.h>
>> +
>> +/**
>> + * struct blkz_head - head of zone to flush to storage
>> + *
>> + * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
>> + * @datalen: length of data in @data
>> + * @data: zone data.
>> + */
>> +struct blkz_buffer {
>> +#define BLK_SIG (0x43474244) /* DBGC */
> 
> I was going to suggest extracting PERSISTENT_RAM_SIG, renaming it and
> using it in here and in ram_core.c, but then I realize they're not
> marking the same structure. How about choosing a new magic sig for the
> blkzone data header?
> 

That's OK to me. I don't know if there is a rule to get a new magic?
In addition, all members of this structure are the same as
struct persistent_ram_buffer after patch 2. Maybe it's a good idea to
extract it
if you want to merge ramoops and pstore/blk.

>> +	uint32_t sig;
>> +	atomic_t datalen;
>> +	uint8_t data[];
>> +};
>> +
>> +/**
>> + * struct blkz_dmesg_header: dmesg information
> 
> This is the on-disk structure also?
> 
Yes. The structure blkz_buffer is a generic header for all recorder
zone, and the
structure blkz_dmesg_header is a header for dmesg, saved in
blkz_buffer->data.
The dmesg recorder use it to save it's specific attributes.

>> + *
>> + * @magic: magic num for dmesg header
>> + * @time: trigger time
>> + * @compressed: whether conpressed
>> + * @count: oops/panic counter
>> + * @reason: identify oops or panic
>> + */
>> +struct blkz_dmesg_header {
>> +#define DMESG_HEADER_MAGIC 0x4dfc3ae5
> 
> How was this magic chosen?
> 

It's a random number. Maybe should I chose a meaningful magic?

>> +	uint32_t magic;
>> +	struct timespec64 time;
>> +	bool compressed;
>> +	uint32_t counter;
>> +	enum kmsg_dump_reason reason;
>> +	uint8_t data[0];
> 
> Please use [] instead of [0].
> 

OK, I will fix it later.

>> +};
>> +
>> +/**
>> + * struct blkz_zone - zone information
>> + * @off:
>> + *	zone offset of block device
>> + * @type:
>> + *	frontent type for this zone
>> + * @name:
>> + *	frontent name for this zone
>> + * @buffer:
>> + *	pointer to data buffer managed by this zone
>> + * @oldbuf:
>> + *	pointer to old data buffer.
>> + * @buffer_size:
>> + *	bytes in @buffer->data
>> + * @should_recover:
>> + *	should recover from storage
>> + * @dirty:
>> + *	mark whether the data in @buffer are dirty (not flush to storage yet)
>> + */
> 
> Thank you for the kerndoc! :) Is it linked to from any .rst files?
> 

I don't get your words. There is a document on the 6th patch. I don't know
whether it is what you want?

>> +struct blkz_zone {
>> +	unsigned long off;
> 
> Should this be loff_t?
> 

Yes. I will fix it and other related codes.

>> +	const char *name;
>> +	enum pstore_type_id type;
>> +
>> +	struct blkz_buffer *buffer;
>> +	struct blkz_buffer *oldbuf;
>> +	size_t buffer_size;
>> +	bool should_recover;
>> +	atomic_t dirty;
>> +};
>> +
>> +struct blkz_context {
>> +	struct blkz_zone **dbzs;	/* dmesg block zones */
>> +	unsigned int dmesg_max_cnt;
>> +	unsigned int dmesg_read_cnt;
>> +	unsigned int dmesg_write_cnt;
>> +	/*
>> +	 * the counter should be recovered when recover.
>> +	 * It records the oops/panic times after burning rather than booting.
>> +	 */
>> +	unsigned int oops_counter;
>> +	unsigned int panic_counter;
>> +	atomic_t recovered;
>> +	atomic_t on_panic;
>> +
>> +	/*
>> +	 * bzinfo_lock just protects "bzinfo" during calls to
>> +	 * blkz_register/blkz_unregister
>> +	 */
>> +	spinlock_t bzinfo_lock;
>> +	struct blkz_info *bzinfo;
>> +	struct pstore_info pstore;
>> +};
>> +static struct blkz_context blkz_cxt;
>> +
>> +enum blkz_flush_mode {
>> +	FLUSH_NONE = 0,
>> +	FLUSH_PART,
>> +	FLUSH_META,
>> +	FLUSH_ALL,
>> +};
>> +
>> +static inline int buffer_datalen(struct blkz_zone *zone)
>> +{
>> +	return atomic_read(&zone->buffer->datalen);
>> +}
>> +
>> +static inline bool is_on_panic(void)
>> +{
>> +	struct blkz_context *cxt = &blkz_cxt;
>> +
>> +	return atomic_read(&cxt->on_panic);
>> +}
>> +
>> +static int blkz_zone_read(struct blkz_zone *zone, char *buf,
>> +		size_t len, unsigned long off)
>> +{
>> +	if (!buf || !zone->buffer)
>> +		return -EINVAL;
>> +	if (off > zone->buffer_size)
>> +		return -EINVAL;
>> +	len = min_t(size_t, len, zone->buffer_size - off);
>> +	memcpy(buf, zone->buffer->data + off, len);
> 
> Should the remainder of the buffer be zeroed if
> 	len > zone->buffer_size - off
> ? If not, I was expecting this to return how much was copied.
> 

You are right. It should return how much was copied.

>> +	return 0;
>> +}
>> +
>> +static int blkz_zone_write(struct blkz_zone *zone,
>> +		enum blkz_flush_mode flush_mode, const char *buf,
>> +		size_t len, unsigned long off)
>> +{
>> +	struct blkz_info *info = blkz_cxt.bzinfo;
>> +	ssize_t wcnt = 0;
>> +	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
>> +	size_t wlen;
>> +
>> +	if (off > zone->buffer_size)
>> +		return -EINVAL;
>> +	wlen = min_t(size_t, len, zone->buffer_size - off);
>> +	if (buf && wlen) {
>> +		memcpy(zone->buffer->data + off, buf, wlen);
>> +		atomic_set(&zone->buffer->datalen, wlen + off);
>> +	}
> 
> If you're expecting concurrent writers (use of atomic_set(), I would
> expect the whole write to be locked instead. (i.e. what happens if
> multiple callers call blkz_zone_write()?)
> 

I don't agree with it. The datalen will be updated everywhere. It's useless
to lock here.

One more things. During the analysis, I found another problem.
Removing old files will cause new logs to be lost. Take console recorder as
am example. After new rebooting, new logs are saved to buf while old
logs are
saved to old_buf. If we remove old file at that time, not only old_buf
is freed, but
also length of buf for new data is reset to zero. The ramoops may also
has this
problem.

>> +
>> +	/* avoid to damage old records */
>> +	if (!is_on_panic() && !atomic_read(&blkz_cxt.recovered))
>> +		goto set_dirty;
>> +
>> +	writeop = is_on_panic() ? info->panic_write : info->write;
>> +	if (!writeop)
>> +		goto set_dirty;
>> +
>> +	switch (flush_mode) {
>> +	case FLUSH_NONE:
>> +		if (unlikely(buf && wlen))
>> +			goto set_dirty;
>> +		return 0;
>> +	case FLUSH_PART:
>> +		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
>> +				zone->off + sizeof(*zone->buffer) + off);
>> +		if (wcnt != wlen)
>> +			goto set_dirty;
>> +		/* fallthrough */
>> +	case FLUSH_META:
>> +		wlen = sizeof(struct blkz_buffer);
>> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
>> +		if (wcnt != wlen)
>> +			goto set_dirty;
>> +		break;
>> +	case FLUSH_ALL:
>> +		wlen = zone->buffer_size + sizeof(*zone->buffer);
>> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
>> +		if (wcnt != wlen)
>> +			goto set_dirty;
>> +		break;
>> +	}
>> +
>> +	return 0;
>> +set_dirty:
>> +	atomic_set(&zone->dirty, true);
>> +	return -EBUSY;
>> +}
>> +
>> +static int blkz_flush_dirty_zone(struct blkz_zone *zone)
>> +{
>> +	int ret;
>> +
>> +	if (!zone)
>> +		return -EINVAL;
>> +
>> +	if (!atomic_read(&zone->dirty))
>> +		return 0;
>> +
>> +	if (!atomic_read(&blkz_cxt.recovered))
>> +		return -EBUSY;
>> +
>> +	ret = blkz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
>> +	if (!ret)
>> +		atomic_set(&zone->dirty, false);
>> +	return ret;
>> +}
>> +
>> +static int blkz_flush_dirty_zones(struct blkz_zone **zones, unsigned int cnt)
>> +{
>> +	int i, ret;
>> +	struct blkz_zone *zone;
>> +
>> +	if (!zones)
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < cnt; i++) {
>> +		zone = zones[i];
>> +		if (!zone)
>> +			return -EINVAL;
>> +		ret = blkz_flush_dirty_zone(zone);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * blkz_move_zone: move data from a old zone to a new zone
>> + *
>> + * @old: the old zone
>> + * @new: the new zone
>> + *
>> + * NOTE:
>> + *	Call blkz_zone_write to copy and flush data. If it failed, we
>> + *	should reset new->dirty, because the new zone not really dirty.
>> + */
>> +static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
>> +{
>> +	const char *data = (const char *)old->buffer->data;
>> +	int ret;
>> +
>> +	ret = blkz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
>> +	if (ret) {
>> +		atomic_set(&new->buffer->datalen, 0);
>> +		atomic_set(&new->dirty, false);
>> +		return ret;
>> +	}
>> +	atomic_set(&old->buffer->datalen, 0);
>> +	return 0;
>> +}
>> +
>> +static int blkz_recover_dmesg_data(struct blkz_context *cxt)
> 
> What does "recover" mean in this context? Is this "read from storage"?

Yes. "recover" means reading data back from storage.

> 
>> +{
>> +	struct blkz_info *info = cxt->bzinfo;
>> +	struct blkz_zone *zone = NULL;
>> +	struct blkz_buffer *buf;
>> +	unsigned long i;
>> +	ssize_t rcnt;
>> +
>> +	if (!info->read)
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
>> +		zone = cxt->dbzs[i];
>> +		if (unlikely(!zone))
>> +			return -EINVAL;
>> +		if (atomic_read(&zone->dirty)) {
>> +			unsigned int wcnt = cxt->dmesg_write_cnt;
>> +			struct blkz_zone *new = cxt->dbzs[wcnt];
>> +			int ret;
>> +
>> +			ret = blkz_move_zone(zone, new);
>> +			if (ret) {
>> +				pr_err("move zone from %lu to %d failed\n",
>> +						i, wcnt);
>> +				return ret;
>> +			}
>> +			cxt->dmesg_write_cnt = (wcnt + 1) % cxt->dmesg_max_cnt;
>> +		}
>> +		if (!zone->should_recover)
>> +			continue;
>> +		buf = zone->buffer;
>> +		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
>> +				zone->off);
>> +		if (rcnt != zone->buffer_size + sizeof(*buf))
>> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
>> +	}
>> +	return 0;
>> +}
>> +
>> +/*
>> + * blkz_recover_dmesg_meta: recover metadata of dmesg
>> + *
>> + * Recover metadata as follow:
>> + * @cxt->dmesg_write_cnt
>> + * @cxt->oops_counter
>> + * @cxt->panic_counter
>> + */
>> +static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
>> +{
>> +	struct blkz_info *info = cxt->bzinfo;
>> +	struct blkz_zone *zone;
>> +	size_t rcnt, len;
>> +	struct blkz_buffer *buf;
>> +	struct blkz_dmesg_header *hdr;
>> +	struct timespec64 time = {0};
>> +	unsigned long i;
>> +	/*
>> +	 * Recover may on panic, we can't allocate any memory by kmalloc.
>> +	 * So, we use local array instead.
>> +	 */
>> +	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
>> +
>> +	if (!info->read)
>> +		return -EINVAL;
>> +
>> +	len = sizeof(*buf) + sizeof(*hdr);
>> +	buf = (struct blkz_buffer *)buffer_header;
>> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
>> +		zone = cxt->dbzs[i];
>> +		if (unlikely(!zone))
>> +			return -EINVAL;
>> +
>> +		rcnt = info->read((char *)buf, len, zone->off);
>> +		if (rcnt != len) {
>> +			pr_err("read %s with id %lu failed\n", zone->name, i);
>> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
>> +		}
>> +
>> +		if (buf->sig != zone->buffer->sig) {
>> +			pr_debug("no valid data in dmesg zone %lu\n", i);
>> +			continue;
>> +		}
>> +
>> +		if (zone->buffer_size < atomic_read(&buf->datalen)) {
>> +			pr_info("found overtop zone: %s: id %lu, off %lu, size %zu\n",
>> +					zone->name, i, zone->off,
>> +					zone->buffer_size);
>> +			continue;
>> +		}
>> +
>> +		hdr = (struct blkz_dmesg_header *)buf->data;
>> +		if (hdr->magic != DMESG_HEADER_MAGIC) {
>> +			pr_info("found invalid zone: %s: id %lu, off %lu, size %zu\n",
>> +					zone->name, i, zone->off,
>> +					zone->buffer_size);
>> +			continue;
>> +		}
>> +
>> +		/*
>> +		 * we get the newest zone, and the next one must be the oldest
>> +		 * or unused zone, because we do write one by one like a circle.
>> +		 */
>> +		if (hdr->time.tv_sec >= time.tv_sec) {
>> +			time.tv_sec = hdr->time.tv_sec;
>> +			cxt->dmesg_write_cnt = (i + 1) % cxt->dmesg_max_cnt;
>> +		}
>> +
>> +		if (hdr->reason == KMSG_DUMP_OOPS)
>> +			cxt->oops_counter =
>> +				max(cxt->oops_counter, hdr->counter);
>> +		else
>> +			cxt->panic_counter =
>> +				max(cxt->panic_counter, hdr->counter);
>> +
>> +		if (!atomic_read(&buf->datalen)) {
>> +			pr_debug("found erased zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
>> +					zone->name, i, zone->off,
>> +					zone->buffer_size,
>> +					atomic_read(&buf->datalen));
>> +			continue;
>> +		}
>> +
>> +		if (!is_on_panic())
>> +			zone->should_recover = true;
>> +		pr_debug("found nice zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
>> +				zone->name, i, zone->off,
>> +				zone->buffer_size, atomic_read(&buf->datalen));
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int blkz_recover_dmesg(struct blkz_context *cxt)
>> +{
>> +	int ret;
>> +
>> +	if (!cxt->dbzs)
>> +		return 0;
>> +
>> +	ret = blkz_recover_dmesg_meta(cxt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>> +	ret = blkz_recover_dmesg_data(cxt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>> +	return 0;
>> +recover_fail:
>> +	pr_debug("recover dmesg failed\n");
>> +	return ret;
>> +}
>> +
>> +static inline int blkz_recovery(struct blkz_context *cxt)
>> +{
>> +	int ret = -EBUSY;
>> +
>> +	if (atomic_read(&cxt->recovered))
>> +		return 0;
>> +
>> +	ret = blkz_recover_dmesg(cxt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>> +	pr_debug("recover end!\n");
>> +	atomic_set(&cxt->recovered, 1);
>> +	return 0;
>> +
>> +recover_fail:
>> +	pr_err("recover failed\n");
>> +	return ret;
>> +}
>> +
>> +static int blkz_pstore_open(struct pstore_info *psi)
>> +{
>> +	struct blkz_context *cxt = psi->data;
>> +
>> +	cxt->dmesg_read_cnt = 0;
>> +	return 0;
>> +}
>> +
>> +static inline bool blkz_ok(struct blkz_zone *zone)
>> +{
>> +	if (zone && zone->buffer && buffer_datalen(zone))
>> +		return true;
>> +	return false;
>> +}
>> +
>> +static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>> +		struct blkz_zone *zone)
>> +{
>> +	if (unlikely(!blkz_ok(zone)))
>> +		return 0;
>> +
>> +	atomic_set(&zone->buffer->datalen, 0);
>> +	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>> +}
> 
> cxt is unused?
> 

I  keep this for the future. You can see, it will be used on patch 9.

>> +
>> +static int blkz_pstore_erase(struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +
> 
> Please sanity-check the record->id is in bounds before using it.
> 

I will fix it later.

>> +	switch (record->type) {
>> +	case PSTORE_TYPE_DMESG:
>> +		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
>> +		struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +	struct blkz_buffer *buffer = zone->buffer;
>> +	struct blkz_dmesg_header *hdr =
>> +		(struct blkz_dmesg_header *)buffer->data;
>> +
>> +	hdr->magic = DMESG_HEADER_MAGIC;
>> +	hdr->compressed = record->compressed;
>> +	hdr->time.tv_sec = record->time.tv_sec;
>> +	hdr->time.tv_nsec = record->time.tv_nsec;
>> +	hdr->reason = record->reason;
>> +	if (hdr->reason == KMSG_DUMP_OOPS)
>> +		hdr->counter = ++cxt->oops_counter;
>> +	else
>> +		hdr->counter = ++cxt->panic_counter;
>> +}
>> +
>> +static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
>> +		struct pstore_record *record)
> 
> Instead of "..._do" maybe name this "..._record", since it writes one
> record?
> 

That's a good idea.

>> +{
>> +	size_t size, hlen;
>> +	struct blkz_zone *zone;
>> +	unsigned int zonenum;
>> +
>> +	zonenum = cxt->dmesg_write_cnt;
>> +	zone = cxt->dbzs[zonenum];
>> +	if (unlikely(!zone))
>> +		return -ENOSPC;
>> +	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
>> +
>> +	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
>> +	blkz_write_kmsg_hdr(zone, record);
>> +	hlen = sizeof(struct blkz_dmesg_header);
>> +	size = min_t(size_t, record->size, zone->buffer_size - hlen);
>> +	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
>> +}
>> +
>> +static int notrace blkz_dmesg_write(struct blkz_context *cxt,
>> +		struct pstore_record *record)
>> +{
>> +	int ret;
>> +	struct blkz_info *info = cxt->bzinfo;
>> +
>> +	/*
>> +	 * Out of the various dmesg dump types, pstore/blk is currently designed
>> +	 * to only store crash logs, rather than storing general kernel logs.
>> +	 */
>> +	if (record->reason != KMSG_DUMP_OOPS &&
>> +			record->reason != KMSG_DUMP_PANIC)
>> +		return -EINVAL;
>> +
>> +	/* Skip Oopes when configured to do so. */
>> +	if (record->reason == KMSG_DUMP_OOPS && !info->dump_oops)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * Explicitly only take the first part of any new crash.
>> +	 * If our buffer is larger than kmsg_bytes, this can never happen,
>> +	 * and if our buffer is smaller than kmsg_bytes, we don't want the
>> +	 * report split across multiple records.
>> +	 */
>> +	if (record->part != 1)
>> +		return -ENOSPC;
>> +
>> +	if (!cxt->dbzs)
>> +		return -ENOSPC;
>> +
>> +	ret = blkz_dmesg_write_do(cxt, record);
>> +	if (!ret) {
>> +		pr_debug("try to flush other dirty dmesg zones\n");
>> +		blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
>> +	}
>> +
>> +	/* alway return 0 as we had handled it on buffer */
>> +	return 0;
>> +}
>> +
>> +static int notrace blkz_pstore_write(struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +
>> +	if (record->type == PSTORE_TYPE_DMESG &&
>> +			record->reason == KMSG_DUMP_PANIC)
>> +		atomic_set(&cxt->on_panic, 1);
>> +
>> +	switch (record->type) {
>> +	case PSTORE_TYPE_DMESG:
>> +		return blkz_dmesg_write(cxt, record);
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +#define READ_NEXT_ZONE ((ssize_t)(-1024))
>> +static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>> +{
>> +	struct blkz_zone *zone = NULL;
>> +
>> +	while (cxt->dmesg_read_cnt < cxt->dmesg_max_cnt) {
>> +		zone = cxt->dbzs[cxt->dmesg_read_cnt++];
>> +		if (blkz_ok(zone))
>> +			return zone;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int blkz_read_dmesg_hdr(struct blkz_zone *zone,
>> +		struct pstore_record *record)
>> +{
>> +	struct blkz_buffer *buffer = zone->buffer;
>> +	struct blkz_dmesg_header *hdr =
>> +		(struct blkz_dmesg_header *)buffer->data;
>> +
>> +	if (hdr->magic != DMESG_HEADER_MAGIC)
>> +		return -EINVAL;
>> +	record->compressed = hdr->compressed;
>> +	record->time.tv_sec = hdr->time.tv_sec;
>> +	record->time.tv_nsec = hdr->time.tv_nsec;
>> +	record->reason = hdr->reason;
>> +	record->count = hdr->counter;
>> +	return 0;
>> +}
>> +
>> +static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>> +		struct pstore_record *record)
>> +{
>> +	size_t size, hlen = 0;
>> +
>> +	size = buffer_datalen(zone);
>> +	/* Clear and skip this DMESG record if it has no valid header */
>> +	if (blkz_read_dmesg_hdr(zone, record)) {
>> +		atomic_set(&zone->buffer->datalen, 0);
>> +		atomic_set(&zone->dirty, 0);
>> +		return READ_NEXT_ZONE;
>> +	}
>> +	size -= sizeof(struct blkz_dmesg_header);
>> +
>> +	if (!record->compressed) {
>> +		char *buf = kasprintf(GFP_KERNEL,
>> +				"%s: Total %d times\n",
>> +				record->reason == KMSG_DUMP_OOPS ? "Oops" :
>> +				"Panic", record->count);
>> +		hlen = strlen(buf);
>> +		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
>> +		if (!record->buf) {
>> +			kfree(buf);
>> +			return -ENOMEM;
>> +		}
>> +	} else {
>> +		record->buf = kmalloc(size, GFP_KERNEL);
>> +		if (!record->buf)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
>> +				sizeof(struct blkz_dmesg_header)) < 0)) {
>> +		kfree(record->buf);
>> +		return READ_NEXT_ZONE;
>> +	}
>> +
>> +	return size + hlen;
>> +}
>> +
>> +static ssize_t blkz_pstore_read(struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +	ssize_t (*blkz_read)(struct blkz_zone *zone,
>> +			struct pstore_record *record);
>> +	struct blkz_zone *zone;
>> +	ssize_t ret;
>> +
>> +	/* before read, we must recover from storage */
>> +	ret = blkz_recovery(cxt);
>> +	if (ret)
>> +		return ret;
>> +
>> +next_zone:
>> +	zone = blkz_read_next_zone(cxt);
>> +	if (!zone)
>> +		return 0;
>> +
>> +	record->type = zone->type;
>> +	switch (record->type) {
>> +	case PSTORE_TYPE_DMESG:
>> +		blkz_read = blkz_dmesg_read;
>> +		record->id = cxt->dmesg_read_cnt - 1;
>> +		break;
>> +	default:
>> +		goto next_zone;
>> +	}
>> +
>> +	ret = blkz_read(zone, record);
>> +	if (ret == READ_NEXT_ZONE)
>> +		goto next_zone;
>> +	return ret;
>> +}
>> +
>> +static struct blkz_context blkz_cxt = {
>> +	.bzinfo_lock = __SPIN_LOCK_UNLOCKED(blkz_cxt.bzinfo_lock),
>> +	.recovered = ATOMIC_INIT(0),
>> +	.on_panic = ATOMIC_INIT(0),
>> +	.pstore = {
>> +		.owner = THIS_MODULE,
>> +		.name = MODNAME,
>> +		.open = blkz_pstore_open,
>> +		.read = blkz_pstore_read,
>> +		.write = blkz_pstore_write,
>> +		.erase = blkz_pstore_erase,
>> +	},
>> +};
>> +
>> +static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
>> +		unsigned long *off, size_t size)
>> +{
>> +	struct blkz_info *info = blkz_cxt.bzinfo;
>> +	struct blkz_zone *zone;
>> +	const char *name = pstore_type_to_name(type);
>> +
>> +	if (!size)
>> +		return NULL;
>> +
>> +	if (*off + size > info->total_size) {
>> +		pr_err("no room for %s (0x%zx@0x%lx over 0x%lx)\n",
>> +			name, size, *off, info->total_size);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	zone = kzalloc(sizeof(struct blkz_zone), GFP_KERNEL);
>> +	if (!zone)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	zone->buffer = kmalloc(size, GFP_KERNEL);
>> +	if (!zone->buffer) {
>> +		kfree(zone);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +	memset(zone->buffer, 0xFF, size);
>> +	zone->off = *off;
>> +	zone->name = name;
>> +	zone->type = type;
>> +	zone->buffer_size = size - sizeof(struct blkz_buffer);
>> +	zone->buffer->sig = type ^ BLK_SIG;
>> +	atomic_set(&zone->dirty, 0);
>> +	atomic_set(&zone->buffer->datalen, 0);
>> +
>> +	*off += size;
>> +
>> +	pr_debug("blkzone %s: off 0x%lx, %zu header, %zu data\n", zone->name,
>> +			zone->off, sizeof(*zone->buffer), zone->buffer_size);
>> +	return zone;
>> +}
>> +
>> +static struct blkz_zone **blkz_init_zones(enum pstore_type_id type,
>> +	unsigned long *off, size_t total_size, ssize_t record_size,
>> +	unsigned int *cnt)
>> +{
>> +	struct blkz_info *info = blkz_cxt.bzinfo;
>> +	struct blkz_zone **zones, *zone;
>> +	const char *name = pstore_type_to_name(type);
>> +	int c, i;
>> +
>> +	if (!total_size || !record_size)
>> +		return NULL;
>> +
>> +	if (*off + total_size > info->total_size) {
>> +		pr_err("no room for zones %s (0x%zx@0x%lx over 0x%lx)\n",
>> +			name, total_size, *off, info->total_size);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	c = total_size / record_size;
>> +	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
>> +	if (!zones) {
>> +		pr_err("allocate for zones %s failed\n", name);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +	memset(zones, 0, c * sizeof(*zones));
>> +
>> +	for (i = 0; i < c; i++) {
>> +		zone = blkz_init_zone(type, off, record_size);
>> +		if (!zone || IS_ERR(zone)) {
>> +			pr_err("initialize zones %s failed\n", name);
>> +			while (--i >= 0) {
>> +				kfree(zones[i]->buffer);
>> +				kfree(zones[i]);
>> +			}
>> +			kfree(zones);
>> +			return (void *)zone;
>> +		}
>> +		zones[i] = zone;
>> +	}
>> +
>> +	*cnt = c;
>> +	return zones;
>> +}
>> +
>> +static void blkz_free_zone(struct blkz_zone **blkzone)
>> +{
>> +	struct blkz_zone *zone = *blkzone;
>> +
>> +	if (!zone)
>> +		return;
>> +
>> +	kfree(zone->buffer);
>> +	kfree(zone);
>> +	*blkzone = NULL;
>> +}
>> +
>> +static void blkz_free_zones(struct blkz_zone ***blkzones, unsigned int *cnt)
>> +{
>> +	struct blkz_zone **zones = *blkzones;
>> +
>> +	if (!zones)
>> +		return;
>> +
>> +	while (*cnt > 0) {
>> +		blkz_free_zone(&zones[*cnt]);
>> +		(*cnt)--;
>> +	}
>> +	kfree(zones);
>> +	*blkzones = NULL;
>> +}
>> +
>> +static int blkz_cut_zones(struct blkz_context *cxt)
> 
> What does "cut" mean here? Maybe "alloc" instead?
> 

It seems like cut a cake. It's fine to rename "alloc".

>> +{
>> +	struct blkz_info *info = cxt->bzinfo;
>> +	unsigned long off = 0;
>> +	int err;
>> +	size_t size;
>> +
>> +	size = info->total_size;
>> +	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
>> +			info->dmesg_size, &cxt->dmesg_max_cnt);
>> +	if (IS_ERR(cxt->dbzs)) {
>> +		err = PTR_ERR(cxt->dbzs);
>> +		goto fail_out;
>> +	}
>> +
>> +	return 0;
>> +fail_out:
>> +	return err;
>> +}
>> +
>> +int blkz_register(struct blkz_info *info)
>> +{
>> +	int err = -EINVAL;
>> +	struct blkz_context *cxt = &blkz_cxt;
>> +	struct module *owner = info->owner;
>> +
>> +	if (!info->total_size) {
>> +		pr_warn("the total size must be non-zero\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!info->dmesg_size) {
>> +		pr_warn("at least one of the records be non-zero\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!info->name || !info->name[0])
>> +		return -EINVAL;
>> +
>> +	if (info->total_size < 4096) {
>> +		pr_err("total size must be greater than 4096 bytes\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +#define check_size(name, size) {					\
>> +		if (info->name > 0 && info->name < (size)) {		\
>> +			pr_err(#name " must be over %d\n", (size));	\
>> +			return -EINVAL;					\
>> +		}							\
>> +		if (info->name & (size - 1)) {				\
>> +			pr_err(#name " must be a multiple of %d\n",	\
>> +					(size));			\
>> +			return -EINVAL;					\
>> +		}							\
>> +	}
>> +
>> +	check_size(total_size, 4096);
>> +	check_size(dmesg_size, SECTOR_SIZE);
>> +
>> +#undef check_size
>> +
>> +	/*
>> +	 * the @read and @write must be applied.
>> +	 * if no @read, pstore may mount failed.
>> +	 * if no @write, pstore do not support to remove record file.
>> +	 */
>> +	if (!info->read || !info->write) {
>> +		pr_err("no valid general read/write interface\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	spin_lock(&cxt->bzinfo_lock);
>> +	if (cxt->bzinfo) {
>> +		pr_warn("blk '%s' already loaded: ignoring '%s'\n",
>> +				cxt->bzinfo->name, info->name);
>> +		spin_unlock(&cxt->bzinfo_lock);
>> +		return -EBUSY;
>> +	}
>> +	cxt->bzinfo = info;
>> +	spin_unlock(&cxt->bzinfo_lock);
>> +
>> +	if (owner && !try_module_get(owner)) {
>> +		err = -EBUSY;
>> +		goto fail_out;
>> +	}
>> +
>> +	pr_debug("register %s with properties:\n", info->name);
>> +	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>> +	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
>> +
>> +	err = blkz_cut_zones(cxt);
>> +	if (err) {
>> +		pr_err("cut zones fialed\n");
> 
> typo: "failed"
> 

I will fix it later.

>> +		goto put_module;
>> +	}
>> +
>> +	if (info->dmesg_size) {
>> +		cxt->pstore.bufsize = cxt->dbzs[0]->buffer_size -
>> +			sizeof(struct blkz_dmesg_header);
>> +		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
>> +		if (!cxt->pstore.buf) {
>> +			err = -ENOMEM;
> 
> I think the allocated zones need to be freed here.
> 

You are right. I will fix it later.

>> +			goto put_module;
>> +		}
>> +	}
>> +	cxt->pstore.data = cxt;
>> +	if (info->dmesg_size)
>> +		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
>> +
>> +	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
>> +			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
>> +			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
>> +
>> +	err = pstore_register(&cxt->pstore);
>> +	if (err) {
>> +		pr_err("registering with pstore failed\n");
> 
> Also here?
> 

You are right. I will fix it later.

>> +		goto free_pstore_buf;
>> +	}
>> +
>> +	module_put(owner);
>> +	return 0;
>> +
>> +free_pstore_buf:
>> +	kfree(cxt->pstore.buf);
>> +put_module:
>> +	module_put(owner);
>> +fail_out:
>> +	spin_lock(&blkz_cxt.bzinfo_lock);
>> +	blkz_cxt.bzinfo = NULL;
>> +	spin_unlock(&blkz_cxt.bzinfo_lock);
>> +	return err;
>> +}
>> +EXPORT_SYMBOL_GPL(blkz_register);
>> +
>> +void blkz_unregister(struct blkz_info *info)
>> +{
>> +	struct blkz_context *cxt = &blkz_cxt;
>> +
>> +	pstore_unregister(&cxt->pstore);
>> +	kfree(cxt->pstore.buf);
>> +	cxt->pstore.bufsize = 0;
>> +
>> +	spin_lock(&cxt->bzinfo_lock);
>> +	blkz_cxt.bzinfo = NULL;
>> +	spin_unlock(&cxt->bzinfo_lock);
>> +
>> +	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
>> +}
>> +EXPORT_SYMBOL_GPL(blkz_unregister);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>> +MODULE_DESCRIPTION("Block device Oops/Panic logger");
>> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
>> new file mode 100644
>> index 000000000000..589d276fa4e4
>> --- /dev/null
>> +++ b/include/linux/pstore_blk.h
>> @@ -0,0 +1,62 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef __PSTORE_BLK_H_
>> +#define __PSTORE_BLK_H_
>> +
>> +#include <linux/types.h>
>> +#include <linux/blkdev.h>
>> +
>> +/**
>> + * struct blkz_info - backend blkzone driver structure
>> + *
>> + * @owner:
>> + *	Module which is responsible for this backend driver.
>> + * @name:
>> + *	Name of the backend driver.
>> + * @total_size:
>> + *	The total size in bytes pstore/blk can use. It must be greater than
>> + *	4096 and be multiple of 4096.
>> + * @dmesg_size:
>> + *	The size of each zones for dmesg (oops & panic). Zero means disabled,
>> + *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
>> + * @dump_oops:
>> + *	Dump oops and panic log or only panic.
>> + * @read, @write:
>> + *	The general (not panic) read/write operation. It's required unless you
>> + *	are block device and supply valid @bdev. In this case, blkzone will
>> + *	replace it as a general read/write interface.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, the number of bytes read/write should be returned.
>> + *	On error, negative number should be returned.
>> + * @panic_write:
>> + *	The write operation only used for panic. It's optional if you do not
>> + *	care panic record. If panic occur but blkzone do not recover yet, the
>> + *	first zone of dmesg is used.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, the number of bytes write should be returned.
>> + *	On error, negative number should be returned.
>> + */
>> +typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
>> +typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
>> +struct blkz_info {
>> +	struct module *owner;
>> +	const char *name;
>> +
>> +	unsigned long total_size;
>> +	unsigned long dmesg_size;
>> +	int dump_oops;
>> +	blkz_read_op read;
>> +	blkz_write_op write;
>> +	blkz_write_op panic_write;
>> +};
>> +
>> +extern int blkz_register(struct blkz_info *info);
>> +extern void blkz_unregister(struct blkz_info *info);
>> +
>> +#endif
>> -- 
>> 1.9.1
>>
> 

-- 
liaoweixiong

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-02-26  0:52   ` Kees Cook
  2020-02-27  8:21     ` liaoweixiong
@ 2020-03-09  0:52     ` WeiXiong Liao
  1 sibling, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-09  0:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

Did I lose my e-mail? I am trying to resend it.
I am waiting for your reply and more suggestions for other patches.

On 2020/2/26 AM 8:52, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
>> pstore/blk is similar to pstore/ram, but dump log to block devices
>> rather than persistent ram.
> 
> Thanks for waiting for me to get to this review! Notes below...
> 
>>
>> Why do we need pstore/blk?
>> 1. Most embedded intelligent equipment have no persistent ram, which
>> increases costs. We perfer to cheaper solutions, like block devices.
>> 2. Do not any equipment have battery, which means that it lost all data
>> on general ram if power failure. Pstore has little to do for these
>> equipments.
>>
>> pstore/blk is one of series patches, and provides the zones management
>> of partition of block device or non-block device likes mtd devices. It
>> only supports dmesg recorder right now.
>>
>> To make pstore/blk work, the block/non-block driver should calls
>> blkz_register() and call blkz_unregister() when exits. On other patches
>> of series, a better wrapper for pstore/blk, named blkoops, will be
>> there.
>>
>> It's different with pstore/ram, pstore/blk relies on read/write APIs
>> from device driver, especially, write operation for panic record.
>>
>> Recommend that, the block/non-block driver should register to pstore/blk
>> only after devices have registered to Linux and ready to work.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  fs/pstore/Kconfig          |  10 +
>>  fs/pstore/Makefile         |   3 +
>>  fs/pstore/blkzone.c        | 948 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/pstore_blk.h |  62 +++
>>  4 files changed, 1023 insertions(+)
>>  create mode 100644 fs/pstore/blkzone.c
>>  create mode 100644 include/linux/pstore_blk.h
>>
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index 8f0369aad22a..536fde9e13e8 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -153,3 +153,13 @@ config PSTORE_RAM
>>  	  "ramoops.ko".
>>  
>>  	  For more information, see Documentation/admin-guide/ramoops.rst.
>> +
>> +config PSTORE_BLK
>> +	tristate "Log panic/oops to a block device"
>> +	depends on PSTORE
>> +	depends on BLOCK
>> +	help
>> +	  This enables panic and oops message to be logged to a block dev
>> +	  where it can be read back at some later point.
> 
> I think more accurate would be:
> "... read back on the next boot via pstorefs."
> 

I will fix it later.

>> +
>> +	  If unsure, say N.
>> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
>> index 967b5891f325..0ee2fc8d1bfb 100644
>> --- a/fs/pstore/Makefile
>> +++ b/fs/pstore/Makefile
>> @@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
>>  
>>  ramoops-objs += ram.o ram_core.o
>>  obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
>> +
>> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
>> +pstore_blk-y += blkzone.o
> 
> Why this dance with files? I would just expect:
> 
> obj-$(CONFIG_PSTORE_BLK)     += blkzone.o
> 

This makes the built module named blkzone.ko rather than
pstore_blk.ko.

> (Regardless, please keep tabs lined up in this file)
>

OK.

>> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
>> new file mode 100644
>> index 000000000000..f77f612b50ba
>> --- /dev/null
>> +++ b/fs/pstore/blkzone.c
>> @@ -0,0 +1,948 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#define MODNAME "pstore-blk"
>> +#define pr_fmt(fmt) MODNAME ": " fmt
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/slab.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/pstore.h>
>> +#include <linux/mount.h>
>> +#include <linux/printk.h>
>> +#include <linux/fs.h>
>> +#include <linux/pstore_blk.h>
>> +#include <linux/kdev_t.h>
>> +#include <linux/device.h>
>> +#include <linux/namei.h>
>> +#include <linux/fcntl.h>
>> +#include <linux/uio.h>
>> +#include <linux/writeback.h>
>> +
>> +/**
>> + * struct blkz_head - head of zone to flush to storage
>> + *
>> + * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
>> + * @datalen: length of data in @data
>> + * @data: zone data.
>> + */
>> +struct blkz_buffer {
>> +#define BLK_SIG (0x43474244) /* DBGC */
> 
> I was going to suggest extracting PERSISTENT_RAM_SIG, renaming it and
> using it in here and in ram_core.c, but then I realize they're not
> marking the same structure. How about choosing a new magic sig for the
> blkzone data header?
> 

That's OK to me. I don't know if there is a rule to get a new magic?
In addition, all members of this structure are the same as
struct persistent_ram_buffer after patch 2. Maybe it's a good idea to
extract it
if you want to merge ramoops and pstore/blk.

>> +	uint32_t sig;
>> +	atomic_t datalen;
>> +	uint8_t data[];
>> +};
>> +
>> +/**
>> + * struct blkz_dmesg_header: dmesg information
> 
> This is the on-disk structure also?
> 
Yes. The structure blkz_buffer is a generic header for all recorder
zone, and the
structure blkz_dmesg_header is a header for dmesg, saved in
blkz_buffer->data.
The dmesg recorder use it to save it's specific attributes.

>> + *
>> + * @magic: magic num for dmesg header
>> + * @time: trigger time
>> + * @compressed: whether conpressed
>> + * @count: oops/panic counter
>> + * @reason: identify oops or panic
>> + */
>> +struct blkz_dmesg_header {
>> +#define DMESG_HEADER_MAGIC 0x4dfc3ae5
> 
> How was this magic chosen?
> 

It's a random number. Maybe should I chose a meaningful magic?

>> +	uint32_t magic;
>> +	struct timespec64 time;
>> +	bool compressed;
>> +	uint32_t counter;
>> +	enum kmsg_dump_reason reason;
>> +	uint8_t data[0];
> 
> Please use [] instead of [0].
> 

OK, I will fix it later.

>> +};
>> +
>> +/**
>> + * struct blkz_zone - zone information
>> + * @off:
>> + *	zone offset of block device
>> + * @type:
>> + *	frontent type for this zone
>> + * @name:
>> + *	frontent name for this zone
>> + * @buffer:
>> + *	pointer to data buffer managed by this zone
>> + * @oldbuf:
>> + *	pointer to old data buffer.
>> + * @buffer_size:
>> + *	bytes in @buffer->data
>> + * @should_recover:
>> + *	should recover from storage
>> + * @dirty:
>> + *	mark whether the data in @buffer are dirty (not flush to storage yet)
>> + */
> 
> Thank you for the kerndoc! :) Is it linked to from any .rst files?
> 

I don't get your words. There is a document on the 6th patch. I don't know
whether it is what you want?

>> +struct blkz_zone {
>> +	unsigned long off;
> 
> Should this be loff_t?
> 

Yes. I will fix it and other related codes.

>> +	const char *name;
>> +	enum pstore_type_id type;
>> +
>> +	struct blkz_buffer *buffer;
>> +	struct blkz_buffer *oldbuf;
>> +	size_t buffer_size;
>> +	bool should_recover;
>> +	atomic_t dirty;
>> +};
>> +
>> +struct blkz_context {
>> +	struct blkz_zone **dbzs;	/* dmesg block zones */
>> +	unsigned int dmesg_max_cnt;
>> +	unsigned int dmesg_read_cnt;
>> +	unsigned int dmesg_write_cnt;
>> +	/*
>> +	 * the counter should be recovered when recover.
>> +	 * It records the oops/panic times after burning rather than booting.
>> +	 */
>> +	unsigned int oops_counter;
>> +	unsigned int panic_counter;
>> +	atomic_t recovered;
>> +	atomic_t on_panic;
>> +
>> +	/*
>> +	 * bzinfo_lock just protects "bzinfo" during calls to
>> +	 * blkz_register/blkz_unregister
>> +	 */
>> +	spinlock_t bzinfo_lock;
>> +	struct blkz_info *bzinfo;
>> +	struct pstore_info pstore;
>> +};
>> +static struct blkz_context blkz_cxt;
>> +
>> +enum blkz_flush_mode {
>> +	FLUSH_NONE = 0,
>> +	FLUSH_PART,
>> +	FLUSH_META,
>> +	FLUSH_ALL,
>> +};
>> +
>> +static inline int buffer_datalen(struct blkz_zone *zone)
>> +{
>> +	return atomic_read(&zone->buffer->datalen);
>> +}
>> +
>> +static inline bool is_on_panic(void)
>> +{
>> +	struct blkz_context *cxt = &blkz_cxt;
>> +
>> +	return atomic_read(&cxt->on_panic);
>> +}
>> +
>> +static int blkz_zone_read(struct blkz_zone *zone, char *buf,
>> +		size_t len, unsigned long off)
>> +{
>> +	if (!buf || !zone->buffer)
>> +		return -EINVAL;
>> +	if (off > zone->buffer_size)
>> +		return -EINVAL;
>> +	len = min_t(size_t, len, zone->buffer_size - off);
>> +	memcpy(buf, zone->buffer->data + off, len);
> 
> Should the remainder of the buffer be zeroed if
> 	len > zone->buffer_size - off
> ? If not, I was expecting this to return how much was copied.
> 

You are right. It should return how much was copied.

>> +	return 0;
>> +}
>> +
>> +static int blkz_zone_write(struct blkz_zone *zone,
>> +		enum blkz_flush_mode flush_mode, const char *buf,
>> +		size_t len, unsigned long off)
>> +{
>> +	struct blkz_info *info = blkz_cxt.bzinfo;
>> +	ssize_t wcnt = 0;
>> +	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
>> +	size_t wlen;
>> +
>> +	if (off > zone->buffer_size)
>> +		return -EINVAL;
>> +	wlen = min_t(size_t, len, zone->buffer_size - off);
>> +	if (buf && wlen) {
>> +		memcpy(zone->buffer->data + off, buf, wlen);
>> +		atomic_set(&zone->buffer->datalen, wlen + off);
>> +	}
> 
> If you're expecting concurrent writers (use of atomic_set(), I would
> expect the whole write to be locked instead. (i.e. what happens if
> multiple callers call blkz_zone_write()?)
> 

I don't agree with it. The datalen will be updated everywhere. It's useless
to lock here.

One more things. During the analysis, I found another problem.
Removing old files will cause new logs to be lost. Take console recorder as
am example. After new rebooting, new logs are saved to buf while old
logs are
saved to old_buf. If we remove old file at that time, not only old_buf
is freed, but
also length of buf for new data is reset to zero. The ramoops may also
has this
problem.

>> +
>> +	/* avoid to damage old records */
>> +	if (!is_on_panic() && !atomic_read(&blkz_cxt.recovered))
>> +		goto set_dirty;
>> +
>> +	writeop = is_on_panic() ? info->panic_write : info->write;
>> +	if (!writeop)
>> +		goto set_dirty;
>> +
>> +	switch (flush_mode) {
>> +	case FLUSH_NONE:
>> +		if (unlikely(buf && wlen))
>> +			goto set_dirty;
>> +		return 0;
>> +	case FLUSH_PART:
>> +		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
>> +				zone->off + sizeof(*zone->buffer) + off);
>> +		if (wcnt != wlen)
>> +			goto set_dirty;
>> +		/* fallthrough */
>> +	case FLUSH_META:
>> +		wlen = sizeof(struct blkz_buffer);
>> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
>> +		if (wcnt != wlen)
>> +			goto set_dirty;
>> +		break;
>> +	case FLUSH_ALL:
>> +		wlen = zone->buffer_size + sizeof(*zone->buffer);
>> +		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
>> +		if (wcnt != wlen)
>> +			goto set_dirty;
>> +		break;
>> +	}
>> +
>> +	return 0;
>> +set_dirty:
>> +	atomic_set(&zone->dirty, true);
>> +	return -EBUSY;
>> +}
>> +
>> +static int blkz_flush_dirty_zone(struct blkz_zone *zone)
>> +{
>> +	int ret;
>> +
>> +	if (!zone)
>> +		return -EINVAL;
>> +
>> +	if (!atomic_read(&zone->dirty))
>> +		return 0;
>> +
>> +	if (!atomic_read(&blkz_cxt.recovered))
>> +		return -EBUSY;
>> +
>> +	ret = blkz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
>> +	if (!ret)
>> +		atomic_set(&zone->dirty, false);
>> +	return ret;
>> +}
>> +
>> +static int blkz_flush_dirty_zones(struct blkz_zone **zones, unsigned int cnt)
>> +{
>> +	int i, ret;
>> +	struct blkz_zone *zone;
>> +
>> +	if (!zones)
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < cnt; i++) {
>> +		zone = zones[i];
>> +		if (!zone)
>> +			return -EINVAL;
>> +		ret = blkz_flush_dirty_zone(zone);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * blkz_move_zone: move data from a old zone to a new zone
>> + *
>> + * @old: the old zone
>> + * @new: the new zone
>> + *
>> + * NOTE:
>> + *	Call blkz_zone_write to copy and flush data. If it failed, we
>> + *	should reset new->dirty, because the new zone not really dirty.
>> + */
>> +static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
>> +{
>> +	const char *data = (const char *)old->buffer->data;
>> +	int ret;
>> +
>> +	ret = blkz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
>> +	if (ret) {
>> +		atomic_set(&new->buffer->datalen, 0);
>> +		atomic_set(&new->dirty, false);
>> +		return ret;
>> +	}
>> +	atomic_set(&old->buffer->datalen, 0);
>> +	return 0;
>> +}
>> +
>> +static int blkz_recover_dmesg_data(struct blkz_context *cxt)
> 
> What does "recover" mean in this context? Is this "read from storage"?

Yes. "recover" means reading data back from storage.

> 
>> +{
>> +	struct blkz_info *info = cxt->bzinfo;
>> +	struct blkz_zone *zone = NULL;
>> +	struct blkz_buffer *buf;
>> +	unsigned long i;
>> +	ssize_t rcnt;
>> +
>> +	if (!info->read)
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
>> +		zone = cxt->dbzs[i];
>> +		if (unlikely(!zone))
>> +			return -EINVAL;
>> +		if (atomic_read(&zone->dirty)) {
>> +			unsigned int wcnt = cxt->dmesg_write_cnt;
>> +			struct blkz_zone *new = cxt->dbzs[wcnt];
>> +			int ret;
>> +
>> +			ret = blkz_move_zone(zone, new);
>> +			if (ret) {
>> +				pr_err("move zone from %lu to %d failed\n",
>> +						i, wcnt);
>> +				return ret;
>> +			}
>> +			cxt->dmesg_write_cnt = (wcnt + 1) % cxt->dmesg_max_cnt;
>> +		}
>> +		if (!zone->should_recover)
>> +			continue;
>> +		buf = zone->buffer;
>> +		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
>> +				zone->off);
>> +		if (rcnt != zone->buffer_size + sizeof(*buf))
>> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
>> +	}
>> +	return 0;
>> +}
>> +
>> +/*
>> + * blkz_recover_dmesg_meta: recover metadata of dmesg
>> + *
>> + * Recover metadata as follow:
>> + * @cxt->dmesg_write_cnt
>> + * @cxt->oops_counter
>> + * @cxt->panic_counter
>> + */
>> +static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
>> +{
>> +	struct blkz_info *info = cxt->bzinfo;
>> +	struct blkz_zone *zone;
>> +	size_t rcnt, len;
>> +	struct blkz_buffer *buf;
>> +	struct blkz_dmesg_header *hdr;
>> +	struct timespec64 time = {0};
>> +	unsigned long i;
>> +	/*
>> +	 * Recover may on panic, we can't allocate any memory by kmalloc.
>> +	 * So, we use local array instead.
>> +	 */
>> +	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
>> +
>> +	if (!info->read)
>> +		return -EINVAL;
>> +
>> +	len = sizeof(*buf) + sizeof(*hdr);
>> +	buf = (struct blkz_buffer *)buffer_header;
>> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
>> +		zone = cxt->dbzs[i];
>> +		if (unlikely(!zone))
>> +			return -EINVAL;
>> +
>> +		rcnt = info->read((char *)buf, len, zone->off);
>> +		if (rcnt != len) {
>> +			pr_err("read %s with id %lu failed\n", zone->name, i);
>> +			return (int)rcnt < 0 ? (int)rcnt : -EIO;
>> +		}
>> +
>> +		if (buf->sig != zone->buffer->sig) {
>> +			pr_debug("no valid data in dmesg zone %lu\n", i);
>> +			continue;
>> +		}
>> +
>> +		if (zone->buffer_size < atomic_read(&buf->datalen)) {
>> +			pr_info("found overtop zone: %s: id %lu, off %lu, size %zu\n",
>> +					zone->name, i, zone->off,
>> +					zone->buffer_size);
>> +			continue;
>> +		}
>> +
>> +		hdr = (struct blkz_dmesg_header *)buf->data;
>> +		if (hdr->magic != DMESG_HEADER_MAGIC) {
>> +			pr_info("found invalid zone: %s: id %lu, off %lu, size %zu\n",
>> +					zone->name, i, zone->off,
>> +					zone->buffer_size);
>> +			continue;
>> +		}
>> +
>> +		/*
>> +		 * we get the newest zone, and the next one must be the oldest
>> +		 * or unused zone, because we do write one by one like a circle.
>> +		 */
>> +		if (hdr->time.tv_sec >= time.tv_sec) {
>> +			time.tv_sec = hdr->time.tv_sec;
>> +			cxt->dmesg_write_cnt = (i + 1) % cxt->dmesg_max_cnt;
>> +		}
>> +
>> +		if (hdr->reason == KMSG_DUMP_OOPS)
>> +			cxt->oops_counter =
>> +				max(cxt->oops_counter, hdr->counter);
>> +		else
>> +			cxt->panic_counter =
>> +				max(cxt->panic_counter, hdr->counter);
>> +
>> +		if (!atomic_read(&buf->datalen)) {
>> +			pr_debug("found erased zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
>> +					zone->name, i, zone->off,
>> +					zone->buffer_size,
>> +					atomic_read(&buf->datalen));
>> +			continue;
>> +		}
>> +
>> +		if (!is_on_panic())
>> +			zone->should_recover = true;
>> +		pr_debug("found nice zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
>> +				zone->name, i, zone->off,
>> +				zone->buffer_size, atomic_read(&buf->datalen));
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int blkz_recover_dmesg(struct blkz_context *cxt)
>> +{
>> +	int ret;
>> +
>> +	if (!cxt->dbzs)
>> +		return 0;
>> +
>> +	ret = blkz_recover_dmesg_meta(cxt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>> +	ret = blkz_recover_dmesg_data(cxt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>> +	return 0;
>> +recover_fail:
>> +	pr_debug("recover dmesg failed\n");
>> +	return ret;
>> +}
>> +
>> +static inline int blkz_recovery(struct blkz_context *cxt)
>> +{
>> +	int ret = -EBUSY;
>> +
>> +	if (atomic_read(&cxt->recovered))
>> +		return 0;
>> +
>> +	ret = blkz_recover_dmesg(cxt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>> +	pr_debug("recover end!\n");
>> +	atomic_set(&cxt->recovered, 1);
>> +	return 0;
>> +
>> +recover_fail:
>> +	pr_err("recover failed\n");
>> +	return ret;
>> +}
>> +
>> +static int blkz_pstore_open(struct pstore_info *psi)
>> +{
>> +	struct blkz_context *cxt = psi->data;
>> +
>> +	cxt->dmesg_read_cnt = 0;
>> +	return 0;
>> +}
>> +
>> +static inline bool blkz_ok(struct blkz_zone *zone)
>> +{
>> +	if (zone && zone->buffer && buffer_datalen(zone))
>> +		return true;
>> +	return false;
>> +}
>> +
>> +static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>> +		struct blkz_zone *zone)
>> +{
>> +	if (unlikely(!blkz_ok(zone)))
>> +		return 0;
>> +
>> +	atomic_set(&zone->buffer->datalen, 0);
>> +	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>> +}
> 
> cxt is unused?
> 

I  keep this for the future. You can see, it will be used on patch 9.

>> +
>> +static int blkz_pstore_erase(struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +
> 
> Please sanity-check the record->id is in bounds before using it.
> 

I will fix it later.

>> +	switch (record->type) {
>> +	case PSTORE_TYPE_DMESG:
>> +		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
>> +		struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +	struct blkz_buffer *buffer = zone->buffer;
>> +	struct blkz_dmesg_header *hdr =
>> +		(struct blkz_dmesg_header *)buffer->data;
>> +
>> +	hdr->magic = DMESG_HEADER_MAGIC;
>> +	hdr->compressed = record->compressed;
>> +	hdr->time.tv_sec = record->time.tv_sec;
>> +	hdr->time.tv_nsec = record->time.tv_nsec;
>> +	hdr->reason = record->reason;
>> +	if (hdr->reason == KMSG_DUMP_OOPS)
>> +		hdr->counter = ++cxt->oops_counter;
>> +	else
>> +		hdr->counter = ++cxt->panic_counter;
>> +}
>> +
>> +static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
>> +		struct pstore_record *record)
> 
> Instead of "..._do" maybe name this "..._record", since it writes one
> record?
> 

That's a good idea.

>> +{
>> +	size_t size, hlen;
>> +	struct blkz_zone *zone;
>> +	unsigned int zonenum;
>> +
>> +	zonenum = cxt->dmesg_write_cnt;
>> +	zone = cxt->dbzs[zonenum];
>> +	if (unlikely(!zone))
>> +		return -ENOSPC;
>> +	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
>> +
>> +	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
>> +	blkz_write_kmsg_hdr(zone, record);
>> +	hlen = sizeof(struct blkz_dmesg_header);
>> +	size = min_t(size_t, record->size, zone->buffer_size - hlen);
>> +	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
>> +}
>> +
>> +static int notrace blkz_dmesg_write(struct blkz_context *cxt,
>> +		struct pstore_record *record)
>> +{
>> +	int ret;
>> +	struct blkz_info *info = cxt->bzinfo;
>> +
>> +	/*
>> +	 * Out of the various dmesg dump types, pstore/blk is currently designed
>> +	 * to only store crash logs, rather than storing general kernel logs.
>> +	 */
>> +	if (record->reason != KMSG_DUMP_OOPS &&
>> +			record->reason != KMSG_DUMP_PANIC)
>> +		return -EINVAL;
>> +
>> +	/* Skip Oopes when configured to do so. */
>> +	if (record->reason == KMSG_DUMP_OOPS && !info->dump_oops)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * Explicitly only take the first part of any new crash.
>> +	 * If our buffer is larger than kmsg_bytes, this can never happen,
>> +	 * and if our buffer is smaller than kmsg_bytes, we don't want the
>> +	 * report split across multiple records.
>> +	 */
>> +	if (record->part != 1)
>> +		return -ENOSPC;
>> +
>> +	if (!cxt->dbzs)
>> +		return -ENOSPC;
>> +
>> +	ret = blkz_dmesg_write_do(cxt, record);
>> +	if (!ret) {
>> +		pr_debug("try to flush other dirty dmesg zones\n");
>> +		blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
>> +	}
>> +
>> +	/* alway return 0 as we had handled it on buffer */
>> +	return 0;
>> +}
>> +
>> +static int notrace blkz_pstore_write(struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +
>> +	if (record->type == PSTORE_TYPE_DMESG &&
>> +			record->reason == KMSG_DUMP_PANIC)
>> +		atomic_set(&cxt->on_panic, 1);
>> +
>> +	switch (record->type) {
>> +	case PSTORE_TYPE_DMESG:
>> +		return blkz_dmesg_write(cxt, record);
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +#define READ_NEXT_ZONE ((ssize_t)(-1024))
>> +static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>> +{
>> +	struct blkz_zone *zone = NULL;
>> +
>> +	while (cxt->dmesg_read_cnt < cxt->dmesg_max_cnt) {
>> +		zone = cxt->dbzs[cxt->dmesg_read_cnt++];
>> +		if (blkz_ok(zone))
>> +			return zone;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int blkz_read_dmesg_hdr(struct blkz_zone *zone,
>> +		struct pstore_record *record)
>> +{
>> +	struct blkz_buffer *buffer = zone->buffer;
>> +	struct blkz_dmesg_header *hdr =
>> +		(struct blkz_dmesg_header *)buffer->data;
>> +
>> +	if (hdr->magic != DMESG_HEADER_MAGIC)
>> +		return -EINVAL;
>> +	record->compressed = hdr->compressed;
>> +	record->time.tv_sec = hdr->time.tv_sec;
>> +	record->time.tv_nsec = hdr->time.tv_nsec;
>> +	record->reason = hdr->reason;
>> +	record->count = hdr->counter;
>> +	return 0;
>> +}
>> +
>> +static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>> +		struct pstore_record *record)
>> +{
>> +	size_t size, hlen = 0;
>> +
>> +	size = buffer_datalen(zone);
>> +	/* Clear and skip this DMESG record if it has no valid header */
>> +	if (blkz_read_dmesg_hdr(zone, record)) {
>> +		atomic_set(&zone->buffer->datalen, 0);
>> +		atomic_set(&zone->dirty, 0);
>> +		return READ_NEXT_ZONE;
>> +	}
>> +	size -= sizeof(struct blkz_dmesg_header);
>> +
>> +	if (!record->compressed) {
>> +		char *buf = kasprintf(GFP_KERNEL,
>> +				"%s: Total %d times\n",
>> +				record->reason == KMSG_DUMP_OOPS ? "Oops" :
>> +				"Panic", record->count);
>> +		hlen = strlen(buf);
>> +		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
>> +		if (!record->buf) {
>> +			kfree(buf);
>> +			return -ENOMEM;
>> +		}
>> +	} else {
>> +		record->buf = kmalloc(size, GFP_KERNEL);
>> +		if (!record->buf)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
>> +				sizeof(struct blkz_dmesg_header)) < 0)) {
>> +		kfree(record->buf);
>> +		return READ_NEXT_ZONE;
>> +	}
>> +
>> +	return size + hlen;
>> +}
>> +
>> +static ssize_t blkz_pstore_read(struct pstore_record *record)
>> +{
>> +	struct blkz_context *cxt = record->psi->data;
>> +	ssize_t (*blkz_read)(struct blkz_zone *zone,
>> +			struct pstore_record *record);
>> +	struct blkz_zone *zone;
>> +	ssize_t ret;
>> +
>> +	/* before read, we must recover from storage */
>> +	ret = blkz_recovery(cxt);
>> +	if (ret)
>> +		return ret;
>> +
>> +next_zone:
>> +	zone = blkz_read_next_zone(cxt);
>> +	if (!zone)
>> +		return 0;
>> +
>> +	record->type = zone->type;
>> +	switch (record->type) {
>> +	case PSTORE_TYPE_DMESG:
>> +		blkz_read = blkz_dmesg_read;
>> +		record->id = cxt->dmesg_read_cnt - 1;
>> +		break;
>> +	default:
>> +		goto next_zone;
>> +	}
>> +
>> +	ret = blkz_read(zone, record);
>> +	if (ret == READ_NEXT_ZONE)
>> +		goto next_zone;
>> +	return ret;
>> +}
>> +
>> +static struct blkz_context blkz_cxt = {
>> +	.bzinfo_lock = __SPIN_LOCK_UNLOCKED(blkz_cxt.bzinfo_lock),
>> +	.recovered = ATOMIC_INIT(0),
>> +	.on_panic = ATOMIC_INIT(0),
>> +	.pstore = {
>> +		.owner = THIS_MODULE,
>> +		.name = MODNAME,
>> +		.open = blkz_pstore_open,
>> +		.read = blkz_pstore_read,
>> +		.write = blkz_pstore_write,
>> +		.erase = blkz_pstore_erase,
>> +	},
>> +};
>> +
>> +static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
>> +		unsigned long *off, size_t size)
>> +{
>> +	struct blkz_info *info = blkz_cxt.bzinfo;
>> +	struct blkz_zone *zone;
>> +	const char *name = pstore_type_to_name(type);
>> +
>> +	if (!size)
>> +		return NULL;
>> +
>> +	if (*off + size > info->total_size) {
>> +		pr_err("no room for %s (0x%zx@0x%lx over 0x%lx)\n",
>> +			name, size, *off, info->total_size);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	zone = kzalloc(sizeof(struct blkz_zone), GFP_KERNEL);
>> +	if (!zone)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	zone->buffer = kmalloc(size, GFP_KERNEL);
>> +	if (!zone->buffer) {
>> +		kfree(zone);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +	memset(zone->buffer, 0xFF, size);
>> +	zone->off = *off;
>> +	zone->name = name;
>> +	zone->type = type;
>> +	zone->buffer_size = size - sizeof(struct blkz_buffer);
>> +	zone->buffer->sig = type ^ BLK_SIG;
>> +	atomic_set(&zone->dirty, 0);
>> +	atomic_set(&zone->buffer->datalen, 0);
>> +
>> +	*off += size;
>> +
>> +	pr_debug("blkzone %s: off 0x%lx, %zu header, %zu data\n", zone->name,
>> +			zone->off, sizeof(*zone->buffer), zone->buffer_size);
>> +	return zone;
>> +}
>> +
>> +static struct blkz_zone **blkz_init_zones(enum pstore_type_id type,
>> +	unsigned long *off, size_t total_size, ssize_t record_size,
>> +	unsigned int *cnt)
>> +{
>> +	struct blkz_info *info = blkz_cxt.bzinfo;
>> +	struct blkz_zone **zones, *zone;
>> +	const char *name = pstore_type_to_name(type);
>> +	int c, i;
>> +
>> +	if (!total_size || !record_size)
>> +		return NULL;
>> +
>> +	if (*off + total_size > info->total_size) {
>> +		pr_err("no room for zones %s (0x%zx@0x%lx over 0x%lx)\n",
>> +			name, total_size, *off, info->total_size);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	c = total_size / record_size;
>> +	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
>> +	if (!zones) {
>> +		pr_err("allocate for zones %s failed\n", name);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +	memset(zones, 0, c * sizeof(*zones));
>> +
>> +	for (i = 0; i < c; i++) {
>> +		zone = blkz_init_zone(type, off, record_size);
>> +		if (!zone || IS_ERR(zone)) {
>> +			pr_err("initialize zones %s failed\n", name);
>> +			while (--i >= 0) {
>> +				kfree(zones[i]->buffer);
>> +				kfree(zones[i]);
>> +			}
>> +			kfree(zones);
>> +			return (void *)zone;
>> +		}
>> +		zones[i] = zone;
>> +	}
>> +
>> +	*cnt = c;
>> +	return zones;
>> +}
>> +
>> +static void blkz_free_zone(struct blkz_zone **blkzone)
>> +{
>> +	struct blkz_zone *zone = *blkzone;
>> +
>> +	if (!zone)
>> +		return;
>> +
>> +	kfree(zone->buffer);
>> +	kfree(zone);
>> +	*blkzone = NULL;
>> +}
>> +
>> +static void blkz_free_zones(struct blkz_zone ***blkzones, unsigned int *cnt)
>> +{
>> +	struct blkz_zone **zones = *blkzones;
>> +
>> +	if (!zones)
>> +		return;
>> +
>> +	while (*cnt > 0) {
>> +		blkz_free_zone(&zones[*cnt]);
>> +		(*cnt)--;
>> +	}
>> +	kfree(zones);
>> +	*blkzones = NULL;
>> +}
>> +
>> +static int blkz_cut_zones(struct blkz_context *cxt)
> 
> What does "cut" mean here? Maybe "alloc" instead?
> 

It seems like cut a cake. It's fine to rename "alloc".

>> +{
>> +	struct blkz_info *info = cxt->bzinfo;
>> +	unsigned long off = 0;
>> +	int err;
>> +	size_t size;
>> +
>> +	size = info->total_size;
>> +	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
>> +			info->dmesg_size, &cxt->dmesg_max_cnt);
>> +	if (IS_ERR(cxt->dbzs)) {
>> +		err = PTR_ERR(cxt->dbzs);
>> +		goto fail_out;
>> +	}
>> +
>> +	return 0;
>> +fail_out:
>> +	return err;
>> +}
>> +
>> +int blkz_register(struct blkz_info *info)
>> +{
>> +	int err = -EINVAL;
>> +	struct blkz_context *cxt = &blkz_cxt;
>> +	struct module *owner = info->owner;
>> +
>> +	if (!info->total_size) {
>> +		pr_warn("the total size must be non-zero\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!info->dmesg_size) {
>> +		pr_warn("at least one of the records be non-zero\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!info->name || !info->name[0])
>> +		return -EINVAL;
>> +
>> +	if (info->total_size < 4096) {
>> +		pr_err("total size must be greater than 4096 bytes\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +#define check_size(name, size) {					\
>> +		if (info->name > 0 && info->name < (size)) {		\
>> +			pr_err(#name " must be over %d\n", (size));	\
>> +			return -EINVAL;					\
>> +		}							\
>> +		if (info->name & (size - 1)) {				\
>> +			pr_err(#name " must be a multiple of %d\n",	\
>> +					(size));			\
>> +			return -EINVAL;					\
>> +		}							\
>> +	}
>> +
>> +	check_size(total_size, 4096);
>> +	check_size(dmesg_size, SECTOR_SIZE);
>> +
>> +#undef check_size
>> +
>> +	/*
>> +	 * the @read and @write must be applied.
>> +	 * if no @read, pstore may mount failed.
>> +	 * if no @write, pstore do not support to remove record file.
>> +	 */
>> +	if (!info->read || !info->write) {
>> +		pr_err("no valid general read/write interface\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	spin_lock(&cxt->bzinfo_lock);
>> +	if (cxt->bzinfo) {
>> +		pr_warn("blk '%s' already loaded: ignoring '%s'\n",
>> +				cxt->bzinfo->name, info->name);
>> +		spin_unlock(&cxt->bzinfo_lock);
>> +		return -EBUSY;
>> +	}
>> +	cxt->bzinfo = info;
>> +	spin_unlock(&cxt->bzinfo_lock);
>> +
>> +	if (owner && !try_module_get(owner)) {
>> +		err = -EBUSY;
>> +		goto fail_out;
>> +	}
>> +
>> +	pr_debug("register %s with properties:\n", info->name);
>> +	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>> +	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
>> +
>> +	err = blkz_cut_zones(cxt);
>> +	if (err) {
>> +		pr_err("cut zones fialed\n");
> 
> typo: "failed"
> 

I will fix it later.

>> +		goto put_module;
>> +	}
>> +
>> +	if (info->dmesg_size) {
>> +		cxt->pstore.bufsize = cxt->dbzs[0]->buffer_size -
>> +			sizeof(struct blkz_dmesg_header);
>> +		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
>> +		if (!cxt->pstore.buf) {
>> +			err = -ENOMEM;
> 
> I think the allocated zones need to be freed here.
> 

You are right. I will fix it later.

>> +			goto put_module;
>> +		}
>> +	}
>> +	cxt->pstore.data = cxt;
>> +	if (info->dmesg_size)
>> +		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
>> +
>> +	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
>> +			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
>> +			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
>> +
>> +	err = pstore_register(&cxt->pstore);
>> +	if (err) {
>> +		pr_err("registering with pstore failed\n");
> 
> Also here?
> 

You are right. I will fix it later.

>> +		goto free_pstore_buf;
>> +	}
>> +
>> +	module_put(owner);
>> +	return 0;
>> +
>> +free_pstore_buf:
>> +	kfree(cxt->pstore.buf);
>> +put_module:
>> +	module_put(owner);
>> +fail_out:
>> +	spin_lock(&blkz_cxt.bzinfo_lock);
>> +	blkz_cxt.bzinfo = NULL;
>> +	spin_unlock(&blkz_cxt.bzinfo_lock);
>> +	return err;
>> +}
>> +EXPORT_SYMBOL_GPL(blkz_register);
>> +
>> +void blkz_unregister(struct blkz_info *info)
>> +{
>> +	struct blkz_context *cxt = &blkz_cxt;
>> +
>> +	pstore_unregister(&cxt->pstore);
>> +	kfree(cxt->pstore.buf);
>> +	cxt->pstore.bufsize = 0;
>> +
>> +	spin_lock(&cxt->bzinfo_lock);
>> +	blkz_cxt.bzinfo = NULL;
>> +	spin_unlock(&cxt->bzinfo_lock);
>> +
>> +	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
>> +}
>> +EXPORT_SYMBOL_GPL(blkz_unregister);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>> +MODULE_DESCRIPTION("Block device Oops/Panic logger");
>> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
>> new file mode 100644
>> index 000000000000..589d276fa4e4
>> --- /dev/null
>> +++ b/include/linux/pstore_blk.h
>> @@ -0,0 +1,62 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef __PSTORE_BLK_H_
>> +#define __PSTORE_BLK_H_
>> +
>> +#include <linux/types.h>
>> +#include <linux/blkdev.h>
>> +
>> +/**
>> + * struct blkz_info - backend blkzone driver structure
>> + *
>> + * @owner:
>> + *	Module which is responsible for this backend driver.
>> + * @name:
>> + *	Name of the backend driver.
>> + * @total_size:
>> + *	The total size in bytes pstore/blk can use. It must be greater than
>> + *	4096 and be multiple of 4096.
>> + * @dmesg_size:
>> + *	The size of each zones for dmesg (oops & panic). Zero means disabled,
>> + *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
>> + * @dump_oops:
>> + *	Dump oops and panic log or only panic.
>> + * @read, @write:
>> + *	The general (not panic) read/write operation. It's required unless you
>> + *	are block device and supply valid @bdev. In this case, blkzone will
>> + *	replace it as a general read/write interface.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, the number of bytes read/write should be returned.
>> + *	On error, negative number should be returned.
>> + * @panic_write:
>> + *	The write operation only used for panic. It's optional if you do not
>> + *	care panic record. If panic occur but blkzone do not recover yet, the
>> + *	first zone of dmesg is used.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, the number of bytes write should be returned.
>> + *	On error, negative number should be returned.
>> + */
>> +typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
>> +typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
>> +struct blkz_info {
>> +	struct module *owner;
>> +	const char *name;
>> +
>> +	unsigned long total_size;
>> +	unsigned long dmesg_size;
>> +	int dump_oops;
>> +	blkz_read_op read;
>> +	blkz_write_op write;
>> +	blkz_write_op panic_write;
>> +};
>> +
>> +extern int blkz_register(struct blkz_info *info);
>> +extern void blkz_unregister(struct blkz_info *info);
>> +
>> +#endif
>> -- 
>> 1.9.1
>>
> 

-- 
liaoweixiong

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-02-27  8:21     ` liaoweixiong
@ 2020-03-18 17:23       ` Kees Cook
  2020-03-20  1:50         ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 17:23 UTC (permalink / raw)
  To: liaoweixiong
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Thu, Feb 27, 2020 at 04:21:51PM +0800, liaoweixiong wrote:
> On 2020/2/26 AM 8:52, Kees Cook wrote:
> > On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
> >> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
> >> +pstore_blk-y += blkzone.o
> > 
> > Why this dance with files? I would just expect:
> > 
> > obj-$(CONFIG_PSTORE_BLK)     += blkzone.o
> > 
> 
> This makes the built module named blkzone.ko rather than
> pstore_blk.ko.

You can just do a regular build rule:

obj-$(CONFIG_PSTORE_BLK) += blkzone.o

> >> +#define BLK_SIG (0x43474244) /* DBGC */
> > 
> > I was going to suggest extracting PERSISTENT_RAM_SIG, renaming it and
> > using it in here and in ram_core.c, but then I realize they're not
> > marking the same structure. How about choosing a new magic sig for the
> > blkzone data header?
> > 
> 
> That's OK to me. I don't know if there is a rule to get a new magic?
> In addition, all members of this structure are the same as
> struct persistent_ram_buffer after patch 2. Maybe it's a good idea to
> extract it
> if you want to merge ramoops and pstore/blk.

Okay, let's leave it as-is for now.

> >> +	uint32_t sig;
> >> +	atomic_t datalen;
> >> +	uint8_t data[];
> >> +};
> >> +
> >> +/**
> >> + * struct blkz_dmesg_header: dmesg information
> > 
> > This is the on-disk structure also?
> > 
> Yes. The structure blkz_buffer is a generic header for all recorder
> zone, and the
> structure blkz_dmesg_header is a header for dmesg, saved in
> blkz_buffer->data.
> The dmesg recorder use it to save it's specific attributes.

Okay, can you add comments to distinguish the on-disk structures from
the in-memory, etc?

> >> +#define DMESG_HEADER_MAGIC 0x4dfc3ae5
> > 
> > How was this magic chosen?
> 
> It's a random number. Maybe should I chose a meaningful magic?

That's fine; just add a comment to say so.

> >> + * @dirty:
> >> + *	mark whether the data in @buffer are dirty (not flush to storage yet)
> >> + */
> > 
> > Thank you for the kerndoc! :) Is it linked to from any .rst files?
> > 
> 
> I don't get your words. There is a document on the 6th patch. I don't know
> whether it is what you want?

Patch 6 is excellent; I think you might want to add references back to
these kern-doc structures using the ".. kernel-doc::
fs/pstore/blkzone.c" syntax:
https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#including-kernel-doc-comments

> >> +static int blkz_zone_write(struct blkz_zone *zone,
> >> +		enum blkz_flush_mode flush_mode, const char *buf,
> >> +		size_t len, unsigned long off)
> >> +{
> >> +	struct blkz_info *info = blkz_cxt.bzinfo;
> >> +	ssize_t wcnt = 0;
> >> +	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
> >> +	size_t wlen;
> >> +
> >> +	if (off > zone->buffer_size)
> >> +		return -EINVAL;
> >> +	wlen = min_t(size_t, len, zone->buffer_size - off);
> >> +	if (buf && wlen) {
> >> +		memcpy(zone->buffer->data + off, buf, wlen);
> >> +		atomic_set(&zone->buffer->datalen, wlen + off);
> >> +	}
> > 
> > If you're expecting concurrent writers (use of atomic_set(), I would
> > expect the whole write to be locked instead. (i.e. what happens if
> > multiple callers call blkz_zone_write()?)
> > 
> 
> I don't agree with it. The datalen will be updated everywhere. It's useless
> to lock here.

But there could be multiple writers; locking should be needed.

> One more things. During the analysis, I found another problem.
> Removing old files will cause new logs to be lost. Take console recorder as
> am example. After new rebooting, new logs are saved to buf while old
> logs are
> saved to old_buf. If we remove old file at that time, not only old_buf
> is freed, but
> also length of buf for new data is reset to zero. The ramoops may also
> has this
> problem.

Hmm. I'll need to double-check this. It's possible the call to
persistent_ram_zap() in ramoops_pstore_erase() is not needed.

> >> +static int blkz_recover_dmesg_data(struct blkz_context *cxt)
> > 
> > What does "recover" mean in this context? Is this "read from storage"?
> 
> Yes. "recover" means reading data back from storage.

Okay. Please add some comments here. I would think of it more as "read"
or "load". When I think of "recover" I think of "finding something that
was lost". But the name isn't important as long as there is a comment
somewhere about what it's doing.

-Kees

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk
  2020-02-07 12:25 ` [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
@ 2020-03-18 18:06   ` Kees Cook
  2020-03-22 10:00     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:06 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:46PM +0800, WeiXiong Liao wrote:
> blkoops is a better wrapper for pstore/blk, which provides efficient
> configuration mothod. It divides all configurations of pstore/blk into

typo: method

> 2 parts, configurations for user and configurations for driver.
> 
> Configurations for user detemine how pstore/blk work, such as
> dump_oops and dmesg_size. They can be set by Kconfig and module
> parameters.

I'd like to keep blkoops as close to ramoops as possible on the user
configuration side. Notes below...

> Configurations for driver are all about block/non-block device, such as
> total_size of device and read/write operations. They should be provided
> by device drivers, calling blkoops_register_device() for non-block
> device and blkoops_register_blkdev() for block device.

By non-block do you mean nvme etc? What is the right term for spinning
disk and nvme collectively? (I always considered them all to be "block"
devices.)

> If device driver support for panic records, @panic_write must be valid.
> If panic occurs and pstore/blk does not recover yet, the first zone
> of dmesg will be used.

I'd like to maintain pstore terminology here: there is the "front end"
(dmesg, console, pmsg, etc) and there is the "back end" (ramoops,
blkoops, efi, etc). Since the block layer is a behind blkoops, I'd like
to come up with a term for this since "device driver" is, I think, too
general. You call it later "block device driver", so let's use that
everywhere you say "device driver".

Then we have the layers: pstore front end, pstore core, pstore back end,
and block device driver.

> Besides, Block device driver has no need to verify which partition is
> used and provides generic read/write operations. Because blkoops has
> done it. It also means that if users do not care panic records but
> records for oops/console/pmsg/ftrace, block device driver should do
> nothing.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  MAINTAINERS             |   2 +-
>  fs/pstore/Kconfig       |  61 ++++++++
>  fs/pstore/Makefile      |   2 +
>  fs/pstore/blkoops.c     | 402 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/blkoops.h |  58 +++++++
>  5 files changed, 524 insertions(+), 1 deletion(-)
>  create mode 100644 fs/pstore/blkoops.c
>  create mode 100644 include/linux/blkoops.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cc0a4a8ae06a..e4ba97130560 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13381,7 +13381,7 @@ F:	drivers/firmware/efi/efi-pstore.c
>  F:	drivers/acpi/apei/erst.c
>  F:	Documentation/admin-guide/ramoops.rst
>  F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
> -K:	\b(pstore|ramoops)
> +K:	\b(pstore|ramoops|blkoops)
>  
>  PTP HARDWARE CLOCK SUPPORT
>  M:	Richard Cochran <richardcochran@gmail.com>
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 536fde9e13e8..7a57a8edb612 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -163,3 +163,64 @@ config PSTORE_BLK
>  	  where it can be read back at some later point.
>  
>  	  If unsure, say N.
> +
> +config PSTORE_BLKOOPS
> +	tristate "pstore block with oops logger"
> +	depends on PSTORE_BLK
> +	help
> +	  This is a wrapper for pstore/blk.

Is there a reason to keep this separate from PSTORE_BLK? (i.e. why a
separate Kconfig?)

> +
> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> +	  but module parameters have priority over kconfig.
> +
> +	  If unsure, say N.
> +
> +config PSTORE_BLKOOPS_DMESG_SIZE
> +	int "dmesg size in kbytes for blkoops"

How about "Size in Kbytes of dmesg to store"? (It will already show up
under the parent config, so no need to repeat "blkoops" here.

> +	depends on PSTORE_BLKOOPS
> +	default 64
> +	help
> +	  This just sets size of dmesg (dmesg_size) for pstore/blk. The size is
> +	  in KB and must be a multiple of 4.
> +
> +	  NOTE that, both kconfig and module parameters can configure blkoops,

nit: "Kconfig" instead of "kconfig"

> +	  but module parameters have priority over kconfig.
>
> +config PSTORE_BLKOOPS_BLKDEV
> +	string "block device for blkoops"

Maybe clarify with as "block device identifier for blkoops" ? Also, I'd
put this before the DMESG_SIZE.

> +	depends on PSTORE_BLKOOPS
> +	default ""
> +	help
> +	  Which block device should be used for pstore/blk.
> +
> +	  It accept the following variants:
> +	  1) <hex_major><hex_minor> device number in hexadecimal represents
> +	     itself no leading 0x, for example b302.
> +	  2) /dev/<disk_name> represents the device number of disk
> +	  3) /dev/<disk_name><decimal> represents the device number
> +	     of partition - device number of disk plus the partition number
> +	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
> +	     used when disk name of partitioned disk ends with a digit.
> +	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
> +	     unique id of a partition if the partition table provides it.
> +	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
> +	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
> +	     filled hex representation of the 32-bit "NT disk signature", and PP
> +	     is a zero-filled hex representation of the 1-based partition number.
> +	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
> +	     to a partition with a known unique id.
> +	  7) <major>:<minor> major and minor number of the device separated by
> +	     a colon.
> +
> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> +	  but module parameters have priority over kconfig.
> +
> +config PSTORE_BLKOOPS_DUMP_OOPS
> +	bool "dump oops"

Why is this a Kconfig at all? Isn't the whole point to always catch
oopses? :) Let's leave this default to 1 (as ramoops does).

> +	depends on PSTORE_BLKOOPS
> +	default y
> +	help
> +	  Whether blkoops dumps oops or not.
> +
> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> +	  but module parameters have priority over kconfig.
> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
> index 0ee2fc8d1bfb..24b3d488d2f0 100644
> --- a/fs/pstore/Makefile
> +++ b/fs/pstore/Makefile
> @@ -15,3 +15,5 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
>  
>  obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
>  pstore_blk-y += blkzone.o
> +
> +obj-$(CONFIG_PSTORE_BLKOOPS) += blkoops.o
> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
> new file mode 100644
> index 000000000000..8027c3af8c8d
> --- /dev/null
> +++ b/fs/pstore/blkoops.c
> @@ -0,0 +1,402 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define pr_fmt(fmt) "blkoops : " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/string.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/platform_device.h>
> +#include <linux/blkoops.h>
> +#include <linux/mount.h>
> +#include <linux/uio.h>
> +
> +static long dmesg_size = -1;
> +module_param(dmesg_size, long, 0400);
> +MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");

Can this be named "record_size" to match ramoops?

> +static int dump_oops = -1;

I'd default this to 1 as mentioned in the Kconfig.

> +module_param(dump_oops, int, 0400);
> +MODULE_PARM_DESC(total_size, "whether dump oops");
> +
> +/**
> + * The block device to use. Most of the time, it is a partition of block
> + * device. It's fine to ignore it if you are not block device and register
> + * to blkoops by blkoops_register_device(). In this case, @blkdev is
> + * useless and @read, @write and @total_size must be supplied.
> + *
> + * @blkdev accepts the following variants:
> + * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
> + *    no leading 0x, for example b302.
> + * 2) /dev/<disk_name> represents the device number of disk
> + * 3) /dev/<disk_name><decimal> represents the device number
> + *    of partition - device number of disk plus the partition number
> + * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
> + *    used when disk name of partitioned disk ends on a digit.
> + * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
> + *    unique id of a partition if the partition table provides it.
> + *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
> + *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
> + *    filled hex representation of the 32-bit "NT disk signature", and PP
> + *    is a zero-filled hex representation of the 1-based partition number.
> + * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
> + *    a partition with a known unique id.
> + * 7) <major>:<minor> major and minor number of the device separated by
> + *    a colon.
> + */
> +static char blkdev[80];

static char blkdev[80] = CONFIG_PSTORE_BLKOOPS_BLKDEV;

> +module_param_string(blkdev, blkdev, 80, 0400);
> +MODULE_PARM_DESC(blkdev, "the block device for general read/write");
> +
> +static DEFINE_MUTEX(blkz_lock);
> +static struct block_device *blkoops_bdev;
> +static struct blkz_info *bzinfo;
> +static blkoops_blk_panic_write_op blkdev_panic_write;
> +
> +#ifdef CONFIG_PSTORE_BLKOOPS_DMESG_SIZE

This (and all the others below) will always be defined, so no need to
test it -- just use it as needed below.

> +#define DEFAULT_DMESG_SIZE CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
> +#else
> +#define DEFAULT_DMESG_SIZE 0
> +#endif
> +
> +#ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
> +#define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
> +#else
> +#define DEFAULT_DUMP_OOPS 1
> +#endif
> +
> +#ifdef CONFIG_PSTORE_BLKOOPS_BLKDEV
> +#define DEFAULT_BLKDEV CONFIG_PSTORE_BLKOOPS_BLKDEV
> +#else
> +#define DEFAULT_BLKDEV ""
> +#endif
> +
> +/**
> + * register device to blkoops
> + *
> + * Drivers, not only block drivers but also non-block drivers can call this
> + * function to register to blkoops. It will pack for blkzone and pstore.
> + */
> +int blkoops_register_device(struct blkoops_device *bo_dev)
> +{
> +	int ret;
> +
> +	if (!bo_dev || !bo_dev->total_size || !bo_dev->read || !bo_dev->write)
> +		return -EINVAL;
> +
> +	mutex_lock(&blkz_lock);
> +
> +	/* someone already registered before */
> +	if (bzinfo) {
> +		mutex_unlock(&blkz_lock);
> +		return -EBUSY;
> +	}
> +	bzinfo = kzalloc(sizeof(struct blkz_info), GFP_KERNEL);
> +	if (!bzinfo) {
> +		mutex_unlock(&blkz_lock);
> +		return -ENOMEM;
> +	}
> +
> +#define verify_size(name, defsize, alignsize) {				\
> +		long _##name_ = (name);					\
> +		if (_##name_ < 0)					\
> +			_##name_ = (defsize);				\
> +		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
> +		if (_##name_ & ((alignsize) - 1)) {			\
> +			pr_info(#name " must align to %d\n",		\
> +					(alignsize));			\
> +			_##name_ = ALIGN(name, (alignsize));		\
> +		}							\
> +		name = _##name_ / 1024;					\
> +		bzinfo->name = _##name_;				\
> +	}
> +
> +	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
> +#undef verify_size

As mentioned, can this be named "record_size"?

> +	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
> +
> +	bzinfo->total_size = bo_dev->total_size;
> +	bzinfo->dump_oops = dump_oops;
> +	bzinfo->read = bo_dev->read;
> +	bzinfo->write = bo_dev->write;

Why copy these separate functions? Shouldn't bzinfo just keep a pointer
to bo_dev?

> +	bzinfo->panic_write = bo_dev->panic_write;
> +	bzinfo->name = "blkoops";
> +	bzinfo->owner = THIS_MODULE;
> +
> +	ret = blkz_register(bzinfo);
> +	if (ret) {
> +		kfree(bzinfo);
> +		bzinfo = NULL;
> +	}
> +	mutex_unlock(&blkz_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(blkoops_register_device);
> +
> +void blkoops_unregister_device(struct blkoops_device *bo_dev)
> +{
> +	mutex_lock(&blkz_lock);
> +	if (bzinfo && bzinfo->read == bo_dev->read) {

Why this read equality test?

> +		blkz_unregister(bzinfo);
> +		kfree(bzinfo);
> +		bzinfo = NULL;
> +	}
> +	mutex_unlock(&blkz_lock);
> +}
> +EXPORT_SYMBOL_GPL(blkoops_unregister_device);
> +
> +/**
> + * get block_device of @blkdev
> + * @holder: exclusive holder identifier
> + *
> + * On success, @blkoops_bdev will save the block_device and the returned
> + * block_device has reference count of one.
> + */
> +static struct block_device *blkoops_get_bdev(void *holder)
> +{
> +	struct block_device *bdev = ERR_PTR(-ENODEV);
> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
> +
> +	if (!blkdev[0] && strlen(DEFAULT_BLKDEV))
> +		snprintf(blkdev, 80, "%s", DEFAULT_BLKDEV);
> +	if (!blkdev[0])
> +		return ERR_PTR(-ENODEV);

I'd drop these tests -- and the snprintf isn't needed with the change
above on initialization.

> +
> +	mutex_lock(&blkz_lock);
> +	if (bzinfo)
> +		goto out;
> +	if (holder)
> +		mode |= FMODE_EXCL;
> +	bdev = blkdev_get_by_path(blkdev, mode, holder);
> +	if (IS_ERR(bdev)) {
> +		dev_t devt;
> +
> +		devt = name_to_dev_t(blkdev);
> +		if (devt == 0) {
> +			bdev = ERR_PTR(-ENODEV);
> +			goto out;
> +		}
> +		bdev = blkdev_get_by_dev(devt, mode, holder);
> +	}
> +out:
> +	mutex_unlock(&blkz_lock);
> +	return bdev;
> +}
> +
> +static void blkoops_put_bdev(struct block_device *bdev, void *holder)
> +{
> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
> +
> +	if (!bdev)
> +		return;
> +
> +	mutex_lock(&blkz_lock);
> +	if (holder)
> +		mode |= FMODE_EXCL;
> +	blkdev_put(bdev, mode);
> +	mutex_unlock(&blkz_lock);
> +}
> +
> +static ssize_t blkoops_generic_blk_read(char *buf, size_t bytes, loff_t pos)
> +{
> +	ssize_t ret;
> +	struct block_device *bdev = blkoops_bdev;
> +	struct file filp;
> +	mm_segment_t ofs;
> +	struct kiocb kiocb;
> +	struct iov_iter iter;
> +	struct iovec iov = {
> +		.iov_base = (void __user *)buf,
> +		.iov_len = bytes
> +	};
> +
> +	if (!bdev)
> +		return -ENODEV;
> +
> +	memset(&filp, 0, sizeof(struct file));
> +	filp.f_mapping = bdev->bd_inode->i_mapping;
> +	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
> +	filp.f_inode = bdev->bd_inode;
> +
> +	init_sync_kiocb(&kiocb, &filp);
> +	kiocb.ki_pos = pos;
> +	iov_iter_init(&iter, READ, &iov, 1, bytes);
> +
> +	ofs = get_fs();
> +	set_fs(KERNEL_DS);
> +	ret = generic_file_read_iter(&kiocb, &iter);
> +	set_fs(ofs);

Please don't use "set_fs". I think you want ITER_KVEC and to use
vfs_iter_read()? A lot of work went into removing set_fs() uses; we
should not add more. :)
https://lwn.net/Articles/722267/

> +	return ret;
> +}
> +
> +static ssize_t blkoops_generic_blk_write(const char *buf, size_t bytes,
> +		loff_t pos)
> +{
> +	struct block_device *bdev = blkoops_bdev;
> +	struct iov_iter iter;
> +	struct kiocb kiocb;
> +	struct file filp;
> +	mm_segment_t ofs;
> +	ssize_t ret;
> +	struct iovec iov = {
> +		.iov_base = (void __user *)buf,
> +		.iov_len = bytes
> +	};
> +
> +	if (!bdev)
> +		return -ENODEV;
> +
> +	/* Console/Ftrace recorder may handle buffer until flush dirty zones */
> +	if (in_interrupt() || irqs_disabled())
> +		return -EBUSY;
> +
> +	memset(&filp, 0, sizeof(struct file));
> +	filp.f_mapping = bdev->bd_inode->i_mapping;
> +	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
> +	filp.f_inode = bdev->bd_inode;
> +
> +	init_sync_kiocb(&kiocb, &filp);
> +	kiocb.ki_pos = pos;
> +	iov_iter_init(&iter, WRITE, &iov, 1, bytes);
> +
> +	ofs = get_fs();
> +	set_fs(KERNEL_DS);

Same.

> +
> +	inode_lock(bdev->bd_inode);
> +	ret = generic_write_checks(&kiocb, &iter);
> +	if (ret > 0)
> +		ret = generic_perform_write(&filp, &iter, pos);
> +	inode_unlock(bdev->bd_inode);
> +
> +	if (likely(ret > 0)) {
> +		const struct file_operations f_op = {.fsync = blkdev_fsync};
> +
> +		filp.f_op = &f_op;
> +		kiocb.ki_pos += ret;
> +		ret = generic_write_sync(&kiocb, ret);
> +	}
> +	set_fs(ofs);
> +	return ret;
> +}
> +
> +static inline unsigned long blkoops_bdev_size(struct block_device *bdev)
> +{
> +	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
> +}
> +
> +static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
> +		loff_t off)
> +{
> +	int ret;
> +
> +	if (!blkdev_panic_write)
> +		return -EOPNOTSUPP;
> +
> +	/* size and off must align to SECTOR_SIZE for block device */
> +	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
> +			size >> SECTOR_SHIFT);
> +	return ret ? -EIO : size;
> +}
> +
> +/**
> + * register block device to blkoops
> + * @major: the major device number of registering device
> + * @panic_write: the write interface for panic case.
> + *
> + * It is ONLY used for block device to register to blkoops. In this case,
> + * the module parameter @blkdev must be valid. Generic read/write interfaces
> + * will be used.
> + *
> + * Block driver has no need to verify which partition is used. Block driver
> + * should only tell me what major number is, so blkoops can get the matching
> + * driver for @blkdev.
> + *
> + * If block driver support for panic records, @panic_write must be valid. If
> + * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
> + * will be used.
> + */
> +int blkoops_register_blkdev(unsigned int major,
> +		blkoops_blk_panic_write_op panic_write)
> +{
> +	struct block_device *bdev;
> +	struct blkoops_device bo_dev = {0};
> +	int ret = -ENODEV;
> +	void *holder = blkdev;
> +
> +	bdev = blkoops_get_bdev(holder);
> +	if (IS_ERR(bdev))
> +		return PTR_ERR(bdev);

This seems like a good place to report getting or failing to get the
named block device.

	bdev = blkoops_get_bdev(holder);
	if (IS_ERR(bdev)) {
		pr_err("failed to open '%s'!\n", blkdev);
		return PTR_ERR(bdev);
	}

> +
> +	blkoops_bdev = bdev;
> +	blkdev_panic_write = panic_write;
> +
> +	/* only allow driver matching the @blkdev */
> +	if (!bdev->bd_dev || MAJOR(bdev->bd_dev) != major)

And add similar error reports here.

> +		goto err_put_bdev;
> +
> +	bo_dev.total_size = blkoops_bdev_size(bdev);
> +	if (bo_dev.total_size == 0)
> +		goto err_put_bdev;

And here. We want to make failures as discoverable as possible.

> +	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
> +	bo_dev.read = blkoops_generic_blk_read;
> +	bo_dev.write = blkoops_generic_blk_write;
> +
> +	ret = blkoops_register_device(&bo_dev);
> +	if (ret)
> +		goto err_put_bdev;

	pr_info("using '%s'\n", blkdev);

> +	return 0;
> +
> +err_put_bdev:
> +	blkdev_panic_write = NULL;
> +	blkoops_bdev = NULL;
> +	blkoops_put_bdev(bdev, holder);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(blkoops_register_blkdev);
> +
> +void blkoops_unregister_blkdev(unsigned int major)
> +{
> +	struct blkoops_device bo_dev = {.read = blkoops_generic_blk_read};
> +	void *holder = blkdev;
> +
> +	if (blkoops_bdev && MAJOR(blkoops_bdev->bd_dev) == major) {
> +		blkoops_unregister_device(&bo_dev);
> +		blkoops_put_bdev(blkoops_bdev, holder);
> +		blkdev_panic_write = NULL;
> +		blkoops_bdev = NULL;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(blkoops_unregister_blkdev);
> +
> +/**
> + * get information of @blkdev
> + * @devt: the block device num of @blkdev
> + * @nr_sectors: the sector count of @blkdev
> + * @start_sect: the start sector of @blkdev
> + *
> + * Block driver needs the follow information for @panic_write.
> + */
> +int blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
> +{
> +	struct block_device *bdev;
> +
> +	bdev = blkoops_get_bdev(NULL);
> +	if (IS_ERR(bdev))
> +		return PTR_ERR(bdev);
> +
> +	if (devt)
> +		*devt = bdev->bd_dev;
> +	if (nr_sects)
> +		*nr_sects = part_nr_sects_read(bdev->bd_part);
> +	if (start_sect)
> +		*start_sect = get_start_sect(bdev);
> +
> +	blkoops_put_bdev(bdev, NULL);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(blkoops_blkdev_info);

I don't see this function getting used anywhere. Can it be removed? I
see the notes in the Documentation. Could these values just be cached at
open time instead of reopening the device?

> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("Wrapper for Pstore BLK with Oops logger");
> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
> new file mode 100644
> index 000000000000..fe63739309aa
> --- /dev/null
> +++ b/include/linux/blkoops.h
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __BLKOOPS_H_
> +#define __BLKOOPS_H_
> +
> +#include <linux/types.h>
> +#include <linux/blkdev.h>
> +#include <linux/pstore_blk.h>
> +
> +/**
> + * struct blkoops_device - backend blkoops driver structure.
> + *
> + * This structure is ONLY used for non-block device by
> + * blkoops_register_device(). If block device, you are strongly recommended
> + * to use blkoops_register_blkdev().
> + *
> + * @total_size:
> + *	The total size in bytes pstore/blk can use. It must be greater than
> + *	4096 and be multiple of 4096.
> + * @read, @write:
> + *	The general (not panic) read/write operation.
> + *
> + *	Both of the @size and @offset parameters on this interface are
> + *	the relative size of the space provided, not the whole disk/flash.
> + *
> + *	On success, the number of bytes read should be returned.
> + *	On error, negative number should be returned.
> + * @panic_write:
> + *	The write operation only used for panic.
> + *
> + *	Both of the @size and @offset parameters on this interface are
> + *	the relative size of the space provided, not the whole disk/flash.
> + *
> + *	On success, the number of bytes read should be returned.
> + *	On error, negative number should be returned.
> + */
> +struct blkoops_device {
> +	unsigned long total_size;
> +	blkz_read_op read;
> +	blkz_write_op write;
> +	blkz_write_op panic_write;
> +};
> +
> +/*
> + * Panic write for block device who should write alignmemt to SECTOR_SIZE.
> + * On success, zero should be returned. Others mean error.
> + */
> +typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
> +		sector_t sects);
> +
> +int  blkoops_register_device(struct blkoops_device *bo_dev);
> +void blkoops_unregister_device(struct blkoops_device *bo_dev);
> +int  blkoops_register_blkdev(unsigned int major,
> +		blkoops_blk_panic_write_op panic_write);
> +void blkoops_unregister_blkdev(unsigned int major);
> +int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
> +
> +#endif
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder
  2020-02-07 12:25 ` [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder WeiXiong Liao
@ 2020-03-18 18:13   ` Kees Cook
  2020-03-22 11:14     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:13 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:47PM +0800, WeiXiong Liao wrote:
> pmsg support recorder for userspace. To enable pmsg, just make pmsg_size
> be greater than 0 and a multiple of 4096.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  fs/pstore/Kconfig          |  12 +++
>  fs/pstore/blkoops.c        |  11 +++
>  fs/pstore/blkzone.c        | 229 +++++++++++++++++++++++++++++++++++++++++++--
>  include/linux/pstore_blk.h |   4 +
>  4 files changed, 246 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 7a57a8edb612..bbf1fdb5eaa7 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -186,6 +186,18 @@ config PSTORE_BLKOOPS_DMESG_SIZE
>  	  NOTE that, both kconfig and module parameters can configure blkoops,
>  	  but module parameters have priority over kconfig.
>  
> +config PSTORE_BLKOOPS_PMSG_SIZE
> +	int "pmsg size in kbytes for blkoops"
> +	depends on PSTORE_BLKOOPS
> +	depends on PSTORE_PMSG
> +	default 64

Instead of "depends on PSTORE_PMSG", you can do:

	default 64 if PSTORE_PMSG
	default 0

> +	help
> +	  This just sets size of pmsg (pmsg_size) for pstore/blk. The size is
> +	  in KB and must be a multiple of 4.
> +
> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> +	  but module parameters have priority over kconfig.
> +
>  config PSTORE_BLKOOPS_BLKDEV
>  	string "block device for blkoops"
>  	depends on PSTORE_BLKOOPS
> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
> index 8027c3af8c8d..02e6e4c1f965 100644
> --- a/fs/pstore/blkoops.c
> +++ b/fs/pstore/blkoops.c
> @@ -16,6 +16,10 @@
>  module_param(dmesg_size, long, 0400);
>  MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
>  
> +static long pmsg_size = -1;

Now PSTORE_BLKOOPS_PMSG_SIZE will always be available and you can set it
here.

> +module_param(pmsg_size, long, 0400);
> +MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
> +
>  static int dump_oops = -1;
>  module_param(dump_oops, int, 0400);
>  MODULE_PARM_DESC(total_size, "whether dump oops");
> @@ -60,6 +64,12 @@
>  #define DEFAULT_DMESG_SIZE 0
>  #endif
>  
> +#ifdef CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
> +#define DEFAULT_PMSG_SIZE CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
> +#else
> +#define DEFAULT_PMSG_SIZE 0
> +#endif

And drop this.

> +
>  #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>  #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>  #else
> @@ -113,6 +123,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>  	}
>  
>  	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
> +	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
>  #undef verify_size
>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>  
> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
> index f77f612b50ba..a3464252d52e 100644
> --- a/fs/pstore/blkzone.c
> +++ b/fs/pstore/blkzone.c
> @@ -24,12 +24,14 @@
>   *
>   * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
>   * @datalen: length of data in @data
> + * @start: offset into @data where the beginning of the stored bytes begin
>   * @data: zone data.
>   */
>  struct blkz_buffer {
>  #define BLK_SIG (0x43474244) /* DBGC */
>  	uint32_t sig;
>  	atomic_t datalen;
> +	atomic_t start;
>  	uint8_t data[];
>  };
>  
> @@ -85,8 +87,10 @@ struct blkz_zone {
>  
>  struct blkz_context {
>  	struct blkz_zone **dbzs;	/* dmesg block zones */
> +	struct blkz_zone *pbz;		/* Pmsg block zone */
>  	unsigned int dmesg_max_cnt;
>  	unsigned int dmesg_read_cnt;
> +	unsigned int pmsg_read_cnt;
>  	unsigned int dmesg_write_cnt;
>  	/*
>  	 * the counter should be recovered when recover.
> @@ -119,6 +123,11 @@ static inline int buffer_datalen(struct blkz_zone *zone)
>  	return atomic_read(&zone->buffer->datalen);
>  }
>  
> +static inline int buffer_start(struct blkz_zone *zone)
> +{
> +	return atomic_read(&zone->buffer->start);
> +}
> +
>  static inline bool is_on_panic(void)
>  {
>  	struct blkz_context *cxt = &blkz_cxt;
> @@ -410,6 +419,69 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
>  	return ret;
>  }
>  
> +static int blkz_recover_pmsg(struct blkz_context *cxt)
> +{
> +	struct blkz_info *info = cxt->bzinfo;
> +	struct blkz_buffer *oldbuf;
> +	struct blkz_zone *zone = NULL;
> +	int ret = 0;
> +	ssize_t rcnt, len;
> +
> +	zone = cxt->pbz;
> +	if (!zone || zone->oldbuf)
> +		return 0;
> +
> +	if (is_on_panic())
> +		goto out;
> +
> +	if (unlikely(!info->read))
> +		return -EINVAL;
> +
> +	len = zone->buffer_size + sizeof(*oldbuf);
> +	oldbuf = kzalloc(len, GFP_KERNEL);
> +	if (!oldbuf)
> +		return -ENOMEM;
> +
> +	rcnt = info->read((char *)oldbuf, len, zone->off);
> +	if (rcnt != len) {
> +		pr_debug("recover pmsg failed\n");
> +		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
> +		goto free_oldbuf;
> +	}
> +
> +	if (oldbuf->sig != zone->buffer->sig) {
> +		pr_debug("no valid data in zone %s\n", zone->name);
> +		goto free_oldbuf;
> +	}
> +
> +	if (zone->buffer_size < atomic_read(&oldbuf->datalen) ||
> +		zone->buffer_size < atomic_read(&oldbuf->start)) {
> +		pr_info("found overtop zone: %s: off %lu, size %zu\n",
> +				zone->name, zone->off, zone->buffer_size);
> +		goto free_oldbuf;
> +	}
> +
> +	if (!atomic_read(&oldbuf->datalen)) {
> +		pr_debug("found erased zone: %s: id 0, off %lu, size %zu, datalen %d\n",
> +				zone->name, zone->off, zone->buffer_size,
> +				atomic_read(&oldbuf->datalen));
> +		kfree(oldbuf);
> +		goto out;
> +	}
> +
> +	pr_debug("found nice zone: %s: id 0, off %lu, size %zu, datalen %d\n",
> +			zone->name, zone->off, zone->buffer_size,
> +			atomic_read(&oldbuf->datalen));
> +	zone->oldbuf = oldbuf;
> +out:
> +	blkz_flush_dirty_zone(zone);
> +	return 0;
> +
> +free_oldbuf:
> +	kfree(oldbuf);
> +	return ret;
> +}
> +
>  static inline int blkz_recovery(struct blkz_context *cxt)
>  {
>  	int ret = -EBUSY;
> @@ -421,6 +493,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
>  	if (ret)
>  		goto recover_fail;
>  
> +	ret = blkz_recover_pmsg(cxt);
> +	if (ret)
> +		goto recover_fail;
> +
>  	pr_debug("recover end!\n");
>  	atomic_set(&cxt->recovered, 1);
>  	return 0;
> @@ -435,9 +511,17 @@ static int blkz_pstore_open(struct pstore_info *psi)
>  	struct blkz_context *cxt = psi->data;
>  
>  	cxt->dmesg_read_cnt = 0;
> +	cxt->pmsg_read_cnt = 0;
>  	return 0;
>  }
>  
> +static inline bool blkz_old_ok(struct blkz_zone *zone)
> +{
> +	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
> +		return true;
> +	return false;
> +}
> +
>  static inline bool blkz_ok(struct blkz_zone *zone)
>  {
>  	if (zone && zone->buffer && buffer_datalen(zone))
> @@ -455,6 +539,25 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>  	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>  }
>  
> +static inline int blkz_pmsg_erase(struct blkz_context *cxt,
> +		struct blkz_zone *zone)
> +{
> +	if (unlikely(!blkz_old_ok(zone)))
> +		return 0;
> +
> +	kfree(zone->oldbuf);
> +	zone->oldbuf = NULL;
> +	/*
> +	 * if there are new data in zone buffer, that means the old data
> +	 * are already invalid. It is no need to flush 0 (erase) to
> +	 * block device.
> +	 */
> +	if (!buffer_datalen(zone))
> +		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +	blkz_flush_dirty_zone(zone);
> +	return 0;
> +}
> +
>  static int blkz_pstore_erase(struct pstore_record *record)
>  {
>  	struct blkz_context *cxt = record->psi->data;
> @@ -462,6 +565,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
>  		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
> +	case PSTORE_TYPE_PMSG:
> +		return blkz_pmsg_erase(cxt, cxt->pbz);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -482,8 +587,10 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
>  	hdr->reason = record->reason;
>  	if (hdr->reason == KMSG_DUMP_OOPS)
>  		hdr->counter = ++cxt->oops_counter;
> -	else
> +	else if (hdr->reason == KMSG_DUMP_PANIC)
>  		hdr->counter = ++cxt->panic_counter;
> +	else
> +		hdr->counter = 0;
>  }
>  
>  static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
> @@ -546,6 +653,55 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
>  	return 0;
>  }
>  
> +static int notrace blkz_pmsg_write(struct blkz_context *cxt,
> +		struct pstore_record *record)
> +{
> +	struct blkz_zone *zone;
> +	size_t start, rem;
> +	int cnt = record->size;
> +	bool is_full_data = false;
> +	char *buf = record->buf;
> +
> +	zone = cxt->pbz;
> +	if (!zone)
> +		return -ENOSPC;
> +
> +	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
> +		is_full_data = true;
> +
> +	if (unlikely(cnt > zone->buffer_size)) {
> +		buf += cnt - zone->buffer_size;
> +		cnt = zone->buffer_size;
> +	}
> +
> +	start = buffer_start(zone);
> +	rem = zone->buffer_size - start;
> +	if (unlikely(rem < cnt)) {
> +		blkz_zone_write(zone, FLUSH_PART, buf, rem, start);
> +		buf += rem;
> +		cnt -= rem;
> +		start = 0;
> +		is_full_data = true;
> +	}
> +
> +	atomic_set(&zone->buffer->start, cnt + start);
> +	blkz_zone_write(zone, FLUSH_PART, buf, cnt, start);
> +
> +	/**
> +	 * blkz_zone_write will set datalen as start + cnt.
> +	 * It work if actual data length lesser than buffer size.
> +	 * If data length greater than buffer size, pmsg will rewrite to
> +	 * beginning of zone, which make buffer->datalen wrongly.
> +	 * So we should reset datalen as buffer size once actual data length
> +	 * greater than buffer size.
> +	 */
> +	if (is_full_data) {
> +		atomic_set(&zone->buffer->datalen, zone->buffer_size);
> +		blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +	}
> +	return 0;
> +}
> +
>  static int notrace blkz_pstore_write(struct pstore_record *record)
>  {
>  	struct blkz_context *cxt = record->psi->data;
> @@ -557,6 +713,8 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
>  		return blkz_dmesg_write(cxt, record);
> +	case PSTORE_TYPE_PMSG:
> +		return blkz_pmsg_write(cxt, record);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -573,6 +731,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>  			return zone;
>  	}
>  
> +	if (cxt->pmsg_read_cnt == 0) {
> +		cxt->pmsg_read_cnt++;
> +		zone = cxt->pbz;
> +		if (blkz_old_ok(zone))
> +			return zone;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -611,7 +776,8 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>  		char *buf = kasprintf(GFP_KERNEL,
>  				"%s: Total %d times\n",
>  				record->reason == KMSG_DUMP_OOPS ? "Oops" :
> -				"Panic", record->count);
> +				record->reason == KMSG_DUMP_PANIC ? "Panic" :
> +				"Unknown", record->count);

Please use get_reason_str() here.

>  		hlen = strlen(buf);
>  		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
>  		if (!record->buf) {
> @@ -633,6 +799,29 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>  	return size + hlen;
>  }
>  
> +static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
> +		struct pstore_record *record)
> +{
> +	size_t size, start;
> +	struct blkz_buffer *buf;
> +
> +	buf = (struct blkz_buffer *)zone->oldbuf;
> +	if (!buf)
> +		return READ_NEXT_ZONE;
> +
> +	size = atomic_read(&buf->datalen);
> +	start = atomic_read(&buf->start);
> +
> +	record->buf = kmalloc(size, GFP_KERNEL);
> +	if (!record->buf)
> +		return -ENOMEM;
> +
> +	memcpy(record->buf, buf->data + start, size - start);
> +	memcpy(record->buf + size - start, buf->data, start);
> +
> +	return size;
> +}
> +
>  static ssize_t blkz_pstore_read(struct pstore_record *record)
>  {
>  	struct blkz_context *cxt = record->psi->data;
> @@ -657,6 +846,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>  		blkz_read = blkz_dmesg_read;
>  		record->id = cxt->dmesg_read_cnt - 1;
>  		break;
> +	case PSTORE_TYPE_PMSG:
> +		blkz_read = blkz_pmsg_read;
> +		break;
>  	default:
>  		goto next_zone;
>  	}
> @@ -712,8 +904,10 @@ static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
>  	zone->type = type;
>  	zone->buffer_size = size - sizeof(struct blkz_buffer);
>  	zone->buffer->sig = type ^ BLK_SIG;
> +	zone->oldbuf = NULL;
>  	atomic_set(&zone->dirty, 0);
>  	atomic_set(&zone->buffer->datalen, 0);
> +	atomic_set(&zone->buffer->start, 0);
>  
>  	*off += size;
>  
> @@ -798,17 +992,26 @@ static int blkz_cut_zones(struct blkz_context *cxt)
>  	struct blkz_info *info = cxt->bzinfo;
>  	unsigned long off = 0;
>  	int err;
> -	size_t size;
> +	size_t off_size = 0;
>  
> -	size = info->total_size;
> -	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
> +	off_size += info->pmsg_size;
> +	cxt->pbz = blkz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
> +	if (IS_ERR(cxt->pbz)) {
> +		err = PTR_ERR(cxt->pbz);
> +		goto fail_out;
> +	}
> +
> +	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
> +			info->total_size - off_size,
>  			info->dmesg_size, &cxt->dmesg_max_cnt);
>  	if (IS_ERR(cxt->dbzs)) {
>  		err = PTR_ERR(cxt->dbzs);
> -		goto fail_out;
> +		goto free_pmsg;
>  	}
>  
>  	return 0;
> +free_pmsg:
> +	blkz_free_zone(&cxt->pbz);
>  fail_out:
>  	return err;
>  }
> @@ -824,7 +1027,7 @@ int blkz_register(struct blkz_info *info)
>  		return -EINVAL;
>  	}
>  
> -	if (!info->dmesg_size) {
> +	if (!info->dmesg_size && !info->pmsg_size) {
>  		pr_warn("at least one of the records be non-zero\n");
>  		return -EINVAL;
>  	}
> @@ -851,6 +1054,7 @@ int blkz_register(struct blkz_info *info)
>  
>  	check_size(total_size, 4096);
>  	check_size(dmesg_size, SECTOR_SIZE);
> +	check_size(pmsg_size, SECTOR_SIZE);
>  
>  #undef check_size
>  
> @@ -882,6 +1086,7 @@ int blkz_register(struct blkz_info *info)
>  	pr_debug("register %s with properties:\n", info->name);
>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>  	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
> +	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
>  
>  	err = blkz_cut_zones(cxt);
>  	if (err) {
> @@ -900,11 +1105,14 @@ int blkz_register(struct blkz_info *info)
>  	}
>  	cxt->pstore.data = cxt;
>  	if (info->dmesg_size)
> -		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
> +		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
> +	if (info->pmsg_size)
> +		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
>  
> -	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
> +	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
>  			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
> -			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
> +			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
> +			cxt->pbz ? "Pmsg" : "");

I'd switch to leading spaces so can leave these strings unchanged as you
add them:

	for%s%s%s\n", info->name,
		cxt->dbzs && cxt->bzinfo->dump_oops ? " Oops" : "",
		cxt->dbzs && cxt->bzinfo->panic_write ? " Panic" : "",
		cxt->pbz ? " Pmsg" : "");

etc

>  
>  	err = pstore_register(&cxt->pstore);
>  	if (err) {
> @@ -940,6 +1148,7 @@ void blkz_unregister(struct blkz_info *info)
>  	spin_unlock(&cxt->bzinfo_lock);
>  
>  	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
> +	blkz_free_zone(&cxt->pbz);
>  }
>  EXPORT_SYMBOL_GPL(blkz_unregister);
>  
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> index 589d276fa4e4..af06be25bd01 100644
> --- a/include/linux/pstore_blk.h
> +++ b/include/linux/pstore_blk.h
> @@ -19,6 +19,9 @@
>   * @dmesg_size:
>   *	The size of each zones for dmesg (oops & panic). Zero means disabled,
>   *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
> + * @pmsg_size:
> + *	The size of zone for pmsg. Zero means disabled, othewise, it must be
> + *	multiple of SECTOR_SIZE(512).
>   * @dump_oops:
>   *	Dump oops and panic log or only panic.
>   * @read, @write:
> @@ -50,6 +53,7 @@ struct blkz_info {
>  
>  	unsigned long total_size;
>  	unsigned long dmesg_size;
> +	unsigned long pmsg_size;
>  	int dump_oops;
>  	blkz_read_op read;
>  	blkz_write_op write;
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/11] pstore/blk: blkoops: support console recorder
  2020-02-07 12:25 ` [PATCH v2 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
@ 2020-03-18 18:16   ` Kees Cook
  2020-03-22 11:35     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:16 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:48PM +0800, WeiXiong Liao wrote:
> Support recorder for console. To enable console recorder, just make
> console_size be greater than 0 and a multiple of 4096.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  fs/pstore/Kconfig          |  12 ++++++
>  fs/pstore/blkoops.c        |  11 +++++
>  fs/pstore/blkzone.c        | 101 ++++++++++++++++++++++++++++++++++-----------
>  include/linux/blkoops.h    |   6 ++-
>  include/linux/pstore_blk.h |   8 +++-
>  5 files changed, 112 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index bbf1fdb5eaa7..5f0a42823028 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -198,6 +198,18 @@ config PSTORE_BLKOOPS_PMSG_SIZE
>  	  NOTE that, both kconfig and module parameters can configure blkoops,
>  	  but module parameters have priority over kconfig.
>  
> +config PSTORE_BLKOOPS_CONSOLE_SIZE
> +	int "console size in kbytes for blkoops"
> +	depends on PSTORE_BLKOOPS
> +	depends on PSTORE_CONSOLE
> +	default 64

Same tricks here as for the PMSG.

> +	help
> +	  This just sets size of console (console_size) for pstore/blk. The
> +	  size is in KB and must be a multiple of 4.
> +
> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> +	  but module parameters have priority over kconfig.
> +
>  config PSTORE_BLKOOPS_BLKDEV
>  	string "block device for blkoops"
>  	depends on PSTORE_BLKOOPS
> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
> index 02e6e4c1f965..05990bc3b168 100644
> --- a/fs/pstore/blkoops.c
> +++ b/fs/pstore/blkoops.c
> @@ -20,6 +20,10 @@
>  module_param(pmsg_size, long, 0400);
>  MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
>  
> +static long console_size = -1;
> +module_param(console_size, long, 0400);
> +MODULE_PARM_DESC(console_size, "console size in kbytes");
> +
>  static int dump_oops = -1;
>  module_param(dump_oops, int, 0400);
>  MODULE_PARM_DESC(total_size, "whether dump oops");
> @@ -70,6 +74,12 @@
>  #define DEFAULT_PMSG_SIZE 0
>  #endif
>  
> +#ifdef CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
> +#define DEFAULT_CONSOLE_SIZE CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
> +#else
> +#define DEFAULT_CONSOLE_SIZE 0
> +#endif
> +
>  #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>  #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>  #else
> @@ -124,6 +134,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>  
>  	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
>  	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
> +	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
>  #undef verify_size
>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>  
> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
> index a3464252d52e..9a7e9b06ccf7 100644
> --- a/fs/pstore/blkzone.c
> +++ b/fs/pstore/blkzone.c
> @@ -88,9 +88,11 @@ struct blkz_zone {
>  struct blkz_context {
>  	struct blkz_zone **dbzs;	/* dmesg block zones */
>  	struct blkz_zone *pbz;		/* Pmsg block zone */
> +	struct blkz_zone *cbz;		/* console block zone */
>  	unsigned int dmesg_max_cnt;
>  	unsigned int dmesg_read_cnt;
>  	unsigned int pmsg_read_cnt;
> +	unsigned int console_read_cnt;
>  	unsigned int dmesg_write_cnt;
>  	/*
>  	 * the counter should be recovered when recover.
> @@ -111,6 +113,9 @@ struct blkz_context {
>  };
>  static struct blkz_context blkz_cxt;
>  
> +static void blkz_flush_all_dirty_zones(struct work_struct *);
> +static DECLARE_WORK(blkz_cleaner, blkz_flush_all_dirty_zones);
> +
>  enum blkz_flush_mode {
>  	FLUSH_NONE = 0,
>  	FLUSH_PART,
> @@ -200,6 +205,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
>  	return 0;
>  set_dirty:
>  	atomic_set(&zone->dirty, true);
> +	/* flush dirty zones nicely */
> +	if (wcnt == -EBUSY && !is_on_panic())
> +		schedule_work(&blkz_cleaner);
>  	return -EBUSY;
>  }
>  
> @@ -266,6 +274,15 @@ static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
>  	return 0;
>  }
>  
> +static void blkz_flush_all_dirty_zones(struct work_struct *work)
> +{
> +	struct blkz_context *cxt = &blkz_cxt;
> +
> +	blkz_flush_dirty_zone(cxt->pbz);
> +	blkz_flush_dirty_zone(cxt->cbz);
> +	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
> +}
> +
>  static int blkz_recover_dmesg_data(struct blkz_context *cxt)
>  {
>  	struct blkz_info *info = cxt->bzinfo;
> @@ -419,15 +436,13 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
>  	return ret;
>  }
>  
> -static int blkz_recover_pmsg(struct blkz_context *cxt)
> +static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
>  {
>  	struct blkz_info *info = cxt->bzinfo;
>  	struct blkz_buffer *oldbuf;
> -	struct blkz_zone *zone = NULL;
>  	int ret = 0;
>  	ssize_t rcnt, len;
>  
> -	zone = cxt->pbz;
>  	if (!zone || zone->oldbuf)
>  		return 0;
>  
> @@ -493,7 +508,11 @@ static inline int blkz_recovery(struct blkz_context *cxt)
>  	if (ret)
>  		goto recover_fail;
>  
> -	ret = blkz_recover_pmsg(cxt);
> +	ret = blkz_recover_zone(cxt, cxt->pbz);
> +	if (ret)
> +		goto recover_fail;
> +
> +	ret = blkz_recover_zone(cxt, cxt->cbz);
>  	if (ret)
>  		goto recover_fail;
>  
> @@ -512,6 +531,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
>  
>  	cxt->dmesg_read_cnt = 0;
>  	cxt->pmsg_read_cnt = 0;
> +	cxt->console_read_cnt = 0;
>  	return 0;
>  }
>  
> @@ -539,7 +559,7 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>  	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>  }
>  
> -static inline int blkz_pmsg_erase(struct blkz_context *cxt,
> +static inline int blkz_record_erase(struct blkz_context *cxt,
>  		struct blkz_zone *zone)
>  {
>  	if (unlikely(!blkz_old_ok(zone)))
> @@ -566,9 +586,10 @@ static int blkz_pstore_erase(struct pstore_record *record)
>  	case PSTORE_TYPE_DMESG:
>  		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
>  	case PSTORE_TYPE_PMSG:
> -		return blkz_pmsg_erase(cxt, cxt->pbz);
> -	default:
> -		return -EINVAL;
> +		return blkz_record_erase(cxt, cxt->pbz);
> +	case PSTORE_TYPE_CONSOLE:
> +		return blkz_record_erase(cxt, cxt->cbz);
> +	default: return -EINVAL;
>  	}
>  }
>  
> @@ -653,17 +674,15 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
>  	return 0;
>  }
>  
> -static int notrace blkz_pmsg_write(struct blkz_context *cxt,
> -		struct pstore_record *record)
> +static int notrace blkz_record_write(struct blkz_context *cxt,
> +		struct blkz_zone *zone, struct pstore_record *record)

How about generalizing this earlier in the patch series instead of
mutating it here?

>  {
> -	struct blkz_zone *zone;
>  	size_t start, rem;
>  	int cnt = record->size;
>  	bool is_full_data = false;
>  	char *buf = record->buf;
>  
> -	zone = cxt->pbz;
> -	if (!zone)
> +	if (!zone || !record)
>  		return -ENOSPC;
>  
>  	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
> @@ -710,11 +729,20 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>  			record->reason == KMSG_DUMP_PANIC)
>  		atomic_set(&cxt->on_panic, 1);
>  
> +	/*
> +	 * if on panic, do not write except dmesg records
> +	 * Fix case that panic_write prints log which wakes up console recorder.
> +	 */
> +	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
> +		return -EBUSY;
> +
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
>  		return blkz_dmesg_write(cxt, record);
> +	case PSTORE_TYPE_CONSOLE:
> +		return blkz_record_write(cxt, cxt->cbz, record);
>  	case PSTORE_TYPE_PMSG:
> -		return blkz_pmsg_write(cxt, record);
> +		return blkz_record_write(cxt, cxt->pbz, record);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -738,6 +766,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>  			return zone;
>  	}
>  
> +	if (cxt->console_read_cnt == 0) {
> +		cxt->console_read_cnt++;
> +		zone = cxt->cbz;
> +		if (blkz_old_ok(zone))
> +			return zone;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -799,7 +834,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>  	return size + hlen;
>  }
>  
> -static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
> +static ssize_t blkz_record_read(struct blkz_zone *zone,
>  		struct pstore_record *record)
>  {
>  	size_t size, start;
> @@ -825,7 +860,7 @@ static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
>  static ssize_t blkz_pstore_read(struct pstore_record *record)
>  {
>  	struct blkz_context *cxt = record->psi->data;
> -	ssize_t (*blkz_read)(struct blkz_zone *zone,
> +	ssize_t (*readop)(struct blkz_zone *zone,
>  			struct pstore_record *record);
>  	struct blkz_zone *zone;
>  	ssize_t ret;
> @@ -843,17 +878,19 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>  	record->type = zone->type;
>  	switch (record->type) {
>  	case PSTORE_TYPE_DMESG:
> -		blkz_read = blkz_dmesg_read;
> +		readop = blkz_dmesg_read;
>  		record->id = cxt->dmesg_read_cnt - 1;
>  		break;
> +	case PSTORE_TYPE_CONSOLE:
> +		/* fallthrough */

Since this case has no body, you can leave off the "fallthrough". (But
if you want to mark it anyway, please use "fallthrough;" instead of a
comment.)

>  	case PSTORE_TYPE_PMSG:
> -		blkz_read = blkz_pmsg_read;
> +		readop = blkz_record_read;
>  		break;
>  	default:
>  		goto next_zone;
>  	}
>  
> -	ret = blkz_read(zone, record);
> +	ret = readop(zone, record);
>  	if (ret == READ_NEXT_ZONE)
>  		goto next_zone;
>  	return ret;
> @@ -1001,15 +1038,25 @@ static int blkz_cut_zones(struct blkz_context *cxt)
>  		goto fail_out;
>  	}
>  
> +	off_size += info->console_size;
> +	cxt->cbz = blkz_init_zone(PSTORE_TYPE_CONSOLE, &off,
> +			info->console_size);
> +	if (IS_ERR(cxt->cbz)) {
> +		err = PTR_ERR(cxt->cbz);
> +		goto free_pmsg;
> +	}
> +
>  	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
>  			info->total_size - off_size,
>  			info->dmesg_size, &cxt->dmesg_max_cnt);
>  	if (IS_ERR(cxt->dbzs)) {
>  		err = PTR_ERR(cxt->dbzs);
> -		goto free_pmsg;
> +		goto free_console;
>  	}
>  
>  	return 0;
> +free_console:
> +	blkz_free_zone(&cxt->cbz);
>  free_pmsg:
>  	blkz_free_zone(&cxt->pbz);
>  fail_out:
> @@ -1027,7 +1074,7 @@ int blkz_register(struct blkz_info *info)
>  		return -EINVAL;
>  	}
>  
> -	if (!info->dmesg_size && !info->pmsg_size) {
> +	if (!info->dmesg_size && !info->pmsg_size && !info->console_size) {
>  		pr_warn("at least one of the records be non-zero\n");
>  		return -EINVAL;
>  	}
> @@ -1055,6 +1102,7 @@ int blkz_register(struct blkz_info *info)
>  	check_size(total_size, 4096);
>  	check_size(dmesg_size, SECTOR_SIZE);
>  	check_size(pmsg_size, SECTOR_SIZE);
> +	check_size(console_size, SECTOR_SIZE);
>  
>  #undef check_size
>  
> @@ -1087,6 +1135,7 @@ int blkz_register(struct blkz_info *info)
>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>  	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
>  	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
> +	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
>  
>  	err = blkz_cut_zones(cxt);
>  	if (err) {
> @@ -1108,11 +1157,15 @@ int blkz_register(struct blkz_info *info)
>  		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
>  	if (info->pmsg_size)
>  		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
> +	if (info->console_size)
> +		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
>  
> -	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
> +	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
> +			info->name,
>  			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
>  			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
> -			cxt->pbz ? "Pmsg" : "");
> +			cxt->pbz ? "Pmsg " : "",
> +			cxt->cbz ? "Console" : "");
>  
>  	err = pstore_register(&cxt->pstore);
>  	if (err) {
> @@ -1139,6 +1192,8 @@ void blkz_unregister(struct blkz_info *info)
>  {
>  	struct blkz_context *cxt = &blkz_cxt;
>  
> +	flush_work(&blkz_cleaner);
> +
>  	pstore_unregister(&cxt->pstore);
>  	kfree(cxt->pstore.buf);
>  	cxt->pstore.bufsize = 0;
> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
> index fe63739309aa..8f40f225545d 100644
> --- a/include/linux/blkoops.h
> +++ b/include/linux/blkoops.h
> @@ -23,8 +23,10 @@
>   *	Both of the @size and @offset parameters on this interface are
>   *	the relative size of the space provided, not the whole disk/flash.
>   *
> - *	On success, the number of bytes read should be returned.
> - *	On error, negative number should be returned.
> + *	On success, the number of bytes read/write should be returned.
> + *	On error, negative number should be returned. The following returning
> + *	number means more:
> + *	  -EBUSY: pstore/blk should try again later.
>   * @panic_write:
>   *	The write operation only used for panic.
>   *
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> index af06be25bd01..546375e04419 100644
> --- a/include/linux/pstore_blk.h
> +++ b/include/linux/pstore_blk.h
> @@ -22,6 +22,9 @@
>   * @pmsg_size:
>   *	The size of zone for pmsg. Zero means disabled, othewise, it must be
>   *	multiple of SECTOR_SIZE(512).
> + * @console_size:
> + *	The size of zone for console. Zero means disabled, othewise, it must
> + *	be multiple of SECTOR_SIZE(512).
>   * @dump_oops:
>   *	Dump oops and panic log or only panic.
>   * @read, @write:
> @@ -33,7 +36,9 @@
>   *	the relative size of the space provided, not the whole disk/flash.
>   *
>   *	On success, the number of bytes read/write should be returned.
> - *	On error, negative number should be returned.
> + *	On error, negative number should be returned. The following returning
> + *	number means more:
> + *	  -EBUSY: pstore/blk should try again later.
>   * @panic_write:
>   *	The write operation only used for panic. It's optional if you do not
>   *	care panic record. If panic occur but blkzone do not recover yet, the
> @@ -54,6 +59,7 @@ struct blkz_info {
>  	unsigned long total_size;
>  	unsigned long dmesg_size;
>  	unsigned long pmsg_size;
> +	unsigned long console_size;
>  	int dump_oops;
>  	blkz_read_op read;
>  	blkz_write_op write;
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder
  2020-02-07 12:25 ` [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
@ 2020-03-18 18:19   ` Kees Cook
  2020-03-22 11:42     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:19 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:49PM +0800, WeiXiong Liao wrote:
> Support recorder for ftrace. To enable ftrace recorder, just make
> ftrace_size be greater than 0 and a multiple of 4096.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  fs/pstore/Kconfig          | 12 ++++++++
>  fs/pstore/blkoops.c        | 11 +++++++
>  fs/pstore/blkzone.c        | 75 ++++++++++++++++++++++++++++++++++++++++++++--
>  include/linux/pstore_blk.h |  4 +++
>  4 files changed, 99 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 5f0a42823028..308a0a4c5ee5 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -210,6 +210,18 @@ config PSTORE_BLKOOPS_CONSOLE_SIZE
>  	  NOTE that, both kconfig and module parameters can configure blkoops,
>  	  but module parameters have priority over kconfig.
>  
> +config PSTORE_BLKOOPS_FTRACE_SIZE
> +	int "ftrace size in kbytes for blkoops"
> +	depends on PSTORE_BLKOOPS
> +	depends on PSTORE_FTRACE
> +	default 64

Same tricks. :)

> +	help
> +	  This just sets size of ftrace (ftrace_size) for pstore/blk. The
> +	  size is in KB and must be a multiple of 4.
> +
> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> +	  but module parameters have priority over kconfig.
> +
>  config PSTORE_BLKOOPS_BLKDEV
>  	string "block device for blkoops"
>  	depends on PSTORE_BLKOOPS
> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
> index 05990bc3b168..c76bab671b0b 100644
> --- a/fs/pstore/blkoops.c
> +++ b/fs/pstore/blkoops.c
> @@ -24,6 +24,10 @@
>  module_param(console_size, long, 0400);
>  MODULE_PARM_DESC(console_size, "console size in kbytes");
>  
> +static long ftrace_size = -1;
> +module_param(ftrace_size, long, 0400);
> +MODULE_PARM_DESC(ftrace_size, "ftrace size in kbytes");
> +
>  static int dump_oops = -1;
>  module_param(dump_oops, int, 0400);
>  MODULE_PARM_DESC(total_size, "whether dump oops");
> @@ -80,6 +84,12 @@
>  #define DEFAULT_CONSOLE_SIZE 0
>  #endif
>  
> +#ifdef CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
> +#define DEFAULT_FTRACE_SIZE CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
> +#else
> +#define DEFAULT_FTRACE_SIZE 0
> +#endif
> +
>  #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>  #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>  #else
> @@ -135,6 +145,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>  	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
>  	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
>  	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
> +	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
>  #undef verify_size
>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>  
> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
> index 9a7e9b06ccf7..442e5a5bbfda 100644
> --- a/fs/pstore/blkzone.c
> +++ b/fs/pstore/blkzone.c
> @@ -89,10 +89,13 @@ struct blkz_context {
>  	struct blkz_zone **dbzs;	/* dmesg block zones */
>  	struct blkz_zone *pbz;		/* Pmsg block zone */
>  	struct blkz_zone *cbz;		/* console block zone */
> +	struct blkz_zone **fbzs;	/* Ftrace zones */
>  	unsigned int dmesg_max_cnt;
>  	unsigned int dmesg_read_cnt;
>  	unsigned int pmsg_read_cnt;
>  	unsigned int console_read_cnt;
> +	unsigned int ftrace_max_cnt;
> +	unsigned int ftrace_read_cnt;
>  	unsigned int dmesg_write_cnt;
>  	/*
>  	 * the counter should be recovered when recover.
> @@ -281,6 +284,7 @@ static void blkz_flush_all_dirty_zones(struct work_struct *work)
>  	blkz_flush_dirty_zone(cxt->pbz);
>  	blkz_flush_dirty_zone(cxt->cbz);
>  	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
> +	blkz_flush_dirty_zones(cxt->fbzs, cxt->ftrace_max_cnt);
>  }
>  
>  static int blkz_recover_dmesg_data(struct blkz_context *cxt)
> @@ -497,6 +501,31 @@ static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
>  	return ret;
>  }
>  
> +static int blkz_recover_zones(struct blkz_context *cxt,
> +		struct blkz_zone **zones, unsigned int cnt)
> +{
> +	int ret;
> +	unsigned int i;
> +	struct blkz_zone *zone;
> +
> +	if (!zones)
> +		return 0;
> +
> +	for (i = 0; i < cnt; i++) {
> +		zone = zones[i];
> +		if (unlikely(!zone))
> +			continue;
> +		ret = blkz_recover_zone(cxt, zone);
> +		if (ret)
> +			goto recover_fail;
> +	}
> +
> +	return 0;
> +recover_fail:
> +	pr_debug("recover %s[%u] failed\n", zone->name, i);
> +	return ret;
> +}

Why is this introduced here? Shouldn't this be earlier in the series?

> +
>  static inline int blkz_recovery(struct blkz_context *cxt)
>  {
>  	int ret = -EBUSY;
> @@ -516,6 +545,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
>  	if (ret)
>  		goto recover_fail;
>  
> +	ret = blkz_recover_zones(cxt, cxt->fbzs, cxt->ftrace_max_cnt);
> +	if (ret)
> +		goto recover_fail;
> +
>  	pr_debug("recover end!\n");
>  	atomic_set(&cxt->recovered, 1);
>  	return 0;
> @@ -532,6 +565,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
>  	cxt->dmesg_read_cnt = 0;
>  	cxt->pmsg_read_cnt = 0;
>  	cxt->console_read_cnt = 0;
> +	cxt->ftrace_read_cnt = 0;
>  	return 0;
>  }
>  
> @@ -589,6 +623,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
>  		return blkz_record_erase(cxt, cxt->pbz);
>  	case PSTORE_TYPE_CONSOLE:
>  		return blkz_record_erase(cxt, cxt->cbz);
> +	case PSTORE_TYPE_FTRACE:
> +		return blkz_record_erase(cxt, cxt->fbzs[record->id]);
>  	default: return -EINVAL;
>  	}
>  }
> @@ -743,6 +779,13 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>  		return blkz_record_write(cxt, cxt->cbz, record);
>  	case PSTORE_TYPE_PMSG:
>  		return blkz_record_write(cxt, cxt->pbz, record);
> +	case PSTORE_TYPE_FTRACE: {
> +		int zonenum = smp_processor_id();
> +
> +		if (!cxt->fbzs)
> +			return -ENOSPC;
> +		return blkz_record_write(cxt, cxt->fbzs[zonenum], record);
> +	}
>  	default:
>  		return -EINVAL;
>  	}
> @@ -759,6 +802,12 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>  			return zone;
>  	}
>  
> +	while (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt) {
> +		zone = cxt->fbzs[cxt->ftrace_read_cnt++];
> +		if (blkz_old_ok(zone))
> +			return zone;
> +	}
> +
>  	if (cxt->pmsg_read_cnt == 0) {
>  		cxt->pmsg_read_cnt++;
>  		zone = cxt->pbz;
> @@ -881,6 +930,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>  		readop = blkz_dmesg_read;
>  		record->id = cxt->dmesg_read_cnt - 1;
>  		break;
> +	case PSTORE_TYPE_FTRACE:
> +		record->id = cxt->ftrace_read_cnt - 1;
> +		/* fallthrough */

Please mark with "fallthrough;".
https://www.kernel.org/doc/html/latest/process/deprecated.html#implicit-switch-case-fall-through

>  	case PSTORE_TYPE_CONSOLE:
>  		/* fallthrough */
>  	case PSTORE_TYPE_PMSG:
> @@ -1046,15 +1098,27 @@ static int blkz_cut_zones(struct blkz_context *cxt)
>  		goto free_pmsg;
>  	}
>  
> +	off_size += info->ftrace_size;
> +	cxt->fbzs = blkz_init_zones(PSTORE_TYPE_FTRACE, &off,
> +			info->ftrace_size,
> +			info->ftrace_size / nr_cpu_ids,
> +			&cxt->ftrace_max_cnt);
> +	if (IS_ERR(cxt->fbzs)) {
> +		err = PTR_ERR(cxt->fbzs);
> +		goto free_console;
> +	}
> +
>  	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
>  			info->total_size - off_size,
>  			info->dmesg_size, &cxt->dmesg_max_cnt);
>  	if (IS_ERR(cxt->dbzs)) {
>  		err = PTR_ERR(cxt->dbzs);
> -		goto free_console;
> +		goto free_ftrace;
>  	}
>  
>  	return 0;
> +free_ftrace:
> +	blkz_free_zones(&cxt->fbzs, &cxt->ftrace_max_cnt);
>  free_console:
>  	blkz_free_zone(&cxt->cbz);
>  free_pmsg:
> @@ -1103,6 +1167,7 @@ int blkz_register(struct blkz_info *info)
>  	check_size(dmesg_size, SECTOR_SIZE);
>  	check_size(pmsg_size, SECTOR_SIZE);
>  	check_size(console_size, SECTOR_SIZE);
> +	check_size(ftrace_size, SECTOR_SIZE);
>  
>  #undef check_size
>  
> @@ -1136,6 +1201,7 @@ int blkz_register(struct blkz_info *info)
>  	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
>  	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
>  	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
> +	pr_debug("\tftrace size : %ld Bytes\n", info->ftrace_size);
>  
>  	err = blkz_cut_zones(cxt);
>  	if (err) {
> @@ -1159,13 +1225,16 @@ int blkz_register(struct blkz_info *info)
>  		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
>  	if (info->console_size)
>  		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
> +	if (info->ftrace_size)
> +		cxt->pstore.flags |= PSTORE_FLAGS_FTRACE;
>  
> -	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
> +	pr_info("Registered %s as blkzone backend for %s%s%s%s%s\n",
>  			info->name,
>  			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
>  			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
>  			cxt->pbz ? "Pmsg " : "",
> -			cxt->cbz ? "Console" : "");
> +			cxt->cbz ? "Console " : "",
> +			cxt->fbzs ? "Ftrace" : "");
>  
>  	err = pstore_register(&cxt->pstore);
>  	if (err) {
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> index 546375e04419..77704c1b404a 100644
> --- a/include/linux/pstore_blk.h
> +++ b/include/linux/pstore_blk.h
> @@ -25,6 +25,9 @@
>   * @console_size:
>   *	The size of zone for console. Zero means disabled, othewise, it must
>   *	be multiple of SECTOR_SIZE(512).
> + * @ftrace_size:
> + *	The size of zone for ftrace. Zero means disabled, othewise, it must
> + *	be multiple of SECTOR_SIZE(512).
>   * @dump_oops:
>   *	Dump oops and panic log or only panic.
>   * @read, @write:
> @@ -60,6 +63,7 @@ struct blkz_info {
>  	unsigned long dmesg_size;
>  	unsigned long pmsg_size;
>  	unsigned long console_size;
> +	unsigned long ftrace_size;
>  	int dump_oops;
>  	blkz_read_op read;
>  	blkz_write_op write;
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-02-07 12:25 ` [PATCH v2 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
@ 2020-03-18 18:31   ` Kees Cook
  2020-03-22 12:20     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:31 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:50PM +0800, WeiXiong Liao wrote:
> The document, at Documentation/admin-guide/pstore-block.rst, tells us
> how to use pstore/blk and blkoops.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  Documentation/admin-guide/pstore-block.rst | 281 +++++++++++++++++++++++++++++
>  MAINTAINERS                                |   1 +
>  fs/pstore/Kconfig                          |   2 +
>  3 files changed, 284 insertions(+)
>  create mode 100644 Documentation/admin-guide/pstore-block.rst
> 
> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
> new file mode 100644
> index 000000000000..c8a5f68960c3
> --- /dev/null
> +++ b/Documentation/admin-guide/pstore-block.rst
> @@ -0,0 +1,281 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Pstore block oops/panic logger
> +==============================
> +
> +Introduction
> +------------
> +
> +Pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
> +block device before the system crashes. It also supports non-block devices such
> +as mtd device.
> +
> +There is a trapper named blkoops for pstore/blk, which makes pstore/blk be
> +nicer to device drivers.

"trapper" is an odd term here (oh, maybe this was a typo of
"wrapper"?). Regardless, is there a need to separate blkzone from
blkoops? It seems everything would just use blkoops directly, even
mtdpstore?

> +
> +Pstore block concepts
> +---------------------
> +
> +Pstore/blk works as a zone manager as it cuts the block device or partition
> +into several zones and stores data for different recorders. What device drivers

s/recorders/pstore front-ends/

> +should do is to provide read/write APIs.

"A block device driver only needs to provide read/write APIs."

> +
> +Pstore/blk begins at function ``blkz_register``. Besides, blkoops, a wrapper of
> +pstore/blk, begins at function ``blkoops_register_blkdev`` for block device and
> +``blkoops_register_device`` for non-block device, which is recommended instead
> +of directly using pstore/blk.
> +
> +Blkoops provides efficient configuration method for pstore/blk, which divides
> +all configurations of pstore/blk into two parts, configurations for user and
> +configurations for driver.
> +
> +Configurations for user determine how pstore/blk works, such as pmsg_size,
> +dmesg_size and so on. All of them support both kconfig and module parameters,
> +but module parameters have priority over kconfig.
> +
> +Configurations for driver are all about block/non-block device, such as
> +total_size of device and read/write operations. Device driver transfers a
> +structure ``blkoops_device`` defined in *linux/blkoops.h*.
> +
> +All of the following are for blkoops.
> +
> +Configurations for user
> +-----------------------
> +
> +All of these configurations support both kconfig and module parameters, but
> +module parameters have priority over kconfig.
> +Here is an example for module parameters::
> +
> +        blkoops.blkdev=179:7 blkoops.dmesg_size=64 blkoops.dump_oops=1
> +
> +The detail of each configurations may be of interest to you.
> +
> +blkdev
> +~~~~~~
> +
> +The block device to use. Most of the time, it is a partition of block device.
> +It's fine to ignore it if you are not using a block device.
> +
> +It accepts the following variants:
> +
> +1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
> +   leading 0x, for example b302.
> +#. /dev/<disk_name> represents the device number of disk
> +#. /dev/<disk_name><decimal> represents the device number of partition - device
> +   number of disk plus the partition number
> +#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
> +   name of partitioned disk ends with a digit.
> +#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
> +   a partition if the partition table provides it. The UUID may be either an
> +   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
> +   where SSSSSSSS is a zero-filled hex representation of the 32-bit
> +   "NT disk signature", and PP is a zero-filled hex representation of the
> +   1-based partition number.
> +#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
> +   partition with a known unique id.
> +#. <major>:<minor> major and minor number of the device separated by a colon.
> +
> +dmesg_size
> +~~~~~~~~~~
> +
> +The chunk size in KB for dmesg(oops/panic). It **MUST** be a multiple of 4.
> +If you don't need it, safely set it to 0 or ignore it.
> +
> +NOTE that, the remaining space, except ``pmsg_size``, ``console_size``` and
> +others, belongs to dmesg. It means that there are multiple chunks for dmesg.
> +
> +Pstore/blk will log to dmesg chunks one by one, and always overwrite the oldest
> +chunk if there is no more free chunks.
> +
> +pmsg_size
> +~~~~~~~~~
> +
> +The chunk size in KB for pmsg. It **MUST** be a multiple of 4. If you do not
> +need it, safely set it to 0 or ignore it.
> +
> +There is only one chunk for pmsg.
> +
> +Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
> +appended to the chunk. On reboot the contents are available in
> +/sys/fs/pstore/pmsg-pstore-blk-0.
> +
> +console_size
> +~~~~~~~~~~~~
> +
> +The chunk size in KB for console. It **MUST** be a multiple of 4. If you
> +do not need it, safely set it to 0 or ignore it.
> +
> +There is only one chunk for console.
> +
> +All log of console will be appended to the chunk. On reboot the contents are
> +available in /sys/fs/pstore/console-pstore-blk-0.
> +
> +ftrace_size
> +~~~~~~~~~~~
> +
> +The chunk size in KB for ftrace. It **MUST** be a multiple of 4. If you
> +do not need it, safely set it to 0 or ignore it.
> +
> +There may be several chunks for ftrace, according to how many processors on
> +your CPU. Each chunk size is equal to (ftrace_size / processors_count).
> +
> +All log of ftrace will be appended to the chunk. On reboot the contents are
> +available in /sys/fs/pstore/ftrace-pstore-blk-[N], where N is the processor
> +number.
> +
> +Persistent function tracing might be useful for debugging software or hardware
> +related hangs. Here is an example of usage::
> +
> + # mount -t pstore pstore /sys/fs/pstore
> + # mount -t debugfs debugfs /sys/kernel/debug/
> + # echo 1 > /sys/kernel/debug/pstore/record_ftrace
> + # reboot -f
> + [...]
> + # mount -t pstore pstore /sys/fs/pstore
> + # tail /sys/fs/pstore/ftrace-pstore-blk-0
> + CPU:0 ts:109860 c03a4310  c0063ebc  cpuidle_select <- cpu_startup_entry+0x1a8/0x1e0
> + CPU:0 ts:109861 c03a5878  c03a4324  menu_select <- cpuidle_select+0x24/0x2c
> + CPU:0 ts:109862 c00670e8  c03a589c  pm_qos_request <- menu_select+0x38/0x4cc
> + CPU:0 ts:109863 c0092bbc  c03a5960  tick_nohz_get_sleep_length <- menu_select+0xfc/0x4cc
> + CPU:0 ts:109865 c004b2f4  c03a59d4  get_iowait_load <- menu_select+0x170/0x4cc
> + CPU:0 ts:109868 c0063b60  c0063ecc  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
> + CPU:0 ts:109869 c03a433c  c0063b94  cpuidle_enter <- call_cpuidle+0x44/0x48
> + CPU:0 ts:109871 c03a4000  c03a4350  cpuidle_enter_state <- cpuidle_enter+0x24/0x28
> + CPU:0 ts:109873 c0063ba8  c03a4090  sched_idle_set_state <- cpuidle_enter_state+0xa4/0x314
> + CPU:0 ts:109874 c03a605c  c03a40b4  arm_enter_idle_state <- cpuidle_enter_state+0xc8/0x314

It would be nice to extract ftrace_log_combine() from ram.c and make the
front-end and inode layers aware of this as a way to auto-merge the
records from all backends supporting ftrace.

> +dump_oops
> +~~~~~~~~~
> +
> +Dumping both oopses and panics can be done by setting 1 (not zero) in the
> +``dump_oops`` member while setting 0 in that variable dumps only the panics.
> +
> +Configurations for driver
> +-------------------------
> +
> +Only a device driver cares about these configurations. A block device driver
> +uses ``blkoops_register_blkdev`` while a non-block device driver uses
> +``blkoops_register_device``

Given this clarification, I'd say there is no reason to discuss
blkzone.c at all.

> +
> +The parameters of these two APIs may be of interest to you.
> +
> +major
> +~~~~~
> +
> +It is only required by block device which is registered by
> +``blkoops_register_blkdev``.  It's the major device number of registered
> +devices, by which blkoops can get the matching driver for @blkdev.
> +
> +total_size
> +~~~~~~~~~~
> +
> +It is only required by non-block device which is registered by
> +``blkoops_register_device``.  It tells pstore/blk the total size
> +pstore/blk can use. It is in KB and **MUST** be greater than or equal to 4
> +and a multiple of 4.
> +
> +For block devices, blkoops can get size of block device/partition automatically.
> +
> +read/write
> +~~~~~~~~~~
> +
> +It's generic read/write APIs for pstore/blk, which are required by non-block
> +device. The generic APIs are used for almost all data except panic data,
> +such as pmsg, console, oops and ftrace.
> +
> +The parameter @offset of these interface is the relative position of the device.
> +
> +Normally the number of bytes read/written should be returned, while for error,
> +negative number will be returned. The following return numbers mean more:
> +
> +-EBUSY: pstore/blk should try again later.
> +
> +panic_write (for non-block device)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I still think some other term is needed for "non-block device", since it
_is_ a block device. i.e. we're using it with pstore/blk. ;) I find it
just odd language.

> +
> +It's a interface for panic recorder and will be used only when panic occurs.
> +Non-block device driver registers it by ``blkoops_register_device``. If panic
> +log is unnecessary, it's fine to ignore it.
> +
> +Note that pstore/blk will recover data from device while mounting pstore
> +filesystem by default. If panic occurs but pstore/blk does not recover yet, the
> +first zone of dmesg will be used.
> +
> +The parameter @offset of this interface is the relative position of the device.
> +
> +Normally the number of bytes written should be returned, while for error,
> +negative number should be returned.
> +
> +panic_write (for block device)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +It's much similar to panic_write for non-block device, but the position and
> +data size of panic_write for block device must be aligned to SECTOR_SIZE,
> +that's why the parameters are @sects and @start_sect. Block device driver
> +should register it by ``blkoops_register_blkdev``.
> +
> +The parameter @start_sect is the relative position of the block device and
> +partition. If block driver requires absolute position for panic_write,
> +``blkoops_blkdev_info`` will be helpful, which can provide the absolute
> +position of the block device (or partition) on the whole disk/flash.
> +
> +Normally zero should be returned, otherwise it indicates an error.
> +
> +Compression and header
> +----------------------
> +
> +Block device is large enough for uncompressed dmesg data. Actually we do not
> +recommend data compression because pstore/blk will insert some information into
> +the first line of dmesg data. For example::
> +
> +        Panic: Total 16 times
> +
> +It means that it's OOPS|Panic for the 16th time since the first booting.
> +Sometimes the number of occurrences of oops|panic since the first booting is
> +important to judge whether the system is stable.
> +
> +The following line is inserted by pstore filesystem. For example::
> +
> +        Oops#2 Part1
> +
> +It means that it's OOPS for the 2nd time on the last boot.
> +
> +Reading the data
> +----------------
> +
> +The dump data can be read from the pstore filesystem. The format for these
> +files is ``dmesg-pstore-blk-[N]`` for dmesg(oops|panic), ``pmsg-pstore-blk-0``
> +for pmsg and so on, where N is the record number. To delete a stored
> +record from block device, simply unlink the respective pstore file. The
> +timestamp of the dump file records the trigger time.
> +
> +Attentions in panic read/write APIs
> +-----------------------------------
> +
> +If on panic, the kernel is not going to run for much longer, the tasks will not
> +be scheduled and most kernel resources will be out of service. It
> +looks like a single-threaded program running on a single-core computer.
> +
> +The following points require special attention for panic read/write APIs:
> +
> +1. Can **NOT** allocate any memory.
> +   If you need memory, just allocate while the block driver is initializing
> +   rather than waiting until the panic.
> +#. Must be polled, **NOT** interrupt driven.
> +   No task schedule any more. The block driver should delay to ensure the write
> +   succeeds, but NOT sleep.
> +#. Can **NOT** take any lock.
> +   There is no other task, nor any shared resource; you are safe to break all
> +   locks.
> +#. Just use CPU to transfer.
> +   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
> +#. Control registers directly.
> +   Please control registers directly rather than use Linux kernel resources.
> +   Do I/O map while initializing rather than wait until a panic occurs.
> +#. Reset your block device and controller if necessary.
> +   If you are not sure of the state of your block device and controller when
> +   a panic occurs, you are safe to stop and reset them.
> +
> +Blkoops supports blkoops_blkdev_info(), which is defined in *linux/blkoops.h*,
> +to get information of block device, such as the device number, sector count and
> +start sector of the whole disk.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e4ba97130560..a5122e3aaf76 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13380,6 +13380,7 @@ F:	include/linux/pstore*
>  F:	drivers/firmware/efi/efi-pstore.c
>  F:	drivers/acpi/apei/erst.c
>  F:	Documentation/admin-guide/ramoops.rst
> +F:	Documentation/admin-guide/pstore-block.rst
>  F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
>  K:	\b(pstore|ramoops|blkoops)
>  
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 308a0a4c5ee5..466908a242aa 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -162,6 +162,8 @@ config PSTORE_BLK
>  	  This enables panic and oops message to be logged to a block dev
>  	  where it can be read back at some later point.
>  
> +	  For more information, see Documentation/admin-guide/pstore-block.rst.
> +
>  	  If unsure, say N.
>  
>  config PSTORE_BLKOOPS
> -- 
> 1.9.1
> 

I love the docs; thank you for them! As mentioned in the other email,
perhaps add a section at the bottom like:

blkoops internals
-----------------

For developer reference, here are all the important structures and APIs:

.. kernel-doc: fs/pstore/blkzone.c
   :internal:

.. kernel-doc: fs/pstore/blkoops.c
   :export:

etc

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/11] pstore/blk: skip broken zone for mtd device
  2020-02-07 12:25 ` [PATCH v2 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
@ 2020-03-18 18:35   ` Kees Cook
  2020-03-22 12:27     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:35 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:51PM +0800, WeiXiong Liao wrote:
> It's one of a series of patches for adaptive to MTD device.
> 
> MTD device is not block device. As the block of flash (MTD device) will
> be broken, it's necessary for pstore/blk to skip the broken block
> (bad block).
> 
> If device drivers return -ENEXT, pstore/blk will try next zone of dmesg.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  Documentation/admin-guide/pstore-block.rst |  3 +-
>  fs/pstore/blkzone.c                        | 74 +++++++++++++++++++++++-------
>  include/linux/blkoops.h                    |  4 +-
>  include/linux/pstore_blk.h                 |  4 ++
>  4 files changed, 66 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
> index c8a5f68960c3..be865dfc1a28 100644
> --- a/Documentation/admin-guide/pstore-block.rst
> +++ b/Documentation/admin-guide/pstore-block.rst
> @@ -188,7 +188,8 @@ The parameter @offset of these interface is the relative position of the device.
>  Normally the number of bytes read/written should be returned, while for error,
>  negative number will be returned. The following return numbers mean more:
>  
> --EBUSY: pstore/blk should try again later.
> +1. -EBUSY: pstore/blk should try again later.
> +#. -ENEXT: this zone is used or broken, pstore/blk should try next one.
>  
>  panic_write (for non-block device)
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
> index 442e5a5bbfda..205aeff28992 100644
> --- a/fs/pstore/blkzone.c
> +++ b/fs/pstore/blkzone.c
> @@ -207,6 +207,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
>  
>  	return 0;
>  set_dirty:
> +	/* no need to mark dirty if going to try next zone */
> +	if (wcnt == -ENEXT)
> +		return -ENEXT;
>  	atomic_set(&zone->dirty, true);
>  	/* flush dirty zones nicely */
>  	if (wcnt == -EBUSY && !is_on_panic())
> @@ -360,7 +363,11 @@ static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
>  			return -EINVAL;
>  
>  		rcnt = info->read((char *)buf, len, zone->off);
> -		if (rcnt != len) {
> +		if (rcnt == -ENEXT) {
> +			pr_debug("%s with id %lu may be broken, skip\n",
> +					zone->name, i);
> +			continue;
> +		} else if (rcnt != len) {
>  			pr_err("read %s with id %lu failed\n", zone->name, i);
>  			return (int)rcnt < 0 ? (int)rcnt : -EIO;
>  		}
> @@ -650,24 +657,58 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
>  		hdr->counter = 0;
>  }
>  
> +/*
> + * In case zone is broken, which may occur to MTD device, we try each zones,
> + * start at cxt->dmesg_write_cnt.
> + */
>  static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
>  		struct pstore_record *record)
>  {
> +	int ret = -EBUSY;
>  	size_t size, hlen;
>  	struct blkz_zone *zone;
> -	unsigned int zonenum;
> +	unsigned int i;
>  
> -	zonenum = cxt->dmesg_write_cnt;
> -	zone = cxt->dbzs[zonenum];
> -	if (unlikely(!zone))
> -		return -ENOSPC;
> -	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
> +		unsigned int zonenum, len;
> +
> +		zonenum = (cxt->dmesg_write_cnt + i) % cxt->dmesg_max_cnt;
> +		zone = cxt->dbzs[zonenum];
> +		if (unlikely(!zone))
> +			return -ENOSPC;
>  
> -	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
> -	blkz_write_kmsg_hdr(zone, record);
> -	hlen = sizeof(struct blkz_dmesg_header);
> -	size = min_t(size_t, record->size, zone->buffer_size - hlen);
> -	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
> +		/* avoid destorying old data, allocate a new one */
> +		len = zone->buffer_size + sizeof(*zone->buffer);
> +		zone->oldbuf = zone->buffer;
> +		zone->buffer = kzalloc(len, GFP_KERNEL);
> +		if (!zone->buffer) {
> +			zone->buffer = zone->oldbuf;
> +			return -ENOMEM;
> +		}
> +		zone->buffer->sig = zone->oldbuf->sig;
> +
> +		pr_debug("write %s to zone id %d\n", zone->name, zonenum);
> +		blkz_write_kmsg_hdr(zone, record);
> +		hlen = sizeof(struct blkz_dmesg_header);
> +		size = min_t(size_t, record->size, zone->buffer_size - hlen);
> +		ret = blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
> +		if (likely(!ret || ret != -ENEXT)) {
> +			cxt->dmesg_write_cnt = zonenum + 1;
> +			cxt->dmesg_write_cnt %= cxt->dmesg_max_cnt;
> +			/* no need to try next zone, free last zone buffer */
> +			kfree(zone->oldbuf);
> +			zone->oldbuf = NULL;
> +			return ret;
> +		}
> +
> +		pr_debug("zone %u may be broken, try next dmesg zone\n",
> +				zonenum);
> +		kfree(zone->buffer);
> +		zone->buffer = zone->oldbuf;
> +		zone->oldbuf = NULL;
> +	}
> +
> +	return -EBUSY;
>  }
>  
>  static int notrace blkz_dmesg_write(struct blkz_context *cxt,
> @@ -791,7 +832,6 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>  	}
>  }
>  
> -#define READ_NEXT_ZONE ((ssize_t)(-1024))
>  static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>  {
>  	struct blkz_zone *zone = NULL;
> @@ -852,7 +892,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>  	if (blkz_read_dmesg_hdr(zone, record)) {
>  		atomic_set(&zone->buffer->datalen, 0);
>  		atomic_set(&zone->dirty, 0);
> -		return READ_NEXT_ZONE;
> +		return -ENEXT;
>  	}
>  	size -= sizeof(struct blkz_dmesg_header);
>  
> @@ -877,7 +917,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>  	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
>  				sizeof(struct blkz_dmesg_header)) < 0)) {
>  		kfree(record->buf);
> -		return READ_NEXT_ZONE;
> +		return -ENEXT;
>  	}
>  
>  	return size + hlen;
> @@ -891,7 +931,7 @@ static ssize_t blkz_record_read(struct blkz_zone *zone,
>  
>  	buf = (struct blkz_buffer *)zone->oldbuf;
>  	if (!buf)
> -		return READ_NEXT_ZONE;
> +		return -ENEXT;
>  
>  	size = atomic_read(&buf->datalen);
>  	start = atomic_read(&buf->start);
> @@ -943,7 +983,7 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>  	}
>  
>  	ret = readop(zone, record);
> -	if (ret == READ_NEXT_ZONE)
> +	if (ret == -ENEXT)
>  		goto next_zone;
>  	return ret;
>  }
> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
> index 8f40f225545d..71c596fd4cc8 100644
> --- a/include/linux/blkoops.h
> +++ b/include/linux/blkoops.h
> @@ -27,6 +27,7 @@
>   *	On error, negative number should be returned. The following returning
>   *	number means more:
>   *	  -EBUSY: pstore/blk should try again later.
> + *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
>   * @panic_write:
>   *	The write operation only used for panic.
>   *
> @@ -45,7 +46,8 @@ struct blkoops_device {
>  
>  /*
>   * Panic write for block device who should write alignmemt to SECTOR_SIZE.
> - * On success, zero should be returned. Others mean error.
> + * On success, zero should be returned. Others mean error except that -ENEXT
> + * means the zone is used or broken, pstore/blk should try next one.
>   */
>  typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
>  		sector_t sects);
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> index 77704c1b404a..bbbe4fe37f7c 100644
> --- a/include/linux/pstore_blk.h
> +++ b/include/linux/pstore_blk.h
> @@ -6,6 +6,9 @@
>  #include <linux/types.h>
>  #include <linux/blkdev.h>
>  
> +/* read/write function return -ENEXT means try next zone */
> +#define ENEXT ((ssize_t)(1024))

I really don't like inventing errno numbers. Can you just reuse an
existing (but non-block) errno like ESRCH or ENOMSG or something?

> +
>  /**
>   * struct blkz_info - backend blkzone driver structure
>   *
> @@ -42,6 +45,7 @@
>   *	On error, negative number should be returned. The following returning
>   *	number means more:
>   *	  -EBUSY: pstore/blk should try again later.
> + *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
>   * @panic_write:
>   *	The write operation only used for panic. It's optional if you do not
>   *	care panic record. If panic occur but blkzone do not recover yet, the
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 08/11] blkoops: respect for device to pick recorders
  2020-02-07 12:25 ` [PATCH v2 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
@ 2020-03-18 18:42   ` Kees Cook
  2020-03-22 13:06     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:42 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

In the subject and through-out:

s/recorders/pstore front-ends/

On Fri, Feb 07, 2020 at 08:25:52PM +0800, WeiXiong Liao wrote:
> It's one of a series of patches for adaptive to MTD device.

typo: adapting

> 
> MTD device is not block device. The sector of flash (MTD device) will be
> broken if erase over limited cycles. Avoid damaging block so fast, we
> can not write to a sector frequently. So, the recorders of pstore/blk
> like console and ftrace recorder should not be supported.
> 
> Besides, mtd device need aligned write/erase size. To avoid
> over-erasing/writing flash, we should keep a aligned cache and read old
> data to cache before write/erase, which make codes more complex. So,
> pmsg do not be supported now because it writes misaligned.
> 
> How about dmesg? Luckly, pstore/blk keeps several aligned chunks for
> dmesg and uses one by one for wear balance.
> 
> So, MTD device for pstore should pick recorders, that is why the patch
> here.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  Documentation/admin-guide/pstore-block.rst |  9 +++++++++
>  fs/pstore/blkoops.c                        | 29 +++++++++++++++++++++--------
>  include/linux/blkoops.h                    | 14 +++++++++++++-
>  3 files changed, 43 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
> index be865dfc1a28..299142b3d8e6 100644
> --- a/Documentation/admin-guide/pstore-block.rst
> +++ b/Documentation/admin-guide/pstore-block.rst
> @@ -166,6 +166,15 @@ It is only required by block device which is registered by
>  ``blkoops_register_blkdev``.  It's the major device number of registered
>  devices, by which blkoops can get the matching driver for @blkdev.
>  
> +flags
> +~~~~~
> +
> +Refer to macro starting with *BLKOOPS_DEV_SUPPORT_* which is defined in
> +*linux/blkoops.h*. They tell us that which pstore/blk recorders this device
> +supports. Default zero means all recorders for compatible, witch is the same

typo: witch -> which

> +as BLKOOPS_DEV_SUPPORT_ALL. Recorder works only when chunk size is not zero
> +and device support.

There are already flags for this, please see "Supported frontends"
in include/linux/pstore.h

> +
>  total_size
>  ~~~~~~~~~~
>  
> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
> index c76bab671b0b..01170b344f00 100644
> --- a/fs/pstore/blkoops.c
> +++ b/fs/pstore/blkoops.c
> @@ -128,9 +128,16 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>  		return -ENOMEM;
>  	}
>  
> -#define verify_size(name, defsize, alignsize) {				\
> -		long _##name_ = (name);					\
> -		if (_##name_ < 0)					\
> +	/* zero means all recorders for compatible */
> +	if (bo_dev->flags == BLKOOPS_DEV_SUPPORT_DEFAULT)
> +		bo_dev->flags = BLKOOPS_DEV_SUPPORT_ALL;
> +#define verify_size(name, defsize, alignsize, enable) {			\
> +		long _##name_;						\
> +		if (!(enable))						\
> +			_##name_ = 0;					\
> +		else if ((name) >= 0)					\
> +			_##name_ = (name);				\
> +		else							\
>  			_##name_ = (defsize);				\
>  		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
>  		if (_##name_ & ((alignsize) - 1)) {			\
> @@ -142,10 +149,14 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>  		bzinfo->name = _##name_;				\
>  	}
>  
> -	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
> -	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
> -	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
> -	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
> +	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096,
> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_DMESG);
> +	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096,
> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_PMSG);
> +	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096,
> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_CONSOLE);
> +	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096,
> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_FTRACE);

I'd kind of prefer this patch be moved much earlier in the series so
that the later additions of front-end support doesn't have to be touched
twice. i.e. when PMSG support is added, it is added as a whole here and
does the flag check in that patch, etc.

>  #undef verify_size
>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>  
> @@ -336,6 +347,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
>   * register block device to blkoops
>   * @major: the major device number of registering device
>   * @panic_write: the write interface for panic case.
> + * @flags: Refer to macro starting with BLKOOPS_DEV_SUPPORT.
>   *
>   * It is ONLY used for block device to register to blkoops. In this case,
>   * the module parameter @blkdev must be valid. Generic read/write interfaces
> @@ -349,7 +361,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
>   * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
>   * will be used.
>   */
> -int blkoops_register_blkdev(unsigned int major,
> +int blkoops_register_blkdev(unsigned int major, unsigned int flags,
>  		blkoops_blk_panic_write_op panic_write)
>  {
>  	struct block_device *bdev;
> @@ -372,6 +384,7 @@ int blkoops_register_blkdev(unsigned int major,
>  	if (bo_dev.total_size == 0)
>  		goto err_put_bdev;
>  	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
> +	bo_dev.flags = flags;
>  	bo_dev.read = blkoops_generic_blk_read;
>  	bo_dev.write = blkoops_generic_blk_write;
>  
> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
> index 71c596fd4cc8..bc7665d14a98 100644
> --- a/include/linux/blkoops.h
> +++ b/include/linux/blkoops.h
> @@ -6,6 +6,7 @@
>  #include <linux/types.h>
>  #include <linux/blkdev.h>
>  #include <linux/pstore_blk.h>
> +#include <linux/bitops.h>
>  
>  /**
>   * struct blkoops_device - backend blkoops driver structure.
> @@ -14,6 +15,10 @@
>   * blkoops_register_device(). If block device, you are strongly recommended
>   * to use blkoops_register_blkdev().
>   *
> + * @flags:
> + *	Refer to macro starting with BLKOOPS_DEV_SUPPORT_. These macros tell
> + *	us that which pstore/blk recorders this device supports. Zero means
> + *	all recorders for compatible.
>   * @total_size:
>   *	The total size in bytes pstore/blk can use. It must be greater than
>   *	4096 and be multiple of 4096.
> @@ -38,6 +43,13 @@
>   *	On error, negative number should be returned.
>   */
>  struct blkoops_device {
> +	unsigned int flags;
> +#define BLKOOPS_DEV_SUPPORT_ALL		UINT_MAX
> +#define BLKOOPS_DEV_SUPPORT_DEFAULT	(0)
> +#define BLKOOPS_DEV_SUPPORT_DMESG	BIT(0)
> +#define BLKOOPS_DEV_SUPPORT_PMSG	BIT(1)
> +#define BLKOOPS_DEV_SUPPORT_CONSOLE	BIT(2)
> +#define BLKOOPS_DEV_SUPPORT_FTRACE	BIT(3)
>  	unsigned long total_size;
>  	blkz_read_op read;
>  	blkz_write_op write;
> @@ -54,7 +66,7 @@ typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
>  
>  int  blkoops_register_device(struct blkoops_device *bo_dev);
>  void blkoops_unregister_device(struct blkoops_device *bo_dev);
> -int  blkoops_register_blkdev(unsigned int major,
> +int  blkoops_register_blkdev(unsigned int major, unsigned int flags,
>  		blkoops_blk_panic_write_op panic_write);
>  void blkoops_unregister_blkdev(unsigned int major);
>  int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/11] pstore/blk: blkoops: support special removing jobs for dmesg.
  2020-02-07 12:25 ` [PATCH v2 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
@ 2020-03-18 18:47   ` Kees Cook
  2020-03-22 13:03     ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:47 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:53PM +0800, WeiXiong Liao wrote:
> It's one of a series of patches for adaptive to MTD device.
> 
> MTD device is not block device. To write to flash device on MTD, erase
> must to be done before. However, pstore/blk just set datalen as 0 when
> remove, which is not enough for mtd device. That's why this patch here,
> to support special jobs when removing pstore/blk record.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  Documentation/admin-guide/pstore-block.rst |  9 +++++++++
>  fs/pstore/blkoops.c                        |  4 +++-
>  fs/pstore/blkzone.c                        |  9 ++++++++-
>  include/linux/blkoops.h                    | 10 ++++++++++
>  include/linux/pstore_blk.h                 | 11 +++++++++++
>  5 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
> index 299142b3d8e6..1735476621df 100644
> --- a/Documentation/admin-guide/pstore-block.rst
> +++ b/Documentation/admin-guide/pstore-block.rst
> @@ -200,6 +200,15 @@ negative number will be returned. The following return numbers mean more:
>  1. -EBUSY: pstore/blk should try again later.
>  #. -ENEXT: this zone is used or broken, pstore/blk should try next one.
>  
> +erase
> +~~~~~
> +
> +It's generic erase API for pstore/blk, which is requested by non-block device.
> +It will be called while pstore record is removing. It's required only when the
> +device has special removing jobs. For example, MTD device tries to erase block.
> +
> +Normally zero should be returned, otherwise it indicates an error.
> +
>  panic_write (for non-block device)
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
> index 01170b344f00..7cf4731e52f7 100644
> --- a/fs/pstore/blkoops.c
> +++ b/fs/pstore/blkoops.c
> @@ -164,6 +164,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>  	bzinfo->dump_oops = dump_oops;
>  	bzinfo->read = bo_dev->read;
>  	bzinfo->write = bo_dev->write;
> +	bzinfo->erase = bo_dev->erase;
>  	bzinfo->panic_write = bo_dev->panic_write;
>  	bzinfo->name = "blkoops";
>  	bzinfo->owner = THIS_MODULE;
> @@ -383,10 +384,11 @@ int blkoops_register_blkdev(unsigned int major, unsigned int flags,
>  	bo_dev.total_size = blkoops_bdev_size(bdev);
>  	if (bo_dev.total_size == 0)
>  		goto err_put_bdev;
> -	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
>  	bo_dev.flags = flags;
>  	bo_dev.read = blkoops_generic_blk_read;
>  	bo_dev.write = blkoops_generic_blk_write;
> +	bo_dev.erase = NULL;
> +	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
>  
>  	ret = blkoops_register_device(&bo_dev);
>  	if (ret)

I think this patch, like the prior, needs to be reordered in the series.
How about adding

blkoops_register_device()

as a single patch, which is what provides support for the "non-block"
block devices? Then the blkoops_register_blkdev() can stand alone in the
first patch?

It just might be easier to review, since nothing uses
blkoops_register_device() until the mtd driver is added. So that
function and this patch would go together as a single "support non-block
devices" change.

> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
> index 205aeff28992..a17fff77b875 100644
> --- a/fs/pstore/blkzone.c
> +++ b/fs/pstore/blkzone.c
> @@ -593,11 +593,18 @@ static inline bool blkz_ok(struct blkz_zone *zone)
>  static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>  		struct blkz_zone *zone)
>  {
> +	size_t size;
> +
>  	if (unlikely(!blkz_ok(zone)))
>  		return 0;
>  
>  	atomic_set(&zone->buffer->datalen, 0);
> -	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
> +
> +	size = buffer_datalen(zone) + sizeof(*zone->buffer);
> +	if (cxt->bzinfo->erase)
> +		return cxt->bzinfo->erase(size, zone->off);
> +	else
> +		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>  }
>  
>  static inline int blkz_record_erase(struct blkz_context *cxt,
> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
> index bc7665d14a98..11cb3036ad5f 100644
> --- a/include/linux/blkoops.h
> +++ b/include/linux/blkoops.h
> @@ -33,6 +33,15 @@
>   *	number means more:
>   *	  -EBUSY: pstore/blk should try again later.
>   *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
> + * @erase:
> + *	The general (not panic) erase operation. It will be call while pstore
> + *	record is removing. It's required only when device have special
> + *	removing jobs, for example, MTD device try to erase block.
> + *
> + *	Both of the @size and @offset parameters on this interface are
> + *	the relative size of the space provided, not the whole disk/flash.
> + *
> + *	On success, 0 should be returned. Others mean error.
>   * @panic_write:
>   *	The write operation only used for panic.
>   *
> @@ -53,6 +62,7 @@ struct blkoops_device {
>  	unsigned long total_size;
>  	blkz_read_op read;
>  	blkz_write_op write;
> +	blkz_erase_op erase;
>  	blkz_write_op panic_write;
>  };
>  
> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
> index bbbe4fe37f7c..9641969f888f 100644
> --- a/include/linux/pstore_blk.h
> +++ b/include/linux/pstore_blk.h
> @@ -46,6 +46,15 @@
>   *	number means more:
>   *	  -EBUSY: pstore/blk should try again later.
>   *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
> + * @erase:
> + *	The general (not panic) erase operation. It will be call while pstore
> + *	record is removing. It's required only when device have special
> + *	removing jobs, for example, MTD device try to erase block.
> + *
> + *	Both of the @size and @offset parameters on this interface are
> + *	the relative size of the space provided, not the whole disk/flash.
> + *
> + *	On success, 0 should be returned. Others mean error.
>   * @panic_write:
>   *	The write operation only used for panic. It's optional if you do not
>   *	care panic record. If panic occur but blkzone do not recover yet, the
> @@ -59,6 +68,7 @@
>   */
>  typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
>  typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
> +typedef ssize_t (*blkz_erase_op)(size_t, loff_t);
>  struct blkz_info {
>  	struct module *owner;
>  	const char *name;
> @@ -71,6 +81,7 @@ struct blkz_info {
>  	int dump_oops;
>  	blkz_read_op read;
>  	blkz_write_op write;
> +	blkz_erase_op erase;
>  	blkz_write_op panic_write;
>  };
>  
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-07 12:25 ` [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
  2020-02-18 10:34   ` Miquel Raynal
@ 2020-03-18 18:57   ` Kees Cook
  2020-03-22 13:51     ` WeiXiong Liao
  1 sibling, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-18 18:57 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Feb 07, 2020 at 08:25:55PM +0800, WeiXiong Liao wrote:
> It's the last one of a series of patches for adaptive to MTD device.
> 
> The mtdpstore is similar to mtdoops but more powerful. It bases on
> pstore/blk, aims to store panic and oops logs to a flash partition,
> where it can be read back as files after mounting pstore filesystem.
> 
> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
> block device at the very beginning, but now, compatible to not only
> block device. After this series of patches, pstore/blk can also work
> for MTD device. To make it work, 'blkdev' on kconfig or module
> parameter of blkoops should be set as mtd device name or mtd number.
> See more about pstore/blk and blkoops on:
>     Documentation/admin-guide/pstore-block.rst
> 
> Why do we need mtdpstore?
> 1. repetitive jobs between pstore and mtdoops
>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
>    They have much similar logic that register to kmsg dumper and store
>    log to several chunks one by one.
> 2. do what a driver should do
>    To me, a driver should provide methods instead of policies. What MTD
>    should do is to provide read/write/erase operations, geting rid of codes
>    about chunk management, kmsg dumper and configuration.
> 3. enhanced feature
>    Not only store log, but also show it as files.
>    Not only log, but also trigger time and trigger count.
>    Not only panic/oops log, but also log recorder for pmsg, console and
>    ftrace in the future.

I wonder if it's possible to make this device driver "invisible", in the
sense that it could be entirely user-configured via blkoops. I don't
think that's needed right now, especially since it's MOSTLY configured
by blkoops.param, etc, but I'll keep thinking about it.

Modulo various naming convention adjustments outlined in the other
patches, this looks fine to me (I can't really speak to the mtd driver
bits itself, but the pstore and blkoops interaction looks good).

-Kees

> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  Documentation/admin-guide/pstore-block.rst |  10 +-
>  drivers/mtd/Kconfig                        |  10 +
>  drivers/mtd/Makefile                       |   1 +
>  drivers/mtd/mtdpstore.c                    | 564 +++++++++++++++++++++++++++++
>  4 files changed, 583 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/mtd/mtdpstore.c
> 
> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
> index 1735476621df..823fe2b4b84f 100644
> --- a/Documentation/admin-guide/pstore-block.rst
> +++ b/Documentation/admin-guide/pstore-block.rst
> @@ -54,9 +54,10 @@ blkdev
>  ~~~~~~
>  
>  The block device to use. Most of the time, it is a partition of block device.
> -It's fine to ignore it if you are not using a block device.
> +It is also used for MTD device. It's fine to ignore it if you are not using
> +a block device or a MTD device.
>  
> -It accepts the following variants:
> +It accepts the following variants for block device:
>  
>  1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
>     leading 0x, for example b302.
> @@ -75,6 +76,11 @@ It accepts the following variants:
>     partition with a known unique id.
>  #. <major>:<minor> major and minor number of the device separated by a colon.
>  
> +It accepts the following variants for MTD device:
> +
> +1. <device name> MTD device name. "pstore" is recommended.
> +#. <device number> MTD device number.
> +
>  dmesg_size
>  ~~~~~~~~~~
>  
> diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
> index 42d401ea60ee..5d53d5cd2998 100644
> --- a/drivers/mtd/Kconfig
> +++ b/drivers/mtd/Kconfig
> @@ -170,6 +170,16 @@ config MTD_OOPS
>  	  buffer in a flash partition where it can be read back at some
>  	  later point.
>  
> +config MTD_PSTORE
> +	tristate "Log panic/oops to an MTD buffer based on pstore"
> +	depends on PSTORE_BLKOOPS
> +	help
> +	  This enables panic and oops messages to be logged to a circular
> +	  buffer in a flash partition where it can be read back as files after
> +	  mounting pstore filesystem.
> +
> +	  If unsure, say N.
> +
>  config MTD_SWAP
>  	tristate "Swap on MTD device support"
>  	depends on MTD && SWAP
> diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
> index 56cc60ccc477..593d0593a038 100644
> --- a/drivers/mtd/Makefile
> +++ b/drivers/mtd/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
>  obj-$(CONFIG_SSFDC)		+= ssfdc.o
>  obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
>  obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
> +obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
>  obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
>  
>  nftl-objs		:= nftlcore.o nftlmount.o
> diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
> new file mode 100644
> index 000000000000..58b9e10ef675
> --- /dev/null
> +++ b/drivers/mtd/mtdpstore.c
> @@ -0,0 +1,564 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define dev_fmt(fmt) "mtdoops-pstore: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/blkoops.h>
> +#include <linux/mtd/mtd.h>
> +#include <linux/bitops.h>
> +
> +static struct mtdpstore_context {
> +	int index;
> +	struct blkoops_info bo_info;
> +	struct blkoops_device bo_dev;
> +	struct mtd_info *mtd;
> +	unsigned long *rmmap;		/* removed bit map */
> +	unsigned long *usedmap;		/* used bit map */
> +	/*
> +	 * used for panic write
> +	 * As there are no block_isbad for panic case, we should keep this
> +	 * status before panic to ensure panic_write not failed.
> +	 */
> +	unsigned long *badmap;		/* bad block bit map */
> +} oops_cxt;
> +
> +static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret;
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	ret = mtd_block_isbad(mtd, off);
> +	if (ret < 0) {
> +		dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
> +		return ret;
> +	} else if (ret > 0) {
> +		set_bit(blknum, cxt->badmap);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	return test_bit(blknum, cxt->badmap);
> +}
> +
> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
> +	set_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
> +	clear_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
> +		clear_bit(zonenum, cxt->usedmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	return test_bit(zonenum, cxt->usedmap);
> +}
> +
> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->usedmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
> +		size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t sz;
> +	int i;
> +
> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
> +	for (i = 0; i < sz; i++) {
> +		if (buf[i] != (char)0xFF)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
> +	set_bit(zonenum, cxt->rmmap);
> +}
> +
> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		clear_bit(zonenum, cxt->rmmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->rmmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	struct erase_info erase;
> +	int ret;
> +
> +	dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
> +	erase.len = cxt->mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(cxt->mtd, &erase);
> +	if (!ret)
> +		mtdpstore_block_clear_removed(cxt, off);
> +	else
> +		dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
> +		       (unsigned long long)erase.addr,
> +		       (unsigned long long)erase.len, cxt->bo_info.device);
> +	return ret;
> +}
> +
> +/*
> + * called while removing file
> + *
> + * Avoiding over erasing, do erase block only when the whole block is unused.
> + * If the block contains valid log, do erase lazily on flush_removed() when
> + * unregister.
> + */
> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -EIO;
> +
> +	mtdpstore_mark_unused(cxt, off);
> +
> +	/* If the block still has valid data, mtdpstore do erase lazily */
> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
> +		mtdpstore_mark_removed(cxt, off);
> +		return 0;
> +	}
> +
> +	/* all zones are unused, erase it */
> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
> +	return mtdpstore_erase_do(cxt, off);
> +}
> +
> +/*
> + * What is security for mtdpstore?
> + * As there is no erase for panic case, we should ensure at least one zone
> + * is writable. Otherwise, panic write will fail.
> + * If zone is used, write operation will return -ENEXT, which means that
> + * pstore/blk will try one by one until gets an empty zone. So, it is not
> + * needed to ensure the next zone is empty, but at least one.
> + */
> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret = 0, i;
> +	struct mtd_info *mtd = cxt->mtd;
> +	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
> +	u32 erasesize = cxt->mtd->erasesize;
> +
> +	for (i = 0; i < zonecnt; i++) {
> +		u32 num = (zonenum + i) % zonecnt;
> +
> +		/* found empty zone */
> +		if (!test_bit(num, cxt->usedmap))
> +			return 0;
> +	}
> +
> +	/* If there is no any empty zone, we have no way but to do erase */
> +	off = ALIGN_DOWN(off, erasesize);
> +	while (blkcnt--) {
> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
> +
> +		if (mtdpstore_block_isbad(cxt, off))
> +			continue;
> +
> +		ret = mtdpstore_erase_do(cxt, off);
> +		if (!ret) {
> +			mtdpstore_block_mark_unused(cxt, off);
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		dev_err(&mtd->dev, "all blocks bad!\n");
> +	dev_dbg(&mtd->dev, "end security\n");
> +	return ret;
> +}
> +
> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENEXT;
> +
> +	dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || retlen != size) {
> +		dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static inline bool mtdpstore_is_io_error(int ret)
> +{
> +	return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
> +}
> +
> +/*
> + * All zones will be read as pstore/blk will read zone one by one when do
> + * recover.
> + */
> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen, done;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
> +	for (done = 0, retlen = 0; done < size; done += retlen) {
> +		retlen = 0;
> +
> +		ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
> +				(u_char *)buf + done);
> +		if (mtdpstore_is_io_error(ret)) {
> +			dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
> +					off + done, retlen, size - done, ret);
> +			/* the zone may be broken, try next one */
> +			return -ENEXT;
> +		}
> +
> +		/*
> +		 * ECC error. The impact on log data is so small. Maybe we can
> +		 * still read it and try to understand. So mtdpstore just hands
> +		 * over what it gets and user can judge whether the data is
> +		 * valid or not.
> +		 */
> +		if (mtd_is_eccerr(ret)) {
> +			dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
> +					off + done, retlen, size - done, ret);
> +			/* driver may not set retlen when ecc error */
> +			retlen = retlen == 0 ? size - done : retlen;
> +		}
> +	}
> +
> +	if (mtdpstore_is_empty(cxt, buf, size))
> +		mtdpstore_mark_unused(cxt, off);
> +	else
> +		mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_panic_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENEXT;
> +
> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || size != retlen) {
> +		dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	return retlen;
> +}
> +
> +static void mtdpstore_notify_add(struct mtd_info *mtd)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct blkoops_info *info = &cxt->bo_info;
> +	unsigned long longcnt;
> +
> +	if (!strcmp(mtd->name, info->device))
> +		cxt->index = mtd->index;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
> +
> +	if (mtd->size < info->dmesg_size * 2) {
> +		dev_err(&mtd->dev, "MTD partition %d not big enough\n",
> +				mtd->index);
> +		return;
> +	}
> +	/*
> +	 * dmesg_size must be aligned to 4096 Bytes, which is limited by
> +	 * blkoops. The default value of dmesg_size is 64KB. If dmesg_size
> +	 * is larger than erasesize, some errors will occur since mtdpsotre
> +	 * is designed on it.
> +	 */
> +	if (mtd->erasesize < info->dmesg_size) {
> +		dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
> +				mtd->index);
> +		return;
> +	}
> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
> +		dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
> +				info->dmesg_size / 1024,
> +				mtd->writesize / 1024);
> +		return;
> +	}
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	cxt->bo_dev.total_size = mtd->size;
> +	/* just support dmesg right now */
> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
> +	cxt->bo_dev.read = mtdpstore_read;
> +	cxt->bo_dev.write = mtdpstore_write;
> +	cxt->bo_dev.erase = mtdpstore_erase;
> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
> +
> +	ret = blkoops_register_device(&cxt->bo_dev);
> +	if (ret) {
> +		dev_err(&mtd->dev, "mtd%d register to blkoops failed\n",
> +				mtd->index);
> +		return;
> +	}
> +	cxt->mtd = mtd;
> +	dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
> +}
> +
> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
> +		loff_t off, size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u_char *buf;
> +	int ret;
> +	size_t retlen;
> +	struct erase_info erase;
> +
> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	/* 1st. read to cache */
> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
> +	if (mtdpstore_is_io_error(ret))
> +		goto free;
> +
> +	/* 2nd. erase block */
> +	erase.len = mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(mtd, &erase);
> +	if (ret)
> +		goto free;
> +
> +	/* 3rd. write back */
> +	while (size) {
> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
> +
> +		/* there is valid data on block, write back */
> +		if (mtdpstore_is_used(cxt, off)) {
> +			ret = mtd_write(mtd, off, zonesize, &retlen, buf);
> +			if (ret)
> +				dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
> +						off, retlen, zonesize, ret);
> +		}
> +
> +		off += zonesize;
> +		size -= min_t(unsigned int, zonesize, size);
> +	}
> +
> +free:
> +	kfree(buf);
> +	return ret;
> +}
> +
> +/*
> + * What does mtdpstore_flush_removed() do?
> + * When user remove any log file on pstore filesystem, mtdpstore should do
> + * something to ensure log file removed. If the whole block is no longer used,
> + * it's nice to erase the block. However if the block still contains valid log,
> + * what mtdpstore can do is to erase and write the valid log back.
> + */
> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	int ret;
> +	loff_t off;
> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
> +
> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
> +		ret = mtdpstore_block_isbad(cxt, off);
> +		if (ret)
> +			continue;
> +
> +		ret = mtdpstore_block_is_removed(cxt, off);
> +		if (!ret)
> +			continue;
> +
> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	mtdpstore_flush_removed(cxt);
> +
> +	blkoops_unregister_device(&cxt->bo_dev);
> +	kfree(cxt->badmap);
> +	kfree(cxt->usedmap);
> +	kfree(cxt->rmmap);
> +	cxt->mtd = NULL;
> +	cxt->index = -1;
> +}
> +
> +static struct mtd_notifier mtdpstore_notifier = {
> +	.add	= mtdpstore_notify_add,
> +	.remove	= mtdpstore_notify_remove,
> +};
> +
> +static int __init mtdpstore_init(void)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct mtd_info *mtd = cxt->mtd;
> +	struct blkoops_info *info = &cxt->bo_info;
> +
> +	ret = blkoops_info(info);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (strlen(info->device) == 0) {
> +		dev_err(&mtd->dev, "mtd device must be supplied\n");
> +		return -EINVAL;
> +	}
> +	if (!info->dmesg_size) {
> +		dev_err(&mtd->dev, "no recorder enabled\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Setup the MTD device to use */
> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
> +	if (ret)
> +		cxt->index = -1;
> +
> +	register_mtd_user(&mtdpstore_notifier);
> +	return 0;
> +}
> +module_init(mtdpstore_init);
> +
> +static void __exit mtdpstore_exit(void)
> +{
> +	unregister_mtd_user(&mtdpstore_notifier);
> +}
> +module_exit(mtdpstore_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
> -- 
> 1.9.1
> 

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-03-18 17:23       ` Kees Cook
@ 2020-03-20  1:50         ` WeiXiong Liao
  2020-03-20 18:20           ` Kees Cook
  0 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-20  1:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

Hi Kees Cook,

On 2020/3/19 AM 1:23, Kees Cook wrote:
> On Thu, Feb 27, 2020 at 04:21:51PM +0800, liaoweixiong wrote:
>> On 2020/2/26 AM 8:52, Kees Cook wrote:
>>> On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
>>>> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
>>>> +pstore_blk-y += blkzone.o
>>>
>>> Why this dance with files? I would just expect:
>>>
>>> obj-$(CONFIG_PSTORE_BLK)     += blkzone.o
>>>
>>
>> This makes the built module named blkzone.ko rather than
>> pstore_blk.ko.
> 
> You can just do a regular build rule:
> 
> obj-$(CONFIG_PSTORE_BLK) += blkzone.o
> 

I don't get it. If make it as your words, the built module will be
blkzone.ko.
The module is named pstore/blk, however it built out blkzone.ko. I think
it's
confusing.

>>>> +#define BLK_SIG (0x43474244) /* DBGC */
>>>
>>> I was going to suggest extracting PERSISTENT_RAM_SIG, renaming it and
>>> using it in here and in ram_core.c, but then I realize they're not
>>> marking the same structure. How about choosing a new magic sig for the
>>> blkzone data header?
>>>
>>
>> That's OK to me. I don't know if there is a rule to get a new magic?
>> In addition, all members of this structure are the same as
>> struct persistent_ram_buffer after patch 2. Maybe it's a good idea to
>> extract it
>> if you want to merge ramoops and pstore/blk.
> 
> Okay, let's leave it as-is for now.
> 
>>>> +	uint32_t sig;
>>>> +	atomic_t datalen;
>>>> +	uint8_t data[];
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct blkz_dmesg_header: dmesg information
>>>
>>> This is the on-disk structure also?
>>>
>> Yes. The structure blkz_buffer is a generic header for all recorder
>> zone, and the
>> structure blkz_dmesg_header is a header for dmesg, saved in
>> blkz_buffer->data.
>> The dmesg recorder use it to save it's specific attributes.
> 
> Okay, can you add comments to distinguish the on-disk structures from
> the in-memory, etc?
> 

Sure. I will do it.

>>>> +#define DMESG_HEADER_MAGIC 0x4dfc3ae5
>>>
>>> How was this magic chosen?
>>
>> It's a random number. Maybe should I chose a meaningful magic?
> 
> That's fine; just add a comment to say so.
> 

OK.

>>>> + * @dirty:
>>>> + *	mark whether the data in @buffer are dirty (not flush to storage yet)
>>>> + */
>>>
>>> Thank you for the kerndoc! :) Is it linked to from any .rst files?
>>>
>>
>> I don't get your words. There is a document on the 6th patch. I don't know
>> whether it is what you want?
> 
> Patch 6 is excellent; I think you might want to add references back to
> these kern-doc structures using the ".. kernel-doc::
> fs/pstore/blkzone.c" syntax:
> https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#including-kernel-doc-comments
> 

Wow! I marvel at kernel-doc. Your link has helped me a lot.

I will optimize all my comment and document later.

>>>> +static int blkz_zone_write(struct blkz_zone *zone,
>>>> +		enum blkz_flush_mode flush_mode, const char *buf,
>>>> +		size_t len, unsigned long off)
>>>> +{
>>>> +	struct blkz_info *info = blkz_cxt.bzinfo;
>>>> +	ssize_t wcnt = 0;
>>>> +	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
>>>> +	size_t wlen;
>>>> +
>>>> +	if (off > zone->buffer_size)
>>>> +		return -EINVAL;
>>>> +	wlen = min_t(size_t, len, zone->buffer_size - off);
>>>> +	if (buf && wlen) {
>>>> +		memcpy(zone->buffer->data + off, buf, wlen);
>>>> +		atomic_set(&zone->buffer->datalen, wlen + off);
>>>> +	}
>>>
>>> If you're expecting concurrent writers (use of atomic_set(), I would
>>> expect the whole write to be locked instead. (i.e. what happens if
>>> multiple callers call blkz_zone_write()?)
>>>
>>
>> I don't agree with it. The datalen will be updated everywhere. It's useless
>> to lock here.
> 
> But there could be multiple writers; locking should be needed.
> 

All the recorders such as dmesg, pmsg, console and ftrace have been
locked on
pstore and upper layers. So, a recorder will not write in parallel and
different
recorders operate privately zone. They don't have any influence on each
other.

The only parallel case I think is that recorder writes while dirty-flush
thread is
working. And the dirty-flusher will flush the whole zone rather than
part of it, so,
it is OK to call in parallel.

Based on these reasons, I don't think locking should be needed.

>> One more things. During the analysis, I found another problem.
>> Removing old files will cause new logs to be lost. Take console recorder as
>> am example. After new rebooting, new logs are saved to buf while old
>> logs are
>> saved to old_buf. If we remove old file at that time, not only old_buf
>> is freed, but
>> also length of buf for new data is reset to zero. The ramoops may also
>> has this
>> problem.
> 
> Hmm. I'll need to double-check this. It's possible the call to
> persistent_ram_zap() in ramoops_pstore_erase() is not needed.
> 
>>>> +static int blkz_recover_dmesg_data(struct blkz_context *cxt)
>>>
>>> What does "recover" mean in this context? Is this "read from storage"?
>>
>> Yes. "recover" means reading data back from storage.
> 
> Okay. Please add some comments here. I would think of it more as "read"
> or "load". When I think of "recover" I think of "finding something that
> was lost". But the name isn't important as long as there is a comment
> somewhere about what it's doing.
> 

OK. I will add some comments on entry function blkz_recovery()。

> -Kees
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-03-20  1:50         ` WeiXiong Liao
@ 2020-03-20 18:20           ` Kees Cook
  2020-03-22 10:28             ` WeiXiong Liao
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-03-20 18:20 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Fri, Mar 20, 2020 at 09:50:36AM +0800, WeiXiong Liao wrote:
> On 2020/3/19 AM 1:23, Kees Cook wrote:
> > On Thu, Feb 27, 2020 at 04:21:51PM +0800, liaoweixiong wrote:
> >> On 2020/2/26 AM 8:52, Kees Cook wrote:
> >>> On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
> >>>> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
> >>>> +pstore_blk-y += blkzone.o
> >>>
> >>> Why this dance with files? I would just expect:
> >>>
> >>> obj-$(CONFIG_PSTORE_BLK)     += blkzone.o
> >>>
> >>
> >> This makes the built module named blkzone.ko rather than
> >> pstore_blk.ko.
> > 
> > You can just do a regular build rule:
> > 
> > obj-$(CONFIG_PSTORE_BLK) += blkzone.o
> > 
> 
> I don't get it. If make it as your words, the built module will be
> blkzone.ko.
> The module is named pstore/blk, however it built out blkzone.ko. I think
> it's confusing.

I mean just pick whatever filename you want it to be named. The Makefile
case for ramoops was that ramoops got renamed but we wanted to keep the
old API name.

So, if you want it named pstore-blk.ko, just rename blkzone.c to
pstore-blk.c.

> >>> If you're expecting concurrent writers (use of atomic_set(), I would
> >>> expect the whole write to be locked instead. (i.e. what happens if
> >>> multiple callers call blkz_zone_write()?)
> >>>
> >>
> >> I don't agree with it. The datalen will be updated everywhere. It's useless
> >> to lock here.
> > 
> > But there could be multiple writers; locking should be needed.
> > 
> 
> All the recorders such as dmesg, pmsg, console and ftrace have been
> locked on
> pstore and upper layers. So, a recorder will not write in parallel and
> different
> recorders operate privately zone. They don't have any influence on each
> other.

Yes, sorry, I was confusing myself about pmsg, and I forgot it had a
global lock. Each are locked or split by CPU.

> The only parallel case I think is that recorder writes while dirty-flush
> thread is
> working. And the dirty-flusher will flush the whole zone rather than
> part of it, so,
> it is OK to call in parallel.

Okay, thanks for clarifying.

> Based on these reasons, I don't think locking should be needed.

Agreed.

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk
  2020-03-18 18:06   ` Kees Cook
@ 2020-03-22 10:00     ` WeiXiong Liao
  2020-03-22 15:44       ` Kees Cook
  0 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 10:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM2:06, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:46PM +0800, WeiXiong Liao wrote:
>> blkoops is a better wrapper for pstore/blk, which provides efficient
>> configuration mothod. It divides all configurations of pstore/blk into
> 
> typo: method
> 

I will fix it.

>> 2 parts, configurations for user and configurations for driver.
>>
>> Configurations for user detemine how pstore/blk work, such as
>> dump_oops and dmesg_size. They can be set by Kconfig and module
>> parameters.
> 
> I'd like to keep blkoops as close to ramoops as possible on the user
> configuration side. Notes below...
> 

Is your question why not use device-tree on the user configuration
side? Here are my answer about it.

There is an important difference between blkoops and ramoops.
The ramoops can be initialized at any time since ram already be
ready. However, blkoops must waits for block_dev registering.

If blkoops use device-tree like ramoops do, it sometimes fails to
open block device because of block device not ready even though
it is initialized later by using late_initcall(). Take MMC as am example.
Block devices of MMC will not register until a few seconds after
MMC driver initialization. During waiting MMC block device, blkoops
has already been called and fails to initialize. Instead of using
device-tree and waiting for block device for several seconds, I
prefer to not initialize until block driver call it.

How about just getting the user configurations from device-tree but not
do initialize until block driver calls it? It seems illogical.

>> Configurations for driver are all about block/non-block device, such as
>> total_size of device and read/write operations. They should be provided
>> by device drivers, calling blkoops_register_device() for non-block
>> device and blkoops_register_blkdev() for block device.
> 
> By non-block do you mean nvme etc? What is the right term for spinning
> disk and nvme collectively? (I always considered them all to be "block"
> devices.)
> 

No, non-block here means devices such as MTD device which is not a block
device and do not use generic block layer.

Notes of non-block here seems too early. I will make a separate patch to
support non-block.

>> If device driver support for panic records, @panic_write must be valid.
>> If panic occurs and pstore/blk does not recover yet, the first zone
>> of dmesg will be used.
> 
> I'd like to maintain pstore terminology here: there is the "front end"
> (dmesg, console, pmsg, etc) and there is the "back end" (ramoops,
> blkoops, efi, etc). Since the block layer is a behind blkoops, I'd like
> to come up with a term for this since "device driver" is, I think, too
> general. You call it later "block device driver", so let's use that
> everywhere you say "device driver".
> 
> Then we have the layers: pstore front end, pstore core, pstore back end,
> and block device driver.
> 

The device driver here means block device driver and non-block device
driver.
It is loose to just name it as "block device driver".

Somethings about layer as below...

>> Besides, Block device driver has no need to verify which partition is
>> used and provides generic read/write operations. Because blkoops has
>> done it. It also means that if users do not care panic records but
>> records for oops/console/pmsg/ftrace, block device driver should do
>> nothing.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  MAINTAINERS             |   2 +-
>>  fs/pstore/Kconfig       |  61 ++++++++
>>  fs/pstore/Makefile      |   2 +
>>  fs/pstore/blkoops.c     | 402 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/blkoops.h |  58 +++++++
>>  5 files changed, 524 insertions(+), 1 deletion(-)
>>  create mode 100644 fs/pstore/blkoops.c
>>  create mode 100644 include/linux/blkoops.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index cc0a4a8ae06a..e4ba97130560 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -13381,7 +13381,7 @@ F:	drivers/firmware/efi/efi-pstore.c
>>  F:	drivers/acpi/apei/erst.c
>>  F:	Documentation/admin-guide/ramoops.rst
>>  F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
>> -K:	\b(pstore|ramoops)
>> +K:	\b(pstore|ramoops|blkoops)
>>  
>>  PTP HARDWARE CLOCK SUPPORT
>>  M:	Richard Cochran <richardcochran@gmail.com>
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index 536fde9e13e8..7a57a8edb612 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -163,3 +163,64 @@ config PSTORE_BLK
>>  	  where it can be read back at some later point.
>>  
>>  	  If unsure, say N.
>> +
>> +config PSTORE_BLKOOPS
>> +	tristate "pstore block with oops logger"
>> +	depends on PSTORE_BLK
>> +	help
>> +	  This is a wrapper for pstore/blk.
> 
> Is there a reason to keep this separate from PSTORE_BLK? (i.e. why a
> separate Kconfig?)
> 

Well, I think it's time to explain my design ideas.

Before blkoops, I read through the code of ramoops.
How similar blkoops is to ramoops and their management of storage
space is completely repetitive. The only difference between of them,
I think, is just different storage media.

So, why not extract a common layer from ramoops and blkoops to allocate
and manager storage sapce? That is what psotre/blk (blkzone.c) do. The
ramoops and the blkoops do not care about the use of storage.

I don't know whether the common layer is good enough to ramoops and
whether is good time to rename the common layer from pstore/blk to
psotre/zone?

How about Makefile and Kconfig as follow?

	<Kconfig>
	config PSOTRE_ZONE
		# NOTE.
		# the configuration is hidden from users and selected by
		# pstore/blk.
		help
		  The common layer for pstore/blk (and pstore/ram in the future)
		  to manager storage as zones.
	config PSTORE_BLK
		tristate "Log panic/oops to a block device"
		select PSOTRE_ZONE
		help
		  ......
	config PSTORE_BLK_DMESG_SIZE
		......

	<Makefile>
	#  Note: rename blkzone.c to pstore_zone.c
	obj-$(CONFIG_PSTORE_ZONE) += pstore_zone.c

	# Note: rename blkoops.c to pstore_blk.c
	obj-$(CONFIG_PSTORE_BLK) += pstore_blk.c

>> +
>> +	  NOTE that, both kconfig and module parameters can configure blkoops,
>> +	  but module parameters have priority over kconfig.
>> +
>> +	  If unsure, say N.
>> +
>> +config PSTORE_BLKOOPS_DMESG_SIZE
>> +	int "dmesg size in kbytes for blkoops"
> 
> How about "Size in Kbytes of dmesg to store"? (It will already show up
> under the parent config, so no need to repeat "blkoops" here.
> 

That's good idea.

>> +	depends on PSTORE_BLKOOPS
>> +	default 64
>> +	help
>> +	  This just sets size of dmesg (dmesg_size) for pstore/blk. The size is
>> +	  in KB and must be a multiple of 4.
>> +
>> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> 
> nit: "Kconfig" instead of "kconfig"
> 

Fixed.

>> +	  but module parameters have priority over kconfig.
>>
>> +config PSTORE_BLKOOPS_BLKDEV
>> +	string "block device for blkoops"
> 
> Maybe clarify with as "block device identifier for blkoops" ? Also, I'd
> put this before the DMESG_SIZE.
> 

OK.

>> +	depends on PSTORE_BLKOOPS
>> +	default ""
>> +	help
>> +	  Which block device should be used for pstore/blk.
>> +
>> +	  It accept the following variants:
>> +	  1) <hex_major><hex_minor> device number in hexadecimal represents
>> +	     itself no leading 0x, for example b302.
>> +	  2) /dev/<disk_name> represents the device number of disk
>> +	  3) /dev/<disk_name><decimal> represents the device number
>> +	     of partition - device number of disk plus the partition number
>> +	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
>> +	     used when disk name of partitioned disk ends with a digit.
>> +	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
>> +	     unique id of a partition if the partition table provides it.
>> +	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
>> +	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
>> +	     filled hex representation of the 32-bit "NT disk signature", and PP
>> +	     is a zero-filled hex representation of the 1-based partition number.
>> +	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
>> +	     to a partition with a known unique id.
>> +	  7) <major>:<minor> major and minor number of the device separated by
>> +	     a colon.
>> +
>> +	  NOTE that, both kconfig and module parameters can configure blkoops,
>> +	  but module parameters have priority over kconfig.
>> +
>> +config PSTORE_BLKOOPS_DUMP_OOPS
>> +	bool "dump oops"
> 
> Why is this a Kconfig at all? Isn't the whole point to always catch
> oopses? :) Let's leave this default to 1 (as ramoops does).
> 

You can see as bellow, it's default 'y'.

>> +	depends on PSTORE_BLKOOPS
>> +	default y
>> +	help
>> +	  Whether blkoops dumps oops or not.
>> +
>> +	  NOTE that, both kconfig and module parameters can configure blkoops,
>> +	  but module parameters have priority over kconfig.
>> diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
>> index 0ee2fc8d1bfb..24b3d488d2f0 100644
>> --- a/fs/pstore/Makefile
>> +++ b/fs/pstore/Makefile
>> @@ -15,3 +15,5 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
>>  
>>  obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
>>  pstore_blk-y += blkzone.o
>> +
>> +obj-$(CONFIG_PSTORE_BLKOOPS) += blkoops.o
>> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
>> new file mode 100644
>> index 000000000000..8027c3af8c8d
>> --- /dev/null
>> +++ b/fs/pstore/blkoops.c
>> @@ -0,0 +1,402 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#define pr_fmt(fmt) "blkoops : " fmt
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/string.h>
>> +#include <linux/of.h>
>> +#include <linux/of_address.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/blkoops.h>
>> +#include <linux/mount.h>
>> +#include <linux/uio.h>
>> +
>> +static long dmesg_size = -1;
>> +module_param(dmesg_size, long, 0400);
>> +MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
> 
> Can this be named "record_size" to match ramoops?
> 

To be honest, I named it dmesg_size on purpose since I think
record_size is ambiguous. It describes the size of dmesg recorder
rather than size of all recorder.

>> +static int dump_oops = -1;
> 
> I'd default this to 1 as mentioned in the Kconfig.
> 

dump_oops defaults to -1 means using configuration on Kconfig,
while the default value on Kconfig is 1, which means default to
catch oopses.

Module parameters have priority over Kconfig. If we defualt module
parameter dump_oops to 1 at here, configuration on Kconfig will run
out  of work.

>> +module_param(dump_oops, int, 0400);
>> +MODULE_PARM_DESC(total_size, "whether dump oops");
>> +
>> +/**
>> + * The block device to use. Most of the time, it is a partition of block
>> + * device. It's fine to ignore it if you are not block device and register
>> + * to blkoops by blkoops_register_device(). In this case, @blkdev is
>> + * useless and @read, @write and @total_size must be supplied.
>> + *
>> + * @blkdev accepts the following variants:
>> + * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
>> + *    no leading 0x, for example b302.
>> + * 2) /dev/<disk_name> represents the device number of disk
>> + * 3) /dev/<disk_name><decimal> represents the device number
>> + *    of partition - device number of disk plus the partition number
>> + * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
>> + *    used when disk name of partitioned disk ends on a digit.
>> + * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
>> + *    unique id of a partition if the partition table provides it.
>> + *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
>> + *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
>> + *    filled hex representation of the 32-bit "NT disk signature", and PP
>> + *    is a zero-filled hex representation of the 1-based partition number.
>> + * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
>> + *    a partition with a known unique id.
>> + * 7) <major>:<minor> major and minor number of the device separated by
>> + *    a colon.
>> + */
>> +static char blkdev[80];
> 
> static char blkdev[80] = CONFIG_PSTORE_BLKOOPS_BLKDEV;
> 

That's good idea.

>> +module_param_string(blkdev, blkdev, 80, 0400);
>> +MODULE_PARM_DESC(blkdev, "the block device for general read/write");
>> +
>> +static DEFINE_MUTEX(blkz_lock);
>> +static struct block_device *blkoops_bdev;
>> +static struct blkz_info *bzinfo;
>> +static blkoops_blk_panic_write_op blkdev_panic_write;
>> +
>> +#ifdef CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
> 
> This (and all the others below) will always be defined, so no need to
> test it -- just use it as needed below.
> 

It's fine to dmesg_size and dump_oops but not pmsg_size, ftrace_size
and console_size, because they will be not available sometimes.

I disagree to use "default 64 if PSTORE_PMSG" instead of
"depends on PSTORE_PMSG", and explain in patch 3.

>> +#define DEFAULT_DMESG_SIZE CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
>> +#else
>> +#define DEFAULT_DMESG_SIZE 0
>> +#endif
>> +
>> +#ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>> +#define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>> +#else
>> +#define DEFAULT_DUMP_OOPS 1
>> +#endif
>> +
>> +#ifdef CONFIG_PSTORE_BLKOOPS_BLKDEV
>> +#define DEFAULT_BLKDEV CONFIG_PSTORE_BLKOOPS_BLKDEV
>> +#else
>> +#define DEFAULT_BLKDEV ""
>> +#endif
>> +
>> +/**
>> + * register device to blkoops
>> + *
>> + * Drivers, not only block drivers but also non-block drivers can call this
>> + * function to register to blkoops. It will pack for blkzone and pstore.
>> + */
>> +int blkoops_register_device(struct blkoops_device *bo_dev)
>> +{
>> +	int ret;
>> +
>> +	if (!bo_dev || !bo_dev->total_size || !bo_dev->read || !bo_dev->write)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&blkz_lock);
>> +
>> +	/* someone already registered before */
>> +	if (bzinfo) {
>> +		mutex_unlock(&blkz_lock);
>> +		return -EBUSY;
>> +	}
>> +	bzinfo = kzalloc(sizeof(struct blkz_info), GFP_KERNEL);
>> +	if (!bzinfo) {
>> +		mutex_unlock(&blkz_lock);
>> +		return -ENOMEM;
>> +	}
>> +
>> +#define verify_size(name, defsize, alignsize) {				\
>> +		long _##name_ = (name);					\
>> +		if (_##name_ < 0)					\
>> +			_##name_ = (defsize);				\
>> +		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
>> +		if (_##name_ & ((alignsize) - 1)) {			\
>> +			pr_info(#name " must align to %d\n",		\
>> +					(alignsize));			\
>> +			_##name_ = ALIGN(name, (alignsize));		\
>> +		}							\
>> +		name = _##name_ / 1024;					\
>> +		bzinfo->name = _##name_;				\
>> +	}
>> +
>> +	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
>> +#undef verify_size
> 
> As mentioned, can this be named "record_size"?
> 

See above. Thanks.

>> +	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>> +
>> +	bzinfo->total_size = bo_dev->total_size;
>> +	bzinfo->dump_oops = dump_oops;
>> +	bzinfo->read = bo_dev->read;
>> +	bzinfo->write = bo_dev->write;
> 
> Why copy these separate functions? Shouldn't bzinfo just keep a pointer
> to bo_dev?
> 

bo_dev is a structure defined in blkoops and not available to bzinfo.

At the very beginning of my design, the pstore/blk is a common layer
for  blkoops and ramoops. So, it's not suitable for bzinfo to keep a
pointer to structure of blkoops.

>> +	bzinfo->panic_write = bo_dev->panic_write;
>> +	bzinfo->name = "blkoops";
>> +	bzinfo->owner = THIS_MODULE;
>> +
>> +	ret = blkz_register(bzinfo);
>> +	if (ret) {
>> +		kfree(bzinfo);
>> +		bzinfo = NULL;
>> +	}
>> +	mutex_unlock(&blkz_lock);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(blkoops_register_device);
>> +
>> +void blkoops_unregister_device(struct blkoops_device *bo_dev)
>> +{
>> +	mutex_lock(&blkz_lock);
>> +	if (bzinfo && bzinfo->read == bo_dev->read) {
> 
> Why this read equality test?
> 

To identify the driver avoiding illegal unregister.

>> +		blkz_unregister(bzinfo);
>> +		kfree(bzinfo);
>> +		bzinfo = NULL;
>> +	}
>> +	mutex_unlock(&blkz_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(blkoops_unregister_device);
>> +
>> +/**
>> + * get block_device of @blkdev
>> + * @holder: exclusive holder identifier
>> + *
>> + * On success, @blkoops_bdev will save the block_device and the returned
>> + * block_device has reference count of one.
>> + */
>> +static struct block_device *blkoops_get_bdev(void *holder)
>> +{
>> +	struct block_device *bdev = ERR_PTR(-ENODEV);
>> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
>> +
>> +	if (!blkdev[0] && strlen(DEFAULT_BLKDEV))
>> +		snprintf(blkdev, 80, "%s", DEFAULT_BLKDEV);
>> +	if (!blkdev[0])
>> +		return ERR_PTR(-ENODEV);
> 
> I'd drop these tests -- and the snprintf isn't needed with the change
> above on initialization.
> 

OK.

>> +
>> +	mutex_lock(&blkz_lock);
>> +	if (bzinfo)
>> +		goto out;
>> +	if (holder)
>> +		mode |= FMODE_EXCL;
>> +	bdev = blkdev_get_by_path(blkdev, mode, holder);
>> +	if (IS_ERR(bdev)) {
>> +		dev_t devt;
>> +
>> +		devt = name_to_dev_t(blkdev);
>> +		if (devt == 0) {
>> +			bdev = ERR_PTR(-ENODEV);
>> +			goto out;
>> +		}
>> +		bdev = blkdev_get_by_dev(devt, mode, holder);
>> +	}
>> +out:
>> +	mutex_unlock(&blkz_lock);
>> +	return bdev;
>> +}
>> +
>> +static void blkoops_put_bdev(struct block_device *bdev, void *holder)
>> +{
>> +	fmode_t mode = FMODE_READ | FMODE_WRITE;
>> +
>> +	if (!bdev)
>> +		return;
>> +
>> +	mutex_lock(&blkz_lock);
>> +	if (holder)
>> +		mode |= FMODE_EXCL;
>> +	blkdev_put(bdev, mode);
>> +	mutex_unlock(&blkz_lock);
>> +}
>> +
>> +static ssize_t blkoops_generic_blk_read(char *buf, size_t bytes, loff_t pos)
>> +{
>> +	ssize_t ret;
>> +	struct block_device *bdev = blkoops_bdev;
>> +	struct file filp;
>> +	mm_segment_t ofs;
>> +	struct kiocb kiocb;
>> +	struct iov_iter iter;
>> +	struct iovec iov = {
>> +		.iov_base = (void __user *)buf,
>> +		.iov_len = bytes
>> +	};
>> +
>> +	if (!bdev)
>> +		return -ENODEV;
>> +
>> +	memset(&filp, 0, sizeof(struct file));
>> +	filp.f_mapping = bdev->bd_inode->i_mapping;
>> +	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
>> +	filp.f_inode = bdev->bd_inode;
>> +
>> +	init_sync_kiocb(&kiocb, &filp);
>> +	kiocb.ki_pos = pos;
>> +	iov_iter_init(&iter, READ, &iov, 1, bytes);
>> +
>> +	ofs = get_fs();
>> +	set_fs(KERNEL_DS);
>> +	ret = generic_file_read_iter(&kiocb, &iter);
>> +	set_fs(ofs);
> 
> Please don't use "set_fs". I think you want ITER_KVEC and to use
> vfs_iter_read()? A lot of work went into removing set_fs() uses; we
> should not add more. :)
> https://lwn.net/Articles/722267/
> 

You are right. I will change to use ITER_KVEC to remove set_fs().

I will keep generic_file_read_iter() rather than vfs_iter_read().
Blkoops fails to read/write with vfs_iter_read/write() since it
does not have a valid 'struct file' to block device. It's so ugly
to vfs_open() block_device to get 'struct file'. Why not just use
interfaces on generic block layer because we have got
struct block_dev already.

Besides, SELinux reports error if we open, read, and write to
a block directly.

>> +	return ret;
>> +}
>> +
>> +static ssize_t blkoops_generic_blk_write(const char *buf, size_t bytes,
>> +		loff_t pos)
>> +{
>> +	struct block_device *bdev = blkoops_bdev;
>> +	struct iov_iter iter;
>> +	struct kiocb kiocb;
>> +	struct file filp;
>> +	mm_segment_t ofs;
>> +	ssize_t ret;
>> +	struct iovec iov = {
>> +		.iov_base = (void __user *)buf,
>> +		.iov_len = bytes
>> +	};
>> +
>> +	if (!bdev)
>> +		return -ENODEV;
>> +
>> +	/* Console/Ftrace recorder may handle buffer until flush dirty zones */
>> +	if (in_interrupt() || irqs_disabled())
>> +		return -EBUSY;
>> +
>> +	memset(&filp, 0, sizeof(struct file));
>> +	filp.f_mapping = bdev->bd_inode->i_mapping;
>> +	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
>> +	filp.f_inode = bdev->bd_inode;
>> +
>> +	init_sync_kiocb(&kiocb, &filp);
>> +	kiocb.ki_pos = pos;
>> +	iov_iter_init(&iter, WRITE, &iov, 1, bytes);
>> +
>> +	ofs = get_fs();
>> +	set_fs(KERNEL_DS);
> 
> Same.
> 

Done.

>> +
>> +	inode_lock(bdev->bd_inode);
>> +	ret = generic_write_checks(&kiocb, &iter);
>> +	if (ret > 0)
>> +		ret = generic_perform_write(&filp, &iter, pos);
>> +	inode_unlock(bdev->bd_inode);
>> +
>> +	if (likely(ret > 0)) {
>> +		const struct file_operations f_op = {.fsync = blkdev_fsync};
>> +
>> +		filp.f_op = &f_op;
>> +		kiocb.ki_pos += ret;
>> +		ret = generic_write_sync(&kiocb, ret);
>> +	}
>> +	set_fs(ofs);
>> +	return ret;
>> +}
>> +
>> +static inline unsigned long blkoops_bdev_size(struct block_device *bdev)
>> +{
>> +	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
>> +}
>> +
>> +static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
>> +		loff_t off)
>> +{
>> +	int ret;
>> +
>> +	if (!blkdev_panic_write)
>> +		return -EOPNOTSUPP;
>> +
>> +	/* size and off must align to SECTOR_SIZE for block device */
>> +	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
>> +			size >> SECTOR_SHIFT);
>> +	return ret ? -EIO : size;
>> +}
>> +
>> +/**
>> + * register block device to blkoops
>> + * @major: the major device number of registering device
>> + * @panic_write: the write interface for panic case.
>> + *
>> + * It is ONLY used for block device to register to blkoops. In this case,
>> + * the module parameter @blkdev must be valid. Generic read/write interfaces
>> + * will be used.
>> + *
>> + * Block driver has no need to verify which partition is used. Block driver
>> + * should only tell me what major number is, so blkoops can get the matching
>> + * driver for @blkdev.
>> + *
>> + * If block driver support for panic records, @panic_write must be valid. If
>> + * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
>> + * will be used.
>> + */
>> +int blkoops_register_blkdev(unsigned int major,
>> +		blkoops_blk_panic_write_op panic_write)
>> +{
>> +	struct block_device *bdev;
>> +	struct blkoops_device bo_dev = {0};
>> +	int ret = -ENODEV;
>> +	void *holder = blkdev;
>> +
>> +	bdev = blkoops_get_bdev(holder);
>> +	if (IS_ERR(bdev))
>> +		return PTR_ERR(bdev);
> 
> This seems like a good place to report getting or failing to get the
> named block device.
> 
> 	bdev = blkoops_get_bdev(holder);
> 	if (IS_ERR(bdev)) {
> 		pr_err("failed to open '%s'!\n", blkdev);
> 		return PTR_ERR(bdev);
> 	}
> 

OK.

>> +
>> +	blkoops_bdev = bdev;
>> +	blkdev_panic_write = panic_write;
>> +
>> +	/* only allow driver matching the @blkdev */
>> +	if (!bdev->bd_dev || MAJOR(bdev->bd_dev) != major)
> 
> And add similar error reports here.
> 

I'd  use pr_debug rather than pr_err. Because we allow mulitiple
devices to attempt to register to blkoops. It's not an error.

pr_debug("invalid major %u (expect %u)\n", major, MAJOR(bdev->bd_dev));

>> +		goto err_put_bdev;
>> +
>> +	bo_dev.total_size = blkoops_bdev_size(bdev);
>> +	if (bo_dev.total_size == 0)
>> +		goto err_put_bdev;
> 
> And here. We want to make failures as discoverable as possible.
> 

You are right.

>> +	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
>> +	bo_dev.read = blkoops_generic_blk_read;
>> +	bo_dev.write = blkoops_generic_blk_write;
>> +
>> +	ret = blkoops_register_device(&bo_dev);
>> +	if (ret)
>> +		goto err_put_bdev;
> 
> 	pr_info("using '%s'\n", blkdev);
> 

OK.

>> +	return 0;
>> +
>> +err_put_bdev:
>> +	blkdev_panic_write = NULL;
>> +	blkoops_bdev = NULL;
>> +	blkoops_put_bdev(bdev, holder);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(blkoops_register_blkdev);
>> +
>> +void blkoops_unregister_blkdev(unsigned int major)
>> +{
>> +	struct blkoops_device bo_dev = {.read = blkoops_generic_blk_read};
>> +	void *holder = blkdev;
>> +
>> +	if (blkoops_bdev && MAJOR(blkoops_bdev->bd_dev) == major) {
>> +		blkoops_unregister_device(&bo_dev);
>> +		blkoops_put_bdev(blkoops_bdev, holder);
>> +		blkdev_panic_write = NULL;
>> +		blkoops_bdev = NULL;
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(blkoops_unregister_blkdev);
>> +
>> +/**
>> + * get information of @blkdev
>> + * @devt: the block device num of @blkdev
>> + * @nr_sectors: the sector count of @blkdev
>> + * @start_sect: the start sector of @blkdev
>> + *
>> + * Block driver needs the follow information for @panic_write.
>> + */
>> +int blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
>> +{
>> +	struct block_device *bdev;
>> +
>> +	bdev = blkoops_get_bdev(NULL);
>> +	if (IS_ERR(bdev))
>> +		return PTR_ERR(bdev);
>> +
>> +	if (devt)
>> +		*devt = bdev->bd_dev;
>> +	if (nr_sects)
>> +		*nr_sects = part_nr_sects_read(bdev->bd_part);
>> +	if (start_sect)
>> +		*start_sect = get_start_sect(bdev);
>> +
>> +	blkoops_put_bdev(bdev, NULL);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(blkoops_blkdev_info);
> 
> I don't see this function getting used anywhere. Can it be removed? I
> see the notes in the Documentation. Could these values just be cached at
> open time instead of reopening the device?
> 

This function is reserved for block driver to get information about the
using
block device. So it can't be removed.

Sure, a new structrue is created to cached these values.

>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>> +MODULE_DESCRIPTION("Wrapper for Pstore BLK with Oops logger");
>> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
>> new file mode 100644
>> index 000000000000..fe63739309aa
>> --- /dev/null
>> +++ b/include/linux/blkoops.h
>> @@ -0,0 +1,58 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef __BLKOOPS_H_
>> +#define __BLKOOPS_H_
>> +
>> +#include <linux/types.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/pstore_blk.h>
>> +
>> +/**
>> + * struct blkoops_device - backend blkoops driver structure.
>> + *
>> + * This structure is ONLY used for non-block device by
>> + * blkoops_register_device(). If block device, you are strongly recommended
>> + * to use blkoops_register_blkdev().
>> + *
>> + * @total_size:
>> + *	The total size in bytes pstore/blk can use. It must be greater than
>> + *	4096 and be multiple of 4096.
>> + * @read, @write:
>> + *	The general (not panic) read/write operation.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, the number of bytes read should be returned.
>> + *	On error, negative number should be returned.
>> + * @panic_write:
>> + *	The write operation only used for panic.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, the number of bytes read should be returned.
>> + *	On error, negative number should be returned.
>> + */
>> +struct blkoops_device {
>> +	unsigned long total_size;
>> +	blkz_read_op read;
>> +	blkz_write_op write;
>> +	blkz_write_op panic_write;
>> +};
>> +
>> +/*
>> + * Panic write for block device who should write alignmemt to SECTOR_SIZE.
>> + * On success, zero should be returned. Others mean error.
>> + */
>> +typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
>> +		sector_t sects);
>> +
>> +int  blkoops_register_device(struct blkoops_device *bo_dev);
>> +void blkoops_unregister_device(struct blkoops_device *bo_dev);
>> +int  blkoops_register_blkdev(unsigned int major,
>> +		blkoops_blk_panic_write_op panic_write);
>> +void blkoops_unregister_blkdev(unsigned int major);
>> +int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
>> +
>> +#endif
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices
  2020-03-20 18:20           ` Kees Cook
@ 2020-03-22 10:28             ` WeiXiong Liao
  0 siblings, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 10:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/21 上午2:20, Kees Cook wrote:
> On Fri, Mar 20, 2020 at 09:50:36AM +0800, WeiXiong Liao wrote:
>> On 2020/3/19 AM 1:23, Kees Cook wrote:
>>> On Thu, Feb 27, 2020 at 04:21:51PM +0800, liaoweixiong wrote:
>>>> On 2020/2/26 AM 8:52, Kees Cook wrote:
>>>>> On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
>>>>>> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
>>>>>> +pstore_blk-y += blkzone.o
>>>>>
>>>>> Why this dance with files? I would just expect:
>>>>>
>>>>> obj-$(CONFIG_PSTORE_BLK)     += blkzone.o
>>>>>
>>>>
>>>> This makes the built module named blkzone.ko rather than
>>>> pstore_blk.ko.
>>>
>>> You can just do a regular build rule:
>>>
>>> obj-$(CONFIG_PSTORE_BLK) += blkzone.o
>>>
>>
>> I don't get it. If make it as your words, the built module will be
>> blkzone.ko.
>> The module is named pstore/blk, however it built out blkzone.ko. I think
>> it's confusing.
> 
> I mean just pick whatever filename you want it to be named. The Makefile
> case for ramoops was that ramoops got renamed but we wanted to keep the
> old API name.
> 
> So, if you want it named pstore-blk.ko, just rename blkzone.c to
> pstore-blk.c.
> 

How about rename blkzone.c to psotre_zone.c and blkoops.c to pstore_blk.c?

Please refer to my reply email for patch 2.

>>>>> If you're expecting concurrent writers (use of atomic_set(), I would
>>>>> expect the whole write to be locked instead. (i.e. what happens if
>>>>> multiple callers call blkz_zone_write()?)
>>>>>
>>>>
>>>> I don't agree with it. The datalen will be updated everywhere. It's useless
>>>> to lock here.
>>>
>>> But there could be multiple writers; locking should be needed.
>>>
>>
>> All the recorders such as dmesg, pmsg, console and ftrace have been
>> locked on
>> pstore and upper layers. So, a recorder will not write in parallel and
>> different
>> recorders operate privately zone. They don't have any influence on each
>> other.
> 
> Yes, sorry, I was confusing myself about pmsg, and I forgot it had a
> global lock. Each are locked or split by CPU.
> 
>> The only parallel case I think is that recorder writes while dirty-flush
>> thread is
>> working. And the dirty-flusher will flush the whole zone rather than
>> part of it, so,
>> it is OK to call in parallel.
> 
> Okay, thanks for clarifying.
> 
>> Based on these reasons, I don't think locking should be needed.
> 
> Agreed.
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder
  2020-03-18 18:13   ` Kees Cook
@ 2020-03-22 11:14     ` WeiXiong Liao
  2020-03-22 15:59       ` Kees Cook
  0 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 11:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:13, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:47PM +0800, WeiXiong Liao wrote:
>> pmsg support recorder for userspace. To enable pmsg, just make pmsg_size
>> be greater than 0 and a multiple of 4096.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  fs/pstore/Kconfig          |  12 +++
>>  fs/pstore/blkoops.c        |  11 +++
>>  fs/pstore/blkzone.c        | 229 +++++++++++++++++++++++++++++++++++++++++++--
>>  include/linux/pstore_blk.h |   4 +
>>  4 files changed, 246 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index 7a57a8edb612..bbf1fdb5eaa7 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -186,6 +186,18 @@ config PSTORE_BLKOOPS_DMESG_SIZE
>>  	  NOTE that, both kconfig and module parameters can configure blkoops,
>>  	  but module parameters have priority over kconfig.
>>  
>> +config PSTORE_BLKOOPS_PMSG_SIZE
>> +	int "pmsg size in kbytes for blkoops"
>> +	depends on PSTORE_BLKOOPS
>> +	depends on PSTORE_PMSG
>> +	default 64
> 
> Instead of "depends on PSTORE_PMSG", you can do:
> 
> 	default 64 if PSTORE_PMSG
> 	default 0
> 

What happens if PSTORE_BLKOOPS_PMSG_SIZE is non-zero while
PSTORE_PMSG is disabled? The pmsg recorder do not work but pstore/blk
will always allocate zone for pmsg recorder since pmsg_size is non-zero.
It waste storage space.

I think "depends on PSTORE_PMSG" is batter than "default 64 if PSTORE_PMSG",
because PSTORE_BLKOOPS_PMSG_SIZE really depends on PSTORE_PMSG.

>> +	help
>> +	  This just sets size of pmsg (pmsg_size) for pstore/blk. The size is
>> +	  in KB and must be a multiple of 4.
>> +
>> +	  NOTE that, both kconfig and module parameters can configure blkoops,
>> +	  but module parameters have priority over kconfig.
>> +
>>  config PSTORE_BLKOOPS_BLKDEV
>>  	string "block device for blkoops"
>>  	depends on PSTORE_BLKOOPS
>> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
>> index 8027c3af8c8d..02e6e4c1f965 100644
>> --- a/fs/pstore/blkoops.c
>> +++ b/fs/pstore/blkoops.c
>> @@ -16,6 +16,10 @@
>>  module_param(dmesg_size, long, 0400);
>>  MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
>>  
>> +static long pmsg_size = -1;
> 
> Now PSTORE_BLKOOPS_PMSG_SIZE will always be available and you can set it
> here.
> 

Note above.

>> +module_param(pmsg_size, long, 0400);
>> +MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
>> +
>>  static int dump_oops = -1;
>>  module_param(dump_oops, int, 0400);
>>  MODULE_PARM_DESC(total_size, "whether dump oops");
>> @@ -60,6 +64,12 @@
>>  #define DEFAULT_DMESG_SIZE 0
>>  #endif
>>  
>> +#ifdef CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
>> +#define DEFAULT_PMSG_SIZE CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
>> +#else
>> +#define DEFAULT_PMSG_SIZE 0
>> +#endif
> 
> And drop this.
> 

Note above.

>> +
>>  #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>>  #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>>  #else
>> @@ -113,6 +123,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>>  	}
>>  
>>  	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
>> +	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
>>  #undef verify_size
>>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>>  
>> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
>> index f77f612b50ba..a3464252d52e 100644
>> --- a/fs/pstore/blkzone.c
>> +++ b/fs/pstore/blkzone.c
>> @@ -24,12 +24,14 @@
>>   *
>>   * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
>>   * @datalen: length of data in @data
>> + * @start: offset into @data where the beginning of the stored bytes begin
>>   * @data: zone data.
>>   */
>>  struct blkz_buffer {
>>  #define BLK_SIG (0x43474244) /* DBGC */
>>  	uint32_t sig;
>>  	atomic_t datalen;
>> +	atomic_t start;
>>  	uint8_t data[];
>>  };
>>  
>> @@ -85,8 +87,10 @@ struct blkz_zone {
>>  
>>  struct blkz_context {
>>  	struct blkz_zone **dbzs;	/* dmesg block zones */
>> +	struct blkz_zone *pbz;		/* Pmsg block zone */
>>  	unsigned int dmesg_max_cnt;
>>  	unsigned int dmesg_read_cnt;
>> +	unsigned int pmsg_read_cnt;
>>  	unsigned int dmesg_write_cnt;
>>  	/*
>>  	 * the counter should be recovered when recover.
>> @@ -119,6 +123,11 @@ static inline int buffer_datalen(struct blkz_zone *zone)
>>  	return atomic_read(&zone->buffer->datalen);
>>  }
>>  
>> +static inline int buffer_start(struct blkz_zone *zone)
>> +{
>> +	return atomic_read(&zone->buffer->start);
>> +}
>> +
>>  static inline bool is_on_panic(void)
>>  {
>>  	struct blkz_context *cxt = &blkz_cxt;
>> @@ -410,6 +419,69 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
>>  	return ret;
>>  }
>>  
>> +static int blkz_recover_pmsg(struct blkz_context *cxt)
>> +{
>> +	struct blkz_info *info = cxt->bzinfo;
>> +	struct blkz_buffer *oldbuf;
>> +	struct blkz_zone *zone = NULL;
>> +	int ret = 0;
>> +	ssize_t rcnt, len;
>> +
>> +	zone = cxt->pbz;
>> +	if (!zone || zone->oldbuf)
>> +		return 0;
>> +
>> +	if (is_on_panic())
>> +		goto out;
>> +
>> +	if (unlikely(!info->read))
>> +		return -EINVAL;
>> +
>> +	len = zone->buffer_size + sizeof(*oldbuf);
>> +	oldbuf = kzalloc(len, GFP_KERNEL);
>> +	if (!oldbuf)
>> +		return -ENOMEM;
>> +
>> +	rcnt = info->read((char *)oldbuf, len, zone->off);
>> +	if (rcnt != len) {
>> +		pr_debug("recover pmsg failed\n");
>> +		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
>> +		goto free_oldbuf;
>> +	}
>> +
>> +	if (oldbuf->sig != zone->buffer->sig) {
>> +		pr_debug("no valid data in zone %s\n", zone->name);
>> +		goto free_oldbuf;
>> +	}
>> +
>> +	if (zone->buffer_size < atomic_read(&oldbuf->datalen) ||
>> +		zone->buffer_size < atomic_read(&oldbuf->start)) {
>> +		pr_info("found overtop zone: %s: off %lu, size %zu\n",
>> +				zone->name, zone->off, zone->buffer_size);
>> +		goto free_oldbuf;
>> +	}
>> +
>> +	if (!atomic_read(&oldbuf->datalen)) {
>> +		pr_debug("found erased zone: %s: id 0, off %lu, size %zu, datalen %d\n",
>> +				zone->name, zone->off, zone->buffer_size,
>> +				atomic_read(&oldbuf->datalen));
>> +		kfree(oldbuf);
>> +		goto out;
>> +	}
>> +
>> +	pr_debug("found nice zone: %s: id 0, off %lu, size %zu, datalen %d\n",
>> +			zone->name, zone->off, zone->buffer_size,
>> +			atomic_read(&oldbuf->datalen));
>> +	zone->oldbuf = oldbuf;
>> +out:
>> +	blkz_flush_dirty_zone(zone);
>> +	return 0;
>> +
>> +free_oldbuf:
>> +	kfree(oldbuf);
>> +	return ret;
>> +}
>> +
>>  static inline int blkz_recovery(struct blkz_context *cxt)
>>  {
>>  	int ret = -EBUSY;
>> @@ -421,6 +493,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
>>  	if (ret)
>>  		goto recover_fail;
>>  
>> +	ret = blkz_recover_pmsg(cxt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>>  	pr_debug("recover end!\n");
>>  	atomic_set(&cxt->recovered, 1);
>>  	return 0;
>> @@ -435,9 +511,17 @@ static int blkz_pstore_open(struct pstore_info *psi)
>>  	struct blkz_context *cxt = psi->data;
>>  
>>  	cxt->dmesg_read_cnt = 0;
>> +	cxt->pmsg_read_cnt = 0;
>>  	return 0;
>>  }
>>  
>> +static inline bool blkz_old_ok(struct blkz_zone *zone)
>> +{
>> +	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
>> +		return true;
>> +	return false;
>> +}
>> +
>>  static inline bool blkz_ok(struct blkz_zone *zone)
>>  {
>>  	if (zone && zone->buffer && buffer_datalen(zone))
>> @@ -455,6 +539,25 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>>  	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>>  }
>>  
>> +static inline int blkz_pmsg_erase(struct blkz_context *cxt,
>> +		struct blkz_zone *zone)
>> +{
>> +	if (unlikely(!blkz_old_ok(zone)))
>> +		return 0;
>> +
>> +	kfree(zone->oldbuf);
>> +	zone->oldbuf = NULL;
>> +	/*
>> +	 * if there are new data in zone buffer, that means the old data
>> +	 * are already invalid. It is no need to flush 0 (erase) to
>> +	 * block device.
>> +	 */
>> +	if (!buffer_datalen(zone))
>> +		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>> +	blkz_flush_dirty_zone(zone);
>> +	return 0;
>> +}
>> +
>>  static int blkz_pstore_erase(struct pstore_record *record)
>>  {
>>  	struct blkz_context *cxt = record->psi->data;
>> @@ -462,6 +565,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
>>  	switch (record->type) {
>>  	case PSTORE_TYPE_DMESG:
>>  		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
>> +	case PSTORE_TYPE_PMSG:
>> +		return blkz_pmsg_erase(cxt, cxt->pbz);
>>  	default:
>>  		return -EINVAL;
>>  	}
>> @@ -482,8 +587,10 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
>>  	hdr->reason = record->reason;
>>  	if (hdr->reason == KMSG_DUMP_OOPS)
>>  		hdr->counter = ++cxt->oops_counter;
>> -	else
>> +	else if (hdr->reason == KMSG_DUMP_PANIC)
>>  		hdr->counter = ++cxt->panic_counter;
>> +	else
>> +		hdr->counter = 0;
>>  }
>>  
>>  static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
>> @@ -546,6 +653,55 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
>>  	return 0;
>>  }
>>  
>> +static int notrace blkz_pmsg_write(struct blkz_context *cxt,
>> +		struct pstore_record *record)
>> +{
>> +	struct blkz_zone *zone;
>> +	size_t start, rem;
>> +	int cnt = record->size;
>> +	bool is_full_data = false;
>> +	char *buf = record->buf;
>> +
>> +	zone = cxt->pbz;
>> +	if (!zone)
>> +		return -ENOSPC;
>> +
>> +	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
>> +		is_full_data = true;
>> +
>> +	if (unlikely(cnt > zone->buffer_size)) {
>> +		buf += cnt - zone->buffer_size;
>> +		cnt = zone->buffer_size;
>> +	}
>> +
>> +	start = buffer_start(zone);
>> +	rem = zone->buffer_size - start;
>> +	if (unlikely(rem < cnt)) {
>> +		blkz_zone_write(zone, FLUSH_PART, buf, rem, start);
>> +		buf += rem;
>> +		cnt -= rem;
>> +		start = 0;
>> +		is_full_data = true;
>> +	}
>> +
>> +	atomic_set(&zone->buffer->start, cnt + start);
>> +	blkz_zone_write(zone, FLUSH_PART, buf, cnt, start);
>> +
>> +	/**
>> +	 * blkz_zone_write will set datalen as start + cnt.
>> +	 * It work if actual data length lesser than buffer size.
>> +	 * If data length greater than buffer size, pmsg will rewrite to
>> +	 * beginning of zone, which make buffer->datalen wrongly.
>> +	 * So we should reset datalen as buffer size once actual data length
>> +	 * greater than buffer size.
>> +	 */
>> +	if (is_full_data) {
>> +		atomic_set(&zone->buffer->datalen, zone->buffer_size);
>> +		blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>> +	}
>> +	return 0;
>> +}
>> +
>>  static int notrace blkz_pstore_write(struct pstore_record *record)
>>  {
>>  	struct blkz_context *cxt = record->psi->data;
>> @@ -557,6 +713,8 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>>  	switch (record->type) {
>>  	case PSTORE_TYPE_DMESG:
>>  		return blkz_dmesg_write(cxt, record);
>> +	case PSTORE_TYPE_PMSG:
>> +		return blkz_pmsg_write(cxt, record);
>>  	default:
>>  		return -EINVAL;
>>  	}
>> @@ -573,6 +731,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>>  			return zone;
>>  	}
>>  
>> +	if (cxt->pmsg_read_cnt == 0) {
>> +		cxt->pmsg_read_cnt++;
>> +		zone = cxt->pbz;
>> +		if (blkz_old_ok(zone))
>> +			return zone;
>> +	}
>> +
>>  	return NULL;
>>  }
>>  
>> @@ -611,7 +776,8 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>>  		char *buf = kasprintf(GFP_KERNEL,
>>  				"%s: Total %d times\n",
>>  				record->reason == KMSG_DUMP_OOPS ? "Oops" :
>> -				"Panic", record->count);
>> +				record->reason == KMSG_DUMP_PANIC ? "Panic" :
>> +				"Unknown", record->count);
> 
> Please use get_reason_str() here.
> 

get_reason_str() is marked 'static' on platform.c and pstore/blk only
support oops
and panic, it's no need to check more reason number.

>>  		hlen = strlen(buf);
>>  		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
>>  		if (!record->buf) {
>> @@ -633,6 +799,29 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>>  	return size + hlen;
>>  }
>>  
>> +static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
>> +		struct pstore_record *record)
>> +{
>> +	size_t size, start;
>> +	struct blkz_buffer *buf;
>> +
>> +	buf = (struct blkz_buffer *)zone->oldbuf;
>> +	if (!buf)
>> +		return READ_NEXT_ZONE;
>> +
>> +	size = atomic_read(&buf->datalen);
>> +	start = atomic_read(&buf->start);
>> +
>> +	record->buf = kmalloc(size, GFP_KERNEL);
>> +	if (!record->buf)
>> +		return -ENOMEM;
>> +
>> +	memcpy(record->buf, buf->data + start, size - start);
>> +	memcpy(record->buf + size - start, buf->data, start);
>> +
>> +	return size;
>> +}
>> +
>>  static ssize_t blkz_pstore_read(struct pstore_record *record)
>>  {
>>  	struct blkz_context *cxt = record->psi->data;
>> @@ -657,6 +846,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>>  		blkz_read = blkz_dmesg_read;
>>  		record->id = cxt->dmesg_read_cnt - 1;
>>  		break;
>> +	case PSTORE_TYPE_PMSG:
>> +		blkz_read = blkz_pmsg_read;
>> +		break;
>>  	default:
>>  		goto next_zone;
>>  	}
>> @@ -712,8 +904,10 @@ static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
>>  	zone->type = type;
>>  	zone->buffer_size = size - sizeof(struct blkz_buffer);
>>  	zone->buffer->sig = type ^ BLK_SIG;
>> +	zone->oldbuf = NULL;
>>  	atomic_set(&zone->dirty, 0);
>>  	atomic_set(&zone->buffer->datalen, 0);
>> +	atomic_set(&zone->buffer->start, 0);
>>  
>>  	*off += size;
>>  
>> @@ -798,17 +992,26 @@ static int blkz_cut_zones(struct blkz_context *cxt)
>>  	struct blkz_info *info = cxt->bzinfo;
>>  	unsigned long off = 0;
>>  	int err;
>> -	size_t size;
>> +	size_t off_size = 0;
>>  
>> -	size = info->total_size;
>> -	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
>> +	off_size += info->pmsg_size;
>> +	cxt->pbz = blkz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
>> +	if (IS_ERR(cxt->pbz)) {
>> +		err = PTR_ERR(cxt->pbz);
>> +		goto fail_out;
>> +	}
>> +
>> +	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
>> +			info->total_size - off_size,
>>  			info->dmesg_size, &cxt->dmesg_max_cnt);
>>  	if (IS_ERR(cxt->dbzs)) {
>>  		err = PTR_ERR(cxt->dbzs);
>> -		goto fail_out;
>> +		goto free_pmsg;
>>  	}
>>  
>>  	return 0;
>> +free_pmsg:
>> +	blkz_free_zone(&cxt->pbz);
>>  fail_out:
>>  	return err;
>>  }
>> @@ -824,7 +1027,7 @@ int blkz_register(struct blkz_info *info)
>>  		return -EINVAL;
>>  	}
>>  
>> -	if (!info->dmesg_size) {
>> +	if (!info->dmesg_size && !info->pmsg_size) {
>>  		pr_warn("at least one of the records be non-zero\n");
>>  		return -EINVAL;
>>  	}
>> @@ -851,6 +1054,7 @@ int blkz_register(struct blkz_info *info)
>>  
>>  	check_size(total_size, 4096);
>>  	check_size(dmesg_size, SECTOR_SIZE);
>> +	check_size(pmsg_size, SECTOR_SIZE);
>>  
>>  #undef check_size
>>  
>> @@ -882,6 +1086,7 @@ int blkz_register(struct blkz_info *info)
>>  	pr_debug("register %s with properties:\n", info->name);
>>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>>  	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
>> +	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
>>  
>>  	err = blkz_cut_zones(cxt);
>>  	if (err) {
>> @@ -900,11 +1105,14 @@ int blkz_register(struct blkz_info *info)
>>  	}
>>  	cxt->pstore.data = cxt;
>>  	if (info->dmesg_size)
>> -		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
>> +		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
>> +	if (info->pmsg_size)
>> +		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
>>  
>> -	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
>> +	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
>>  			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
>> -			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
>> +			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
>> +			cxt->pbz ? "Pmsg" : "");
> 
> I'd switch to leading spaces so can leave these strings unchanged as you
> add them:
> 
> 	for%s%s%s\n", info->name,
> 		cxt->dbzs && cxt->bzinfo->dump_oops ? " Oops" : "",
> 		cxt->dbzs && cxt->bzinfo->panic_write ? " Panic" : "",
> 		cxt->pbz ? " Pmsg" : "");
> 
> etc

That's a good idea.

> 
>>  
>>  	err = pstore_register(&cxt->pstore);
>>  	if (err) {
>> @@ -940,6 +1148,7 @@ void blkz_unregister(struct blkz_info *info)
>>  	spin_unlock(&cxt->bzinfo_lock);
>>  
>>  	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
>> +	blkz_free_zone(&cxt->pbz);
>>  }
>>  EXPORT_SYMBOL_GPL(blkz_unregister);
>>  
>> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
>> index 589d276fa4e4..af06be25bd01 100644
>> --- a/include/linux/pstore_blk.h
>> +++ b/include/linux/pstore_blk.h
>> @@ -19,6 +19,9 @@
>>   * @dmesg_size:
>>   *	The size of each zones for dmesg (oops & panic). Zero means disabled,
>>   *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
>> + * @pmsg_size:
>> + *	The size of zone for pmsg. Zero means disabled, othewise, it must be
>> + *	multiple of SECTOR_SIZE(512).
>>   * @dump_oops:
>>   *	Dump oops and panic log or only panic.
>>   * @read, @write:
>> @@ -50,6 +53,7 @@ struct blkz_info {
>>  
>>  	unsigned long total_size;
>>  	unsigned long dmesg_size;
>> +	unsigned long pmsg_size;
>>  	int dump_oops;
>>  	blkz_read_op read;
>>  	blkz_write_op write;
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/11] pstore/blk: blkoops: support console recorder
  2020-03-18 18:16   ` Kees Cook
@ 2020-03-22 11:35     ` WeiXiong Liao
  0 siblings, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 11:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:16, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:48PM +0800, WeiXiong Liao wrote:
>> Support recorder for console. To enable console recorder, just make
>> console_size be greater than 0 and a multiple of 4096.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  fs/pstore/Kconfig          |  12 ++++++
>>  fs/pstore/blkoops.c        |  11 +++++
>>  fs/pstore/blkzone.c        | 101 ++++++++++++++++++++++++++++++++++-----------
>>  include/linux/blkoops.h    |   6 ++-
>>  include/linux/pstore_blk.h |   8 +++-
>>  5 files changed, 112 insertions(+), 26 deletions(-)
>>
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index bbf1fdb5eaa7..5f0a42823028 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -198,6 +198,18 @@ config PSTORE_BLKOOPS_PMSG_SIZE
>>  	  NOTE that, both kconfig and module parameters can configure blkoops,
>>  	  but module parameters have priority over kconfig.
>>  
>> +config PSTORE_BLKOOPS_CONSOLE_SIZE
>> +	int "console size in kbytes for blkoops"
>> +	depends on PSTORE_BLKOOPS
>> +	depends on PSTORE_CONSOLE
>> +	default 64
> 
> Same tricks here as for the PMSG.
> 

Same reply as for the PMSG.

>> +	help
>> +	  This just sets size of console (console_size) for pstore/blk. The
>> +	  size is in KB and must be a multiple of 4.
>> +
>> +	  NOTE that, both kconfig and module parameters can configure blkoops,
>> +	  but module parameters have priority over kconfig.
>> +
>>  config PSTORE_BLKOOPS_BLKDEV
>>  	string "block device for blkoops"
>>  	depends on PSTORE_BLKOOPS
>> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
>> index 02e6e4c1f965..05990bc3b168 100644
>> --- a/fs/pstore/blkoops.c
>> +++ b/fs/pstore/blkoops.c
>> @@ -20,6 +20,10 @@
>>  module_param(pmsg_size, long, 0400);
>>  MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
>>  
>> +static long console_size = -1;
>> +module_param(console_size, long, 0400);
>> +MODULE_PARM_DESC(console_size, "console size in kbytes");
>> +
>>  static int dump_oops = -1;
>>  module_param(dump_oops, int, 0400);
>>  MODULE_PARM_DESC(total_size, "whether dump oops");
>> @@ -70,6 +74,12 @@
>>  #define DEFAULT_PMSG_SIZE 0
>>  #endif
>>  
>> +#ifdef CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
>> +#define DEFAULT_CONSOLE_SIZE CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
>> +#else
>> +#define DEFAULT_CONSOLE_SIZE 0
>> +#endif
>> +
>>  #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>>  #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>>  #else
>> @@ -124,6 +134,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>>  
>>  	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
>>  	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
>> +	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
>>  #undef verify_size
>>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>>  
>> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
>> index a3464252d52e..9a7e9b06ccf7 100644
>> --- a/fs/pstore/blkzone.c
>> +++ b/fs/pstore/blkzone.c
>> @@ -88,9 +88,11 @@ struct blkz_zone {
>>  struct blkz_context {
>>  	struct blkz_zone **dbzs;	/* dmesg block zones */
>>  	struct blkz_zone *pbz;		/* Pmsg block zone */
>> +	struct blkz_zone *cbz;		/* console block zone */
>>  	unsigned int dmesg_max_cnt;
>>  	unsigned int dmesg_read_cnt;
>>  	unsigned int pmsg_read_cnt;
>> +	unsigned int console_read_cnt;
>>  	unsigned int dmesg_write_cnt;
>>  	/*
>>  	 * the counter should be recovered when recover.
>> @@ -111,6 +113,9 @@ struct blkz_context {
>>  };
>>  static struct blkz_context blkz_cxt;
>>  
>> +static void blkz_flush_all_dirty_zones(struct work_struct *);
>> +static DECLARE_WORK(blkz_cleaner, blkz_flush_all_dirty_zones);
>> +
>>  enum blkz_flush_mode {
>>  	FLUSH_NONE = 0,
>>  	FLUSH_PART,
>> @@ -200,6 +205,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
>>  	return 0;
>>  set_dirty:
>>  	atomic_set(&zone->dirty, true);
>> +	/* flush dirty zones nicely */
>> +	if (wcnt == -EBUSY && !is_on_panic())
>> +		schedule_work(&blkz_cleaner);
>>  	return -EBUSY;
>>  }
>>  
>> @@ -266,6 +274,15 @@ static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
>>  	return 0;
>>  }
>>  
>> +static void blkz_flush_all_dirty_zones(struct work_struct *work)
>> +{
>> +	struct blkz_context *cxt = &blkz_cxt;
>> +
>> +	blkz_flush_dirty_zone(cxt->pbz);
>> +	blkz_flush_dirty_zone(cxt->cbz);
>> +	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
>> +}
>> +
>>  static int blkz_recover_dmesg_data(struct blkz_context *cxt)
>>  {
>>  	struct blkz_info *info = cxt->bzinfo;
>> @@ -419,15 +436,13 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
>>  	return ret;
>>  }
>>  
>> -static int blkz_recover_pmsg(struct blkz_context *cxt)
>> +static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
>>  {
>>  	struct blkz_info *info = cxt->bzinfo;
>>  	struct blkz_buffer *oldbuf;
>> -	struct blkz_zone *zone = NULL;
>>  	int ret = 0;
>>  	ssize_t rcnt, len;
>>  
>> -	zone = cxt->pbz;
>>  	if (!zone || zone->oldbuf)
>>  		return 0;
>>  
>> @@ -493,7 +508,11 @@ static inline int blkz_recovery(struct blkz_context *cxt)
>>  	if (ret)
>>  		goto recover_fail;
>>  
>> -	ret = blkz_recover_pmsg(cxt);
>> +	ret = blkz_recover_zone(cxt, cxt->pbz);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>> +	ret = blkz_recover_zone(cxt, cxt->cbz);
>>  	if (ret)
>>  		goto recover_fail;
>>  
>> @@ -512,6 +531,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
>>  
>>  	cxt->dmesg_read_cnt = 0;
>>  	cxt->pmsg_read_cnt = 0;
>> +	cxt->console_read_cnt = 0;
>>  	return 0;
>>  }
>>  
>> @@ -539,7 +559,7 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>>  	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>>  }
>>  
>> -static inline int blkz_pmsg_erase(struct blkz_context *cxt,
>> +static inline int blkz_record_erase(struct blkz_context *cxt,
>>  		struct blkz_zone *zone)
>>  {
>>  	if (unlikely(!blkz_old_ok(zone)))
>> @@ -566,9 +586,10 @@ static int blkz_pstore_erase(struct pstore_record *record)
>>  	case PSTORE_TYPE_DMESG:
>>  		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
>>  	case PSTORE_TYPE_PMSG:
>> -		return blkz_pmsg_erase(cxt, cxt->pbz);
>> -	default:
>> -		return -EINVAL;
>> +		return blkz_record_erase(cxt, cxt->pbz);
>> +	case PSTORE_TYPE_CONSOLE:
>> +		return blkz_record_erase(cxt, cxt->cbz);
>> +	default: return -EINVAL;
>>  	}
>>  }
>>  
>> @@ -653,17 +674,15 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
>>  	return 0;
>>  }
>>  
>> -static int notrace blkz_pmsg_write(struct blkz_context *cxt,
>> -		struct pstore_record *record)
>> +static int notrace blkz_record_write(struct blkz_context *cxt,
>> +		struct blkz_zone *zone, struct pstore_record *record)
> 
> How about generalizing this earlier in the patch series instead of
> mutating it here?
> 

OK.

>>  {
>> -	struct blkz_zone *zone;
>>  	size_t start, rem;
>>  	int cnt = record->size;
>>  	bool is_full_data = false;
>>  	char *buf = record->buf;
>>  
>> -	zone = cxt->pbz;
>> -	if (!zone)
>> +	if (!zone || !record)
>>  		return -ENOSPC;
>>  
>>  	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
>> @@ -710,11 +729,20 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>>  			record->reason == KMSG_DUMP_PANIC)
>>  		atomic_set(&cxt->on_panic, 1);
>>  
>> +	/*
>> +	 * if on panic, do not write except dmesg records
>> +	 * Fix case that panic_write prints log which wakes up console recorder.
>> +	 */
>> +	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
>> +		return -EBUSY;
>> +
>>  	switch (record->type) {
>>  	case PSTORE_TYPE_DMESG:
>>  		return blkz_dmesg_write(cxt, record);
>> +	case PSTORE_TYPE_CONSOLE:
>> +		return blkz_record_write(cxt, cxt->cbz, record);
>>  	case PSTORE_TYPE_PMSG:
>> -		return blkz_pmsg_write(cxt, record);
>> +		return blkz_record_write(cxt, cxt->pbz, record);
>>  	default:
>>  		return -EINVAL;
>>  	}
>> @@ -738,6 +766,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>>  			return zone;
>>  	}
>>  
>> +	if (cxt->console_read_cnt == 0) {
>> +		cxt->console_read_cnt++;
>> +		zone = cxt->cbz;
>> +		if (blkz_old_ok(zone))
>> +			return zone;
>> +	}
>> +
>>  	return NULL;
>>  }
>>  
>> @@ -799,7 +834,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>>  	return size + hlen;
>>  }
>>  
>> -static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
>> +static ssize_t blkz_record_read(struct blkz_zone *zone,
>>  		struct pstore_record *record)
>>  {
>>  	size_t size, start;
>> @@ -825,7 +860,7 @@ static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
>>  static ssize_t blkz_pstore_read(struct pstore_record *record)
>>  {
>>  	struct blkz_context *cxt = record->psi->data;
>> -	ssize_t (*blkz_read)(struct blkz_zone *zone,
>> +	ssize_t (*readop)(struct blkz_zone *zone,
>>  			struct pstore_record *record);
>>  	struct blkz_zone *zone;
>>  	ssize_t ret;
>> @@ -843,17 +878,19 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>>  	record->type = zone->type;
>>  	switch (record->type) {
>>  	case PSTORE_TYPE_DMESG:
>> -		blkz_read = blkz_dmesg_read;
>> +		readop = blkz_dmesg_read;
>>  		record->id = cxt->dmesg_read_cnt - 1;
>>  		break;
>> +	case PSTORE_TYPE_CONSOLE:
>> +		/* fallthrough */
> 
> Since this case has no body, you can leave off the "fallthrough". (But
> if you want to mark it anyway, please use "fallthrough;" instead of a
> comment.)
> 

OK. I will fix it anywhere.

>>  	case PSTORE_TYPE_PMSG:
>> -		blkz_read = blkz_pmsg_read;
>> +		readop = blkz_record_read;
>>  		break;
>>  	default:
>>  		goto next_zone;
>>  	}
>>  
>> -	ret = blkz_read(zone, record);
>> +	ret = readop(zone, record);
>>  	if (ret == READ_NEXT_ZONE)
>>  		goto next_zone;
>>  	return ret;
>> @@ -1001,15 +1038,25 @@ static int blkz_cut_zones(struct blkz_context *cxt)
>>  		goto fail_out;
>>  	}
>>  
>> +	off_size += info->console_size;
>> +	cxt->cbz = blkz_init_zone(PSTORE_TYPE_CONSOLE, &off,
>> +			info->console_size);
>> +	if (IS_ERR(cxt->cbz)) {
>> +		err = PTR_ERR(cxt->cbz);
>> +		goto free_pmsg;
>> +	}
>> +
>>  	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
>>  			info->total_size - off_size,
>>  			info->dmesg_size, &cxt->dmesg_max_cnt);
>>  	if (IS_ERR(cxt->dbzs)) {
>>  		err = PTR_ERR(cxt->dbzs);
>> -		goto free_pmsg;
>> +		goto free_console;
>>  	}
>>  
>>  	return 0;
>> +free_console:
>> +	blkz_free_zone(&cxt->cbz);
>>  free_pmsg:
>>  	blkz_free_zone(&cxt->pbz);
>>  fail_out:
>> @@ -1027,7 +1074,7 @@ int blkz_register(struct blkz_info *info)
>>  		return -EINVAL;
>>  	}
>>  
>> -	if (!info->dmesg_size && !info->pmsg_size) {
>> +	if (!info->dmesg_size && !info->pmsg_size && !info->console_size) {
>>  		pr_warn("at least one of the records be non-zero\n");
>>  		return -EINVAL;
>>  	}
>> @@ -1055,6 +1102,7 @@ int blkz_register(struct blkz_info *info)
>>  	check_size(total_size, 4096);
>>  	check_size(dmesg_size, SECTOR_SIZE);
>>  	check_size(pmsg_size, SECTOR_SIZE);
>> +	check_size(console_size, SECTOR_SIZE);
>>  
>>  #undef check_size
>>  
>> @@ -1087,6 +1135,7 @@ int blkz_register(struct blkz_info *info)
>>  	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
>>  	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
>>  	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
>> +	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
>>  
>>  	err = blkz_cut_zones(cxt);
>>  	if (err) {
>> @@ -1108,11 +1157,15 @@ int blkz_register(struct blkz_info *info)
>>  		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
>>  	if (info->pmsg_size)
>>  		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
>> +	if (info->console_size)
>> +		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
>>  
>> -	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
>> +	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
>> +			info->name,
>>  			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
>>  			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
>> -			cxt->pbz ? "Pmsg" : "");
>> +			cxt->pbz ? "Pmsg " : "",
>> +			cxt->cbz ? "Console" : "");
>>  
>>  	err = pstore_register(&cxt->pstore);
>>  	if (err) {
>> @@ -1139,6 +1192,8 @@ void blkz_unregister(struct blkz_info *info)
>>  {
>>  	struct blkz_context *cxt = &blkz_cxt;
>>  
>> +	flush_work(&blkz_cleaner);
>> +
>>  	pstore_unregister(&cxt->pstore);
>>  	kfree(cxt->pstore.buf);
>>  	cxt->pstore.bufsize = 0;
>> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
>> index fe63739309aa..8f40f225545d 100644
>> --- a/include/linux/blkoops.h
>> +++ b/include/linux/blkoops.h
>> @@ -23,8 +23,10 @@
>>   *	Both of the @size and @offset parameters on this interface are
>>   *	the relative size of the space provided, not the whole disk/flash.
>>   *
>> - *	On success, the number of bytes read should be returned.
>> - *	On error, negative number should be returned.
>> + *	On success, the number of bytes read/write should be returned.
>> + *	On error, negative number should be returned. The following returning
>> + *	number means more:
>> + *	  -EBUSY: pstore/blk should try again later.
>>   * @panic_write:
>>   *	The write operation only used for panic.
>>   *
>> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
>> index af06be25bd01..546375e04419 100644
>> --- a/include/linux/pstore_blk.h
>> +++ b/include/linux/pstore_blk.h
>> @@ -22,6 +22,9 @@
>>   * @pmsg_size:
>>   *	The size of zone for pmsg. Zero means disabled, othewise, it must be
>>   *	multiple of SECTOR_SIZE(512).
>> + * @console_size:
>> + *	The size of zone for console. Zero means disabled, othewise, it must
>> + *	be multiple of SECTOR_SIZE(512).
>>   * @dump_oops:
>>   *	Dump oops and panic log or only panic.
>>   * @read, @write:
>> @@ -33,7 +36,9 @@
>>   *	the relative size of the space provided, not the whole disk/flash.
>>   *
>>   *	On success, the number of bytes read/write should be returned.
>> - *	On error, negative number should be returned.
>> + *	On error, negative number should be returned. The following returning
>> + *	number means more:
>> + *	  -EBUSY: pstore/blk should try again later.
>>   * @panic_write:
>>   *	The write operation only used for panic. It's optional if you do not
>>   *	care panic record. If panic occur but blkzone do not recover yet, the
>> @@ -54,6 +59,7 @@ struct blkz_info {
>>  	unsigned long total_size;
>>  	unsigned long dmesg_size;
>>  	unsigned long pmsg_size;
>> +	unsigned long console_size;
>>  	int dump_oops;
>>  	blkz_read_op read;
>>  	blkz_write_op write;
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder
  2020-03-18 18:19   ` Kees Cook
@ 2020-03-22 11:42     ` WeiXiong Liao
  2020-03-22 15:16       ` Kees Cook
  0 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 11:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:19, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:49PM +0800, WeiXiong Liao wrote:
>> Support recorder for ftrace. To enable ftrace recorder, just make
>> ftrace_size be greater than 0 and a multiple of 4096.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  fs/pstore/Kconfig          | 12 ++++++++
>>  fs/pstore/blkoops.c        | 11 +++++++
>>  fs/pstore/blkzone.c        | 75 ++++++++++++++++++++++++++++++++++++++++++++--
>>  include/linux/pstore_blk.h |  4 +++
>>  4 files changed, 99 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index 5f0a42823028..308a0a4c5ee5 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -210,6 +210,18 @@ config PSTORE_BLKOOPS_CONSOLE_SIZE
>>  	  NOTE that, both kconfig and module parameters can configure blkoops,
>>  	  but module parameters have priority over kconfig.
>>  
>> +config PSTORE_BLKOOPS_FTRACE_SIZE
>> +	int "ftrace size in kbytes for blkoops"
>> +	depends on PSTORE_BLKOOPS
>> +	depends on PSTORE_FTRACE
>> +	default 64
> 
> Same tricks. :)
> 

Sure. We can discuss on patch 2 and I will fix it to all over the series
patches.

>> +	help
>> +	  This just sets size of ftrace (ftrace_size) for pstore/blk. The
>> +	  size is in KB and must be a multiple of 4.
>> +
>> +	  NOTE that, both kconfig and module parameters can configure blkoops,
>> +	  but module parameters have priority over kconfig.
>> +
>>  config PSTORE_BLKOOPS_BLKDEV
>>  	string "block device for blkoops"
>>  	depends on PSTORE_BLKOOPS
>> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
>> index 05990bc3b168..c76bab671b0b 100644
>> --- a/fs/pstore/blkoops.c
>> +++ b/fs/pstore/blkoops.c
>> @@ -24,6 +24,10 @@
>>  module_param(console_size, long, 0400);
>>  MODULE_PARM_DESC(console_size, "console size in kbytes");
>>  
>> +static long ftrace_size = -1;
>> +module_param(ftrace_size, long, 0400);
>> +MODULE_PARM_DESC(ftrace_size, "ftrace size in kbytes");
>> +
>>  static int dump_oops = -1;
>>  module_param(dump_oops, int, 0400);
>>  MODULE_PARM_DESC(total_size, "whether dump oops");
>> @@ -80,6 +84,12 @@
>>  #define DEFAULT_CONSOLE_SIZE 0
>>  #endif
>>  
>> +#ifdef CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
>> +#define DEFAULT_FTRACE_SIZE CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
>> +#else
>> +#define DEFAULT_FTRACE_SIZE 0
>> +#endif
>> +
>>  #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>>  #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
>>  #else
>> @@ -135,6 +145,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>>  	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
>>  	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
>>  	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
>> +	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
>>  #undef verify_size
>>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>>  
>> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
>> index 9a7e9b06ccf7..442e5a5bbfda 100644
>> --- a/fs/pstore/blkzone.c
>> +++ b/fs/pstore/blkzone.c
>> @@ -89,10 +89,13 @@ struct blkz_context {
>>  	struct blkz_zone **dbzs;	/* dmesg block zones */
>>  	struct blkz_zone *pbz;		/* Pmsg block zone */
>>  	struct blkz_zone *cbz;		/* console block zone */
>> +	struct blkz_zone **fbzs;	/* Ftrace zones */
>>  	unsigned int dmesg_max_cnt;
>>  	unsigned int dmesg_read_cnt;
>>  	unsigned int pmsg_read_cnt;
>>  	unsigned int console_read_cnt;
>> +	unsigned int ftrace_max_cnt;
>> +	unsigned int ftrace_read_cnt;
>>  	unsigned int dmesg_write_cnt;
>>  	/*
>>  	 * the counter should be recovered when recover.
>> @@ -281,6 +284,7 @@ static void blkz_flush_all_dirty_zones(struct work_struct *work)
>>  	blkz_flush_dirty_zone(cxt->pbz);
>>  	blkz_flush_dirty_zone(cxt->cbz);
>>  	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
>> +	blkz_flush_dirty_zones(cxt->fbzs, cxt->ftrace_max_cnt);
>>  }
>>  
>>  static int blkz_recover_dmesg_data(struct blkz_context *cxt)
>> @@ -497,6 +501,31 @@ static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
>>  	return ret;
>>  }
>>  
>> +static int blkz_recover_zones(struct blkz_context *cxt,
>> +		struct blkz_zone **zones, unsigned int cnt)
>> +{
>> +	int ret;
>> +	unsigned int i;
>> +	struct blkz_zone *zone;
>> +
>> +	if (!zones)
>> +		return 0;
>> +
>> +	for (i = 0; i < cnt; i++) {
>> +		zone = zones[i];
>> +		if (unlikely(!zone))
>> +			continue;
>> +		ret = blkz_recover_zone(cxt, zone);
>> +		if (ret)
>> +			goto recover_fail;
>> +	}
>> +
>> +	return 0;
>> +recover_fail:
>> +	pr_debug("recover %s[%u] failed\n", zone->name, i);
>> +	return ret;
>> +}
> 
> Why is this introduced here? Shouldn't this be earlier in the series?
> 

blkz_recover_zones() is used to recover a array of zones. Only ftrace
recorder
need it, so it's introduced here.

>> +
>>  static inline int blkz_recovery(struct blkz_context *cxt)
>>  {
>>  	int ret = -EBUSY;
>> @@ -516,6 +545,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
>>  	if (ret)
>>  		goto recover_fail;
>>  
>> +	ret = blkz_recover_zones(cxt, cxt->fbzs, cxt->ftrace_max_cnt);
>> +	if (ret)
>> +		goto recover_fail;
>> +
>>  	pr_debug("recover end!\n");
>>  	atomic_set(&cxt->recovered, 1);
>>  	return 0;
>> @@ -532,6 +565,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
>>  	cxt->dmesg_read_cnt = 0;
>>  	cxt->pmsg_read_cnt = 0;
>>  	cxt->console_read_cnt = 0;
>> +	cxt->ftrace_read_cnt = 0;
>>  	return 0;
>>  }
>>  
>> @@ -589,6 +623,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
>>  		return blkz_record_erase(cxt, cxt->pbz);
>>  	case PSTORE_TYPE_CONSOLE:
>>  		return blkz_record_erase(cxt, cxt->cbz);
>> +	case PSTORE_TYPE_FTRACE:
>> +		return blkz_record_erase(cxt, cxt->fbzs[record->id]);
>>  	default: return -EINVAL;
>>  	}
>>  }
>> @@ -743,6 +779,13 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>>  		return blkz_record_write(cxt, cxt->cbz, record);
>>  	case PSTORE_TYPE_PMSG:
>>  		return blkz_record_write(cxt, cxt->pbz, record);
>> +	case PSTORE_TYPE_FTRACE: {
>> +		int zonenum = smp_processor_id();
>> +
>> +		if (!cxt->fbzs)
>> +			return -ENOSPC;
>> +		return blkz_record_write(cxt, cxt->fbzs[zonenum], record);
>> +	}
>>  	default:
>>  		return -EINVAL;
>>  	}
>> @@ -759,6 +802,12 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>>  			return zone;
>>  	}
>>  
>> +	while (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt) {
>> +		zone = cxt->fbzs[cxt->ftrace_read_cnt++];
>> +		if (blkz_old_ok(zone))
>> +			return zone;
>> +	}
>> +
>>  	if (cxt->pmsg_read_cnt == 0) {
>>  		cxt->pmsg_read_cnt++;
>>  		zone = cxt->pbz;
>> @@ -881,6 +930,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>>  		readop = blkz_dmesg_read;
>>  		record->id = cxt->dmesg_read_cnt - 1;
>>  		break;
>> +	case PSTORE_TYPE_FTRACE:
>> +		record->id = cxt->ftrace_read_cnt - 1;
>> +		/* fallthrough */
> 
> Please mark with "fallthrough;".
> https://www.kernel.org/doc/html/latest/process/deprecated.html#implicit-switch-case-fall-through
> 

Fixed.

>>  	case PSTORE_TYPE_CONSOLE:
>>  		/* fallthrough */
>>  	case PSTORE_TYPE_PMSG:
>> @@ -1046,15 +1098,27 @@ static int blkz_cut_zones(struct blkz_context *cxt)
>>  		goto free_pmsg;
>>  	}
>>  
>> +	off_size += info->ftrace_size;
>> +	cxt->fbzs = blkz_init_zones(PSTORE_TYPE_FTRACE, &off,
>> +			info->ftrace_size,
>> +			info->ftrace_size / nr_cpu_ids,
>> +			&cxt->ftrace_max_cnt);
>> +	if (IS_ERR(cxt->fbzs)) {
>> +		err = PTR_ERR(cxt->fbzs);
>> +		goto free_console;
>> +	}
>> +
>>  	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
>>  			info->total_size - off_size,
>>  			info->dmesg_size, &cxt->dmesg_max_cnt);
>>  	if (IS_ERR(cxt->dbzs)) {
>>  		err = PTR_ERR(cxt->dbzs);
>> -		goto free_console;
>> +		goto free_ftrace;
>>  	}
>>  
>>  	return 0;
>> +free_ftrace:
>> +	blkz_free_zones(&cxt->fbzs, &cxt->ftrace_max_cnt);
>>  free_console:
>>  	blkz_free_zone(&cxt->cbz);
>>  free_pmsg:
>> @@ -1103,6 +1167,7 @@ int blkz_register(struct blkz_info *info)
>>  	check_size(dmesg_size, SECTOR_SIZE);
>>  	check_size(pmsg_size, SECTOR_SIZE);
>>  	check_size(console_size, SECTOR_SIZE);
>> +	check_size(ftrace_size, SECTOR_SIZE);
>>  
>>  #undef check_size
>>  
>> @@ -1136,6 +1201,7 @@ int blkz_register(struct blkz_info *info)
>>  	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
>>  	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
>>  	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
>> +	pr_debug("\tftrace size : %ld Bytes\n", info->ftrace_size);
>>  
>>  	err = blkz_cut_zones(cxt);
>>  	if (err) {
>> @@ -1159,13 +1225,16 @@ int blkz_register(struct blkz_info *info)
>>  		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
>>  	if (info->console_size)
>>  		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
>> +	if (info->ftrace_size)
>> +		cxt->pstore.flags |= PSTORE_FLAGS_FTRACE;
>>  
>> -	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
>> +	pr_info("Registered %s as blkzone backend for %s%s%s%s%s\n",
>>  			info->name,
>>  			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
>>  			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
>>  			cxt->pbz ? "Pmsg " : "",
>> -			cxt->cbz ? "Console" : "");
>> +			cxt->cbz ? "Console " : "",
>> +			cxt->fbzs ? "Ftrace" : "");
>>  
>>  	err = pstore_register(&cxt->pstore);
>>  	if (err) {
>> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
>> index 546375e04419..77704c1b404a 100644
>> --- a/include/linux/pstore_blk.h
>> +++ b/include/linux/pstore_blk.h
>> @@ -25,6 +25,9 @@
>>   * @console_size:
>>   *	The size of zone for console. Zero means disabled, othewise, it must
>>   *	be multiple of SECTOR_SIZE(512).
>> + * @ftrace_size:
>> + *	The size of zone for ftrace. Zero means disabled, othewise, it must
>> + *	be multiple of SECTOR_SIZE(512).
>>   * @dump_oops:
>>   *	Dump oops and panic log or only panic.
>>   * @read, @write:
>> @@ -60,6 +63,7 @@ struct blkz_info {
>>  	unsigned long dmesg_size;
>>  	unsigned long pmsg_size;
>>  	unsigned long console_size;
>> +	unsigned long ftrace_size;
>>  	int dump_oops;
>>  	blkz_read_op read;
>>  	blkz_write_op write;
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-03-18 18:31   ` Kees Cook
@ 2020-03-22 12:20     ` WeiXiong Liao
  0 siblings, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 12:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:31, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:50PM +0800, WeiXiong Liao wrote:
>> The document, at Documentation/admin-guide/pstore-block.rst, tells us
>> how to use pstore/blk and blkoops.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  Documentation/admin-guide/pstore-block.rst | 281 +++++++++++++++++++++++++++++
>>  MAINTAINERS                                |   1 +
>>  fs/pstore/Kconfig                          |   2 +
>>  3 files changed, 284 insertions(+)
>>  create mode 100644 Documentation/admin-guide/pstore-block.rst
>>
>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>> new file mode 100644
>> index 000000000000..c8a5f68960c3
>> --- /dev/null
>> +++ b/Documentation/admin-guide/pstore-block.rst
>> @@ -0,0 +1,281 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +Pstore block oops/panic logger
>> +==============================
>> +
>> +Introduction
>> +------------
>> +
>> +Pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
>> +block device before the system crashes. It also supports non-block devices such
>> +as mtd device.
>> +
>> +There is a trapper named blkoops for pstore/blk, which makes pstore/blk be
>> +nicer to device drivers.
> 
> "trapper" is an odd term here (oh, maybe this was a typo of
> "wrapper"?). Regardless, is there a need to separate blkzone from
> blkoops? It seems everything would just use blkoops directly, even
> mtdpstore?
> 

It is a typo...

Please refer to reply email of patch 2 for reason why I separate blkzone
from
blkoops.

>> +
>> +Pstore block concepts
>> +---------------------
>> +
>> +Pstore/blk works as a zone manager as it cuts the block device or partition
>> +into several zones and stores data for different recorders. What device drivers
> 
> s/recorders/pstore front-ends/
> 

Done.

>> +should do is to provide read/write APIs.
> 
> "A block device driver only needs to provide read/write APIs."
> 

OK.

>> +
>> +Pstore/blk begins at function ``blkz_register``. Besides, blkoops, a wrapper of
>> +pstore/blk, begins at function ``blkoops_register_blkdev`` for block device and
>> +``blkoops_register_device`` for non-block device, which is recommended instead
>> +of directly using pstore/blk.
>> +
>> +Blkoops provides efficient configuration method for pstore/blk, which divides
>> +all configurations of pstore/blk into two parts, configurations for user and
>> +configurations for driver.
>> +
>> +Configurations for user determine how pstore/blk works, such as pmsg_size,
>> +dmesg_size and so on. All of them support both kconfig and module parameters,
>> +but module parameters have priority over kconfig.
>> +
>> +Configurations for driver are all about block/non-block device, such as
>> +total_size of device and read/write operations. Device driver transfers a
>> +structure ``blkoops_device`` defined in *linux/blkoops.h*.
>> +
>> +All of the following are for blkoops.
>> +
>> +Configurations for user
>> +-----------------------
>> +
>> +All of these configurations support both kconfig and module parameters, but
>> +module parameters have priority over kconfig.
>> +Here is an example for module parameters::
>> +
>> +        blkoops.blkdev=179:7 blkoops.dmesg_size=64 blkoops.dump_oops=1
>> +
>> +The detail of each configurations may be of interest to you.
>> +
>> +blkdev
>> +~~~~~~
>> +
>> +The block device to use. Most of the time, it is a partition of block device.
>> +It's fine to ignore it if you are not using a block device.
>> +
>> +It accepts the following variants:
>> +
>> +1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
>> +   leading 0x, for example b302.
>> +#. /dev/<disk_name> represents the device number of disk
>> +#. /dev/<disk_name><decimal> represents the device number of partition - device
>> +   number of disk plus the partition number
>> +#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
>> +   name of partitioned disk ends with a digit.
>> +#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
>> +   a partition if the partition table provides it. The UUID may be either an
>> +   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
>> +   where SSSSSSSS is a zero-filled hex representation of the 32-bit
>> +   "NT disk signature", and PP is a zero-filled hex representation of the
>> +   1-based partition number.
>> +#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
>> +   partition with a known unique id.
>> +#. <major>:<minor> major and minor number of the device separated by a colon.
>> +
>> +dmesg_size
>> +~~~~~~~~~~
>> +
>> +The chunk size in KB for dmesg(oops/panic). It **MUST** be a multiple of 4.
>> +If you don't need it, safely set it to 0 or ignore it.
>> +
>> +NOTE that, the remaining space, except ``pmsg_size``, ``console_size``` and
>> +others, belongs to dmesg. It means that there are multiple chunks for dmesg.
>> +
>> +Pstore/blk will log to dmesg chunks one by one, and always overwrite the oldest
>> +chunk if there is no more free chunks.
>> +
>> +pmsg_size
>> +~~~~~~~~~
>> +
>> +The chunk size in KB for pmsg. It **MUST** be a multiple of 4. If you do not
>> +need it, safely set it to 0 or ignore it.
>> +
>> +There is only one chunk for pmsg.
>> +
>> +Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
>> +appended to the chunk. On reboot the contents are available in
>> +/sys/fs/pstore/pmsg-pstore-blk-0.
>> +
>> +console_size
>> +~~~~~~~~~~~~
>> +
>> +The chunk size in KB for console. It **MUST** be a multiple of 4. If you
>> +do not need it, safely set it to 0 or ignore it.
>> +
>> +There is only one chunk for console.
>> +
>> +All log of console will be appended to the chunk. On reboot the contents are
>> +available in /sys/fs/pstore/console-pstore-blk-0.
>> +
>> +ftrace_size
>> +~~~~~~~~~~~
>> +
>> +The chunk size in KB for ftrace. It **MUST** be a multiple of 4. If you
>> +do not need it, safely set it to 0 or ignore it.
>> +
>> +There may be several chunks for ftrace, according to how many processors on
>> +your CPU. Each chunk size is equal to (ftrace_size / processors_count).
>> +
>> +All log of ftrace will be appended to the chunk. On reboot the contents are
>> +available in /sys/fs/pstore/ftrace-pstore-blk-[N], where N is the processor
>> +number.
>> +
>> +Persistent function tracing might be useful for debugging software or hardware
>> +related hangs. Here is an example of usage::
>> +
>> + # mount -t pstore pstore /sys/fs/pstore
>> + # mount -t debugfs debugfs /sys/kernel/debug/
>> + # echo 1 > /sys/kernel/debug/pstore/record_ftrace
>> + # reboot -f
>> + [...]
>> + # mount -t pstore pstore /sys/fs/pstore
>> + # tail /sys/fs/pstore/ftrace-pstore-blk-0
>> + CPU:0 ts:109860 c03a4310  c0063ebc  cpuidle_select <- cpu_startup_entry+0x1a8/0x1e0
>> + CPU:0 ts:109861 c03a5878  c03a4324  menu_select <- cpuidle_select+0x24/0x2c
>> + CPU:0 ts:109862 c00670e8  c03a589c  pm_qos_request <- menu_select+0x38/0x4cc
>> + CPU:0 ts:109863 c0092bbc  c03a5960  tick_nohz_get_sleep_length <- menu_select+0xfc/0x4cc
>> + CPU:0 ts:109865 c004b2f4  c03a59d4  get_iowait_load <- menu_select+0x170/0x4cc
>> + CPU:0 ts:109868 c0063b60  c0063ecc  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
>> + CPU:0 ts:109869 c03a433c  c0063b94  cpuidle_enter <- call_cpuidle+0x44/0x48
>> + CPU:0 ts:109871 c03a4000  c03a4350  cpuidle_enter_state <- cpuidle_enter+0x24/0x28
>> + CPU:0 ts:109873 c0063ba8  c03a4090  sched_idle_set_state <- cpuidle_enter_state+0xa4/0x314
>> + CPU:0 ts:109874 c03a605c  c03a40b4  arm_enter_idle_state <- cpuidle_enter_state+0xc8/0x314
> 
> It would be nice to extract ftrace_log_combine() from ram.c and make the
> front-end and inode layers aware of this as a way to auto-merge the
> records from all backends supporting ftrace.
> 

Sure. I will try to do so.

>> +dump_oops
>> +~~~~~~~~~
>> +
>> +Dumping both oopses and panics can be done by setting 1 (not zero) in the
>> +``dump_oops`` member while setting 0 in that variable dumps only the panics.
>> +
>> +Configurations for driver
>> +-------------------------
>> +
>> +Only a device driver cares about these configurations. A block device driver
>> +uses ``blkoops_register_blkdev`` while a non-block device driver uses
>> +``blkoops_register_device``
> 
> Given this clarification, I'd say there is no reason to discuss
> blkzone.c at all.
> 

That's not about blkzone.c. Or you want to get rid of configurations for
driver?

>> +
>> +The parameters of these two APIs may be of interest to you.
>> +
>> +major
>> +~~~~~
>> +
>> +It is only required by block device which is registered by
>> +``blkoops_register_blkdev``.  It's the major device number of registered
>> +devices, by which blkoops can get the matching driver for @blkdev.
>> +
>> +total_size
>> +~~~~~~~~~~
>> +
>> +It is only required by non-block device which is registered by
>> +``blkoops_register_device``.  It tells pstore/blk the total size
>> +pstore/blk can use. It is in KB and **MUST** be greater than or equal to 4
>> +and a multiple of 4.
>> +
>> +For block devices, blkoops can get size of block device/partition automatically.
>> +
>> +read/write
>> +~~~~~~~~~~
>> +
>> +It's generic read/write APIs for pstore/blk, which are required by non-block
>> +device. The generic APIs are used for almost all data except panic data,
>> +such as pmsg, console, oops and ftrace.
>> +
>> +The parameter @offset of these interface is the relative position of the device.
>> +
>> +Normally the number of bytes read/written should be returned, while for error,
>> +negative number will be returned. The following return numbers mean more:
>> +
>> +-EBUSY: pstore/blk should try again later.
>> +
>> +panic_write (for non-block device)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> I still think some other term is needed for "non-block device", since it
> _is_ a block device. i.e. we're using it with pstore/blk. ;) I find it
> just odd language.
> 

I just want to use non-block to express _not_ block devices, such as mtd
device.
Maybe I get non-block wrong?

>> +
>> +It's a interface for panic recorder and will be used only when panic occurs.
>> +Non-block device driver registers it by ``blkoops_register_device``. If panic
>> +log is unnecessary, it's fine to ignore it.
>> +
>> +Note that pstore/blk will recover data from device while mounting pstore
>> +filesystem by default. If panic occurs but pstore/blk does not recover yet, the
>> +first zone of dmesg will be used.
>> +
>> +The parameter @offset of this interface is the relative position of the device.
>> +
>> +Normally the number of bytes written should be returned, while for error,
>> +negative number should be returned.
>> +
>> +panic_write (for block device)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +It's much similar to panic_write for non-block device, but the position and
>> +data size of panic_write for block device must be aligned to SECTOR_SIZE,
>> +that's why the parameters are @sects and @start_sect. Block device driver
>> +should register it by ``blkoops_register_blkdev``.
>> +
>> +The parameter @start_sect is the relative position of the block device and
>> +partition. If block driver requires absolute position for panic_write,
>> +``blkoops_blkdev_info`` will be helpful, which can provide the absolute
>> +position of the block device (or partition) on the whole disk/flash.
>> +
>> +Normally zero should be returned, otherwise it indicates an error.
>> +
>> +Compression and header
>> +----------------------
>> +
>> +Block device is large enough for uncompressed dmesg data. Actually we do not
>> +recommend data compression because pstore/blk will insert some information into
>> +the first line of dmesg data. For example::
>> +
>> +        Panic: Total 16 times
>> +
>> +It means that it's OOPS|Panic for the 16th time since the first booting.
>> +Sometimes the number of occurrences of oops|panic since the first booting is
>> +important to judge whether the system is stable.
>> +
>> +The following line is inserted by pstore filesystem. For example::
>> +
>> +        Oops#2 Part1
>> +
>> +It means that it's OOPS for the 2nd time on the last boot.
>> +
>> +Reading the data
>> +----------------
>> +
>> +The dump data can be read from the pstore filesystem. The format for these
>> +files is ``dmesg-pstore-blk-[N]`` for dmesg(oops|panic), ``pmsg-pstore-blk-0``
>> +for pmsg and so on, where N is the record number. To delete a stored
>> +record from block device, simply unlink the respective pstore file. The
>> +timestamp of the dump file records the trigger time.
>> +
>> +Attentions in panic read/write APIs
>> +-----------------------------------
>> +
>> +If on panic, the kernel is not going to run for much longer, the tasks will not
>> +be scheduled and most kernel resources will be out of service. It
>> +looks like a single-threaded program running on a single-core computer.
>> +
>> +The following points require special attention for panic read/write APIs:
>> +
>> +1. Can **NOT** allocate any memory.
>> +   If you need memory, just allocate while the block driver is initializing
>> +   rather than waiting until the panic.
>> +#. Must be polled, **NOT** interrupt driven.
>> +   No task schedule any more. The block driver should delay to ensure the write
>> +   succeeds, but NOT sleep.
>> +#. Can **NOT** take any lock.
>> +   There is no other task, nor any shared resource; you are safe to break all
>> +   locks.
>> +#. Just use CPU to transfer.
>> +   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
>> +#. Control registers directly.
>> +   Please control registers directly rather than use Linux kernel resources.
>> +   Do I/O map while initializing rather than wait until a panic occurs.
>> +#. Reset your block device and controller if necessary.
>> +   If you are not sure of the state of your block device and controller when
>> +   a panic occurs, you are safe to stop and reset them.
>> +
>> +Blkoops supports blkoops_blkdev_info(), which is defined in *linux/blkoops.h*,
>> +to get information of block device, such as the device number, sector count and
>> +start sector of the whole disk.
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index e4ba97130560..a5122e3aaf76 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -13380,6 +13380,7 @@ F:	include/linux/pstore*
>>  F:	drivers/firmware/efi/efi-pstore.c
>>  F:	drivers/acpi/apei/erst.c
>>  F:	Documentation/admin-guide/ramoops.rst
>> +F:	Documentation/admin-guide/pstore-block.rst
>>  F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
>>  K:	\b(pstore|ramoops|blkoops)
>>  
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index 308a0a4c5ee5..466908a242aa 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -162,6 +162,8 @@ config PSTORE_BLK
>>  	  This enables panic and oops message to be logged to a block dev
>>  	  where it can be read back at some later point.
>>  
>> +	  For more information, see Documentation/admin-guide/pstore-block.rst.
>> +
>>  	  If unsure, say N.
>>  
>>  config PSTORE_BLKOOPS
>> -- 
>> 1.9.1
>>
> 
> I love the docs; thank you for them! As mentioned in the other email,
> perhaps add a section at the bottom like:
> 
> blkoops internals
> -----------------
> 
> For developer reference, here are all the important structures and APIs:
> 
> .. kernel-doc: fs/pstore/blkzone.c
>    :internal:
> 
> .. kernel-doc: fs/pstore/blkoops.c
>    :export:
> 

OK.

> etc
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/11] pstore/blk: skip broken zone for mtd device
  2020-03-18 18:35   ` Kees Cook
@ 2020-03-22 12:27     ` WeiXiong Liao
  0 siblings, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 12:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:35, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:51PM +0800, WeiXiong Liao wrote:
>> It's one of a series of patches for adaptive to MTD device.
>>
>> MTD device is not block device. As the block of flash (MTD device) will
>> be broken, it's necessary for pstore/blk to skip the broken block
>> (bad block).
>>
>> If device drivers return -ENEXT, pstore/blk will try next zone of dmesg.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  Documentation/admin-guide/pstore-block.rst |  3 +-
>>  fs/pstore/blkzone.c                        | 74 +++++++++++++++++++++++-------
>>  include/linux/blkoops.h                    |  4 +-
>>  include/linux/pstore_blk.h                 |  4 ++
>>  4 files changed, 66 insertions(+), 19 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>> index c8a5f68960c3..be865dfc1a28 100644
>> --- a/Documentation/admin-guide/pstore-block.rst
>> +++ b/Documentation/admin-guide/pstore-block.rst
>> @@ -188,7 +188,8 @@ The parameter @offset of these interface is the relative position of the device.
>>  Normally the number of bytes read/written should be returned, while for error,
>>  negative number will be returned. The following return numbers mean more:
>>  
>> --EBUSY: pstore/blk should try again later.
>> +1. -EBUSY: pstore/blk should try again later.
>> +#. -ENEXT: this zone is used or broken, pstore/blk should try next one.
>>  
>>  panic_write (for non-block device)
>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
>> index 442e5a5bbfda..205aeff28992 100644
>> --- a/fs/pstore/blkzone.c
>> +++ b/fs/pstore/blkzone.c
>> @@ -207,6 +207,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
>>  
>>  	return 0;
>>  set_dirty:
>> +	/* no need to mark dirty if going to try next zone */
>> +	if (wcnt == -ENEXT)
>> +		return -ENEXT;
>>  	atomic_set(&zone->dirty, true);
>>  	/* flush dirty zones nicely */
>>  	if (wcnt == -EBUSY && !is_on_panic())
>> @@ -360,7 +363,11 @@ static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
>>  			return -EINVAL;
>>  
>>  		rcnt = info->read((char *)buf, len, zone->off);
>> -		if (rcnt != len) {
>> +		if (rcnt == -ENEXT) {
>> +			pr_debug("%s with id %lu may be broken, skip\n",
>> +					zone->name, i);
>> +			continue;
>> +		} else if (rcnt != len) {
>>  			pr_err("read %s with id %lu failed\n", zone->name, i);
>>  			return (int)rcnt < 0 ? (int)rcnt : -EIO;
>>  		}
>> @@ -650,24 +657,58 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
>>  		hdr->counter = 0;
>>  }
>>  
>> +/*
>> + * In case zone is broken, which may occur to MTD device, we try each zones,
>> + * start at cxt->dmesg_write_cnt.
>> + */
>>  static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
>>  		struct pstore_record *record)
>>  {
>> +	int ret = -EBUSY;
>>  	size_t size, hlen;
>>  	struct blkz_zone *zone;
>> -	unsigned int zonenum;
>> +	unsigned int i;
>>  
>> -	zonenum = cxt->dmesg_write_cnt;
>> -	zone = cxt->dbzs[zonenum];
>> -	if (unlikely(!zone))
>> -		return -ENOSPC;
>> -	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
>> +	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
>> +		unsigned int zonenum, len;
>> +
>> +		zonenum = (cxt->dmesg_write_cnt + i) % cxt->dmesg_max_cnt;
>> +		zone = cxt->dbzs[zonenum];
>> +		if (unlikely(!zone))
>> +			return -ENOSPC;
>>  
>> -	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
>> -	blkz_write_kmsg_hdr(zone, record);
>> -	hlen = sizeof(struct blkz_dmesg_header);
>> -	size = min_t(size_t, record->size, zone->buffer_size - hlen);
>> -	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
>> +		/* avoid destorying old data, allocate a new one */
>> +		len = zone->buffer_size + sizeof(*zone->buffer);
>> +		zone->oldbuf = zone->buffer;
>> +		zone->buffer = kzalloc(len, GFP_KERNEL);
>> +		if (!zone->buffer) {
>> +			zone->buffer = zone->oldbuf;
>> +			return -ENOMEM;
>> +		}
>> +		zone->buffer->sig = zone->oldbuf->sig;
>> +
>> +		pr_debug("write %s to zone id %d\n", zone->name, zonenum);
>> +		blkz_write_kmsg_hdr(zone, record);
>> +		hlen = sizeof(struct blkz_dmesg_header);
>> +		size = min_t(size_t, record->size, zone->buffer_size - hlen);
>> +		ret = blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
>> +		if (likely(!ret || ret != -ENEXT)) {
>> +			cxt->dmesg_write_cnt = zonenum + 1;
>> +			cxt->dmesg_write_cnt %= cxt->dmesg_max_cnt;
>> +			/* no need to try next zone, free last zone buffer */
>> +			kfree(zone->oldbuf);
>> +			zone->oldbuf = NULL;
>> +			return ret;
>> +		}
>> +
>> +		pr_debug("zone %u may be broken, try next dmesg zone\n",
>> +				zonenum);
>> +		kfree(zone->buffer);
>> +		zone->buffer = zone->oldbuf;
>> +		zone->oldbuf = NULL;
>> +	}
>> +
>> +	return -EBUSY;
>>  }
>>  
>>  static int notrace blkz_dmesg_write(struct blkz_context *cxt,
>> @@ -791,7 +832,6 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
>>  	}
>>  }
>>  
>> -#define READ_NEXT_ZONE ((ssize_t)(-1024))
>>  static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
>>  {
>>  	struct blkz_zone *zone = NULL;
>> @@ -852,7 +892,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>>  	if (blkz_read_dmesg_hdr(zone, record)) {
>>  		atomic_set(&zone->buffer->datalen, 0);
>>  		atomic_set(&zone->dirty, 0);
>> -		return READ_NEXT_ZONE;
>> +		return -ENEXT;
>>  	}
>>  	size -= sizeof(struct blkz_dmesg_header);
>>  
>> @@ -877,7 +917,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
>>  	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
>>  				sizeof(struct blkz_dmesg_header)) < 0)) {
>>  		kfree(record->buf);
>> -		return READ_NEXT_ZONE;
>> +		return -ENEXT;
>>  	}
>>  
>>  	return size + hlen;
>> @@ -891,7 +931,7 @@ static ssize_t blkz_record_read(struct blkz_zone *zone,
>>  
>>  	buf = (struct blkz_buffer *)zone->oldbuf;
>>  	if (!buf)
>> -		return READ_NEXT_ZONE;
>> +		return -ENEXT;
>>  
>>  	size = atomic_read(&buf->datalen);
>>  	start = atomic_read(&buf->start);
>> @@ -943,7 +983,7 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
>>  	}
>>  
>>  	ret = readop(zone, record);
>> -	if (ret == READ_NEXT_ZONE)
>> +	if (ret == -ENEXT)
>>  		goto next_zone;
>>  	return ret;
>>  }
>> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
>> index 8f40f225545d..71c596fd4cc8 100644
>> --- a/include/linux/blkoops.h
>> +++ b/include/linux/blkoops.h
>> @@ -27,6 +27,7 @@
>>   *	On error, negative number should be returned. The following returning
>>   *	number means more:
>>   *	  -EBUSY: pstore/blk should try again later.
>> + *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
>>   * @panic_write:
>>   *	The write operation only used for panic.
>>   *
>> @@ -45,7 +46,8 @@ struct blkoops_device {
>>  
>>  /*
>>   * Panic write for block device who should write alignmemt to SECTOR_SIZE.
>> - * On success, zero should be returned. Others mean error.
>> + * On success, zero should be returned. Others mean error except that -ENEXT
>> + * means the zone is used or broken, pstore/blk should try next one.
>>   */
>>  typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
>>  		sector_t sects);
>> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
>> index 77704c1b404a..bbbe4fe37f7c 100644
>> --- a/include/linux/pstore_blk.h
>> +++ b/include/linux/pstore_blk.h
>> @@ -6,6 +6,9 @@
>>  #include <linux/types.h>
>>  #include <linux/blkdev.h>
>>  
>> +/* read/write function return -ENEXT means try next zone */
>> +#define ENEXT ((ssize_t)(1024))
> 
> I really don't like inventing errno numbers. Can you just reuse an
> existing (but non-block) errno like ESRCH or ENOMSG or something?
> 

ENOMSG is OK.

>> +
>>  /**
>>   * struct blkz_info - backend blkzone driver structure
>>   *
>> @@ -42,6 +45,7 @@
>>   *	On error, negative number should be returned. The following returning
>>   *	number means more:
>>   *	  -EBUSY: pstore/blk should try again later.
>> + *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
>>   * @panic_write:
>>   *	The write operation only used for panic. It's optional if you do not
>>   *	care panic record. If panic occur but blkzone do not recover yet, the
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/11] pstore/blk: blkoops: support special removing jobs for dmesg.
  2020-03-18 18:47   ` Kees Cook
@ 2020-03-22 13:03     ` WeiXiong Liao
  0 siblings, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 13:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:47, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:53PM +0800, WeiXiong Liao wrote:
>> It's one of a series of patches for adaptive to MTD device.
>>
>> MTD device is not block device. To write to flash device on MTD, erase
>> must to be done before. However, pstore/blk just set datalen as 0 when
>> remove, which is not enough for mtd device. That's why this patch here,
>> to support special jobs when removing pstore/blk record.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  Documentation/admin-guide/pstore-block.rst |  9 +++++++++
>>  fs/pstore/blkoops.c                        |  4 +++-
>>  fs/pstore/blkzone.c                        |  9 ++++++++-
>>  include/linux/blkoops.h                    | 10 ++++++++++
>>  include/linux/pstore_blk.h                 | 11 +++++++++++
>>  5 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>> index 299142b3d8e6..1735476621df 100644
>> --- a/Documentation/admin-guide/pstore-block.rst
>> +++ b/Documentation/admin-guide/pstore-block.rst
>> @@ -200,6 +200,15 @@ negative number will be returned. The following return numbers mean more:
>>  1. -EBUSY: pstore/blk should try again later.
>>  #. -ENEXT: this zone is used or broken, pstore/blk should try next one.
>>  
>> +erase
>> +~~~~~
>> +
>> +It's generic erase API for pstore/blk, which is requested by non-block device.
>> +It will be called while pstore record is removing. It's required only when the
>> +device has special removing jobs. For example, MTD device tries to erase block.
>> +
>> +Normally zero should be returned, otherwise it indicates an error.
>> +
>>  panic_write (for non-block device)
>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>  
>> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
>> index 01170b344f00..7cf4731e52f7 100644
>> --- a/fs/pstore/blkoops.c
>> +++ b/fs/pstore/blkoops.c
>> @@ -164,6 +164,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>>  	bzinfo->dump_oops = dump_oops;
>>  	bzinfo->read = bo_dev->read;
>>  	bzinfo->write = bo_dev->write;
>> +	bzinfo->erase = bo_dev->erase;
>>  	bzinfo->panic_write = bo_dev->panic_write;
>>  	bzinfo->name = "blkoops";
>>  	bzinfo->owner = THIS_MODULE;
>> @@ -383,10 +384,11 @@ int blkoops_register_blkdev(unsigned int major, unsigned int flags,
>>  	bo_dev.total_size = blkoops_bdev_size(bdev);
>>  	if (bo_dev.total_size == 0)
>>  		goto err_put_bdev;
>> -	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
>>  	bo_dev.flags = flags;
>>  	bo_dev.read = blkoops_generic_blk_read;
>>  	bo_dev.write = blkoops_generic_blk_write;
>> +	bo_dev.erase = NULL;
>> +	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
>>  
>>  	ret = blkoops_register_device(&bo_dev);
>>  	if (ret)
> 
> I think this patch, like the prior, needs to be reordered in the series.
> How about adding
> 
> blkoops_register_device()
> 
> as a single patch, which is what provides support for the "non-block"
> block devices? Then the blkoops_register_blkdev() can stand alone in the
> first patch?
> 
> It just might be easier to review, since nothing uses
> blkoops_register_device() until the mtd driver is added. So that
> function and this patch would go together as a single "support non-block
> devices" change.
> 

That's OK. I will do it.

>> diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
>> index 205aeff28992..a17fff77b875 100644
>> --- a/fs/pstore/blkzone.c
>> +++ b/fs/pstore/blkzone.c
>> @@ -593,11 +593,18 @@ static inline bool blkz_ok(struct blkz_zone *zone)
>>  static inline int blkz_dmesg_erase(struct blkz_context *cxt,
>>  		struct blkz_zone *zone)
>>  {
>> +	size_t size;
>> +
>>  	if (unlikely(!blkz_ok(zone)))
>>  		return 0;
>>  
>>  	atomic_set(&zone->buffer->datalen, 0);
>> -	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>> +
>> +	size = buffer_datalen(zone) + sizeof(*zone->buffer);
>> +	if (cxt->bzinfo->erase)
>> +		return cxt->bzinfo->erase(size, zone->off);
>> +	else
>> +		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
>>  }
>>  
>>  static inline int blkz_record_erase(struct blkz_context *cxt,
>> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
>> index bc7665d14a98..11cb3036ad5f 100644
>> --- a/include/linux/blkoops.h
>> +++ b/include/linux/blkoops.h
>> @@ -33,6 +33,15 @@
>>   *	number means more:
>>   *	  -EBUSY: pstore/blk should try again later.
>>   *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
>> + * @erase:
>> + *	The general (not panic) erase operation. It will be call while pstore
>> + *	record is removing. It's required only when device have special
>> + *	removing jobs, for example, MTD device try to erase block.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, 0 should be returned. Others mean error.
>>   * @panic_write:
>>   *	The write operation only used for panic.
>>   *
>> @@ -53,6 +62,7 @@ struct blkoops_device {
>>  	unsigned long total_size;
>>  	blkz_read_op read;
>>  	blkz_write_op write;
>> +	blkz_erase_op erase;
>>  	blkz_write_op panic_write;
>>  };
>>  
>> diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
>> index bbbe4fe37f7c..9641969f888f 100644
>> --- a/include/linux/pstore_blk.h
>> +++ b/include/linux/pstore_blk.h
>> @@ -46,6 +46,15 @@
>>   *	number means more:
>>   *	  -EBUSY: pstore/blk should try again later.
>>   *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
>> + * @erase:
>> + *	The general (not panic) erase operation. It will be call while pstore
>> + *	record is removing. It's required only when device have special
>> + *	removing jobs, for example, MTD device try to erase block.
>> + *
>> + *	Both of the @size and @offset parameters on this interface are
>> + *	the relative size of the space provided, not the whole disk/flash.
>> + *
>> + *	On success, 0 should be returned. Others mean error.
>>   * @panic_write:
>>   *	The write operation only used for panic. It's optional if you do not
>>   *	care panic record. If panic occur but blkzone do not recover yet, the
>> @@ -59,6 +68,7 @@
>>   */
>>  typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
>>  typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
>> +typedef ssize_t (*blkz_erase_op)(size_t, loff_t);
>>  struct blkz_info {
>>  	struct module *owner;
>>  	const char *name;
>> @@ -71,6 +81,7 @@ struct blkz_info {
>>  	int dump_oops;
>>  	blkz_read_op read;
>>  	blkz_write_op write;
>> +	blkz_erase_op erase;
>>  	blkz_write_op panic_write;
>>  };
>>  
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 08/11] blkoops: respect for device to pick recorders
  2020-03-18 18:42   ` Kees Cook
@ 2020-03-22 13:06     ` WeiXiong Liao
  0 siblings, 0 replies; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 13:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:42, Kees Cook wrote:
> In the subject and through-out:
> 
> s/recorders/pstore front-ends/
> 

OK.

> On Fri, Feb 07, 2020 at 08:25:52PM +0800, WeiXiong Liao wrote:
>> It's one of a series of patches for adaptive to MTD device.
> 
> typo: adapting
> 

Fixed.

>>
>> MTD device is not block device. The sector of flash (MTD device) will be
>> broken if erase over limited cycles. Avoid damaging block so fast, we
>> can not write to a sector frequently. So, the recorders of pstore/blk
>> like console and ftrace recorder should not be supported.
>>
>> Besides, mtd device need aligned write/erase size. To avoid
>> over-erasing/writing flash, we should keep a aligned cache and read old
>> data to cache before write/erase, which make codes more complex. So,
>> pmsg do not be supported now because it writes misaligned.
>>
>> How about dmesg? Luckly, pstore/blk keeps several aligned chunks for
>> dmesg and uses one by one for wear balance.
>>
>> So, MTD device for pstore should pick recorders, that is why the patch
>> here.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  Documentation/admin-guide/pstore-block.rst |  9 +++++++++
>>  fs/pstore/blkoops.c                        | 29 +++++++++++++++++++++--------
>>  include/linux/blkoops.h                    | 14 +++++++++++++-
>>  3 files changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>> index be865dfc1a28..299142b3d8e6 100644
>> --- a/Documentation/admin-guide/pstore-block.rst
>> +++ b/Documentation/admin-guide/pstore-block.rst
>> @@ -166,6 +166,15 @@ It is only required by block device which is registered by
>>  ``blkoops_register_blkdev``.  It's the major device number of registered
>>  devices, by which blkoops can get the matching driver for @blkdev.
>>  
>> +flags
>> +~~~~~
>> +
>> +Refer to macro starting with *BLKOOPS_DEV_SUPPORT_* which is defined in
>> +*linux/blkoops.h*. They tell us that which pstore/blk recorders this device
>> +supports. Default zero means all recorders for compatible, witch is the same
> 
> typo: witch -> which
> 

Fixed.

>> +as BLKOOPS_DEV_SUPPORT_ALL. Recorder works only when chunk size is not zero
>> +and device support.
> 
> There are already flags for this, please see "Supported frontends"
> in include/linux/pstore.h
> 

yes. You are right. I will change it.

>> +
>>  total_size
>>  ~~~~~~~~~~
>>  
>> diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
>> index c76bab671b0b..01170b344f00 100644
>> --- a/fs/pstore/blkoops.c
>> +++ b/fs/pstore/blkoops.c
>> @@ -128,9 +128,16 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>>  		return -ENOMEM;
>>  	}
>>  
>> -#define verify_size(name, defsize, alignsize) {				\
>> -		long _##name_ = (name);					\
>> -		if (_##name_ < 0)					\
>> +	/* zero means all recorders for compatible */
>> +	if (bo_dev->flags == BLKOOPS_DEV_SUPPORT_DEFAULT)
>> +		bo_dev->flags = BLKOOPS_DEV_SUPPORT_ALL;
>> +#define verify_size(name, defsize, alignsize, enable) {			\
>> +		long _##name_;						\
>> +		if (!(enable))						\
>> +			_##name_ = 0;					\
>> +		else if ((name) >= 0)					\
>> +			_##name_ = (name);				\
>> +		else							\
>>  			_##name_ = (defsize);				\
>>  		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
>>  		if (_##name_ & ((alignsize) - 1)) {			\
>> @@ -142,10 +149,14 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
>>  		bzinfo->name = _##name_;				\
>>  	}
>>  
>> -	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
>> -	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
>> -	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
>> -	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
>> +	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096,
>> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_DMESG);
>> +	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096,
>> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_PMSG);
>> +	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096,
>> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_CONSOLE);
>> +	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096,
>> +			bo_dev->flags & BLKOOPS_DEV_SUPPORT_FTRACE);
> 
> I'd kind of prefer this patch be moved much earlier in the series so
> that the later additions of front-end support doesn't have to be touched
> twice. i.e. when PMSG support is added, it is added as a whole here and
> does the flag check in that patch, etc.
> 

OK.

>>  #undef verify_size
>>  	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
>>  
>> @@ -336,6 +347,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
>>   * register block device to blkoops
>>   * @major: the major device number of registering device
>>   * @panic_write: the write interface for panic case.
>> + * @flags: Refer to macro starting with BLKOOPS_DEV_SUPPORT.
>>   *
>>   * It is ONLY used for block device to register to blkoops. In this case,
>>   * the module parameter @blkdev must be valid. Generic read/write interfaces
>> @@ -349,7 +361,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
>>   * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
>>   * will be used.
>>   */
>> -int blkoops_register_blkdev(unsigned int major,
>> +int blkoops_register_blkdev(unsigned int major, unsigned int flags,
>>  		blkoops_blk_panic_write_op panic_write)
>>  {
>>  	struct block_device *bdev;
>> @@ -372,6 +384,7 @@ int blkoops_register_blkdev(unsigned int major,
>>  	if (bo_dev.total_size == 0)
>>  		goto err_put_bdev;
>>  	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
>> +	bo_dev.flags = flags;
>>  	bo_dev.read = blkoops_generic_blk_read;
>>  	bo_dev.write = blkoops_generic_blk_write;
>>  
>> diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
>> index 71c596fd4cc8..bc7665d14a98 100644
>> --- a/include/linux/blkoops.h
>> +++ b/include/linux/blkoops.h
>> @@ -6,6 +6,7 @@
>>  #include <linux/types.h>
>>  #include <linux/blkdev.h>
>>  #include <linux/pstore_blk.h>
>> +#include <linux/bitops.h>
>>  
>>  /**
>>   * struct blkoops_device - backend blkoops driver structure.
>> @@ -14,6 +15,10 @@
>>   * blkoops_register_device(). If block device, you are strongly recommended
>>   * to use blkoops_register_blkdev().
>>   *
>> + * @flags:
>> + *	Refer to macro starting with BLKOOPS_DEV_SUPPORT_. These macros tell
>> + *	us that which pstore/blk recorders this device supports. Zero means
>> + *	all recorders for compatible.
>>   * @total_size:
>>   *	The total size in bytes pstore/blk can use. It must be greater than
>>   *	4096 and be multiple of 4096.
>> @@ -38,6 +43,13 @@
>>   *	On error, negative number should be returned.
>>   */
>>  struct blkoops_device {
>> +	unsigned int flags;
>> +#define BLKOOPS_DEV_SUPPORT_ALL		UINT_MAX
>> +#define BLKOOPS_DEV_SUPPORT_DEFAULT	(0)
>> +#define BLKOOPS_DEV_SUPPORT_DMESG	BIT(0)
>> +#define BLKOOPS_DEV_SUPPORT_PMSG	BIT(1)
>> +#define BLKOOPS_DEV_SUPPORT_CONSOLE	BIT(2)
>> +#define BLKOOPS_DEV_SUPPORT_FTRACE	BIT(3)
>>  	unsigned long total_size;
>>  	blkz_read_op read;
>>  	blkz_write_op write;
>> @@ -54,7 +66,7 @@ typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
>>  
>>  int  blkoops_register_device(struct blkoops_device *bo_dev);
>>  void blkoops_unregister_device(struct blkoops_device *bo_dev);
>> -int  blkoops_register_blkdev(unsigned int major,
>> +int  blkoops_register_blkdev(unsigned int major, unsigned int flags,
>>  		blkoops_blk_panic_write_op panic_write);
>>  void blkoops_unregister_blkdev(unsigned int major);
>>  int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk
  2020-03-18 18:57   ` Kees Cook
@ 2020-03-22 13:51     ` WeiXiong Liao
  2020-03-22 15:13       ` Kees Cook
  0 siblings, 1 reply; 43+ messages in thread
From: WeiXiong Liao @ 2020-03-22 13:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

hi Kees Cook,

On 2020/3/19 AM 2:57, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:25:55PM +0800, WeiXiong Liao wrote:
>> It's the last one of a series of patches for adaptive to MTD device.
>>
>> The mtdpstore is similar to mtdoops but more powerful. It bases on
>> pstore/blk, aims to store panic and oops logs to a flash partition,
>> where it can be read back as files after mounting pstore filesystem.
>>
>> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
>> block device at the very beginning, but now, compatible to not only
>> block device. After this series of patches, pstore/blk can also work
>> for MTD device. To make it work, 'blkdev' on kconfig or module
>> parameter of blkoops should be set as mtd device name or mtd number.
>> See more about pstore/blk and blkoops on:
>>     Documentation/admin-guide/pstore-block.rst
>>
>> Why do we need mtdpstore?
>> 1. repetitive jobs between pstore and mtdoops
>>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
>>    They have much similar logic that register to kmsg dumper and store
>>    log to several chunks one by one.
>> 2. do what a driver should do
>>    To me, a driver should provide methods instead of policies. What MTD
>>    should do is to provide read/write/erase operations, geting rid of codes
>>    about chunk management, kmsg dumper and configuration.
>> 3. enhanced feature
>>    Not only store log, but also show it as files.
>>    Not only log, but also trigger time and trigger count.
>>    Not only panic/oops log, but also log recorder for pmsg, console and
>>    ftrace in the future.
> 
> I wonder if it's possible to make this device driver "invisible", in the
> sense that it could be entirely user-configured via blkoops. I don't
> think that's needed right now, especially since it's MOSTLY configured
> by blkoops.param, etc, but I'll keep thinking about it.
> 

The physical features of MTD device require that the user configurations
must meet some requirements. For example the record size must be
multiples of page size of MTD flash. It's really different to block device.
If we make this device driver "invisible", we should have other way to
limit user configurations. The dmesg pstore front-end is the most easiest
one to fix to. There are still much work to do to support other front-ends.

> Modulo various naming convention adjustments outlined in the other
> patches, this looks fine to me (I can't really speak to the mtd driver
> bits itself, but the pstore and blkoops interaction looks good).
> 
> -Kees
> 
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  Documentation/admin-guide/pstore-block.rst |  10 +-
>>  drivers/mtd/Kconfig                        |  10 +
>>  drivers/mtd/Makefile                       |   1 +
>>  drivers/mtd/mtdpstore.c                    | 564 +++++++++++++++++++++++++++++
>>  4 files changed, 583 insertions(+), 2 deletions(-)
>>  create mode 100644 drivers/mtd/mtdpstore.c
>>
>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>> index 1735476621df..823fe2b4b84f 100644
>> --- a/Documentation/admin-guide/pstore-block.rst
>> +++ b/Documentation/admin-guide/pstore-block.rst
>> @@ -54,9 +54,10 @@ blkdev
>>  ~~~~~~
>>  
>>  The block device to use. Most of the time, it is a partition of block device.
>> -It's fine to ignore it if you are not using a block device.
>> +It is also used for MTD device. It's fine to ignore it if you are not using
>> +a block device or a MTD device.
>>  
>> -It accepts the following variants:
>> +It accepts the following variants for block device:
>>  
>>  1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
>>     leading 0x, for example b302.
>> @@ -75,6 +76,11 @@ It accepts the following variants:
>>     partition with a known unique id.
>>  #. <major>:<minor> major and minor number of the device separated by a colon.
>>  
>> +It accepts the following variants for MTD device:
>> +
>> +1. <device name> MTD device name. "pstore" is recommended.
>> +#. <device number> MTD device number.
>> +
>>  dmesg_size
>>  ~~~~~~~~~~
>>  
>> diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
>> index 42d401ea60ee..5d53d5cd2998 100644
>> --- a/drivers/mtd/Kconfig
>> +++ b/drivers/mtd/Kconfig
>> @@ -170,6 +170,16 @@ config MTD_OOPS
>>  	  buffer in a flash partition where it can be read back at some
>>  	  later point.
>>  
>> +config MTD_PSTORE
>> +	tristate "Log panic/oops to an MTD buffer based on pstore"
>> +	depends on PSTORE_BLKOOPS
>> +	help
>> +	  This enables panic and oops messages to be logged to a circular
>> +	  buffer in a flash partition where it can be read back as files after
>> +	  mounting pstore filesystem.
>> +
>> +	  If unsure, say N.
>> +
>>  config MTD_SWAP
>>  	tristate "Swap on MTD device support"
>>  	depends on MTD && SWAP
>> diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
>> index 56cc60ccc477..593d0593a038 100644
>> --- a/drivers/mtd/Makefile
>> +++ b/drivers/mtd/Makefile
>> @@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
>>  obj-$(CONFIG_SSFDC)		+= ssfdc.o
>>  obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
>>  obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
>> +obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
>>  obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
>>  
>>  nftl-objs		:= nftlcore.o nftlmount.o
>> diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
>> new file mode 100644
>> index 000000000000..58b9e10ef675
>> --- /dev/null
>> +++ b/drivers/mtd/mtdpstore.c
>> @@ -0,0 +1,564 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#define dev_fmt(fmt) "mtdoops-pstore: " fmt
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/blkoops.h>
>> +#include <linux/mtd/mtd.h>
>> +#include <linux/bitops.h>
>> +
>> +static struct mtdpstore_context {
>> +	int index;
>> +	struct blkoops_info bo_info;
>> +	struct blkoops_device bo_dev;
>> +	struct mtd_info *mtd;
>> +	unsigned long *rmmap;		/* removed bit map */
>> +	unsigned long *usedmap;		/* used bit map */
>> +	/*
>> +	 * used for panic write
>> +	 * As there are no block_isbad for panic case, we should keep this
>> +	 * status before panic to ensure panic_write not failed.
>> +	 */
>> +	unsigned long *badmap;		/* bad block bit map */
>> +} oops_cxt;
>> +
>> +static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	int ret;
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 blknum = div_u64(off, mtd->erasesize);
>> +
>> +	if (test_bit(blknum, cxt->badmap))
>> +		return true;
>> +	ret = mtd_block_isbad(mtd, off);
>> +	if (ret < 0) {
>> +		dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
>> +		return ret;
>> +	} else if (ret > 0) {
>> +		set_bit(blknum, cxt->badmap);
>> +		return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 blknum = div_u64(off, mtd->erasesize);
>> +
>> +	return test_bit(blknum, cxt->badmap);
>> +}
>> +
>> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
>> +	set_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
>> +	clear_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
>> +		clear_bit(zonenum, cxt->usedmap);
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +}
>> +
>> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
>> +
>> +	if (test_bit(blknum, cxt->badmap))
>> +		return true;
>> +	return test_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		if (test_bit(zonenum, cxt->usedmap))
>> +			return true;
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +	return false;
>> +}
>> +
>> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
>> +		size_t size)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	size_t sz;
>> +	int i;
>> +
>> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
>> +	for (i = 0; i < sz; i++) {
>> +		if (buf[i] != (char)0xFF)
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
>> +	set_bit(zonenum, cxt->rmmap);
>> +}
>> +
>> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		clear_bit(zonenum, cxt->rmmap);
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +}
>> +
>> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		if (test_bit(zonenum, cxt->rmmap))
>> +			return true;
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +	return false;
>> +}
>> +
>> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	struct erase_info erase;
>> +	int ret;
>> +
>> +	dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
>> +	erase.len = cxt->mtd->erasesize;
>> +	erase.addr = off;
>> +	ret = mtd_erase(cxt->mtd, &erase);
>> +	if (!ret)
>> +		mtdpstore_block_clear_removed(cxt, off);
>> +	else
>> +		dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
>> +		       (unsigned long long)erase.addr,
>> +		       (unsigned long long)erase.len, cxt->bo_info.device);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * called while removing file
>> + *
>> + * Avoiding over erasing, do erase block only when the whole block is unused.
>> + * If the block contains valid log, do erase lazily on flush_removed() when
>> + * unregister.
>> + */
>> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -EIO;
>> +
>> +	mtdpstore_mark_unused(cxt, off);
>> +
>> +	/* If the block still has valid data, mtdpstore do erase lazily */
>> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
>> +		mtdpstore_mark_removed(cxt, off);
>> +		return 0;
>> +	}
>> +
>> +	/* all zones are unused, erase it */
>> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
>> +	return mtdpstore_erase_do(cxt, off);
>> +}
>> +
>> +/*
>> + * What is security for mtdpstore?
>> + * As there is no erase for panic case, we should ensure at least one zone
>> + * is writable. Otherwise, panic write will fail.
>> + * If zone is used, write operation will return -ENEXT, which means that
>> + * pstore/blk will try one by one until gets an empty zone. So, it is not
>> + * needed to ensure the next zone is empty, but at least one.
>> + */
>> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	int ret = 0, i;
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
>> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
>> +	u32 erasesize = cxt->mtd->erasesize;
>> +
>> +	for (i = 0; i < zonecnt; i++) {
>> +		u32 num = (zonenum + i) % zonecnt;
>> +
>> +		/* found empty zone */
>> +		if (!test_bit(num, cxt->usedmap))
>> +			return 0;
>> +	}
>> +
>> +	/* If there is no any empty zone, we have no way but to do erase */
>> +	off = ALIGN_DOWN(off, erasesize);
>> +	while (blkcnt--) {
>> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
>> +
>> +		if (mtdpstore_block_isbad(cxt, off))
>> +			continue;
>> +
>> +		ret = mtdpstore_erase_do(cxt, off);
>> +		if (!ret) {
>> +			mtdpstore_block_mark_unused(cxt, off);
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (ret)
>> +		dev_err(&mtd->dev, "all blocks bad!\n");
>> +	dev_dbg(&mtd->dev, "end security\n");
>> +	return ret;
>> +}
>> +
>> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	/* zone is used, please try next one */
>> +	if (mtdpstore_is_used(cxt, off))
>> +		return -ENEXT;
>> +
>> +	dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
>> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if (ret < 0 || retlen != size) {
>> +		dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +	mtdpstore_mark_used(cxt, off);
>> +
>> +	mtdpstore_security(cxt, off);
>> +	return retlen;
>> +}
>> +
>> +static inline bool mtdpstore_is_io_error(int ret)
>> +{
>> +	return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
>> +}
>> +
>> +/*
>> + * All zones will be read as pstore/blk will read zone one by one when do
>> + * recover.
>> + */
>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	size_t retlen, done;
>> +	int ret;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
>> +	for (done = 0, retlen = 0; done < size; done += retlen) {
>> +		retlen = 0;
>> +
>> +		ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
>> +				(u_char *)buf + done);
>> +		if (mtdpstore_is_io_error(ret)) {
>> +			dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
>> +					off + done, retlen, size - done, ret);
>> +			/* the zone may be broken, try next one */
>> +			return -ENEXT;
>> +		}
>> +
>> +		/*
>> +		 * ECC error. The impact on log data is so small. Maybe we can
>> +		 * still read it and try to understand. So mtdpstore just hands
>> +		 * over what it gets and user can judge whether the data is
>> +		 * valid or not.
>> +		 */
>> +		if (mtd_is_eccerr(ret)) {
>> +			dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
>> +					off + done, retlen, size - done, ret);
>> +			/* driver may not set retlen when ecc error */
>> +			retlen = retlen == 0 ? size - done : retlen;
>> +		}
>> +	}
>> +
>> +	if (mtdpstore_is_empty(cxt, buf, size))
>> +		mtdpstore_mark_unused(cxt, off);
>> +	else
>> +		mtdpstore_mark_used(cxt, off);
>> +
>> +	mtdpstore_security(cxt, off);
>> +	return retlen;
>> +}
>> +
>> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_panic_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	/* zone is used, please try next one */
>> +	if (mtdpstore_is_used(cxt, off))
>> +		return -ENEXT;
>> +
>> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if (ret < 0 || size != retlen) {
>> +		dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +	mtdpstore_mark_used(cxt, off);
>> +
>> +	return retlen;
>> +}
>> +
>> +static void mtdpstore_notify_add(struct mtd_info *mtd)
>> +{
>> +	int ret;
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct blkoops_info *info = &cxt->bo_info;
>> +	unsigned long longcnt;
>> +
>> +	if (!strcmp(mtd->name, info->device))
>> +		cxt->index = mtd->index;
>> +
>> +	if (mtd->index != cxt->index || cxt->index < 0)
>> +		return;
>> +
>> +	dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
>> +
>> +	if (mtd->size < info->dmesg_size * 2) {
>> +		dev_err(&mtd->dev, "MTD partition %d not big enough\n",
>> +				mtd->index);
>> +		return;
>> +	}
>> +	/*
>> +	 * dmesg_size must be aligned to 4096 Bytes, which is limited by
>> +	 * blkoops. The default value of dmesg_size is 64KB. If dmesg_size
>> +	 * is larger than erasesize, some errors will occur since mtdpsotre
>> +	 * is designed on it.
>> +	 */
>> +	if (mtd->erasesize < info->dmesg_size) {
>> +		dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
>> +				mtd->index);
>> +		return;
>> +	}
>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
>> +		dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
>> +				info->dmesg_size / 1024,
>> +				mtd->writesize / 1024);
>> +		return;
>> +	}
>> +
>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
>> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +
>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
>> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +
>> +	cxt->bo_dev.total_size = mtd->size;
>> +	/* just support dmesg right now */
>> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
>> +	cxt->bo_dev.read = mtdpstore_read;
>> +	cxt->bo_dev.write = mtdpstore_write;
>> +	cxt->bo_dev.erase = mtdpstore_erase;
>> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
>> +
>> +	ret = blkoops_register_device(&cxt->bo_dev);
>> +	if (ret) {
>> +		dev_err(&mtd->dev, "mtd%d register to blkoops failed\n",
>> +				mtd->index);
>> +		return;
>> +	}
>> +	cxt->mtd = mtd;
>> +	dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
>> +}
>> +
>> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
>> +		loff_t off, size_t size)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u_char *buf;
>> +	int ret;
>> +	size_t retlen;
>> +	struct erase_info erase;
>> +
>> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
>> +	if (!buf)
>> +		return -ENOMEM;
>> +
>> +	/* 1st. read to cache */
>> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
>> +	if (mtdpstore_is_io_error(ret))
>> +		goto free;
>> +
>> +	/* 2nd. erase block */
>> +	erase.len = mtd->erasesize;
>> +	erase.addr = off;
>> +	ret = mtd_erase(mtd, &erase);
>> +	if (ret)
>> +		goto free;
>> +
>> +	/* 3rd. write back */
>> +	while (size) {
>> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
>> +
>> +		/* there is valid data on block, write back */
>> +		if (mtdpstore_is_used(cxt, off)) {
>> +			ret = mtd_write(mtd, off, zonesize, &retlen, buf);
>> +			if (ret)
>> +				dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
>> +						off, retlen, zonesize, ret);
>> +		}
>> +
>> +		off += zonesize;
>> +		size -= min_t(unsigned int, zonesize, size);
>> +	}
>> +
>> +free:
>> +	kfree(buf);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * What does mtdpstore_flush_removed() do?
>> + * When user remove any log file on pstore filesystem, mtdpstore should do
>> + * something to ensure log file removed. If the whole block is no longer used,
>> + * it's nice to erase the block. However if the block still contains valid log,
>> + * what mtdpstore can do is to erase and write the valid log back.
>> + */
>> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	int ret;
>> +	loff_t off;
>> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
>> +
>> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
>> +		ret = mtdpstore_block_isbad(cxt, off);
>> +		if (ret)
>> +			continue;
>> +
>> +		ret = mtdpstore_block_is_removed(cxt, off);
>> +		if (!ret)
>> +			continue;
>> +
>> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +	return 0;
>> +}
>> +
>> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +
>> +	if (mtd->index != cxt->index || cxt->index < 0)
>> +		return;
>> +
>> +	mtdpstore_flush_removed(cxt);
>> +
>> +	blkoops_unregister_device(&cxt->bo_dev);
>> +	kfree(cxt->badmap);
>> +	kfree(cxt->usedmap);
>> +	kfree(cxt->rmmap);
>> +	cxt->mtd = NULL;
>> +	cxt->index = -1;
>> +}
>> +
>> +static struct mtd_notifier mtdpstore_notifier = {
>> +	.add	= mtdpstore_notify_add,
>> +	.remove	= mtdpstore_notify_remove,
>> +};
>> +
>> +static int __init mtdpstore_init(void)
>> +{
>> +	int ret;
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	struct blkoops_info *info = &cxt->bo_info;
>> +
>> +	ret = blkoops_info(info);
>> +	if (unlikely(ret))
>> +		return ret;
>> +
>> +	if (strlen(info->device) == 0) {
>> +		dev_err(&mtd->dev, "mtd device must be supplied\n");
>> +		return -EINVAL;
>> +	}
>> +	if (!info->dmesg_size) {
>> +		dev_err(&mtd->dev, "no recorder enabled\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Setup the MTD device to use */
>> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
>> +	if (ret)
>> +		cxt->index = -1;
>> +
>> +	register_mtd_user(&mtdpstore_notifier);
>> +	return 0;
>> +}
>> +module_init(mtdpstore_init);
>> +
>> +static void __exit mtdpstore_exit(void)
>> +{
>> +	unregister_mtd_user(&mtdpstore_notifier);
>> +}
>> +module_exit(mtdpstore_exit);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>> +MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
>> -- 
>> 1.9.1
>>
> 

-- 
WeiXiong Liao

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk
  2020-03-22 13:51     ` WeiXiong Liao
@ 2020-03-22 15:13       ` Kees Cook
  0 siblings, 0 replies; 43+ messages in thread
From: Kees Cook @ 2020-03-22 15:13 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Sun, Mar 22, 2020 at 09:51:19PM +0800, WeiXiong Liao wrote:
> The physical features of MTD device require that the user configurations
> must meet some requirements. For example the record size must be
> multiples of page size of MTD flash. It's really different to block device.
> If we make this device driver "invisible", we should have other way to
> limit user configurations. The dmesg pstore front-end is the most easiest
> one to fix to. There are still much work to do to support other front-ends.

I finally understand this now -- I was still thinking of things like
nvme which ultimately expose a block layer. MTD appear to genuinely be a
"non-block" device. But it is still considered a "storage" device, yes?

So perhaps "block storage device" and "non-block storage device"?

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder
  2020-03-22 11:42     ` WeiXiong Liao
@ 2020-03-22 15:16       ` Kees Cook
  0 siblings, 0 replies; 43+ messages in thread
From: Kees Cook @ 2020-03-22 15:16 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Sun, Mar 22, 2020 at 07:42:07PM +0800, WeiXiong Liao wrote:
> On 2020/3/19 AM 2:19, Kees Cook wrote:
> > On Fri, Feb 07, 2020 at 08:25:49PM +0800, WeiXiong Liao wrote:
> >> +static int blkz_recover_zones(struct blkz_context *cxt,
> >> +		struct blkz_zone **zones, unsigned int cnt)
> >> +{
> >> +	int ret;
> >> +	unsigned int i;
> >> +	struct blkz_zone *zone;
> >> +
> >> +	if (!zones)
> >> +		return 0;
> >> +
> >> +	for (i = 0; i < cnt; i++) {
> >> +		zone = zones[i];
> >> +		if (unlikely(!zone))
> >> +			continue;
> >> +		ret = blkz_recover_zone(cxt, zone);
> >> +		if (ret)
> >> +			goto recover_fail;
> >> +	}
> >> +
> >> +	return 0;
> >> +recover_fail:
> >> +	pr_debug("recover %s[%u] failed\n", zone->name, i);
> >> +	return ret;
> >> +}
> > 
> > Why is this introduced here? Shouldn't this be earlier in the series?
> 
> blkz_recover_zones() is used to recover a array of zones. Only ftrace
> recorder need it, so it's introduced here.

Okay, that's fine. I thought maybe the dmesg front-end could use it too?
Anyway, I can look at it again in v3. :)

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk
  2020-03-22 10:00     ` WeiXiong Liao
@ 2020-03-22 15:44       ` Kees Cook
  0 siblings, 0 replies; 43+ messages in thread
From: Kees Cook @ 2020-03-22 15:44 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Sun, Mar 22, 2020 at 06:00:34PM +0800, WeiXiong Liao wrote:
> On 2020/3/19 AM2:06, Kees Cook wrote:
> > On Fri, Feb 07, 2020 at 08:25:46PM +0800, WeiXiong Liao wrote:
> >> blkoops is a better wrapper for pstore/blk, which provides efficient
> >> configuration mothod. It divides all configurations of pstore/blk into
> > 
> > typo: method
> > 
> 
> I will fix it.
> 
> >> 2 parts, configurations for user and configurations for driver.
> >>
> >> Configurations for user detemine how pstore/blk work, such as
> >> dump_oops and dmesg_size. They can be set by Kconfig and module
> >> parameters.
> > 
> > I'd like to keep blkoops as close to ramoops as possible on the user
> > configuration side. Notes below...
> > 
> 
> Is your question why not use device-tree on the user configuration
> side? Here are my answer about it.
> 
> There is an important difference between blkoops and ramoops.
> The ramoops can be initialized at any time since ram already be
> ready. However, blkoops must waits for block_dev registering.

Right, that's true and looks fine as you have it. I meant I wondered if
there was a way to teach blkoops about mtd device naming (in the same
way that it already supports many ways to find matching block devices by
path, by UUID, etc). That way when blkoops see a matching MTD device,
it'll load the mtd module, etc. For now, let's leave this as-is, and
revisit this idea after v3.

> No, non-block here means devices such as MTD device which is not a block
> device and do not use generic block layer.

How are filesystems implemented on top of MTD devices? Are they
MTD-specific, or is there a block layer driver that goes on top of MTD?

> So, why not extract a common layer from ramoops and blkoops to allocate
> and manager storage sapce? That is what psotre/blk (blkzone.c) do. The
> ramoops and the blkoops do not care about the use of storage.
> 
> I don't know whether the common layer is good enough to ramoops and
> whether is good time to rename the common layer from pstore/blk to
> psotre/zone?

Yeah, I'm still looking through that. I'd love to be able to merge the
pstore/zone with much of ram.c. That way we could even get ECC support
on non-RAM storage devices. :)

But let's not worry about that for v3. I'd like to get our
configurations matched up, though. To that end, yes, let's keep your
"dmesg_size" (or should we maybe call this "oops_size" to distinguish
oops dmesg from console dmesg) and I will add an alias to ramoops to
support "oops_size". Then we can have a single place to configure
settings for the pstore/zone layer. I'll keep thinking about how to best
to that.

> How about Makefile and Kconfig as follow?
> 
> 	<Kconfig>
> 	config PSOTRE_ZONE
> 		# NOTE.
> 		# the configuration is hidden from users and selected by
> 		# pstore/blk.
> 		help
> 		  The common layer for pstore/blk (and pstore/ram in the future)
> 		  to manager storage as zones.
> 	config PSTORE_BLK
> 		tristate "Log panic/oops to a block device"
> 		select PSOTRE_ZONE
> 		help
> 		  ......
> 	config PSTORE_BLK_DMESG_SIZE
> 		......
> 
> 	<Makefile>
> 	#  Note: rename blkzone.c to pstore_zone.c
> 	obj-$(CONFIG_PSTORE_ZONE) += pstore_zone.c
> 
> 	# Note: rename blkoops.c to pstore_blk.c
> 	obj-$(CONFIG_PSTORE_BLK) += pstore_blk.c

Yeah, this works, though with the "psotre" typos fixed. ;) The comments
in the Makefile aren't needed, since there's no renaming actually
happening. They're just named that from the first time they appear
upstream.

> 
> >> +
> >> +	  NOTE that, both kconfig and module parameters can configure blkoops,
> >> +	  but module parameters have priority over kconfig.
> >> +
> >> +	  If unsure, say N.
> >> +
> >> +config PSTORE_BLKOOPS_DMESG_SIZE
> >> +	int "dmesg size in kbytes for blkoops"
> > 
> > How about "Size in Kbytes of dmesg to store"? (It will already show up
> > under the parent config, so no need to repeat "blkoops" here.
> 
> That's good idea.

Or, based on above, "Size if Kbytes of oops log to store"?

> >> +#ifdef CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
> > 
> > This (and all the others below) will always be defined, so no need to
> > test it -- just use it as needed below.
> > 
> 
> It's fine to dmesg_size and dump_oops but not pmsg_size, ftrace_size
> and console_size, because they will be not available sometimes.

Yeah, this has bothered me for a while but mostly only ramoops cared
(almost all the other backends only support the oops frontend :P).
I have some ideas about this, but I'm not quite ready to implement it
(basically, the backend would tell the core what it could support,
and the core would examine available frontends and then report back to
the backend what it needed). But that's not something we need for v3.
I'll keep thinking about it.

> >> +	bzinfo->total_size = bo_dev->total_size;
> >> +	bzinfo->dump_oops = dump_oops;
> >> +	bzinfo->read = bo_dev->read;
> >> +	bzinfo->write = bo_dev->write;
> > 
> > Why copy these separate functions? Shouldn't bzinfo just keep a pointer
> > to bo_dev?
> > 
> 
> bo_dev is a structure defined in blkoops and not available to bzinfo.
> 
> At the very beginning of my design, the pstore/blk is a common layer
> for  blkoops and ramoops. So, it's not suitable for bzinfo to keep a
> pointer to structure of blkoops.

We may need to revisit this in the future in order to keep the module
loading sane: we can't have the function body get unloaded while
something holding a pointer to it is active. But this would be a small
change at a later time. Let's leave this as-is for v3.

> I will keep generic_file_read_iter() rather than vfs_iter_read().

Absolutely. :)

> >> +
> >> +	blkoops_bdev = bdev;
> >> +	blkdev_panic_write = panic_write;
> >> +
> >> +	/* only allow driver matching the @blkdev */
> >> +	if (!bdev->bd_dev || MAJOR(bdev->bd_dev) != major)
> > 
> > And add similar error reports here.
> > 
> 
> I'd  use pr_debug rather than pr_err. Because we allow mulitiple
> devices to attempt to register to blkoops. It's not an error.
> 
> pr_debug("invalid major %u (expect %u)\n", major, MAJOR(bdev->bd_dev));

Ah! Right. Then it should separate "non matching" with pr_debug() and
"the matching one failed" with pr_err() (i.e. it's the right device, but
something about it is bad: bad size, can't register, etc).

> > I don't see this function getting used anywhere. Can it be removed? I
> > see the notes in the Documentation. Could these values just be cached at
> > open time instead of reopening the device?
> > 
> 
> This function is reserved for block driver to get information about the
> using block device. So it can't be removed.
> 
> Sure, a new structrue is created to cached these values.

Okay.

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder
  2020-03-22 11:14     ` WeiXiong Liao
@ 2020-03-22 15:59       ` Kees Cook
  0 siblings, 0 replies; 43+ messages in thread
From: Kees Cook @ 2020-03-22 15:59 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Rob Herring, Tony Luck, Vignesh Raghavendra, Jonathan Corbet,
	Richard Weinberger, Anton Vorontsov, linux-doc,
	Greg Kroah-Hartman, linux-kernel, Colin Cross, linux-mtd,
	Jonathan Cameron, Miquel Raynal, Mauro Carvalho Chehab,
	David S. Miller

On Sun, Mar 22, 2020 at 07:14:37PM +0800, WeiXiong Liao wrote:
> hi Kees Cook,
> 
> On 2020/3/19 AM 2:13, Kees Cook wrote:
> > On Fri, Feb 07, 2020 at 08:25:47PM +0800, WeiXiong Liao wrote:
> >> +config PSTORE_BLKOOPS_PMSG_SIZE
> >> +	int "pmsg size in kbytes for blkoops"
> >> +	depends on PSTORE_BLKOOPS
> >> +	depends on PSTORE_PMSG
> >> +	default 64
> > 
> > Instead of "depends on PSTORE_PMSG", you can do:
> > 
> > 	default 64 if PSTORE_PMSG
> > 	default 0
> > 
> 
> What happens if PSTORE_BLKOOPS_PMSG_SIZE is non-zero while
> PSTORE_PMSG is disabled? The pmsg recorder do not work but pstore/blk
> will always allocate zone for pmsg recorder since pmsg_size is non-zero.
> It waste storage space.

Yeah, true. This gets back to my wanting to have this be more dynamic in
the pstore core. But, yes, for now, you're right.

You can still do this for initialization:

static long pmsg_size = IS_ENABLED(CONFIG_PSTORE_PMSG)
				?  CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
				: -1;

But that'll require logic changes to verify_size(). We can revisit this
after v3.

> >> @@ -611,7 +776,8 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
> >>  		char *buf = kasprintf(GFP_KERNEL,
> >>  				"%s: Total %d times\n",
> >>  				record->reason == KMSG_DUMP_OOPS ? "Oops" :
> >> -				"Panic", record->count);
> >> +				record->reason == KMSG_DUMP_PANIC ? "Panic" :
> >> +				"Unknown", record->count);
> > 
> > Please use get_reason_str() here.
> > 
> 
> get_reason_str() is marked 'static' on platform.c and pstore/blk only
> support oops
> and panic, it's no need to check more reason number.

I'd still rather identical strings not be scattered around pstore. :) Go
ahead and make get_reason_str() non-static and rename it
pstore_get_reason_str(), EXPORT_SYMBOL(), add to pstore.h etc.

-- 
Kees Cook

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-03-22 15:59 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 12:25 [PATCH v2 00/11] pstore: mtd: support crash log to block and mtd device WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
2020-02-26  0:52   ` Kees Cook
2020-02-27  8:21     ` liaoweixiong
2020-03-18 17:23       ` Kees Cook
2020-03-20  1:50         ` WeiXiong Liao
2020-03-20 18:20           ` Kees Cook
2020-03-22 10:28             ` WeiXiong Liao
2020-03-09  0:52     ` WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
2020-03-18 18:06   ` Kees Cook
2020-03-22 10:00     ` WeiXiong Liao
2020-03-22 15:44       ` Kees Cook
2020-02-07 12:25 ` [PATCH v2 03/11] pstore/blk: blkoops: support pmsg recorder WeiXiong Liao
2020-03-18 18:13   ` Kees Cook
2020-03-22 11:14     ` WeiXiong Liao
2020-03-22 15:59       ` Kees Cook
2020-02-07 12:25 ` [PATCH v2 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
2020-03-18 18:16   ` Kees Cook
2020-03-22 11:35     ` WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
2020-03-18 18:19   ` Kees Cook
2020-03-22 11:42     ` WeiXiong Liao
2020-03-22 15:16       ` Kees Cook
2020-02-07 12:25 ` [PATCH v2 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
2020-03-18 18:31   ` Kees Cook
2020-03-22 12:20     ` WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
2020-03-18 18:35   ` Kees Cook
2020-03-22 12:27     ` WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
2020-03-18 18:42   ` Kees Cook
2020-03-22 13:06     ` WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
2020-03-18 18:47   ` Kees Cook
2020-03-22 13:03     ` WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 10/11] blkoops: add interface for dirver to get information of blkoops WeiXiong Liao
2020-02-07 12:25 ` [PATCH v2 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
2020-02-18 10:34   ` Miquel Raynal
2020-02-19  1:13     ` liaoweixiong
2020-03-18 18:57   ` Kees Cook
2020-03-22 13:51     ` WeiXiong Liao
2020-03-22 15:13       ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).