[PATCH v1 00/11] pstore: support crash log to block and mtd device

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 00/11] pstore: support crash log to block and mtd device
@ 2020-01-20  1:03 WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

Why do we need to log to block (mtd) device?
1. Most embedded intelligent equipment have no persistent ram, which
   increases costs. We perfer to cheaper solutions, like block devices.
2. Do not any equipment have battery, which means that it lost all data
   on general ram if power failure. Pstore has little to do for these
   equipments.

Why do we need mtdpstore instead of mtdoops?
1. repetitive jobs between pstore and mtdoops
   Both of pstore and mtdoops do the same jobs that store panic/oops log.
2. do what a driver should do
   To me, a driver should provide methods instead of policies. What MTD
   should do is to provide read/write/erase operations, geting rid of codes
   about chunk management, kmsg dumper and configuration.
3. enhanced feature
   Not only store log, but also show it as files.
   Not only log, but also trigger time and trigger count.
   Not only panic/oops log, but also log recorder for pmsg, console and
   ftrace in the future.

Before upstream submission, pstore/blk is tested on arch ARM and x84_64,
block device and mtd device, built as modules and in kernel. Here are the
details:

	https://github.com/gmpy/articles/blob/master/pstore/Test-Pstore-Block.md

[PATCH v1]:
1. fix errors and warnings reported by kbuild test robot.

WeiXiong Liao (11):
  pstore/blk: new support logger for block devices
  blkoops: add blkoops, a warpper for pstore/blk
  pstore/blk: support pmsg recorder
  pstore/blk: blkoops: support console recorder
  pstore/blk: blkoops: support ftrace recorder
  Documentation: pstore/blk: blkoops: create document for pstore_blk
  pstore/blk: skip broken zone for mtd device
  blkoops: respect for device to pick recorders
  pstore/blk: blkoops: support special removing jobs for dmesg.
  blkoops: add interface for dirver to get information of blkoops
  mtd: new support oops logger based on pstore/blk

 Documentation/admin-guide/pstore-block.rst |  297 ++++++
 MAINTAINERS                                |    3 +-
 drivers/mtd/Kconfig                        |   10 +
 drivers/mtd/Makefile                       |    1 +
 drivers/mtd/mtdpstore.c                    |  530 +++++++++++
 fs/pstore/Kconfig                          |  109 +++
 fs/pstore/Makefile                         |    5 +
 fs/pstore/blkoops.c                        |  490 ++++++++++
 fs/pstore/blkzone.c                        | 1344 ++++++++++++++++++++++++++++
 include/linux/blkoops.h                    |   94 ++
 include/linux/pstore_blk.h                 |   91 ++
 11 files changed, 2973 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/admin-guide/pstore-block.rst
 create mode 100644 drivers/mtd/mtdpstore.c
 create mode 100644 fs/pstore/blkoops.c
 create mode 100644 fs/pstore/blkzone.c
 create mode 100644 include/linux/blkoops.h
 create mode 100644 include/linux/pstore_blk.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v1 01/11] pstore/blk: new support logger for block devices
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

pstore/blk is similar to pstore/ram, but dump log to block devices
rather than persistent ram.

Why do we need pstore/blk?
1. Most embedded intelligent equipment have no persistent ram, which
increases costs. We perfer to cheaper solutions, like block devices.
2. Do not any equipment have battery, which means that it lost all data
on general ram if power failure. Pstore has little to do for these
equipments.

pstore/blk is one of series patches, and provides the zones management
of partition of block device or non-block device likes mtd devices. It
only supports dmesg recorder right now.

To make pstore/blk work, the block/non-block driver should calls
blkz_register() and call blkz_unregister() when exits. On other patches
of series, a better wrapper for pstore/blk, named blkoops, will be
there.

It's different with pstore/ram, pstore/blk relies on read/write APIs
from device driver, especially, write operation for panic record.

Recommend that, the block/non-block driver should register to pstore/blk
only after devices have registered to Linux and ready to work.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Reported-by: kbuild test robot <lkp@intel.com>
---
 fs/pstore/Kconfig          |  10 +
 fs/pstore/Makefile         |   3 +
 fs/pstore/blkzone.c        | 964 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pstore_blk.h |  62 +++
 4 files changed, 1039 insertions(+)
 create mode 100644 fs/pstore/blkzone.c
 create mode 100644 include/linux/pstore_blk.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 8f0369aad22a..536fde9e13e8 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -153,3 +153,13 @@ config PSTORE_RAM
 	  "ramoops.ko".
 
 	  For more information, see Documentation/admin-guide/ramoops.rst.
+
+config PSTORE_BLK
+	tristate "Log panic/oops to a block device"
+	depends on PSTORE
+	depends on BLOCK
+	help
+	  This enables panic and oops message to be logged to a block dev
+	  where it can be read back at some later point.
+
+	  If unsure, say N.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 967b5891f325..0ee2fc8d1bfb 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)	+= pmsg.o
 
 ramoops-objs += ram.o ram_core.o
 obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
+
+obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
+pstore_blk-y += blkzone.o
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
new file mode 100644
index 000000000000..89ad07cdde85
--- /dev/null
+++ b/fs/pstore/blkzone.c
@@ -0,0 +1,964 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * blkzone.c: Block device Oops/Panic logger
+ *
+ * Copyright (C) 2019 WeiXiong Liao <liaoweixiong@gallwinnertech.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define MODNAME "pstore-blk"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/blkdev.h>
+#include <linux/pstore.h>
+#include <linux/mount.h>
+#include <linux/printk.h>
+#include <linux/fs.h>
+#include <linux/pstore_blk.h>
+#include <linux/kdev_t.h>
+#include <linux/device.h>
+#include <linux/namei.h>
+#include <linux/fcntl.h>
+#include <linux/uio.h>
+#include <linux/writeback.h>
+
+/**
+ * struct blkz_head - head of zone to flush to storage
+ *
+ * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
+ * @datalen: length of data in @data
+ * @data: zone data.
+ */
+struct blkz_buffer {
+#define BLK_SIG (0x43474244) /* DBGC */
+	uint32_t sig;
+	atomic_t datalen;
+	uint8_t data[];
+};
+
+/**
+ * struct blkz_dmesg_header: dmesg information
+ *
+ * @magic: magic num for dmesg header
+ * @time: trigger time
+ * @compressed: whether conpressed
+ * @count: oops/panic counter
+ * @reason: identify oops or panic
+ */
+struct blkz_dmesg_header {
+#define DMESG_HEADER_MAGIC 0x4dfc3ae5
+	uint32_t magic;
+	struct timespec64 time;
+	bool compressed;
+	uint32_t counter;
+	enum kmsg_dump_reason reason;
+	uint8_t data[0];
+};
+
+/**
+ * struct blkz_zone - zone information
+ * @off:
+ *	zone offset of block device
+ * @type:
+ *	frontent type for this zone
+ * @name:
+ *	frontent name for this zone
+ * @buffer:
+ *	pointer to data buffer managed by this zone
+ * @oldbuf:
+ *	pointer to old data buffer.
+ * @buffer_size:
+ *	bytes in @buffer->data
+ * @should_recover:
+ *	should recover from storage
+ * @dirty:
+ *	mark whether the data in @buffer are dirty (not flush to storage yet)
+ */
+struct blkz_zone {
+	unsigned long off;
+	const char *name;
+	enum pstore_type_id type;
+
+	struct blkz_buffer *buffer;
+	struct blkz_buffer *oldbuf;
+	size_t buffer_size;
+	bool should_recover;
+	atomic_t dirty;
+};
+
+struct blkz_context {
+	struct blkz_zone **dbzs;	/* dmesg block zones */
+	unsigned int dmesg_max_cnt;
+	unsigned int dmesg_read_cnt;
+	unsigned int dmesg_write_cnt;
+	/*
+	 * the counter should be recovered when recover.
+	 * It records the oops/panic times after burning rather than booting.
+	 */
+	unsigned int oops_counter;
+	unsigned int panic_counter;
+	atomic_t recovered;
+	atomic_t on_panic;
+
+	/*
+	 * bzinfo_lock just protects "bzinfo" during calls to
+	 * blkz_register/blkz_unregister
+	 */
+	spinlock_t bzinfo_lock;
+	struct blkz_info *bzinfo;
+	struct pstore_info pstore;
+};
+static struct blkz_context blkz_cxt;
+
+enum blkz_flush_mode {
+	FLUSH_NONE = 0,
+	FLUSH_PART,
+	FLUSH_META,
+	FLUSH_ALL,
+};
+
+static inline int buffer_datalen(struct blkz_zone *zone)
+{
+	return atomic_read(&zone->buffer->datalen);
+}
+
+static inline bool is_on_panic(void)
+{
+	struct blkz_context *cxt = &blkz_cxt;
+
+	return atomic_read(&cxt->on_panic);
+}
+
+static int blkz_zone_read(struct blkz_zone *zone, char *buf,
+		size_t len, unsigned long off)
+{
+	if (!buf || !zone->buffer)
+		return -EINVAL;
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	len = min_t(size_t, len, zone->buffer_size - off);
+	memcpy(buf, zone->buffer->data + off, len);
+	return 0;
+}
+
+static int blkz_zone_write(struct blkz_zone *zone,
+		enum blkz_flush_mode flush_mode, const char *buf,
+		size_t len, unsigned long off)
+{
+	struct blkz_info *info = blkz_cxt.bzinfo;
+	ssize_t wcnt = 0;
+	ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
+	size_t wlen;
+
+	if (off > zone->buffer_size)
+		return -EINVAL;
+	wlen = min_t(size_t, len, zone->buffer_size - off);
+	if (buf && wlen) {
+		memcpy(zone->buffer->data + off, buf, wlen);
+		atomic_set(&zone->buffer->datalen, wlen + off);
+	}
+
+	/* avoid to damage old records */
+	if (!is_on_panic() && !atomic_read(&blkz_cxt.recovered))
+		goto set_dirty;
+
+	writeop = is_on_panic() ? info->panic_write : info->write;
+	if (!writeop)
+		goto set_dirty;
+
+	switch (flush_mode) {
+	case FLUSH_NONE:
+		if (unlikely(buf && wlen))
+			goto set_dirty;
+		return 0;
+	case FLUSH_PART:
+		wcnt = writeop((const char *)zone->buffer->data + off, wlen,
+				zone->off + sizeof(*zone->buffer) + off);
+		if (wcnt != wlen)
+			goto set_dirty;
+		/* fallthrough */
+	case FLUSH_META:
+		wlen = sizeof(struct blkz_buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto set_dirty;
+		break;
+	case FLUSH_ALL:
+		wlen = zone->buffer_size + sizeof(*zone->buffer);
+		wcnt = writeop((const char *)zone->buffer, wlen, zone->off);
+		if (wcnt != wlen)
+			goto set_dirty;
+		break;
+	}
+
+	return 0;
+set_dirty:
+	atomic_set(&zone->dirty, true);
+	return -EBUSY;
+}
+
+static int blkz_flush_dirty_zone(struct blkz_zone *zone)
+{
+	int ret;
+
+	if (!zone)
+		return -EINVAL;
+
+	if (!atomic_read(&zone->dirty))
+		return 0;
+
+	if (!atomic_read(&blkz_cxt.recovered))
+		return -EBUSY;
+
+	ret = blkz_zone_write(zone, FLUSH_ALL, NULL, 0, 0);
+	if (!ret)
+		atomic_set(&zone->dirty, false);
+	return ret;
+}
+
+static int blkz_flush_dirty_zones(struct blkz_zone **zones, unsigned int cnt)
+{
+	int i, ret;
+	struct blkz_zone *zone;
+
+	if (!zones)
+		return -EINVAL;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (!zone)
+			return -EINVAL;
+		ret = blkz_flush_dirty_zone(zone);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+/**
+ * blkz_move_zone: move data from a old zone to a new zone
+ *
+ * @old: the old zone
+ * @new: the new zone
+ *
+ * NOTE:
+ *	Call blkz_zone_write to copy and flush data. If it failed, we
+ *	should reset new->dirty, because the new zone not really dirty.
+ */
+static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
+{
+	const char *data = (const char *)old->buffer->data;
+	int ret;
+
+	ret = blkz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0);
+	if (ret) {
+		atomic_set(&new->buffer->datalen, 0);
+		atomic_set(&new->dirty, false);
+		return ret;
+	}
+	atomic_set(&old->buffer->datalen, 0);
+	return 0;
+}
+
+static int blkz_recover_dmesg_data(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	struct blkz_zone *zone = NULL;
+	struct blkz_buffer *buf;
+	unsigned long i;
+	ssize_t rcnt;
+
+	if (!info->read)
+		return -EINVAL;
+
+	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
+		zone = cxt->dbzs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+		if (atomic_read(&zone->dirty)) {
+			unsigned int wcnt = cxt->dmesg_write_cnt;
+			struct blkz_zone *new = cxt->dbzs[wcnt];
+			int ret;
+
+			ret = blkz_move_zone(zone, new);
+			if (ret) {
+				pr_err("move zone from %lu to %d failed\n",
+						i, wcnt);
+				return ret;
+			}
+			cxt->dmesg_write_cnt = (wcnt + 1) % cxt->dmesg_max_cnt;
+		}
+		if (!zone->should_recover)
+			continue;
+		buf = zone->buffer;
+		rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf),
+				zone->off);
+		if (rcnt != zone->buffer_size + sizeof(*buf))
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+	}
+	return 0;
+}
+
+/*
+ * blkz_recover_dmesg_meta: recover metadata of dmesg
+ *
+ * Recover metadata as follow:
+ * @cxt->dmesg_write_cnt
+ * @cxt->oops_counter
+ * @cxt->panic_counter
+ */
+static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	struct blkz_zone *zone;
+	size_t rcnt, len;
+	struct blkz_buffer *buf;
+	struct blkz_dmesg_header *hdr;
+	struct timespec64 time = {0};
+	unsigned long i;
+	/*
+	 * Recover may on panic, we can't allocate any memory by kmalloc.
+	 * So, we use local array instead.
+	 */
+	char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0};
+
+	if (!info->read)
+		return -EINVAL;
+
+	len = sizeof(*buf) + sizeof(*hdr);
+	buf = (struct blkz_buffer *)buffer_header;
+	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
+		zone = cxt->dbzs[i];
+		if (unlikely(!zone))
+			return -EINVAL;
+
+		rcnt = info->read((char *)buf, len, zone->off);
+		if (rcnt != len) {
+			pr_err("read %s with id %lu failed\n", zone->name, i);
+			return (int)rcnt < 0 ? (int)rcnt : -EIO;
+		}
+
+		if (buf->sig != zone->buffer->sig) {
+			pr_debug("no valid data in dmesg zone %lu\n", i);
+			continue;
+		}
+
+		if (zone->buffer_size < atomic_read(&buf->datalen)) {
+			pr_info("found overtop zone: %s: id %lu, off %lu, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		hdr = (struct blkz_dmesg_header *)buf->data;
+		if (hdr->magic != DMESG_HEADER_MAGIC) {
+			pr_info("found invalid zone: %s: id %lu, off %lu, size %zu\n",
+					zone->name, i, zone->off,
+					zone->buffer_size);
+			continue;
+		}
+
+		/*
+		 * we get the newest zone, and the next one must be the oldest
+		 * or unused zone, because we do write one by one like a circle.
+		 */
+		if (hdr->time.tv_sec >= time.tv_sec) {
+			time.tv_sec = hdr->time.tv_sec;
+			cxt->dmesg_write_cnt = (i + 1) % cxt->dmesg_max_cnt;
+		}
+
+		if (hdr->reason == KMSG_DUMP_OOPS)
+			cxt->oops_counter =
+				max(cxt->oops_counter, hdr->counter);
+		else
+			cxt->panic_counter =
+				max(cxt->panic_counter, hdr->counter);
+
+		if (!atomic_read(&buf->datalen)) {
+			pr_debug("found erased zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
+					zone->name, i, zone->off,
+					zone->buffer_size,
+					atomic_read(&buf->datalen));
+			continue;
+		}
+
+		if (!is_on_panic())
+			zone->should_recover = true;
+		pr_debug("found nice zone: %s: id %ld, off %lu, size %zu, datalen %d\n",
+				zone->name, i, zone->off,
+				zone->buffer_size, atomic_read(&buf->datalen));
+	}
+
+	return 0;
+}
+
+static int blkz_recover_dmesg(struct blkz_context *cxt)
+{
+	int ret;
+
+	if (!cxt->dbzs)
+		return 0;
+
+	ret = blkz_recover_dmesg_meta(cxt);
+	if (ret)
+		goto recover_fail;
+
+	ret = blkz_recover_dmesg_data(cxt);
+	if (ret)
+		goto recover_fail;
+
+	return 0;
+recover_fail:
+	pr_debug("recover dmesg failed\n");
+	return ret;
+}
+
+static inline int blkz_recovery(struct blkz_context *cxt)
+{
+	int ret = -EBUSY;
+
+	if (atomic_read(&cxt->recovered))
+		return 0;
+
+	ret = blkz_recover_dmesg(cxt);
+	if (ret)
+		goto recover_fail;
+
+	pr_debug("recover end!\n");
+	atomic_set(&cxt->recovered, 1);
+	return 0;
+
+recover_fail:
+	pr_err("recover failed\n");
+	return ret;
+}
+
+static int blkz_pstore_open(struct pstore_info *psi)
+{
+	struct blkz_context *cxt = psi->data;
+
+	cxt->dmesg_read_cnt = 0;
+	return 0;
+}
+
+static inline bool blkz_ok(struct blkz_zone *zone)
+{
+	if (zone && zone->buffer && buffer_datalen(zone))
+		return true;
+	return false;
+}
+
+static inline int blkz_dmesg_erase(struct blkz_context *cxt,
+		struct blkz_zone *zone)
+{
+	if (unlikely(!blkz_ok(zone)))
+		return 0;
+
+	atomic_set(&zone->buffer->datalen, 0);
+	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+}
+
+static int blkz_pstore_erase(struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
+	default:
+		return -EINVAL;
+	}
+}
+
+static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+	struct blkz_buffer *buffer = zone->buffer;
+	struct blkz_dmesg_header *hdr =
+		(struct blkz_dmesg_header *)buffer->data;
+
+	hdr->magic = DMESG_HEADER_MAGIC;
+	hdr->compressed = record->compressed;
+	hdr->time.tv_sec = record->time.tv_sec;
+	hdr->time.tv_nsec = record->time.tv_nsec;
+	hdr->reason = record->reason;
+	if (hdr->reason == KMSG_DUMP_OOPS)
+		hdr->counter = ++cxt->oops_counter;
+	else
+		hdr->counter = ++cxt->panic_counter;
+}
+
+static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
+		struct pstore_record *record)
+{
+	size_t size, hlen;
+	struct blkz_zone *zone;
+	unsigned int zonenum;
+
+	zonenum = cxt->dmesg_write_cnt;
+	zone = cxt->dbzs[zonenum];
+	if (unlikely(!zone))
+		return -ENOSPC;
+	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
+
+	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+	blkz_write_kmsg_hdr(zone, record);
+	hlen = sizeof(struct blkz_dmesg_header);
+	size = min_t(size_t, record->size, zone->buffer_size - hlen);
+	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+}
+
+static int notrace blkz_dmesg_write(struct blkz_context *cxt,
+		struct pstore_record *record)
+{
+	int ret;
+	struct blkz_info *info = cxt->bzinfo;
+
+	/*
+	 * Out of the various dmesg dump types, pstore/blk is currently designed
+	 * to only store crash logs, rather than storing general kernel logs.
+	 */
+	if (record->reason != KMSG_DUMP_OOPS &&
+			record->reason != KMSG_DUMP_PANIC)
+		return -EINVAL;
+
+	/* Skip Oopes when configured to do so. */
+	if (record->reason == KMSG_DUMP_OOPS && !info->dump_oops)
+		return -EINVAL;
+
+	/*
+	 * Explicitly only take the first part of any new crash.
+	 * If our buffer is larger than kmsg_bytes, this can never happen,
+	 * and if our buffer is smaller than kmsg_bytes, we don't want the
+	 * report split across multiple records.
+	 */
+	if (record->part != 1)
+		return -ENOSPC;
+
+	if (!cxt->dbzs)
+		return -ENOSPC;
+
+	ret = blkz_dmesg_write_do(cxt, record);
+	if (!ret) {
+		pr_debug("try to flush other dirty dmesg zones\n");
+		blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
+	}
+
+	/* alway return 0 as we had handled it on buffer */
+	return 0;
+}
+
+static int notrace blkz_pstore_write(struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+
+	if (record->type == PSTORE_TYPE_DMESG &&
+			record->reason == KMSG_DUMP_PANIC)
+		atomic_set(&cxt->on_panic, 1);
+
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		return blkz_dmesg_write(cxt, record);
+	default:
+		return -EINVAL;
+	}
+}
+
+#define READ_NEXT_ZONE ((ssize_t)(-1024))
+static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
+{
+	struct blkz_zone *zone = NULL;
+
+	while (cxt->dmesg_read_cnt < cxt->dmesg_max_cnt) {
+		zone = cxt->dbzs[cxt->dmesg_read_cnt++];
+		if (blkz_ok(zone))
+			return zone;
+	}
+
+	return NULL;
+}
+
+static int blkz_read_dmesg_hdr(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	struct blkz_buffer *buffer = zone->buffer;
+	struct blkz_dmesg_header *hdr =
+		(struct blkz_dmesg_header *)buffer->data;
+
+	if (hdr->magic != DMESG_HEADER_MAGIC)
+		return -EINVAL;
+	record->compressed = hdr->compressed;
+	record->time.tv_sec = hdr->time.tv_sec;
+	record->time.tv_nsec = hdr->time.tv_nsec;
+	record->reason = hdr->reason;
+	record->count = hdr->counter;
+	return 0;
+}
+
+static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	size_t size, hlen = 0;
+
+	size = buffer_datalen(zone);
+	/* Clear and skip this DMESG record if it has no valid header */
+	if (blkz_read_dmesg_hdr(zone, record)) {
+		atomic_set(&zone->buffer->datalen, 0);
+		atomic_set(&zone->dirty, 0);
+		return READ_NEXT_ZONE;
+	}
+	size -= sizeof(struct blkz_dmesg_header);
+
+	if (!record->compressed) {
+		char *buf = kasprintf(GFP_KERNEL,
+				"%s: Total %d times\n",
+				record->reason == KMSG_DUMP_OOPS ? "Oops" :
+				"Panic", record->count);
+		hlen = strlen(buf);
+		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
+		if (!record->buf) {
+			kfree(buf);
+			return -ENOMEM;
+		}
+	} else {
+		record->buf = kmalloc(size, GFP_KERNEL);
+		if (!record->buf)
+			return -ENOMEM;
+	}
+
+	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
+				sizeof(struct blkz_dmesg_header)) < 0)) {
+		kfree(record->buf);
+		return READ_NEXT_ZONE;
+	}
+
+	return size + hlen;
+}
+
+static ssize_t blkz_pstore_read(struct pstore_record *record)
+{
+	struct blkz_context *cxt = record->psi->data;
+	ssize_t (*blkz_read)(struct blkz_zone *zone,
+			struct pstore_record *record);
+	struct blkz_zone *zone;
+	ssize_t ret;
+
+	/* before read, we must recover from storage */
+	ret = blkz_recovery(cxt);
+	if (ret)
+		return ret;
+
+next_zone:
+	zone = blkz_read_next_zone(cxt);
+	if (!zone)
+		return 0;
+
+	record->type = zone->type;
+	switch (record->type) {
+	case PSTORE_TYPE_DMESG:
+		blkz_read = blkz_dmesg_read;
+		record->id = cxt->dmesg_read_cnt - 1;
+		break;
+	default:
+		goto next_zone;
+	}
+
+	ret = blkz_read(zone, record);
+	if (ret == READ_NEXT_ZONE)
+		goto next_zone;
+	return ret;
+}
+
+static struct blkz_context blkz_cxt = {
+	.bzinfo_lock = __SPIN_LOCK_UNLOCKED(blkz_cxt.bzinfo_lock),
+	.recovered = ATOMIC_INIT(0),
+	.on_panic = ATOMIC_INIT(0),
+	.pstore = {
+		.owner = THIS_MODULE,
+		.name = MODNAME,
+		.open = blkz_pstore_open,
+		.read = blkz_pstore_read,
+		.write = blkz_pstore_write,
+		.erase = blkz_pstore_erase,
+	},
+};
+
+static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
+		unsigned long *off, size_t size)
+{
+	struct blkz_info *info = blkz_cxt.bzinfo;
+	struct blkz_zone *zone;
+	const char *name = pstore_type_to_name(type);
+
+	if (!size)
+		return NULL;
+
+	if (*off + size > info->total_size) {
+		pr_err("no room for %s (0x%zx@0x%lx over 0x%lx)\n",
+			name, size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	zone = kzalloc(sizeof(struct blkz_zone), GFP_KERNEL);
+	if (!zone)
+		return ERR_PTR(-ENOMEM);
+
+	zone->buffer = kmalloc(size, GFP_KERNEL);
+	if (!zone->buffer) {
+		kfree(zone);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zone->buffer, 0xFF, size);
+	zone->off = *off;
+	zone->name = name;
+	zone->type = type;
+	zone->buffer_size = size - sizeof(struct blkz_buffer);
+	zone->buffer->sig = type ^ BLK_SIG;
+	atomic_set(&zone->dirty, 0);
+	atomic_set(&zone->buffer->datalen, 0);
+
+	*off += size;
+
+	pr_debug("blkzone %s: off 0x%lx, %zu header, %zu data\n", zone->name,
+			zone->off, sizeof(*zone->buffer), zone->buffer_size);
+	return zone;
+}
+
+static struct blkz_zone **blkz_init_zones(enum pstore_type_id type,
+	unsigned long *off, size_t total_size, ssize_t record_size,
+	unsigned int *cnt)
+{
+	struct blkz_info *info = blkz_cxt.bzinfo;
+	struct blkz_zone **zones, *zone;
+	const char *name = pstore_type_to_name(type);
+	int c, i;
+
+	if (!total_size || !record_size)
+		return NULL;
+
+	if (*off + total_size > info->total_size) {
+		pr_err("no room for zones %s (0x%zx@0x%lx over 0x%lx)\n",
+			name, total_size, *off, info->total_size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	c = total_size / record_size;
+	zones = kcalloc(c, sizeof(*zones), GFP_KERNEL);
+	if (!zones) {
+		pr_err("allocate for zones %s failed\n", name);
+		return ERR_PTR(-ENOMEM);
+	}
+	memset(zones, 0, c * sizeof(*zones));
+
+	for (i = 0; i < c; i++) {
+		zone = blkz_init_zone(type, off, record_size);
+		if (!zone || IS_ERR(zone)) {
+			pr_err("initialize zones %s failed\n", name);
+			while (--i >= 0) {
+				kfree(zones[i]->buffer);
+				kfree(zones[i]);
+			}
+			kfree(zones);
+			return (void *)zone;
+		}
+		zones[i] = zone;
+	}
+
+	*cnt = c;
+	return zones;
+}
+
+static void blkz_free_zone(struct blkz_zone **blkzone)
+{
+	struct blkz_zone *zone = *blkzone;
+
+	if (!zone)
+		return;
+
+	kfree(zone->buffer);
+	kfree(zone);
+	*blkzone = NULL;
+}
+
+static void blkz_free_zones(struct blkz_zone ***blkzones, unsigned int *cnt)
+{
+	struct blkz_zone **zones = *blkzones;
+
+	if (!zones)
+		return;
+
+	while (*cnt > 0) {
+		blkz_free_zone(&zones[*cnt]);
+		(*cnt)--;
+	}
+	kfree(zones);
+	*blkzones = NULL;
+}
+
+static int blkz_cut_zones(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	unsigned long off = 0;
+	int err;
+	size_t size;
+
+	size = info->total_size;
+	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+			info->dmesg_size, &cxt->dmesg_max_cnt);
+	if (IS_ERR(cxt->dbzs)) {
+		err = PTR_ERR(cxt->dbzs);
+		goto fail_out;
+	}
+
+	return 0;
+fail_out:
+	return err;
+}
+
+int blkz_register(struct blkz_info *info)
+{
+	int err = -EINVAL;
+	struct blkz_context *cxt = &blkz_cxt;
+	struct module *owner = info->owner;
+
+	if (!info->total_size) {
+		pr_warn("the total size must be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->dmesg_size) {
+		pr_warn("at least one of the records be non-zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->name || !info->name[0])
+		return -EINVAL;
+
+	if (info->total_size < 4096) {
+		pr_err("total size must be greater than 4096 bytes\n");
+		return -EINVAL;
+	}
+
+#define check_size(name, size) {					\
+		if (info->name > 0 && info->name < (size)) {		\
+			pr_err(#name " must be over %d\n", (size));	\
+			return -EINVAL;					\
+		}							\
+		if (info->name & (size - 1)) {				\
+			pr_err(#name " must be a multiple of %d\n",	\
+					(size));			\
+			return -EINVAL;					\
+		}							\
+	}
+
+	check_size(total_size, 4096);
+	check_size(dmesg_size, SECTOR_SIZE);
+
+#undef check_size
+
+	/*
+	 * the @read and @write must be applied.
+	 * if no @read, pstore may mount failed.
+	 * if no @write, pstore do not support to remove record file.
+	 */
+	if (!info->read || !info->write) {
+		pr_err("no valid general read/write interface\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&cxt->bzinfo_lock);
+	if (cxt->bzinfo) {
+		pr_warn("blk '%s' already loaded: ignoring '%s'\n",
+				cxt->bzinfo->name, info->name);
+		spin_unlock(&cxt->bzinfo_lock);
+		return -EBUSY;
+	}
+	cxt->bzinfo = info;
+	spin_unlock(&cxt->bzinfo_lock);
+
+	if (owner && !try_module_get(owner)) {
+		err = -EBUSY;
+		goto fail_out;
+	}
+
+	pr_debug("register %s with properties:\n", info->name);
+	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
+	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
+
+	err = blkz_cut_zones(cxt);
+	if (err) {
+		pr_err("cut zones fialed\n");
+		goto put_module;
+	}
+
+	if (info->dmesg_size) {
+		cxt->pstore.bufsize = cxt->dbzs[0]->buffer_size -
+			sizeof(struct blkz_dmesg_header);
+		cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
+		if (!cxt->pstore.buf) {
+			err = -ENOMEM;
+			goto put_module;
+		}
+	}
+	cxt->pstore.data = cxt;
+	if (info->dmesg_size)
+		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
+
+	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
+			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
+			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
+
+	err = pstore_register(&cxt->pstore);
+	if (err) {
+		pr_err("registering with pstore failed\n");
+		goto free_pstore_buf;
+	}
+
+	module_put(owner);
+	return 0;
+
+free_pstore_buf:
+	kfree(cxt->pstore.buf);
+put_module:
+	module_put(owner);
+fail_out:
+	spin_lock(&blkz_cxt.bzinfo_lock);
+	blkz_cxt.bzinfo = NULL;
+	spin_unlock(&blkz_cxt.bzinfo_lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(blkz_register);
+
+void blkz_unregister(struct blkz_info *info)
+{
+	struct blkz_context *cxt = &blkz_cxt;
+
+	pstore_unregister(&cxt->pstore);
+	kfree(cxt->pstore.buf);
+	cxt->pstore.bufsize = 0;
+
+	spin_lock(&cxt->bzinfo_lock);
+	blkz_cxt.bzinfo = NULL;
+	spin_unlock(&cxt->bzinfo_lock);
+
+	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
+}
+EXPORT_SYMBOL_GPL(blkz_unregister);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("Block device Oops/Panic logger");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
new file mode 100644
index 000000000000..589d276fa4e4
--- /dev/null
+++ b/include/linux/pstore_blk.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSTORE_BLK_H_
+#define __PSTORE_BLK_H_
+
+#include <linux/types.h>
+#include <linux/blkdev.h>
+
+/**
+ * struct blkz_info - backend blkzone driver structure
+ *
+ * @owner:
+ *	Module which is responsible for this backend driver.
+ * @name:
+ *	Name of the backend driver.
+ * @total_size:
+ *	The total size in bytes pstore/blk can use. It must be greater than
+ *	4096 and be multiple of 4096.
+ * @dmesg_size:
+ *	The size of each zones for dmesg (oops & panic). Zero means disabled,
+ *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
+ * @dump_oops:
+ *	Dump oops and panic log or only panic.
+ * @read, @write:
+ *	The general (not panic) read/write operation. It's required unless you
+ *	are block device and supply valid @bdev. In this case, blkzone will
+ *	replace it as a general read/write interface.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes read/write should be returned.
+ *	On error, negative number should be returned.
+ * @panic_write:
+ *	The write operation only used for panic. It's optional if you do not
+ *	care panic record. If panic occur but blkzone do not recover yet, the
+ *	first zone of dmesg is used.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes write should be returned.
+ *	On error, negative number should be returned.
+ */
+typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
+typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
+struct blkz_info {
+	struct module *owner;
+	const char *name;
+
+	unsigned long total_size;
+	unsigned long dmesg_size;
+	int dump_oops;
+	blkz_read_op read;
+	blkz_write_op write;
+	blkz_write_op panic_write;
+};
+
+extern int blkz_register(struct blkz_info *info);
+extern void blkz_unregister(struct blkz_info *info);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 02/11] blkoops: add blkoops, a warpper for pstore/blk
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 03/11] pstore/blk: support pmsg recorder WeiXiong Liao
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

blkoops is a better wrapper for pstore/blk, which provides efficient
configuration mothod. It divides all configurations of pstore/blk into
2 parts, configurations for user and configurations for driver.

Configurations for user detemine how pstore/blk work, such as
dump_oops and dmesg_size. They can be set by Kconfig and module
parameters.

Configurations for driver are all about block/non-block device, such as
total_size of device and read/write operations. They should be provided
by device drivers, calling blkoops_register_device() for non-block
device and blkoops_register_blkdev() for block device.

If device driver support for panic records, @panic_write must be valid.
If panic occurs and pstore/blk does not recover yet, the first zone
of dmesg will be used.

Besides, Block device driver has no need to verify which partition is
used and provides generic read/write operations. Because blkoops has
done it. It also means that if users do not care panic records but
records for oops/console/pmsg/ftrace, block device driver should do
nothing.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 MAINTAINERS             |   2 +-
 fs/pstore/Kconfig       |  61 +++++++
 fs/pstore/Makefile      |   2 +
 fs/pstore/blkoops.c     | 417 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkoops.h |  58 +++++++
 5 files changed, 539 insertions(+), 1 deletion(-)
 create mode 100644 fs/pstore/blkoops.c
 create mode 100644 include/linux/blkoops.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cc0a4a8ae06a..e4ba97130560 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13381,7 +13381,7 @@ F:	drivers/firmware/efi/efi-pstore.c
 F:	drivers/acpi/apei/erst.c
 F:	Documentation/admin-guide/ramoops.rst
 F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
-K:	\b(pstore|ramoops)
+K:	\b(pstore|ramoops|blkoops)
 
 PTP HARDWARE CLOCK SUPPORT
 M:	Richard Cochran <richardcochran@gmail.com>
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 536fde9e13e8..cd15f9322acd 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -163,3 +163,64 @@ config PSTORE_BLK
 	  where it can be read back at some later point.
 
 	  If unsure, say N.
+
+config PSTORE_BLKOOPS
+	tristate "pstore block with oops logger"
+	depends on PSTORE_BLK
+	help
+	  This is a wrapper for pstore/blk.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
+	  If unsure, say N.
+
+config PSTORE_BLKOOPS_DMESG_SIZE
+	int "dmesg size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	default 64
+	help
+	  This just sets size of dmesg (dmesg_size) for pstore/blk. The value
+	  must be a multiple of 4096.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
+config PSTORE_BLKOOPS_BLKDEV
+	string "block device for blkoops"
+	depends on PSTORE_BLKOOPS
+	default ""
+	help
+	  Which block device should be used for pstore/blk.
+
+	  It accept the following variants:
+	  1) <hex_major><hex_minor> device number in hexadecimal represents
+	     itself no leading 0x, for example b302.
+	  2) /dev/<disk_name> represents the device number of disk
+	  3) /dev/<disk_name><decimal> represents the device number
+	     of partition - device number of disk plus the partition number
+	  4) /dev/<disk_name>p<decimal> - same as the above, this form is
+	     used when disk name of partitioned disk ends with a digit.
+	  5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+	     unique id of a partition if the partition table provides it.
+	     The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+	     partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+	     filled hex representation of the 32-bit "NT disk signature", and PP
+	     is a zero-filled hex representation of the 1-based partition number.
+	  6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
+	     to a partition with a known unique id.
+	  7) <major>:<minor> major and minor number of the device separated by
+	     a colon.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
+config PSTORE_BLKOOPS_DUMP_OOPS
+	bool "dump oops"
+	depends on PSTORE_BLKOOPS
+	default y
+	help
+	  Whether blkoops dumps oops or not.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 0ee2fc8d1bfb..24b3d488d2f0 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -15,3 +15,5 @@ obj-$(CONFIG_PSTORE_RAM)	+= ramoops.o
 
 obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
 pstore_blk-y += blkzone.o
+
+obj-$(CONFIG_PSTORE_BLKOOPS) += blkoops.o
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
new file mode 100644
index 000000000000..9e3fc3a46e0f
--- /dev/null
+++ b/fs/pstore/blkoops.c
@@ -0,0 +1,417 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * blkoops.c: Block device Oops logger
+ *
+ * Copyright (C) 2019 WeiXiong Liao <liaoweixiong@gallwinnertech.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+#define pr_fmt(fmt) "blkoops : " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/platform_device.h>
+#include <linux/blkoops.h>
+#include <linux/mount.h>
+#include <linux/uio.h>
+
+static long dmesg_size = -1;
+module_param(dmesg_size, long, 0400);
+MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
+
+static int dump_oops = -1;
+module_param(dump_oops, int, 0400);
+MODULE_PARM_DESC(total_size, "whether dump oops");
+
+/**
+ * The block device to use. Most of the time, it is a partition of block
+ * device. It's fine to ignore it if you are not block device and register
+ * to blkoops by blkoops_register_device(). In this case, @blkdev is
+ * useless and @read, @write and @total_size must be supplied.
+ *
+ * @blkdev accepts the following variants:
+ * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
+ *    no leading 0x, for example b302.
+ * 2) /dev/<disk_name> represents the device number of disk
+ * 3) /dev/<disk_name><decimal> represents the device number
+ *    of partition - device number of disk plus the partition number
+ * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
+ *    used when disk name of partitioned disk ends on a digit.
+ * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ *    unique id of a partition if the partition table provides it.
+ *    The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ *    partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ *    filled hex representation of the 32-bit "NT disk signature", and PP
+ *    is a zero-filled hex representation of the 1-based partition number.
+ * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
+ *    a partition with a known unique id.
+ * 7) <major>:<minor> major and minor number of the device separated by
+ *    a colon.
+ */
+static char blkdev[80];
+module_param_string(blkdev, blkdev, 80, 0400);
+MODULE_PARM_DESC(blkdev, "the block device for general read/write");
+
+static DEFINE_MUTEX(blkz_lock);
+static struct block_device *blkoops_bdev;
+static struct blkz_info *bzinfo;
+static blkoops_blk_panic_write_op blkdev_panic_write;
+
+#ifdef CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
+#define DEFAULT_DMESG_SIZE CONFIG_PSTORE_BLKOOPS_DMESG_SIZE
+#else
+#define DEFAULT_DMESG_SIZE 0
+#endif
+
+#ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
+#define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
+#else
+#define DEFAULT_DUMP_OOPS 1
+#endif
+
+#ifdef CONFIG_PSTORE_BLKOOPS_BLKDEV
+#define DEFAULT_BLKDEV CONFIG_PSTORE_BLKOOPS_BLKDEV
+#else
+#define DEFAULT_BLKDEV ""
+#endif
+
+/**
+ * register device to blkoops
+ *
+ * Drivers, not only block drivers but also non-block drivers can call this
+ * function to register to blkoops. It will pack for blkzone and pstore.
+ */
+int blkoops_register_device(struct blkoops_device *bo_dev)
+{
+	int ret;
+
+	if (!bo_dev || !bo_dev->total_size || !bo_dev->read || !bo_dev->write)
+		return -EINVAL;
+
+	mutex_lock(&blkz_lock);
+
+	/* someone already registered before */
+	if (bzinfo) {
+		mutex_unlock(&blkz_lock);
+		return -EBUSY;
+	}
+	bzinfo = kzalloc(sizeof(struct blkz_info), GFP_KERNEL);
+	if (!bzinfo) {
+		mutex_unlock(&blkz_lock);
+		return -ENOMEM;
+	}
+
+#define verify_size(name, defsize, alignsize) {				\
+		long _##name_ = (name);					\
+		if (_##name_ < 0)					\
+			_##name_ = (defsize);				\
+		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+		if (_##name_ & (alignsize - 1)) {			\
+			pr_info(#name " must align to %d\n",		\
+					(alignsize));			\
+			_##name_ = ALIGN(name, alignsize);		\
+		}							\
+		name = _##name_ / 1024;					\
+		bzinfo->name = _##name_;				\
+	}
+
+	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
+#undef verify_size
+	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
+
+	bzinfo->total_size = bo_dev->total_size;
+	bzinfo->dump_oops = dump_oops;
+	bzinfo->read = bo_dev->read;
+	bzinfo->write = bo_dev->write;
+	bzinfo->panic_write = bo_dev->panic_write;
+	bzinfo->name = "blkoops";
+	bzinfo->owner = THIS_MODULE;
+
+	ret = blkz_register(bzinfo);
+	if (ret) {
+		kfree(bzinfo);
+		bzinfo = NULL;
+	}
+	mutex_unlock(&blkz_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkoops_register_device);
+
+void blkoops_unregister_device(struct blkoops_device *bo_dev)
+{
+	mutex_lock(&blkz_lock);
+	if (bzinfo && bzinfo->read == bo_dev->read) {
+		blkz_unregister(bzinfo);
+		kfree(bzinfo);
+		bzinfo = NULL;
+	}
+	mutex_unlock(&blkz_lock);
+}
+EXPORT_SYMBOL_GPL(blkoops_unregister_device);
+
+/**
+ * get block_device of @blkdev
+ * @holder: exclusive holder identifier
+ *
+ * On success, @blkoops_bdev will save the block_device and the returned
+ * block_device has reference count of one.
+ */
+static struct block_device *blkoops_get_bdev(void *holder)
+{
+	struct block_device *bdev = ERR_PTR(-ENODEV);
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!blkdev[0] && strlen(DEFAULT_BLKDEV))
+		snprintf(blkdev, 80, "%s", DEFAULT_BLKDEV);
+	if (!blkdev[0])
+		return ERR_PTR(-ENODEV);
+
+	mutex_lock(&blkz_lock);
+	if (bzinfo)
+		goto out;
+	if (holder)
+		mode |= FMODE_EXCL;
+	bdev = blkdev_get_by_path(blkdev, mode, holder);
+	if (IS_ERR(bdev)) {
+		dev_t devt;
+
+		devt = name_to_dev_t(blkdev);
+		if (devt == 0) {
+			bdev = ERR_PTR(-ENODEV);
+			goto out;
+		}
+		bdev = blkdev_get_by_dev(devt, mode, holder);
+	}
+out:
+	mutex_unlock(&blkz_lock);
+	return bdev;
+}
+
+static void blkoops_put_bdev(struct block_device *bdev, void *holder)
+{
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	if (!bdev)
+		return;
+
+	mutex_lock(&blkz_lock);
+	if (holder)
+		mode |= FMODE_EXCL;
+	blkdev_put(bdev, mode);
+	mutex_unlock(&blkz_lock);
+}
+
+static ssize_t blkoops_generic_blk_read(char *buf, size_t bytes, loff_t pos)
+{
+	ssize_t ret;
+	struct block_device *bdev = blkoops_bdev;
+	struct file filp;
+	mm_segment_t ofs;
+	struct kiocb kiocb;
+	struct iov_iter iter;
+	struct iovec iov = {
+		.iov_base = (void __user *)buf,
+		.iov_len = bytes
+	};
+
+	if (!bdev)
+		return -ENODEV;
+
+	memset(&filp, 0, sizeof(struct file));
+	filp.f_mapping = bdev->bd_inode->i_mapping;
+	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	filp.f_inode = bdev->bd_inode;
+
+	init_sync_kiocb(&kiocb, &filp);
+	kiocb.ki_pos = pos;
+	iov_iter_init(&iter, READ, &iov, 1, bytes);
+
+	ofs = get_fs();
+	set_fs(KERNEL_DS);
+	ret = generic_file_read_iter(&kiocb, &iter);
+	set_fs(ofs);
+	return ret;
+}
+
+static ssize_t blkoops_generic_blk_write(const char *buf, size_t bytes,
+		loff_t pos)
+{
+	struct block_device *bdev = blkoops_bdev;
+	struct iov_iter iter;
+	struct kiocb kiocb;
+	struct file filp;
+	mm_segment_t ofs;
+	ssize_t ret;
+	struct iovec iov = {
+		.iov_base = (void __user *)buf,
+		.iov_len = bytes
+	};
+
+	if (!bdev)
+		return -ENODEV;
+
+	/* Console/Ftrace recorder may handle buffer until flush dirty zones */
+	if (in_interrupt() || irqs_disabled())
+		return -EBUSY;
+
+	memset(&filp, 0, sizeof(struct file));
+	filp.f_mapping = bdev->bd_inode->i_mapping;
+	filp.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+	filp.f_inode = bdev->bd_inode;
+
+	init_sync_kiocb(&kiocb, &filp);
+	kiocb.ki_pos = pos;
+	iov_iter_init(&iter, WRITE, &iov, 1, bytes);
+
+	ofs = get_fs();
+	set_fs(KERNEL_DS);
+
+	inode_lock(bdev->bd_inode);
+	ret = generic_write_checks(&kiocb, &iter);
+	if (ret > 0)
+		ret = generic_perform_write(&filp, &iter, pos);
+	inode_unlock(bdev->bd_inode);
+
+	if (likely(ret > 0)) {
+		const struct file_operations f_op = {.fsync = blkdev_fsync};
+
+		filp.f_op = &f_op;
+		kiocb.ki_pos += ret;
+		ret = generic_write_sync(&kiocb, ret);
+	}
+	set_fs(ofs);
+	return ret;
+}
+
+static inline unsigned long blkoops_bdev_size(struct block_device *bdev)
+{
+	return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
+}
+
+static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
+		loff_t off)
+{
+	int ret;
+
+	if (!blkdev_panic_write)
+		return -EOPNOTSUPP;
+
+	/* size and off must align to SECTOR_SIZE for block device */
+	ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
+			size >> SECTOR_SHIFT);
+	return ret ? -EIO : size;
+}
+
+/**
+ * register block device to blkoops
+ * @major: the major device number of registering device
+ * @panic_write: the write interface for panic case.
+ *
+ * It is ONLY used for block device to register to blkoops. In this case,
+ * the module parameter @blkdev must be valid. Generic read/write interfaces
+ * will be used.
+ *
+ * Block driver has no need to verify which partition is used. Block driver
+ * should only tell me what major number is, so blkoops can get the matching
+ * driver for @blkdev.
+ *
+ * If block driver support for panic records, @panic_write must be valid. If
+ * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
+ * will be used.
+ */
+int blkoops_register_blkdev(unsigned int major,
+		blkoops_blk_panic_write_op panic_write)
+{
+	struct block_device *bdev;
+	struct blkoops_device bo_dev = {0};
+	int ret = -ENODEV;
+	void *holder = blkdev;
+
+	bdev = blkoops_get_bdev(holder);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+
+	blkoops_bdev = bdev;
+	blkdev_panic_write = panic_write;
+
+	/* only allow driver matching the @blkdev */
+	if (!bdev->bd_dev || MAJOR(bdev->bd_dev) != major)
+		goto err_put_bdev;
+
+	bo_dev.total_size = blkoops_bdev_size(bdev);
+	if (bo_dev.total_size == 0)
+		goto err_put_bdev;
+	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
+	bo_dev.read = blkoops_generic_blk_read;
+	bo_dev.write = blkoops_generic_blk_write;
+
+	ret = blkoops_register_device(&bo_dev);
+	if (ret)
+		goto err_put_bdev;
+	return 0;
+
+err_put_bdev:
+	blkdev_panic_write = NULL;
+	blkoops_bdev = NULL;
+	blkoops_put_bdev(bdev, holder);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkoops_register_blkdev);
+
+void blkoops_unregister_blkdev(unsigned int major)
+{
+	struct blkoops_device bo_dev = {.read = blkoops_generic_blk_read};
+	void *holder = blkdev;
+
+	if (blkoops_bdev && MAJOR(blkoops_bdev->bd_dev) == major) {
+		blkoops_unregister_device(&bo_dev);
+		blkoops_put_bdev(blkoops_bdev, holder);
+		blkdev_panic_write = NULL;
+		blkoops_bdev = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(blkoops_unregister_blkdev);
+
+/**
+ * get information of @blkdev
+ * @devt: the block device num of @blkdev
+ * @nr_sectors: the sector count of @blkdev
+ * @start_sect: the start sector of @blkdev
+ *
+ * Block driver needs the follow information for @panic_write.
+ */
+int blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
+{
+	struct block_device *bdev;
+
+	bdev = blkoops_get_bdev(NULL);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+
+	if (devt)
+		*devt = bdev->bd_dev;
+	if (nr_sects)
+		*nr_sects = part_nr_sects_read(bdev->bd_part);
+	if (start_sect)
+		*start_sect = get_start_sect(bdev);
+
+	blkoops_put_bdev(bdev, NULL);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkoops_blkdev_info);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("Wrapper for Pstore BLK with Oops logger");
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
new file mode 100644
index 000000000000..fe63739309aa
--- /dev/null
+++ b/include/linux/blkoops.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __BLKOOPS_H_
+#define __BLKOOPS_H_
+
+#include <linux/types.h>
+#include <linux/blkdev.h>
+#include <linux/pstore_blk.h>
+
+/**
+ * struct blkoops_device - backend blkoops driver structure.
+ *
+ * This structure is ONLY used for non-block device by
+ * blkoops_register_device(). If block device, you are strongly recommended
+ * to use blkoops_register_blkdev().
+ *
+ * @total_size:
+ *	The total size in bytes pstore/blk can use. It must be greater than
+ *	4096 and be multiple of 4096.
+ * @read, @write:
+ *	The general (not panic) read/write operation.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes read should be returned.
+ *	On error, negative number should be returned.
+ * @panic_write:
+ *	The write operation only used for panic.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, the number of bytes read should be returned.
+ *	On error, negative number should be returned.
+ */
+struct blkoops_device {
+	unsigned long total_size;
+	blkz_read_op read;
+	blkz_write_op write;
+	blkz_write_op panic_write;
+};
+
+/*
+ * Panic write for block device who should write alignmemt to SECTOR_SIZE.
+ * On success, zero should be returned. Others mean error.
+ */
+typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
+		sector_t sects);
+
+int  blkoops_register_device(struct blkoops_device *bo_dev);
+void blkoops_unregister_device(struct blkoops_device *bo_dev);
+int  blkoops_register_blkdev(unsigned int major,
+		blkoops_blk_panic_write_op panic_write);
+void blkoops_unregister_blkdev(unsigned int major);
+int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 03/11] pstore/blk: support pmsg recorder
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

pmsg support recorder for userspace. To enable pmsg, just make pmsg_size
be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/Kconfig          |  12 +++
 fs/pstore/blkoops.c        |  11 +++
 fs/pstore/blkzone.c        | 229 +++++++++++++++++++++++++++++++++++++++++++--
 include/linux/pstore_blk.h |   4 +
 4 files changed, 246 insertions(+), 10 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index cd15f9322acd..656d63dc3f01 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -186,6 +186,18 @@ config PSTORE_BLKOOPS_DMESG_SIZE
 	  NOTE that, both kconfig and module parameters can configure blkoops,
 	  but module parameters have priority over kconfig.
 
+config PSTORE_BLKOOPS_PMSG_SIZE
+	int "pmsg size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	depends on PSTORE_PMSG
+	default 64
+	help
+	  This just sets size of pmsg (pmsg_size) for pstore/blk. The value must
+	  be a multiple of 4096.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
 config PSTORE_BLKOOPS_BLKDEV
 	string "block device for blkoops"
 	depends on PSTORE_BLKOOPS
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 9e3fc3a46e0f..b3bd004fad1a 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -31,6 +31,10 @@
 module_param(dmesg_size, long, 0400);
 MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
 
+static long pmsg_size = -1;
+module_param(pmsg_size, long, 0400);
+MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
+
 static int dump_oops = -1;
 module_param(dump_oops, int, 0400);
 MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -75,6 +79,12 @@
 #define DEFAULT_DMESG_SIZE 0
 #endif
 
+#ifdef CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
+#define DEFAULT_PMSG_SIZE CONFIG_PSTORE_BLKOOPS_PMSG_SIZE
+#else
+#define DEFAULT_PMSG_SIZE 0
+#endif
+
 #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #else
@@ -128,6 +138,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 	}
 
 	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
+	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 89ad07cdde85..1c9a75c21e39 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -40,12 +40,14 @@
  *
  * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
  * @datalen: length of data in @data
+ * @start: offset into @data where the beginning of the stored bytes begin
  * @data: zone data.
  */
 struct blkz_buffer {
 #define BLK_SIG (0x43474244) /* DBGC */
 	uint32_t sig;
 	atomic_t datalen;
+	atomic_t start;
 	uint8_t data[];
 };
 
@@ -101,8 +103,10 @@ struct blkz_zone {
 
 struct blkz_context {
 	struct blkz_zone **dbzs;	/* dmesg block zones */
+	struct blkz_zone *pbz;		/* Pmsg block zone */
 	unsigned int dmesg_max_cnt;
 	unsigned int dmesg_read_cnt;
+	unsigned int pmsg_read_cnt;
 	unsigned int dmesg_write_cnt;
 	/*
 	 * the counter should be recovered when recover.
@@ -135,6 +139,11 @@ static inline int buffer_datalen(struct blkz_zone *zone)
 	return atomic_read(&zone->buffer->datalen);
 }
 
+static inline int buffer_start(struct blkz_zone *zone)
+{
+	return atomic_read(&zone->buffer->start);
+}
+
 static inline bool is_on_panic(void)
 {
 	struct blkz_context *cxt = &blkz_cxt;
@@ -426,6 +435,69 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
 	return ret;
 }
 
+static int blkz_recover_pmsg(struct blkz_context *cxt)
+{
+	struct blkz_info *info = cxt->bzinfo;
+	struct blkz_buffer *oldbuf;
+	struct blkz_zone *zone = NULL;
+	int ret = 0;
+	ssize_t rcnt, len;
+
+	zone = cxt->pbz;
+	if (!zone || zone->oldbuf)
+		return 0;
+
+	if (is_on_panic())
+		goto out;
+
+	if (unlikely(!info->read))
+		return -EINVAL;
+
+	len = zone->buffer_size + sizeof(*oldbuf);
+	oldbuf = kzalloc(len, GFP_KERNEL);
+	if (!oldbuf)
+		return -ENOMEM;
+
+	rcnt = info->read((char *)oldbuf, len, zone->off);
+	if (rcnt != len) {
+		pr_debug("recover pmsg failed\n");
+		ret = (int)rcnt < 0 ? (int)rcnt : -EIO;
+		goto free_oldbuf;
+	}
+
+	if (oldbuf->sig != zone->buffer->sig) {
+		pr_debug("no valid data in zone %s\n", zone->name);
+		goto free_oldbuf;
+	}
+
+	if (zone->buffer_size < atomic_read(&oldbuf->datalen) ||
+		zone->buffer_size < atomic_read(&oldbuf->start)) {
+		pr_info("found overtop zone: %s: off %lu, size %zu\n",
+				zone->name, zone->off, zone->buffer_size);
+		goto free_oldbuf;
+	}
+
+	if (!atomic_read(&oldbuf->datalen)) {
+		pr_debug("found erased zone: %s: id 0, off %lu, size %zu, datalen %d\n",
+				zone->name, zone->off, zone->buffer_size,
+				atomic_read(&oldbuf->datalen));
+		kfree(oldbuf);
+		goto out;
+	}
+
+	pr_debug("found nice zone: %s: id 0, off %lu, size %zu, datalen %d\n",
+			zone->name, zone->off, zone->buffer_size,
+			atomic_read(&oldbuf->datalen));
+	zone->oldbuf = oldbuf;
+out:
+	blkz_flush_dirty_zone(zone);
+	return 0;
+
+free_oldbuf:
+	kfree(oldbuf);
+	return ret;
+}
+
 static inline int blkz_recovery(struct blkz_context *cxt)
 {
 	int ret = -EBUSY;
@@ -437,6 +509,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = blkz_recover_pmsg(cxt);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -451,9 +527,17 @@ static int blkz_pstore_open(struct pstore_info *psi)
 	struct blkz_context *cxt = psi->data;
 
 	cxt->dmesg_read_cnt = 0;
+	cxt->pmsg_read_cnt = 0;
 	return 0;
 }
 
+static inline bool blkz_old_ok(struct blkz_zone *zone)
+{
+	if (zone && zone->oldbuf && atomic_read(&zone->oldbuf->datalen))
+		return true;
+	return false;
+}
+
 static inline bool blkz_ok(struct blkz_zone *zone)
 {
 	if (zone && zone->buffer && buffer_datalen(zone))
@@ -471,6 +555,25 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
 	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
+static inline int blkz_pmsg_erase(struct blkz_context *cxt,
+		struct blkz_zone *zone)
+{
+	if (unlikely(!blkz_old_ok(zone)))
+		return 0;
+
+	kfree(zone->oldbuf);
+	zone->oldbuf = NULL;
+	/*
+	 * if there are new data in zone buffer, that means the old data
+	 * are already invalid. It is no need to flush 0 (erase) to
+	 * block device.
+	 */
+	if (!buffer_datalen(zone))
+		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	blkz_flush_dirty_zone(zone);
+	return 0;
+}
+
 static int blkz_pstore_erase(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
@@ -478,6 +581,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
+	case PSTORE_TYPE_PMSG:
+		return blkz_pmsg_erase(cxt, cxt->pbz);
 	default:
 		return -EINVAL;
 	}
@@ -498,8 +603,10 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
 	hdr->reason = record->reason;
 	if (hdr->reason == KMSG_DUMP_OOPS)
 		hdr->counter = ++cxt->oops_counter;
-	else
+	else if (hdr->reason == KMSG_DUMP_PANIC)
 		hdr->counter = ++cxt->panic_counter;
+	else
+		hdr->counter = 0;
 }
 
 static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
@@ -562,6 +669,55 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
 	return 0;
 }
 
+static int notrace blkz_pmsg_write(struct blkz_context *cxt,
+		struct pstore_record *record)
+{
+	struct blkz_zone *zone;
+	size_t start, rem;
+	int cnt = record->size;
+	bool is_full_data = false;
+	char *buf = record->buf;
+
+	zone = cxt->pbz;
+	if (!zone)
+		return -ENOSPC;
+
+	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
+		is_full_data = true;
+
+	if (unlikely(cnt > zone->buffer_size)) {
+		buf += cnt - zone->buffer_size;
+		cnt = zone->buffer_size;
+	}
+
+	start = buffer_start(zone);
+	rem = zone->buffer_size - start;
+	if (unlikely(rem < cnt)) {
+		blkz_zone_write(zone, FLUSH_PART, buf, rem, start);
+		buf += rem;
+		cnt -= rem;
+		start = 0;
+		is_full_data = true;
+	}
+
+	atomic_set(&zone->buffer->start, cnt + start);
+	blkz_zone_write(zone, FLUSH_PART, buf, cnt, start);
+
+	/**
+	 * blkz_zone_write will set datalen as start + cnt.
+	 * It work if actual data length lesser than buffer size.
+	 * If data length greater than buffer size, pmsg will rewrite to
+	 * beginning of zone, which make buffer->datalen wrongly.
+	 * So we should reset datalen as buffer size once actual data length
+	 * greater than buffer size.
+	 */
+	if (is_full_data) {
+		atomic_set(&zone->buffer->datalen, zone->buffer_size);
+		blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+	}
+	return 0;
+}
+
 static int notrace blkz_pstore_write(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
@@ -573,6 +729,8 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_write(cxt, record);
+	case PSTORE_TYPE_PMSG:
+		return blkz_pmsg_write(cxt, record);
 	default:
 		return -EINVAL;
 	}
@@ -589,6 +747,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->pmsg_read_cnt == 0) {
+		cxt->pmsg_read_cnt++;
+		zone = cxt->pbz;
+		if (blkz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -627,7 +792,8 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 		char *buf = kasprintf(GFP_KERNEL,
 				"%s: Total %d times\n",
 				record->reason == KMSG_DUMP_OOPS ? "Oops" :
-				"Panic", record->count);
+				record->reason == KMSG_DUMP_PANIC ? "Panic" :
+				"Unknown", record->count);
 		hlen = strlen(buf);
 		record->buf = krealloc(buf, hlen + size, GFP_KERNEL);
 		if (!record->buf) {
@@ -649,6 +815,29 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	return size + hlen;
 }
 
+static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
+		struct pstore_record *record)
+{
+	size_t size, start;
+	struct blkz_buffer *buf;
+
+	buf = (struct blkz_buffer *)zone->oldbuf;
+	if (!buf)
+		return READ_NEXT_ZONE;
+
+	size = atomic_read(&buf->datalen);
+	start = atomic_read(&buf->start);
+
+	record->buf = kmalloc(size, GFP_KERNEL);
+	if (!record->buf)
+		return -ENOMEM;
+
+	memcpy(record->buf, buf->data + start, size - start);
+	memcpy(record->buf + size - start, buf->data, start);
+
+	return size;
+}
+
 static ssize_t blkz_pstore_read(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
@@ -673,6 +862,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 		blkz_read = blkz_dmesg_read;
 		record->id = cxt->dmesg_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_PMSG:
+		blkz_read = blkz_pmsg_read;
+		break;
 	default:
 		goto next_zone;
 	}
@@ -728,8 +920,10 @@ static struct blkz_zone *blkz_init_zone(enum pstore_type_id type,
 	zone->type = type;
 	zone->buffer_size = size - sizeof(struct blkz_buffer);
 	zone->buffer->sig = type ^ BLK_SIG;
+	zone->oldbuf = NULL;
 	atomic_set(&zone->dirty, 0);
 	atomic_set(&zone->buffer->datalen, 0);
+	atomic_set(&zone->buffer->start, 0);
 
 	*off += size;
 
@@ -814,17 +1008,26 @@ static int blkz_cut_zones(struct blkz_context *cxt)
 	struct blkz_info *info = cxt->bzinfo;
 	unsigned long off = 0;
 	int err;
-	size_t size;
+	size_t off_size = 0;
 
-	size = info->total_size;
-	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size,
+	off_size += info->pmsg_size;
+	cxt->pbz = blkz_init_zone(PSTORE_TYPE_PMSG, &off, info->pmsg_size);
+	if (IS_ERR(cxt->pbz)) {
+		err = PTR_ERR(cxt->pbz);
+		goto fail_out;
+	}
+
+	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
+			info->total_size - off_size,
 			info->dmesg_size, &cxt->dmesg_max_cnt);
 	if (IS_ERR(cxt->dbzs)) {
 		err = PTR_ERR(cxt->dbzs);
-		goto fail_out;
+		goto free_pmsg;
 	}
 
 	return 0;
+free_pmsg:
+	blkz_free_zone(&cxt->pbz);
 fail_out:
 	return err;
 }
@@ -840,7 +1043,7 @@ int blkz_register(struct blkz_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->dmesg_size) {
+	if (!info->dmesg_size && !info->pmsg_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -867,6 +1070,7 @@ int blkz_register(struct blkz_info *info)
 
 	check_size(total_size, 4096);
 	check_size(dmesg_size, SECTOR_SIZE);
+	check_size(pmsg_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -898,6 +1102,7 @@ int blkz_register(struct blkz_info *info)
 	pr_debug("register %s with properties:\n", info->name);
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
+	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 
 	err = blkz_cut_zones(cxt);
 	if (err) {
@@ -916,11 +1121,14 @@ int blkz_register(struct blkz_info *info)
 	}
 	cxt->pstore.data = cxt;
 	if (info->dmesg_size)
-		cxt->pstore.flags = PSTORE_FLAGS_DMESG;
+		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
+	if (info->pmsg_size)
+		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
 
-	pr_info("Registered %s as blkzone backend for %s%s\n", info->name,
+	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
 			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
-			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "");
+			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
+			cxt->pbz ? "Pmsg" : "");
 
 	err = pstore_register(&cxt->pstore);
 	if (err) {
@@ -956,6 +1164,7 @@ void blkz_unregister(struct blkz_info *info)
 	spin_unlock(&cxt->bzinfo_lock);
 
 	blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt);
+	blkz_free_zone(&cxt->pbz);
 }
 EXPORT_SYMBOL_GPL(blkz_unregister);
 
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 589d276fa4e4..af06be25bd01 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -19,6 +19,9 @@
  * @dmesg_size:
  *	The size of each zones for dmesg (oops & panic). Zero means disabled,
  *	otherwise, it must be multiple of SECTOR_SIZE(512 Bytes).
+ * @pmsg_size:
+ *	The size of zone for pmsg. Zero means disabled, othewise, it must be
+ *	multiple of SECTOR_SIZE(512).
  * @dump_oops:
  *	Dump oops and panic log or only panic.
  * @read, @write:
@@ -50,6 +53,7 @@ struct blkz_info {
 
 	unsigned long total_size;
 	unsigned long dmesg_size;
+	unsigned long pmsg_size;
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 04/11] pstore/blk: blkoops: support console recorder
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (2 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 03/11] pstore/blk: support pmsg recorder WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

Support recorder for console. To enable console recorder, just make
console_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/Kconfig          |  12 ++++++
 fs/pstore/blkoops.c        |  11 +++++
 fs/pstore/blkzone.c        | 101 ++++++++++++++++++++++++++++++++++-----------
 include/linux/blkoops.h    |   6 ++-
 include/linux/pstore_blk.h |   8 +++-
 5 files changed, 112 insertions(+), 26 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 656d63dc3f01..af83ae59f31a 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -198,6 +198,18 @@ config PSTORE_BLKOOPS_PMSG_SIZE
 	  NOTE that, both kconfig and module parameters can configure blkoops,
 	  but module parameters have priority over kconfig.
 
+config PSTORE_BLKOOPS_CONSOLE_SIZE
+	int "console size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	depends on PSTORE_CONSOLE
+	default 64
+	help
+	  This just sets size of console (console_size) for pstore/blk. The
+	  value must be a multiple of 4096.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
 config PSTORE_BLKOOPS_BLKDEV
 	string "block device for blkoops"
 	depends on PSTORE_BLKOOPS
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index b3bd004fad1a..91754795d612 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -35,6 +35,10 @@
 module_param(pmsg_size, long, 0400);
 MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
 
+static long console_size = -1;
+module_param(console_size, long, 0400);
+MODULE_PARM_DESC(console_size, "console size in kbytes");
+
 static int dump_oops = -1;
 module_param(dump_oops, int, 0400);
 MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -85,6 +89,12 @@
 #define DEFAULT_PMSG_SIZE 0
 #endif
 
+#ifdef CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
+#define DEFAULT_CONSOLE_SIZE CONFIG_PSTORE_BLKOOPS_CONSOLE_SIZE
+#else
+#define DEFAULT_CONSOLE_SIZE 0
+#endif
+
 #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #else
@@ -139,6 +149,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 
 	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
 	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
+	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 1c9a75c21e39..1bb914a2622a 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -104,9 +104,11 @@ struct blkz_zone {
 struct blkz_context {
 	struct blkz_zone **dbzs;	/* dmesg block zones */
 	struct blkz_zone *pbz;		/* Pmsg block zone */
+	struct blkz_zone *cbz;		/* console block zone */
 	unsigned int dmesg_max_cnt;
 	unsigned int dmesg_read_cnt;
 	unsigned int pmsg_read_cnt;
+	unsigned int console_read_cnt;
 	unsigned int dmesg_write_cnt;
 	/*
 	 * the counter should be recovered when recover.
@@ -127,6 +129,9 @@ struct blkz_context {
 };
 static struct blkz_context blkz_cxt;
 
+static void blkz_flush_all_dirty_zones(struct work_struct *);
+static DECLARE_WORK(blkz_cleaner, blkz_flush_all_dirty_zones);
+
 enum blkz_flush_mode {
 	FLUSH_NONE = 0,
 	FLUSH_PART,
@@ -216,6 +221,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
 	return 0;
 set_dirty:
 	atomic_set(&zone->dirty, true);
+	/* flush dirty zones nicely */
+	if (wcnt == -EBUSY && !is_on_panic())
+		schedule_work(&blkz_cleaner);
 	return -EBUSY;
 }
 
@@ -282,6 +290,15 @@ static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new)
 	return 0;
 }
 
+static void blkz_flush_all_dirty_zones(struct work_struct *work)
+{
+	struct blkz_context *cxt = &blkz_cxt;
+
+	blkz_flush_dirty_zone(cxt->pbz);
+	blkz_flush_dirty_zone(cxt->cbz);
+	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
+}
+
 static int blkz_recover_dmesg_data(struct blkz_context *cxt)
 {
 	struct blkz_info *info = cxt->bzinfo;
@@ -435,15 +452,13 @@ static int blkz_recover_dmesg(struct blkz_context *cxt)
 	return ret;
 }
 
-static int blkz_recover_pmsg(struct blkz_context *cxt)
+static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
 {
 	struct blkz_info *info = cxt->bzinfo;
 	struct blkz_buffer *oldbuf;
-	struct blkz_zone *zone = NULL;
 	int ret = 0;
 	ssize_t rcnt, len;
 
-	zone = cxt->pbz;
 	if (!zone || zone->oldbuf)
 		return 0;
 
@@ -509,7 +524,11 @@ static inline int blkz_recovery(struct blkz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
-	ret = blkz_recover_pmsg(cxt);
+	ret = blkz_recover_zone(cxt, cxt->pbz);
+	if (ret)
+		goto recover_fail;
+
+	ret = blkz_recover_zone(cxt, cxt->cbz);
 	if (ret)
 		goto recover_fail;
 
@@ -528,6 +547,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
 
 	cxt->dmesg_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
+	cxt->console_read_cnt = 0;
 	return 0;
 }
 
@@ -555,7 +575,7 @@ static inline int blkz_dmesg_erase(struct blkz_context *cxt,
 	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
-static inline int blkz_pmsg_erase(struct blkz_context *cxt,
+static inline int blkz_record_erase(struct blkz_context *cxt,
 		struct blkz_zone *zone)
 {
 	if (unlikely(!blkz_old_ok(zone)))
@@ -582,9 +602,10 @@ static int blkz_pstore_erase(struct pstore_record *record)
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_erase(cxt, cxt->dbzs[record->id]);
 	case PSTORE_TYPE_PMSG:
-		return blkz_pmsg_erase(cxt, cxt->pbz);
-	default:
-		return -EINVAL;
+		return blkz_record_erase(cxt, cxt->pbz);
+	case PSTORE_TYPE_CONSOLE:
+		return blkz_record_erase(cxt, cxt->cbz);
+	default: return -EINVAL;
 	}
 }
 
@@ -669,17 +690,15 @@ static int notrace blkz_dmesg_write(struct blkz_context *cxt,
 	return 0;
 }
 
-static int notrace blkz_pmsg_write(struct blkz_context *cxt,
-		struct pstore_record *record)
+static int notrace blkz_record_write(struct blkz_context *cxt,
+		struct blkz_zone *zone, struct pstore_record *record)
 {
-	struct blkz_zone *zone;
 	size_t start, rem;
 	int cnt = record->size;
 	bool is_full_data = false;
 	char *buf = record->buf;
 
-	zone = cxt->pbz;
-	if (!zone)
+	if (!zone || !record)
 		return -ENOSPC;
 
 	if (atomic_read(&zone->buffer->datalen) >= zone->buffer_size)
@@ -726,11 +745,20 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 			record->reason == KMSG_DUMP_PANIC)
 		atomic_set(&cxt->on_panic, 1);
 
+	/*
+	 * if on panic, do not write except dmesg records
+	 * Fix case that panic_write prints log which wakes up console recorder.
+	 */
+	if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
+		return -EBUSY;
+
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
 		return blkz_dmesg_write(cxt, record);
+	case PSTORE_TYPE_CONSOLE:
+		return blkz_record_write(cxt, cxt->cbz, record);
 	case PSTORE_TYPE_PMSG:
-		return blkz_pmsg_write(cxt, record);
+		return blkz_record_write(cxt, cxt->pbz, record);
 	default:
 		return -EINVAL;
 	}
@@ -754,6 +782,13 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 			return zone;
 	}
 
+	if (cxt->console_read_cnt == 0) {
+		cxt->console_read_cnt++;
+		zone = cxt->cbz;
+		if (blkz_old_ok(zone))
+			return zone;
+	}
+
 	return NULL;
 }
 
@@ -815,7 +850,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	return size + hlen;
 }
 
-static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
+static ssize_t blkz_record_read(struct blkz_zone *zone,
 		struct pstore_record *record)
 {
 	size_t size, start;
@@ -841,7 +876,7 @@ static ssize_t blkz_pmsg_read(struct blkz_zone *zone,
 static ssize_t blkz_pstore_read(struct pstore_record *record)
 {
 	struct blkz_context *cxt = record->psi->data;
-	ssize_t (*blkz_read)(struct blkz_zone *zone,
+	ssize_t (*readop)(struct blkz_zone *zone,
 			struct pstore_record *record);
 	struct blkz_zone *zone;
 	ssize_t ret;
@@ -859,17 +894,19 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 	record->type = zone->type;
 	switch (record->type) {
 	case PSTORE_TYPE_DMESG:
-		blkz_read = blkz_dmesg_read;
+		readop = blkz_dmesg_read;
 		record->id = cxt->dmesg_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_CONSOLE:
+		/* fallthrough */
 	case PSTORE_TYPE_PMSG:
-		blkz_read = blkz_pmsg_read;
+		readop = blkz_record_read;
 		break;
 	default:
 		goto next_zone;
 	}
 
-	ret = blkz_read(zone, record);
+	ret = readop(zone, record);
 	if (ret == READ_NEXT_ZONE)
 		goto next_zone;
 	return ret;
@@ -1017,15 +1054,25 @@ static int blkz_cut_zones(struct blkz_context *cxt)
 		goto fail_out;
 	}
 
+	off_size += info->console_size;
+	cxt->cbz = blkz_init_zone(PSTORE_TYPE_CONSOLE, &off,
+			info->console_size);
+	if (IS_ERR(cxt->cbz)) {
+		err = PTR_ERR(cxt->cbz);
+		goto free_pmsg;
+	}
+
 	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->dmesg_size, &cxt->dmesg_max_cnt);
 	if (IS_ERR(cxt->dbzs)) {
 		err = PTR_ERR(cxt->dbzs);
-		goto free_pmsg;
+		goto free_console;
 	}
 
 	return 0;
+free_console:
+	blkz_free_zone(&cxt->cbz);
 free_pmsg:
 	blkz_free_zone(&cxt->pbz);
 fail_out:
@@ -1043,7 +1090,7 @@ int blkz_register(struct blkz_info *info)
 		return -EINVAL;
 	}
 
-	if (!info->dmesg_size && !info->pmsg_size) {
+	if (!info->dmesg_size && !info->pmsg_size && !info->console_size) {
 		pr_warn("at least one of the records be non-zero\n");
 		return -EINVAL;
 	}
@@ -1071,6 +1118,7 @@ int blkz_register(struct blkz_info *info)
 	check_size(total_size, 4096);
 	check_size(dmesg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
+	check_size(console_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1103,6 +1151,7 @@ int blkz_register(struct blkz_info *info)
 	pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
 	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
+	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
 
 	err = blkz_cut_zones(cxt);
 	if (err) {
@@ -1124,11 +1173,15 @@ int blkz_register(struct blkz_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
 	if (info->pmsg_size)
 		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
+	if (info->console_size)
+		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
 
-	pr_info("Registered %s as blkzone backend for %s%s%s\n", info->name,
+	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
+			info->name,
 			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
 			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
-			cxt->pbz ? "Pmsg" : "");
+			cxt->pbz ? "Pmsg " : "",
+			cxt->cbz ? "Console" : "");
 
 	err = pstore_register(&cxt->pstore);
 	if (err) {
@@ -1155,6 +1208,8 @@ void blkz_unregister(struct blkz_info *info)
 {
 	struct blkz_context *cxt = &blkz_cxt;
 
+	flush_work(&blkz_cleaner);
+
 	pstore_unregister(&cxt->pstore);
 	kfree(cxt->pstore.buf);
 	cxt->pstore.bufsize = 0;
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index fe63739309aa..8f40f225545d 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -23,8 +23,10 @@
  *	Both of the @size and @offset parameters on this interface are
  *	the relative size of the space provided, not the whole disk/flash.
  *
- *	On success, the number of bytes read should be returned.
- *	On error, negative number should be returned.
+ *	On success, the number of bytes read/write should be returned.
+ *	On error, negative number should be returned. The following returning
+ *	number means more:
+ *	  -EBUSY: pstore/blk should try again later.
  * @panic_write:
  *	The write operation only used for panic.
  *
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index af06be25bd01..546375e04419 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -22,6 +22,9 @@
  * @pmsg_size:
  *	The size of zone for pmsg. Zero means disabled, othewise, it must be
  *	multiple of SECTOR_SIZE(512).
+ * @console_size:
+ *	The size of zone for console. Zero means disabled, othewise, it must
+ *	be multiple of SECTOR_SIZE(512).
  * @dump_oops:
  *	Dump oops and panic log or only panic.
  * @read, @write:
@@ -33,7 +36,9 @@
  *	the relative size of the space provided, not the whole disk/flash.
  *
  *	On success, the number of bytes read/write should be returned.
- *	On error, negative number should be returned.
+ *	On error, negative number should be returned. The following returning
+ *	number means more:
+ *	  -EBUSY: pstore/blk should try again later.
  * @panic_write:
  *	The write operation only used for panic. It's optional if you do not
  *	care panic record. If panic occur but blkzone do not recover yet, the
@@ -54,6 +59,7 @@ struct blkz_info {
 	unsigned long total_size;
 	unsigned long dmesg_size;
 	unsigned long pmsg_size;
+	unsigned long console_size;
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 05/11] pstore/blk: blkoops: support ftrace recorder
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (3 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

Support recorder for ftrace. To enable ftrace recorder, just make
ftrace_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/Kconfig          | 12 ++++++++
 fs/pstore/blkoops.c        | 11 +++++++
 fs/pstore/blkzone.c        | 75 ++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/pstore_blk.h |  4 +++
 4 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index af83ae59f31a..5649218d2821 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -210,6 +210,18 @@ config PSTORE_BLKOOPS_CONSOLE_SIZE
 	  NOTE that, both kconfig and module parameters can configure blkoops,
 	  but module parameters have priority over kconfig.
 
+config PSTORE_BLKOOPS_FTRACE_SIZE
+	int "ftrace size in kbytes for blkoops"
+	depends on PSTORE_BLKOOPS
+	depends on PSTORE_FTRACE
+	default 64
+	help
+	  This just sets size of ftrace (ftrace_size) for pstore/blk. The
+	  value must be a multiple of 4096.
+
+	  NOTE that, both kconfig and module parameters can configure blkoops,
+	  but module parameters have priority over kconfig.
+
 config PSTORE_BLKOOPS_BLKDEV
 	string "block device for blkoops"
 	depends on PSTORE_BLKOOPS
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 91754795d612..3f28e61c8ecf 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -39,6 +39,10 @@
 module_param(console_size, long, 0400);
 MODULE_PARM_DESC(console_size, "console size in kbytes");
 
+static long ftrace_size = -1;
+module_param(ftrace_size, long, 0400);
+MODULE_PARM_DESC(ftrace_size, "ftrace size in kbytes");
+
 static int dump_oops = -1;
 module_param(dump_oops, int, 0400);
 MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -95,6 +99,12 @@
 #define DEFAULT_CONSOLE_SIZE 0
 #endif
 
+#ifdef CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
+#define DEFAULT_FTRACE_SIZE CONFIG_PSTORE_BLKOOPS_FTRACE_SIZE
+#else
+#define DEFAULT_FTRACE_SIZE 0
+#endif
+
 #ifdef CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #define DEFAULT_DUMP_OOPS CONFIG_PSTORE_BLKOOPS_DUMP_OOPS
 #else
@@ -150,6 +160,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
 	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
 	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
+	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 1bb914a2622a..66ae8e2a924b 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -105,10 +105,13 @@ struct blkz_context {
 	struct blkz_zone **dbzs;	/* dmesg block zones */
 	struct blkz_zone *pbz;		/* Pmsg block zone */
 	struct blkz_zone *cbz;		/* console block zone */
+	struct blkz_zone **fbzs;	/* Ftrace zones */
 	unsigned int dmesg_max_cnt;
 	unsigned int dmesg_read_cnt;
 	unsigned int pmsg_read_cnt;
 	unsigned int console_read_cnt;
+	unsigned int ftrace_max_cnt;
+	unsigned int ftrace_read_cnt;
 	unsigned int dmesg_write_cnt;
 	/*
 	 * the counter should be recovered when recover.
@@ -297,6 +300,7 @@ static void blkz_flush_all_dirty_zones(struct work_struct *work)
 	blkz_flush_dirty_zone(cxt->pbz);
 	blkz_flush_dirty_zone(cxt->cbz);
 	blkz_flush_dirty_zones(cxt->dbzs, cxt->dmesg_max_cnt);
+	blkz_flush_dirty_zones(cxt->fbzs, cxt->ftrace_max_cnt);
 }
 
 static int blkz_recover_dmesg_data(struct blkz_context *cxt)
@@ -513,6 +517,31 @@ static int blkz_recover_zone(struct blkz_context *cxt, struct blkz_zone *zone)
 	return ret;
 }
 
+static int blkz_recover_zones(struct blkz_context *cxt,
+		struct blkz_zone **zones, unsigned int cnt)
+{
+	int ret;
+	unsigned int i;
+	struct blkz_zone *zone;
+
+	if (!zones)
+		return 0;
+
+	for (i = 0; i < cnt; i++) {
+		zone = zones[i];
+		if (unlikely(!zone))
+			continue;
+		ret = blkz_recover_zone(cxt, zone);
+		if (ret)
+			goto recover_fail;
+	}
+
+	return 0;
+recover_fail:
+	pr_debug("recover %s[%u] failed\n", zone->name, i);
+	return ret;
+}
+
 static inline int blkz_recovery(struct blkz_context *cxt)
 {
 	int ret = -EBUSY;
@@ -532,6 +561,10 @@ static inline int blkz_recovery(struct blkz_context *cxt)
 	if (ret)
 		goto recover_fail;
 
+	ret = blkz_recover_zones(cxt, cxt->fbzs, cxt->ftrace_max_cnt);
+	if (ret)
+		goto recover_fail;
+
 	pr_debug("recover end!\n");
 	atomic_set(&cxt->recovered, 1);
 	return 0;
@@ -548,6 +581,7 @@ static int blkz_pstore_open(struct pstore_info *psi)
 	cxt->dmesg_read_cnt = 0;
 	cxt->pmsg_read_cnt = 0;
 	cxt->console_read_cnt = 0;
+	cxt->ftrace_read_cnt = 0;
 	return 0;
 }
 
@@ -605,6 +639,8 @@ static int blkz_pstore_erase(struct pstore_record *record)
 		return blkz_record_erase(cxt, cxt->pbz);
 	case PSTORE_TYPE_CONSOLE:
 		return blkz_record_erase(cxt, cxt->cbz);
+	case PSTORE_TYPE_FTRACE:
+		return blkz_record_erase(cxt, cxt->fbzs[record->id]);
 	default: return -EINVAL;
 	}
 }
@@ -759,6 +795,13 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 		return blkz_record_write(cxt, cxt->cbz, record);
 	case PSTORE_TYPE_PMSG:
 		return blkz_record_write(cxt, cxt->pbz, record);
+	case PSTORE_TYPE_FTRACE: {
+		int zonenum = smp_processor_id();
+
+		if (!cxt->fbzs)
+			return -ENOSPC;
+		return blkz_record_write(cxt, cxt->fbzs[zonenum], record);
+	}
 	default:
 		return -EINVAL;
 	}
@@ -775,6 +818,12 @@ static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 			return zone;
 	}
 
+	while (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt) {
+		zone = cxt->fbzs[cxt->ftrace_read_cnt++];
+		if (blkz_old_ok(zone))
+			return zone;
+	}
+
 	if (cxt->pmsg_read_cnt == 0) {
 		cxt->pmsg_read_cnt++;
 		zone = cxt->pbz;
@@ -897,6 +946,9 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 		readop = blkz_dmesg_read;
 		record->id = cxt->dmesg_read_cnt - 1;
 		break;
+	case PSTORE_TYPE_FTRACE:
+		record->id = cxt->ftrace_read_cnt - 1;
+		/* fallthrough */
 	case PSTORE_TYPE_CONSOLE:
 		/* fallthrough */
 	case PSTORE_TYPE_PMSG:
@@ -1062,15 +1114,27 @@ static int blkz_cut_zones(struct blkz_context *cxt)
 		goto free_pmsg;
 	}
 
+	off_size += info->ftrace_size;
+	cxt->fbzs = blkz_init_zones(PSTORE_TYPE_FTRACE, &off,
+			info->ftrace_size,
+			info->ftrace_size / nr_cpu_ids,
+			&cxt->ftrace_max_cnt);
+	if (IS_ERR(cxt->fbzs)) {
+		err = PTR_ERR(cxt->fbzs);
+		goto free_console;
+	}
+
 	cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off,
 			info->total_size - off_size,
 			info->dmesg_size, &cxt->dmesg_max_cnt);
 	if (IS_ERR(cxt->dbzs)) {
 		err = PTR_ERR(cxt->dbzs);
-		goto free_console;
+		goto free_ftrace;
 	}
 
 	return 0;
+free_ftrace:
+	blkz_free_zones(&cxt->fbzs, &cxt->ftrace_max_cnt);
 free_console:
 	blkz_free_zone(&cxt->cbz);
 free_pmsg:
@@ -1119,6 +1183,7 @@ int blkz_register(struct blkz_info *info)
 	check_size(dmesg_size, SECTOR_SIZE);
 	check_size(pmsg_size, SECTOR_SIZE);
 	check_size(console_size, SECTOR_SIZE);
+	check_size(ftrace_size, SECTOR_SIZE);
 
 #undef check_size
 
@@ -1152,6 +1217,7 @@ int blkz_register(struct blkz_info *info)
 	pr_debug("\tdmesg size : %ld Bytes\n", info->dmesg_size);
 	pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
 	pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
+	pr_debug("\tftrace size : %ld Bytes\n", info->ftrace_size);
 
 	err = blkz_cut_zones(cxt);
 	if (err) {
@@ -1175,13 +1241,16 @@ int blkz_register(struct blkz_info *info)
 		cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
 	if (info->console_size)
 		cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
+	if (info->ftrace_size)
+		cxt->pstore.flags |= PSTORE_FLAGS_FTRACE;
 
-	pr_info("Registered %s as blkzone backend for %s%s%s%s\n",
+	pr_info("Registered %s as blkzone backend for %s%s%s%s%s\n",
 			info->name,
 			cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "",
 			cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : "",
 			cxt->pbz ? "Pmsg " : "",
-			cxt->cbz ? "Console" : "");
+			cxt->cbz ? "Console " : "",
+			cxt->fbzs ? "Ftrace" : "");
 
 	err = pstore_register(&cxt->pstore);
 	if (err) {
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 546375e04419..77704c1b404a 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -25,6 +25,9 @@
  * @console_size:
  *	The size of zone for console. Zero means disabled, othewise, it must
  *	be multiple of SECTOR_SIZE(512).
+ * @ftrace_size:
+ *	The size of zone for ftrace. Zero means disabled, othewise, it must
+ *	be multiple of SECTOR_SIZE(512).
  * @dump_oops:
  *	Dump oops and panic log or only panic.
  * @read, @write:
@@ -60,6 +63,7 @@ struct blkz_info {
 	unsigned long dmesg_size;
 	unsigned long pmsg_size;
 	unsigned long console_size;
+	unsigned long ftrace_size;
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (4 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-21  4:13   ` Randy Dunlap
  2020-01-20  1:03 ` [PATCH v1 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

The document, at Documentation/admin-guide/pstore-block.rst, tells us
how to use pstore/blk and blkoops.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
 MAINTAINERS                                |   1 +
 fs/pstore/Kconfig                          |   2 +
 3 files changed, 281 insertions(+)
 create mode 100644 Documentation/admin-guide/pstore-block.rst

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
new file mode 100644
index 000000000000..58418d429c55
--- /dev/null
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -0,0 +1,278 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Pstore block oops/panic logger
+==============================
+
+Introduction
+------------
+
+Pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
+block device before the system crashes. It also supports non-block devices such
+as mtd device.
+
+There is a trapper named blkoops for pstore/blk, which makes pstore/blk be
+nicer to device drivers.
+
+Pstore block concepts
+---------------------
+
+Pstore/blk works as a zone manager as it cuts the block device or partition
+into several zones and stores data for different recorders. What device driver
+should do is to provide read/write APIs.
+
+Pstore/blk begins at function ``blkz_register``. Besides, blkoops, a wrapper of
+pstore/blk, begins at function ``blkoops_register_blkdev`` for block device and
+``blkoops_register_device`` for non-block device, which is recommended instead
+of directly using pstore/blk.
+
+Blkoops provides efficient configuration mothod for pstore/blk, which divides
+all configurations of pstore/blk into two parts, configurations for user and
+configurations for driver.
+
+Configurations for user determine how pstore/blk works, such as pmsg_size,
+dmesg_size and so on. All of them support both kconfig and module parameters,
+but module parameters have priority over kconfig.
+
+Configurations for driver are all about block/non-block device, such as
+total_size of device and read/write operations. Device driver transfers a
+structure ``blkoops_device`` defined in *linux/blkoops.h*.
+
+Configurations for user
+-----------------------
+
+All of these configurations support both kconfig and module parameters, but
+module parameters have priority over kconfig.
+Here is an example for module parameters::
+
+        blkoops.blkdev=179:7 blkoops.dmesg_size=64 blkoops.dump_oops=1
+
+The detail of each configurations may be of interest to you.
+
+blkdev
+~~~~~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's fine to ignore it if you are not block device.
+
+It accepts the following variants:
+
+1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
+   leading 0x, for example b302.
+#. /dev/<disk_name> represents the device number of disk
+#. /dev/<disk_name><decimal> represents the device number of partition - device
+   number of disk plus the partition number
+#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
+   name of partitioned disk ends with a digit.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the unique id of
+   a partition if the partition table provides it. The UUID may be either an
+   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
+   where SSSSSSSS is a zero-filled hex representation of the 32-bit
+   "NT disk signature", and PP is a zero-filled hex representation of the
+   1-based partition number.
+#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
+   partition with a known unique id.
+#. <major>:<minor> major and minor number of the device separated by a colon.
+
+dmesg_size
+~~~~~~~~~~
+
+The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
+4096. If you don't need it, safely set it 0 or ignore it.
+
+NOTE that, the remaining space, except ``pmsg_size``, ``console_size``` and
+others, belongs to dmesg. It means that there are multiple chunks for dmesg.
+
+Pstore/blk will log to dmesg chunks one by one, and always overwrite the oldest
+chunk if there is no more free chunks.
+
+pmsg_size
+~~~~~~~~~
+
+The chunk size in bytes for pmsg. It **MUST** be a multiple of 4096. If you
+do not need it, safely set it 0 or ignore it.
+
+There is only one chunk for pmsg.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+/sys/fs/pstore/pmsg-pstore-blk-0.
+
+console_size
+~~~~~~~~~~~~
+
+The chunk size in bytes for console. It **MUST** be a multiple of 4096. If you
+do not need it, safely set it 0 or ignore it.
+
+There is only one chunk for console.
+
+All log of console will be appended to the chunk. On reboot the contents are
+available in /sys/fs/pstore/console-pstore-blk-0.
+
+ftrace_size
+~~~~~~~~~~~
+
+The chunk size in bytes for ftrace. It **MUST** be a multiple of 4096. If you
+do not need it, safely set it 0 or ignore it.
+
+There may be several chunks for ftrace, according to how many processors on
+your CPU. Each chunk size is equal to (ftrace_size / processors_count).
+
+All log of ftrace will be appended to the chunk. On reboot the contents are
+available in /sys/fs/pstore/ftrace-pstore-blk-[N], where N is the processor
+number.
+
+Persistent function tracing might be useful for debugging software or hardware
+related hangs. Here is an example of usage::
+
+ # mount -t pstore pstore /sys/fs/pstore
+ # mount -t debugfs debugfs /sys/kernel/debug/
+ # echo 1 > /sys/kernel/debug/pstore/record_ftrace
+ # reboot -f
+ [...]
+ # mount -t pstore pstore /sys/fs/pstore
+ # tail /sys/fs/pstore/ftrace-pstore-blk-0
+ CPU:0 ts:109860 c03a4310  c0063ebc  cpuidle_select <- cpu_startup_entry+0x1a8/0x1e0
+ CPU:0 ts:109861 c03a5878  c03a4324  menu_select <- cpuidle_select+0x24/0x2c
+ CPU:0 ts:109862 c00670e8  c03a589c  pm_qos_request <- menu_select+0x38/0x4cc
+ CPU:0 ts:109863 c0092bbc  c03a5960  tick_nohz_get_sleep_length <- menu_select+0xfc/0x4cc
+ CPU:0 ts:109865 c004b2f4  c03a59d4  get_iowait_load <- menu_select+0x170/0x4cc
+ CPU:0 ts:109868 c0063b60  c0063ecc  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
+ CPU:0 ts:109869 c03a433c  c0063b94  cpuidle_enter <- call_cpuidle+0x44/0x48
+ CPU:0 ts:109871 c03a4000  c03a4350  cpuidle_enter_state <- cpuidle_enter+0x24/0x28
+ CPU:0 ts:109873 c0063ba8  c03a4090  sched_idle_set_state <- cpuidle_enter_state+0xa4/0x314
+ CPU:0 ts:109874 c03a605c  c03a40b4  arm_enter_idle_state <- cpuidle_enter_state+0xc8/0x314
+
+dump_oops
+~~~~~~~~~
+
+Dumping both oopses and panics can be done by setting 1 (not zero) in the
+``dump_oops`` member while setting 0 in that variable dumps only the panics.
+
+Configurations for driver
+-------------------------
+
+Only device driver would care these configurations. Block device driver
+refers ``blkoops_register_blkdev`` while ``blkoops_register_device`` for
+non-block device.
+
+The parameters of these two APIs may be of interest to you.
+
+major
+~~~~~
+
+It is only requested by block device which is registered by
+``blkoops_register_blkdev``.  It's the major device number of registered
+devices, by which blkoops can get the matching driver for @blkdev.
+
+total_size
+~~~~~~~~~~
+
+It is only requested by non-block device which is registered by
+``blkoops_register_device``.  It tells pstore/blk that the total size
+pstore/blk can use. It **MUST** be greater than 4096 and a multiple of 4096.
+
+If block device, blkoops can get size of block device/partition automatically.
+
+read/write
+~~~~~~~~~~
+
+It's generic read/write APIs for pstore/blk, which are requested by non-block
+device. The generic APIs are used for almost all data but except panic data,
+such as pmsg, console, oops and ftrace.
+
+The parameter @offset is the relative position of the device.
+
+Normally the number of bytes read/written should be returned, while for error,
+negative number will be returned. The following return numbers mean more:
+
+-EBUSY: pstore/blk should try again later.
+
+panic_write (for non-block device)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It's a interface for panic recorder and will be used only when panic occurs.
+Non-block device driver registers it by ``blkoops_register_device``. If panic
+log is unnecessary, it's fine to ignore it.
+
+Note that pstore/blk will recover data from device while mounting pstore
+filesystem by default. If panic occurs but pstore/blk does not recover yet, the
+first zone of dmesg will be used.
+
+The parameter @offset is the relative position of the device.
+
+Normally the number of bytes written should be returned, while for error,
+negative number should be returned.
+
+panic_write (for block device)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It's much similar to panic_write for non-block device, but panic_write for
+block device writes alignment to SECTOR_SIZE, that's why the parameters are
+@sects and @start_sect. Block device driver should register it by
+``blkoops_register_blkdev``.
+
+The parameter @start_sect is the relative position of the block device and
+partition. If block driver requires absolute position for panic_write,
+``blkoops_blkdev_info`` will be helpful, which can provide the absolute
+position of the block device (or partition) on the whole disk/flash.
+
+Normally zero should be returned, otherwise it indicates an error.
+
+Compression and header
+----------------------
+
+Block device is large enough for uncompressed dmesg data. Actually we do not
+recommend data compression because pstore/blk will insert some information into
+the first line of dmesg data. For example::
+
+        Panic: Total 16 times
+
+It means that it's the 16th times panic log since the first booting. Sometimes
+the oops|panic occurs since burning is very important for embedded device to
+judge whether the system is stable.
+
+The following line is inserted by pstore filesystem. For example::
+
+        Oops#2 Part1
+
+It means that it's the 2nd times oops log on last booting.
+
+Reading the data
+----------------
+
+The dump data can be read from the pstore filesystem. The format for these
+files is ``dmesg-pstore-blk-[N]`` for dmesg(oops|panic), ``pmsg-pstore-blk-0``
+for pmsg and so on, where N is the record number. To delete a stored
+record from block device, simply unlink the respective pstore file. The
+timestamp of the dump file records the trigger time.
+
+Attentions in panic read/write APIs
+-----------------------------------
+
+If on panic, the kernel is not going to run for much longer. The tasks will not
+be scheduled and the most kernel resources will be out of service. It
+looks like a single-threaded program running on a single-core computer.
+
+The following points require special attention for panic read/write APIs:
+
+1. Can **NOT** allocate any memory.
+   If you need memory, just allocate while the block driver is initializing
+   rather than waiting until the panic.
+#. Must be polled, **NOT** interrupt driven.
+   No task schedule any more. The block driver should delay to ensure the write
+   succeeds, but NOT sleep.
+#. Can **NOT** take any lock.
+   There is no other task, nor any share resource; you are safe to break all
+   locks.
+#. Just use CPU to transfer.
+   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
+#. Operate register directly.
+   Try not to use Linux kernel resources. Do I/O map while initializing rather
+   than waiting until the panic.
+#. Reset your block device and controller if necessary.
+   If you are not sure the state of you block device and controller when panic,
+   you are safe to stop and reset them.
+
+Blkoops supports blkoops_blkdev_info(), which is defined in *linux/blkoops.h*,
+to get information of block device, such as the device number, sector count and
+start sector of the whole disk.
diff --git a/MAINTAINERS b/MAINTAINERS
index e4ba97130560..a5122e3aaf76 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13380,6 +13380,7 @@ F:	include/linux/pstore*
 F:	drivers/firmware/efi/efi-pstore.c
 F:	drivers/acpi/apei/erst.c
 F:	Documentation/admin-guide/ramoops.rst
+F:	Documentation/admin-guide/pstore-block.rst
 F:	Documentation/devicetree/bindings/reserved-memory/ramoops.txt
 K:	\b(pstore|ramoops|blkoops)
 
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 5649218d2821..24232e96a98a 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -162,6 +162,8 @@ config PSTORE_BLK
 	  This enables panic and oops message to be logged to a block dev
 	  where it can be read back at some later point.
 
+	  For more information, see Documentation/admin-guide/pstore-block.rst.
+
 	  If unsure, say N.
 
 config PSTORE_BLKOOPS
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 07/11] pstore/blk: skip broken zone for mtd device
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (5 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

It's one of a series of patches for adaptive to MTD device.

MTD device is not block device. As the block of flash (MTD device) will
be broken, it's necessary for pstore/blk to skip the broken block
(bad block).

If device drivers return -ENEXT, pstore/blk will try next zone of dmesg.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst |  3 +-
 fs/pstore/blkzone.c                        | 74 +++++++++++++++++++++++-------
 include/linux/blkoops.h                    |  4 +-
 include/linux/pstore_blk.h                 |  4 ++
 4 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index 58418d429c55..aea6d2664a22 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -185,7 +185,8 @@ The parameter @offset is the relative position of the device.
 Normally the number of bytes read/written should be returned, while for error,
 negative number will be returned. The following return numbers mean more:
 
--EBUSY: pstore/blk should try again later.
+1. -EBUSY: pstore/blk should try again later.
+#. -ENEXT: this zone is used or broken, pstore/blk should try next one.
 
 panic_write (for non-block device)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 66ae8e2a924b..3f58ff85f49c 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -223,6 +223,9 @@ static int blkz_zone_write(struct blkz_zone *zone,
 
 	return 0;
 set_dirty:
+	/* no need to mark dirty if going to try next zone */
+	if (wcnt == -ENEXT)
+		return -ENEXT;
 	atomic_set(&zone->dirty, true);
 	/* flush dirty zones nicely */
 	if (wcnt == -EBUSY && !is_on_panic())
@@ -376,7 +379,11 @@ static int blkz_recover_dmesg_meta(struct blkz_context *cxt)
 			return -EINVAL;
 
 		rcnt = info->read((char *)buf, len, zone->off);
-		if (rcnt != len) {
+		if (rcnt == -ENEXT) {
+			pr_debug("%s with id %lu may be broken, skip\n",
+					zone->name, i);
+			continue;
+		} else if (rcnt != len) {
 			pr_err("read %s with id %lu failed\n", zone->name, i);
 			return (int)rcnt < 0 ? (int)rcnt : -EIO;
 		}
@@ -666,24 +673,58 @@ static void blkz_write_kmsg_hdr(struct blkz_zone *zone,
 		hdr->counter = 0;
 }
 
+/*
+ * In case zone is broken, which may occur to MTD device, we try each zones,
+ * start at cxt->dmesg_write_cnt.
+ */
 static inline int notrace blkz_dmesg_write_do(struct blkz_context *cxt,
 		struct pstore_record *record)
 {
+	int ret = -EBUSY;
 	size_t size, hlen;
 	struct blkz_zone *zone;
-	unsigned int zonenum;
+	unsigned int i;
 
-	zonenum = cxt->dmesg_write_cnt;
-	zone = cxt->dbzs[zonenum];
-	if (unlikely(!zone))
-		return -ENOSPC;
-	cxt->dmesg_write_cnt = (zonenum + 1) % cxt->dmesg_max_cnt;
+	for (i = 0; i < cxt->dmesg_max_cnt; i++) {
+		unsigned int zonenum, len;
+
+		zonenum = (cxt->dmesg_write_cnt + i) % cxt->dmesg_max_cnt;
+		zone = cxt->dbzs[zonenum];
+		if (unlikely(!zone))
+			return -ENOSPC;
 
-	pr_debug("write %s to zone id %d\n", zone->name, zonenum);
-	blkz_write_kmsg_hdr(zone, record);
-	hlen = sizeof(struct blkz_dmesg_header);
-	size = min_t(size_t, record->size, zone->buffer_size - hlen);
-	return blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		/* avoid destorying old data, allocate a new one */
+		len = zone->buffer_size + sizeof(*zone->buffer);
+		zone->oldbuf = zone->buffer;
+		zone->buffer = kzalloc(len, GFP_KERNEL);
+		if (!zone->buffer) {
+			zone->buffer = zone->oldbuf;
+			return -ENOMEM;
+		}
+		zone->buffer->sig = zone->oldbuf->sig;
+
+		pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+		blkz_write_kmsg_hdr(zone, record);
+		hlen = sizeof(struct blkz_dmesg_header);
+		size = min_t(size_t, record->size, zone->buffer_size - hlen);
+		ret = blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+		if (likely(!ret || ret != -ENEXT)) {
+			cxt->dmesg_write_cnt = zonenum + 1;
+			cxt->dmesg_write_cnt %= cxt->dmesg_max_cnt;
+			/* no need to try next zone, free last zone buffer */
+			kfree(zone->oldbuf);
+			zone->oldbuf = NULL;
+			return ret;
+		}
+
+		pr_debug("zone %u may be broken, try next dmesg zone\n",
+				zonenum);
+		kfree(zone->buffer);
+		zone->buffer = zone->oldbuf;
+		zone->oldbuf = NULL;
+	}
+
+	return -EBUSY;
 }
 
 static int notrace blkz_dmesg_write(struct blkz_context *cxt,
@@ -807,7 +848,6 @@ static int notrace blkz_pstore_write(struct pstore_record *record)
 	}
 }
 
-#define READ_NEXT_ZONE ((ssize_t)(-1024))
 static struct blkz_zone *blkz_read_next_zone(struct blkz_context *cxt)
 {
 	struct blkz_zone *zone = NULL;
@@ -868,7 +908,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	if (blkz_read_dmesg_hdr(zone, record)) {
 		atomic_set(&zone->buffer->datalen, 0);
 		atomic_set(&zone->dirty, 0);
-		return READ_NEXT_ZONE;
+		return -ENEXT;
 	}
 	size -= sizeof(struct blkz_dmesg_header);
 
@@ -893,7 +933,7 @@ static ssize_t blkz_dmesg_read(struct blkz_zone *zone,
 	if (unlikely(blkz_zone_read(zone, record->buf + hlen, size,
 				sizeof(struct blkz_dmesg_header)) < 0)) {
 		kfree(record->buf);
-		return READ_NEXT_ZONE;
+		return -ENEXT;
 	}
 
 	return size + hlen;
@@ -907,7 +947,7 @@ static ssize_t blkz_record_read(struct blkz_zone *zone,
 
 	buf = (struct blkz_buffer *)zone->oldbuf;
 	if (!buf)
-		return READ_NEXT_ZONE;
+		return -ENEXT;
 
 	size = atomic_read(&buf->datalen);
 	start = atomic_read(&buf->start);
@@ -959,7 +999,7 @@ static ssize_t blkz_pstore_read(struct pstore_record *record)
 	}
 
 	ret = readop(zone, record);
-	if (ret == READ_NEXT_ZONE)
+	if (ret == -ENEXT)
 		goto next_zone;
 	return ret;
 }
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index 8f40f225545d..71c596fd4cc8 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -27,6 +27,7 @@
  *	On error, negative number should be returned. The following returning
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
+ *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
  * @panic_write:
  *	The write operation only used for panic.
  *
@@ -45,7 +46,8 @@ struct blkoops_device {
 
 /*
  * Panic write for block device who should write alignmemt to SECTOR_SIZE.
- * On success, zero should be returned. Others mean error.
+ * On success, zero should be returned. Others mean error except that -ENEXT
+ * means the zone is used or broken, pstore/blk should try next one.
  */
 typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
 		sector_t sects);
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 77704c1b404a..bbbe4fe37f7c 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -6,6 +6,9 @@
 #include <linux/types.h>
 #include <linux/blkdev.h>
 
+/* read/write function return -ENEXT means try next zone */
+#define ENEXT ((ssize_t)(1024))
+
 /**
  * struct blkz_info - backend blkzone driver structure
  *
@@ -42,6 +45,7 @@
  *	On error, negative number should be returned. The following returning
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
+ *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
  * @panic_write:
  *	The write operation only used for panic. It's optional if you do not
  *	care panic record. If panic occur but blkzone do not recover yet, the
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 08/11] blkoops: respect for device to pick recorders
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (6 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

It's one of a series of patches for adaptive to MTD device.

MTD device is not block device. The sector of flash (MTD device) will be
broken if erase over limited cycles. Avoid damaging block so fast, we
can not write to a sector frequently. So, the recorders of pstore/blk
like console and ftrace recorder should not be supported.

Besides, mtd device need aligned write/erase size. To avoid
over-erasing/writing flash, we should keep a aligned cache and read old
data to cache before write/erase, which make codes more complex. So,
pmsg do not be supported now because it writes misaligned.

How about dmesg? Luckly, pstore/blk keeps several aligned chunks for
dmesg and uses one by one for wear balance.

So, MTD device for pstore should pick recorders, that is why the patch
here.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst |  9 +++++++++
 fs/pstore/blkoops.c                        | 29 +++++++++++++++++++++--------
 include/linux/blkoops.h                    | 14 +++++++++++++-
 3 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index aea6d2664a22..f4fc205406aa 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -164,6 +164,15 @@ It is only requested by block device which is registered by
 ``blkoops_register_blkdev``.  It's the major device number of registered
 devices, by which blkoops can get the matching driver for @blkdev.
 
+flags
+~~~~~
+
+Refer to macro starting with *BLKOOPS_DEV_SUPPORT_* which is defined in
+*linux/blkoops.h*. They tell us that which pstore/blk recorders this device
+supports. Default zero means all recorders for compatible, witch is the same
+as BLKOOPS_DEV_SUPPORT_ALL. Recorder works only when chunk size is not zero
+and device support.
+
 total_size
 ~~~~~~~~~~
 
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 3f28e61c8ecf..d9b51880144b 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -143,9 +143,16 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 		return -ENOMEM;
 	}
 
-#define verify_size(name, defsize, alignsize) {				\
-		long _##name_ = (name);					\
-		if (_##name_ < 0)					\
+	/* zero means all recorders for compatible */
+	if (bo_dev->flags == BLKOOPS_DEV_SUPPORT_DEFAULT)
+		bo_dev->flags = BLKOOPS_DEV_SUPPORT_ALL;
+#define verify_size(name, defsize, alignsize, enable) {			\
+		long _##name_;						\
+		if (!(enable))						\
+			_##name_ = 0;					\
+		else if ((name) >= 0)					\
+			_##name_ = (name);				\
+		else							\
 			_##name_ = (defsize);				\
 		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
 		if (_##name_ & (alignsize - 1)) {			\
@@ -157,10 +164,14 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 		bzinfo->name = _##name_;				\
 	}
 
-	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
-	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
-	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096);
-	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
+	verify_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_DMESG);
+	verify_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_PMSG);
+	verify_size(console_size, DEFAULT_CONSOLE_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_CONSOLE);
+	verify_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096,
+			bo_dev->flags & BLKOOPS_DEV_SUPPORT_FTRACE);
 #undef verify_size
 	dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
 
@@ -351,6 +362,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
  * register block device to blkoops
  * @major: the major device number of registering device
  * @panic_write: the write interface for panic case.
+ * @flags: Refer to macro starting with BLKOOPS_DEV_SUPPORT.
  *
  * It is ONLY used for block device to register to blkoops. In this case,
  * the module parameter @blkdev must be valid. Generic read/write interfaces
@@ -364,7 +376,7 @@ static ssize_t blkoops_blk_panic_write(const char *buf, size_t size,
  * panic occurs but pstore/blk does not recover yet, the first zone of dmesg
  * will be used.
  */
-int blkoops_register_blkdev(unsigned int major,
+int blkoops_register_blkdev(unsigned int major, unsigned int flags,
 		blkoops_blk_panic_write_op panic_write)
 {
 	struct block_device *bdev;
@@ -387,6 +399,7 @@ int blkoops_register_blkdev(unsigned int major,
 	if (bo_dev.total_size == 0)
 		goto err_put_bdev;
 	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
+	bo_dev.flags = flags;
 	bo_dev.read = blkoops_generic_blk_read;
 	bo_dev.write = blkoops_generic_blk_write;
 
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index 71c596fd4cc8..bc7665d14a98 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 #include <linux/blkdev.h>
 #include <linux/pstore_blk.h>
+#include <linux/bitops.h>
 
 /**
  * struct blkoops_device - backend blkoops driver structure.
@@ -14,6 +15,10 @@
  * blkoops_register_device(). If block device, you are strongly recommended
  * to use blkoops_register_blkdev().
  *
+ * @flags:
+ *	Refer to macro starting with BLKOOPS_DEV_SUPPORT_. These macros tell
+ *	us that which pstore/blk recorders this device supports. Zero means
+ *	all recorders for compatible.
  * @total_size:
  *	The total size in bytes pstore/blk can use. It must be greater than
  *	4096 and be multiple of 4096.
@@ -38,6 +43,13 @@
  *	On error, negative number should be returned.
  */
 struct blkoops_device {
+	unsigned int flags;
+#define BLKOOPS_DEV_SUPPORT_ALL		UINT_MAX
+#define BLKOOPS_DEV_SUPPORT_DEFAULT	(0)
+#define BLKOOPS_DEV_SUPPORT_DMESG	BIT(0)
+#define BLKOOPS_DEV_SUPPORT_PMSG	BIT(1)
+#define BLKOOPS_DEV_SUPPORT_CONSOLE	BIT(2)
+#define BLKOOPS_DEV_SUPPORT_FTRACE	BIT(3)
 	unsigned long total_size;
 	blkz_read_op read;
 	blkz_write_op write;
@@ -54,7 +66,7 @@ typedef int (*blkoops_blk_panic_write_op)(const char *buf, sector_t start_sect,
 
 int  blkoops_register_device(struct blkoops_device *bo_dev);
 void blkoops_unregister_device(struct blkoops_device *bo_dev);
-int  blkoops_register_blkdev(unsigned int major,
+int  blkoops_register_blkdev(unsigned int major, unsigned int flags,
 		blkoops_blk_panic_write_op panic_write);
 void blkoops_unregister_blkdev(unsigned int major);
 int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 09/11] pstore/blk: blkoops: support special removing jobs for dmesg.
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (7 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 10/11] blkoops: add interface for dirver to get information of blkoops WeiXiong Liao
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

It's one of a series of patches for adaptive to MTD device.

MTD device is not block device. To write to flash device on MTD, erase
must to be done before. However, pstore/blk just set datalen as 0 when
remove, which is not enough for mtd device. That's why this patch here,
to support special jobs when removing pstore/blk record.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 Documentation/admin-guide/pstore-block.rst |  9 +++++++++
 fs/pstore/blkoops.c                        |  4 +++-
 fs/pstore/blkzone.c                        |  9 ++++++++-
 include/linux/blkoops.h                    | 10 ++++++++++
 include/linux/pstore_blk.h                 | 11 +++++++++++
 5 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index f4fc205406aa..e351b0ebd8e5 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -197,6 +197,15 @@ negative number will be returned. The following return numbers mean more:
 1. -EBUSY: pstore/blk should try again later.
 #. -ENEXT: this zone is used or broken, pstore/blk should try next one.
 
+erase
+~~~~~
+
+It's generic erase API for pstore/blk, which is requested by non-block device.
+It will be called while pstore record is removing. It's required only when the
+device has special removing jobs. For example, MTD device tries to erase block.
+
+Normally zero should be returned, otherwise it indicates an error.
+
 panic_write (for non-block device)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index d9b51880144b..6b74189e5820 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -179,6 +179,7 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 	bzinfo->dump_oops = dump_oops;
 	bzinfo->read = bo_dev->read;
 	bzinfo->write = bo_dev->write;
+	bzinfo->erase = bo_dev->erase;
 	bzinfo->panic_write = bo_dev->panic_write;
 	bzinfo->name = "blkoops";
 	bzinfo->owner = THIS_MODULE;
@@ -398,10 +399,11 @@ int blkoops_register_blkdev(unsigned int major, unsigned int flags,
 	bo_dev.total_size = blkoops_bdev_size(bdev);
 	if (bo_dev.total_size == 0)
 		goto err_put_bdev;
-	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
 	bo_dev.flags = flags;
 	bo_dev.read = blkoops_generic_blk_read;
 	bo_dev.write = blkoops_generic_blk_write;
+	bo_dev.erase = NULL;
+	bo_dev.panic_write = panic_write ? blkoops_blk_panic_write : NULL;
 
 	ret = blkoops_register_device(&bo_dev);
 	if (ret)
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index 3f58ff85f49c..a006a4a5b012 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -609,11 +609,18 @@ static inline bool blkz_ok(struct blkz_zone *zone)
 static inline int blkz_dmesg_erase(struct blkz_context *cxt,
 		struct blkz_zone *zone)
 {
+	size_t size;
+
 	if (unlikely(!blkz_ok(zone)))
 		return 0;
 
 	atomic_set(&zone->buffer->datalen, 0);
-	return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
+
+	size = buffer_datalen(zone) + sizeof(*zone->buffer);
+	if (cxt->bzinfo->erase)
+		return cxt->bzinfo->erase(size, zone->off);
+	else
+		return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0);
 }
 
 static inline int blkz_record_erase(struct blkz_context *cxt,
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index bc7665d14a98..11cb3036ad5f 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -33,6 +33,15 @@
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
  *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
+ * @erase:
+ *	The general (not panic) erase operation. It will be call while pstore
+ *	record is removing. It's required only when device have special
+ *	removing jobs, for example, MTD device try to erase block.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, 0 should be returned. Others mean error.
  * @panic_write:
  *	The write operation only used for panic.
  *
@@ -53,6 +62,7 @@ struct blkoops_device {
 	unsigned long total_size;
 	blkz_read_op read;
 	blkz_write_op write;
+	blkz_erase_op erase;
 	blkz_write_op panic_write;
 };
 
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index bbbe4fe37f7c..9641969f888f 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -46,6 +46,15 @@
  *	number means more:
  *	  -EBUSY: pstore/blk should try again later.
  *	  -ENEXT: this zone is used or broken, pstore/blk should try next one.
+ * @erase:
+ *	The general (not panic) erase operation. It will be call while pstore
+ *	record is removing. It's required only when device have special
+ *	removing jobs, for example, MTD device try to erase block.
+ *
+ *	Both of the @size and @offset parameters on this interface are
+ *	the relative size of the space provided, not the whole disk/flash.
+ *
+ *	On success, 0 should be returned. Others mean error.
  * @panic_write:
  *	The write operation only used for panic. It's optional if you do not
  *	care panic record. If panic occur but blkzone do not recover yet, the
@@ -59,6 +68,7 @@
  */
 typedef ssize_t (*blkz_read_op)(char *, size_t, loff_t);
 typedef ssize_t (*blkz_write_op)(const char *, size_t, loff_t);
+typedef ssize_t (*blkz_erase_op)(size_t, loff_t);
 struct blkz_info {
 	struct module *owner;
 	const char *name;
@@ -71,6 +81,7 @@ struct blkz_info {
 	int dump_oops;
 	blkz_read_op read;
 	blkz_write_op write;
+	blkz_erase_op erase;
 	blkz_write_op panic_write;
 };
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 10/11] blkoops: add interface for dirver to get information of blkoops
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (8 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20  1:03 ` [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
  2020-02-06  9:13 ` [PATCH v1 00/11] pstore: support crash log to block and mtd device Kees Cook
  11 siblings, 0 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

It's one of a series of patches for adaptive to MTD device.

MTD device need to check size of recorder and get mtddev index to verify
which mtd device to use. All it needs is defined in blkoops. So, there
should be a interface for MTD driver to get all information it need.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
---
 fs/pstore/blkoops.c     | 47 ++++++++++++++++++++++++++++++++++++-----------
 include/linux/blkoops.h | 10 ++++++++++
 2 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 6b74189e5820..a2f1e31488bb 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -117,6 +117,20 @@
 #define DEFAULT_BLKDEV ""
 #endif
 
+#define check_size(name, defsize, alignsize) ({			\
+	long _##name_ = (name);					\
+	if ((name) < 0)						\
+		_##name_ = (defsize);				\
+	_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
+	if (_##name_ & (alignsize - 1)) {			\
+		pr_info(#name " must align to %d\n",		\
+				(alignsize));			\
+		_##name_ = ALIGN(name, alignsize);		\
+	}							\
+	_##name_;						\
+})
+
+
 /**
  * register device to blkoops
  *
@@ -148,18 +162,10 @@ int blkoops_register_device(struct blkoops_device *bo_dev)
 		bo_dev->flags = BLKOOPS_DEV_SUPPORT_ALL;
 #define verify_size(name, defsize, alignsize, enable) {			\
 		long _##name_;						\
-		if (!(enable))						\
-			_##name_ = 0;					\
-		else if ((name) >= 0)					\
-			_##name_ = (name);				\
+		if (enable)						\
+			_##name_ = check_size(name, defsize, alignsize);\
 		else							\
-			_##name_ = (defsize);				\
-		_##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024);	\
-		if (_##name_ & (alignsize - 1)) {			\
-			pr_info(#name " must align to %d\n",		\
-					(alignsize));			\
-			_##name_ = ALIGN(name, alignsize);		\
-		}							\
+			_##name_ = 0;					\
 		name = _##name_ / 1024;					\
 		bzinfo->name = _##name_;				\
 	}
@@ -460,6 +466,25 @@ int blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
 }
 EXPORT_SYMBOL_GPL(blkoops_blkdev_info);
 
+/* get information of blkoops */
+int  blkoops_info(struct blkoops_info *info)
+{
+	if (!blkdev[0] && strlen(DEFAULT_BLKDEV))
+		snprintf(blkdev, 80, "%s", DEFAULT_BLKDEV);
+
+	memcpy(info->device, blkdev, 80);
+	info->dump_oops = !!(dump_oops < 0 ? DEFAULT_DUMP_OOPS : dump_oops);
+
+	info->dmesg_size = check_size(dmesg_size, DEFAULT_DMESG_SIZE, 4096);
+	info->pmsg_size = check_size(pmsg_size, DEFAULT_PMSG_SIZE, 4096);
+	info->ftrace_size = check_size(ftrace_size, DEFAULT_FTRACE_SIZE, 4096);
+	info->console_size = check_size(console_size, DEFAULT_CONSOLE_SIZE,
+			4096);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkoops_info);
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
 MODULE_DESCRIPTION("Wrapper for Pstore BLK with Oops logger");
diff --git a/include/linux/blkoops.h b/include/linux/blkoops.h
index 11cb3036ad5f..ea56f3f92360 100644
--- a/include/linux/blkoops.h
+++ b/include/linux/blkoops.h
@@ -81,4 +81,14 @@ int  blkoops_register_blkdev(unsigned int major, unsigned int flags,
 void blkoops_unregister_blkdev(unsigned int major);
 int  blkoops_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
 
+struct blkoops_info {
+	int dump_oops;
+	char device[80];
+	unsigned long dmesg_size;
+	unsigned long pmsg_size;
+	unsigned long console_size;
+	unsigned long ftrace_size;
+};
+int  blkoops_info(struct blkoops_info *info);
+
 #endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (9 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 10/11] blkoops: add interface for dirver to get information of blkoops WeiXiong Liao
@ 2020-01-20  1:03 ` WeiXiong Liao
  2020-01-20 10:03   ` Miquel Raynal
  2020-01-23  4:24   ` Vignesh Raghavendra
  2020-02-06  9:13 ` [PATCH v1 00/11] pstore: support crash log to block and mtd device Kees Cook
  11 siblings, 2 replies; 32+ messages in thread
From: WeiXiong Liao @ 2020-01-20  1:03 UTC (permalink / raw)
  To: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron, WeiXiong Liao
  Cc: linux-doc, linux-kernel, linux-mtd

It's the last one of a series of patches for adaptive to MTD device.

The mtdpstore is similar to mtdoops but more powerful. It bases on
pstore/blk, aims to store panic and oops log to a flash partition,
where it can be read back as files after mounting pstore filesystem.

The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
block device at the very beginning, but now, compatible to not only
block device. After this series of patches, pstore/blk can also work
for MTD device. To make it work, 'blkdev' on kconfig or module
parameter of blkoops should be set as mtd device name or mtd number.
See more about pstore/blk and blkoops on:
    Documentation/admin-guide/pstore-block.rst

Why do we need mtdpstore?
1. repetitive jobs between pstore and mtdoops
   Both of pstore and mtdoops do the same jobs that store panic/oops log.
   They have much similar logic that register to kmsg dumper and store
   log to several chunks one by one.
2. do what a driver should do
   To me, a driver should provide methods instead of policies. What MTD
   should do is to provide read/write/erase operations, geting rid of codes
   about chunk management, kmsg dumper and configuration.
3. enhanced feature
   Not only store log, but also show it as files.
   Not only log, but also trigger time and trigger count.
   Not only panic/oops log, but also log recorder for pmsg, console and
   ftrace in the future.

Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
Reported-by: kbuild test robot <lkp@intel.com>
---
 drivers/mtd/Kconfig     |  10 +
 drivers/mtd/Makefile    |   1 +
 drivers/mtd/mtdpstore.c | 530 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 541 insertions(+)
 create mode 100644 drivers/mtd/mtdpstore.c

diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
index 42d401ea60ee..a6e59495a738 100644
--- a/drivers/mtd/Kconfig
+++ b/drivers/mtd/Kconfig
@@ -170,6 +170,16 @@ config MTD_OOPS
 	  buffer in a flash partition where it can be read back at some
 	  later point.
 
+config MTD_PSTORE
+	tristate "Log panic/oops to an MTD buffer base on pstore"
+	depends on PSTORE_BLKOOPS
+	help
+	  This enables panic and oops messages to be logged to a circular
+	  buffer in a flash partition where it can be read back as files after
+	  mounting pstore filesystem.
+
+	  If unsure, say N.
+
 config MTD_SWAP
 	tristate "Swap on MTD device support"
 	depends on MTD && SWAP
diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index 56cc60ccc477..593d0593a038 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
 obj-$(CONFIG_SSFDC)		+= ssfdc.o
 obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
 obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
+obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
 obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
 
 nftl-objs		:= nftlcore.o nftlmount.o
diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
new file mode 100644
index 000000000000..ab4acd3a9011
--- /dev/null
+++ b/drivers/mtd/mtdpstore.c
@@ -0,0 +1,530 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * MTD Oops/Panic loger for pstore/blk
+ *
+ * Copyright (C) 2019 WeiXiong Liao <liaoweixiong@gallwinnertech.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+#define pr_fmt(fmt) "mtdoops-pstore: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/blkoops.h>
+#include <linux/mtd/mtd.h>
+#include <linux/bitops.h>
+
+/* Maximum MTD partition size */
+#define MTDPSTORE_MAX_MTD_SIZE (8 * 1024 * 1024)
+
+static struct mtdpstore_context {
+	int index;
+	struct blkoops_info bo_info;
+	struct blkoops_device bo_dev;
+	struct mtd_info *mtd;
+	unsigned long *rmmap;		/* removed bit map */
+	unsigned long *usedmap;		/* used bit map */
+	/*
+	 * used for panic write
+	 * As there are no block_isbad for panic case, we should keep this
+	 * status before panic to ensure panic_write not failed.
+	 */
+	unsigned long *badmap;		/* bad block bit map */
+} oops_cxt;
+
+static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret;
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	ret = mtd_block_isbad(mtd, off);
+	if (ret < 0) {
+		pr_err("mtd_block_isbad failed, aborting\n");
+		return ret;
+	} else if (ret > 0) {
+		set_bit(blknum, cxt->badmap);
+		return true;
+	}
+	return false;
+}
+
+static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u64 blknum = div_u64(off, mtd->erasesize);
+
+	return test_bit(blknum, cxt->badmap);
+}
+
+static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+
+	pr_debug("mark zone %llu used\n", zonenum);
+	set_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+
+	pr_debug("mark zone %llu unused\n", zonenum);
+	clear_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		pr_debug("mark zone %llu unused\n", zonenum);
+		clear_bit(zonenum, cxt->usedmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u64 blknum = div_u64(off, cxt->mtd->erasesize);
+
+	if (test_bit(blknum, cxt->badmap))
+		return true;
+	return test_bit(zonenum, cxt->usedmap);
+}
+
+static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->usedmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
+		size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	size_t sz;
+	int i;
+
+	sz = min_t(uint32_t, size, mtd->writesize / 4);
+	for (i = 0; i < sz; i++) {
+		if (buf[i] != (char)0xFF)
+			return false;
+	}
+	return true;
+}
+
+static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+
+	pr_debug("mark zone %llu removed\n", zonenum);
+	set_bit(zonenum, cxt->rmmap);
+}
+
+static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		clear_bit(zonenum, cxt->rmmap);
+		zonenum++;
+		zonecnt--;
+	}
+}
+
+static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
+		loff_t off)
+{
+	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
+
+	while (zonecnt > 0) {
+		if (test_bit(zonenum, cxt->rmmap))
+			return true;
+		zonenum++;
+		zonecnt--;
+	}
+	return false;
+}
+
+static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
+{
+	struct erase_info erase;
+	int ret;
+
+	pr_debug("try to erase off 0x%llx\n", off);
+	erase.len = cxt->mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(cxt->mtd, &erase);
+	if (!ret)
+		mtdpstore_block_clear_removed(cxt, off);
+	else
+		pr_err("erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
+		       (unsigned long long)erase.addr,
+		       (unsigned long long)erase.len, cxt->bo_info.device);
+	return ret;
+}
+
+/*
+ * called while removing file
+ *
+ * Avoiding over erasing, do erase only when all zones are removed or unused.
+ * Ensure to remove when unregister by reading, erasing and wrtiing back.
+ */
+static ssize_t mtdpstore_erase(size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -EIO;
+
+	mtdpstore_mark_unused(cxt, off);
+
+	if (likely(mtdpstore_block_is_used(cxt, off))) {
+		mtdpstore_mark_removed(cxt, off);
+		return 0;
+	}
+
+	/* all zones are unused, erase it */
+	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
+	return mtdpstore_erase_do(cxt, off);
+}
+
+/*
+ * What is securety for mtdpstore?
+ * As there is no erase for panic case, we should ensure at least one zone
+ * is writable. Otherwise, panic write will be failed.
+ * If zone is used, write operation will return -ENEXT, which means that
+ * pstore/blk will try one by one until get a empty zone. So, it's no need
+ * to ensure next zone is empty, but at least one.
+ */
+static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
+{
+	int ret = 0, i;
+	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
+	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
+	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
+	u32 erasesize = cxt->mtd->erasesize;
+
+	for (i = 0; i < zonecnt; i++) {
+		u32 num = (zonenum + i) % zonecnt;
+
+		/* found empty zone */
+		if (!test_bit(num, cxt->usedmap))
+			return 0;
+	}
+
+	/* If there is no any empty zone, we have no way but to do erase */
+	off = ALIGN_DOWN(off, erasesize);
+	while (blkcnt--) {
+		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
+
+		if (mtdpstore_block_isbad(cxt, off))
+			continue;
+
+		ret = mtdpstore_erase_do(cxt, off);
+		if (!ret) {
+			mtdpstore_block_mark_unused(cxt, off);
+			break;
+		}
+	}
+
+	if (ret)
+		pr_err("all blocks bad!\n");
+	pr_debug("end security\n");
+	return ret;
+}
+
+static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENEXT;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENEXT;
+
+	pr_debug("try to write off 0x%llx size %zu\n", off, size);
+	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || retlen != size) {
+		pr_err("write failure at %lld (%zu of %zu written), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+/*
+ * All zones will be read as pstore/blk will read zone one by one when do
+ * recover.
+ */
+static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_block_isbad(cxt, off))
+		return -ENEXT;
+
+	pr_debug("try to read off 0x%llx size %zu\n", off, size);
+	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
+		pr_err("read failure at %lld (%zu of %zu read), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+
+	if (mtdpstore_is_empty(cxt, buf, size))
+		mtdpstore_mark_unused(cxt, off);
+	else
+		mtdpstore_mark_used(cxt, off);
+
+	mtdpstore_security(cxt, off);
+	return retlen;
+}
+
+static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+	size_t retlen;
+	int ret;
+
+	if (mtdpstore_panic_block_isbad(cxt, off))
+		return -ENEXT;
+
+	/* zone is used, please try next one */
+	if (mtdpstore_is_used(cxt, off))
+		return -ENEXT;
+
+	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+	if (ret < 0 || size != retlen) {
+		pr_err("panic write failure at %lld (%zu of %zu read), err %d\n",
+				off, retlen, size, ret);
+		return -EIO;
+	}
+	mtdpstore_mark_used(cxt, off);
+
+	return retlen;
+}
+
+static void mtdpstore_notify_add(struct mtd_info *mtd)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct blkoops_info *info = &cxt->bo_info;
+	unsigned long longcnt;
+
+	if (!strcmp(mtd->name, info->device))
+		cxt->index = mtd->index;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	pr_debug("found matching MTD device %s\n", mtd->name);
+
+	if (mtd->size < info->dmesg_size * 2) {
+		pr_err("MTD partition %d not big enough\n", mtd->index);
+		return;
+	}
+	if (mtd->erasesize < info->dmesg_size) {
+		pr_err("eraseblock size of MTD partition %d too small\n",
+				mtd->index);
+		return;
+	}
+	if (unlikely(info->dmesg_size % mtd->writesize)) {
+		pr_err("record size %lu KB must align to write size %d KB\n",
+				info->dmesg_size / 1024,
+				mtd->writesize / 1024);
+		return;
+	}
+	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
+		pr_err("mtd%d is too large (limit is %d MiB)\n",
+				mtd->index,
+				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);
+		return;
+	}
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
+	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
+	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+	cxt->bo_dev.total_size = mtd->size;
+	/* just support dmesg right now */
+	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
+	cxt->bo_dev.read = mtdpstore_read;
+	cxt->bo_dev.write = mtdpstore_write;
+	cxt->bo_dev.erase = mtdpstore_erase;
+	cxt->bo_dev.panic_write = mtdpstore_panic_write;
+
+	ret = blkoops_register_device(&cxt->bo_dev);
+	if (ret) {
+		pr_err("mtd%d register to blkoops failed\n", mtd->index);
+		return;
+	}
+	cxt->mtd = mtd;
+	pr_info("Attached to MTD device %d\n", mtd->index);
+}
+
+static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
+		loff_t off, size_t size)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	u_char *buf;
+	int ret;
+	size_t retlen;
+	struct erase_info erase;
+
+	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* 1st. read to cache */
+	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
+	if (ret || retlen != mtd->erasesize)
+		goto free;
+
+	/* 2nd. erase block */
+	erase.len = mtd->erasesize;
+	erase.addr = off;
+	ret = mtd_erase(mtd, &erase);
+	if (ret)
+		goto free;
+
+	/* 3rd. write back */
+	while (size) {
+		unsigned int zonesize = cxt->bo_info.dmesg_size;
+
+		/* remove must clear used bit */
+		if (mtdpstore_is_used(cxt, off))
+			mtd_write(mtd, off, zonesize, &retlen, buf);
+
+		off += zonesize;
+		size -= min_t(unsigned int, zonesize, size);
+	}
+
+free:
+	kfree(buf);
+	return ret;
+}
+
+static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
+{
+	struct mtd_info *mtd = cxt->mtd;
+	int ret;
+	loff_t off;
+	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
+
+	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
+		ret = mtdpstore_block_is_removed(cxt, off);
+		if (!ret) {
+			off += mtd->erasesize;
+			continue;
+		}
+
+		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static void mtdpstore_notify_remove(struct mtd_info *mtd)
+{
+	struct mtdpstore_context *cxt = &oops_cxt;
+
+	if (mtd->index != cxt->index || cxt->index < 0)
+		return;
+
+	mtdpstore_flush_removed(cxt);
+
+	blkoops_unregister_device(&cxt->bo_dev);
+	kfree(cxt->badmap);
+	kfree(cxt->usedmap);
+	kfree(cxt->rmmap);
+	cxt->mtd = NULL;
+	cxt->index = -1;
+}
+
+static struct mtd_notifier mtdpstore_notifier = {
+	.add	= mtdpstore_notify_add,
+	.remove	= mtdpstore_notify_remove,
+};
+
+static int __init mtdpstore_init(void)
+{
+	int ret;
+	struct mtdpstore_context *cxt = &oops_cxt;
+	struct blkoops_info *info = &cxt->bo_info;
+
+	ret = blkoops_info(info);
+	if (unlikely(ret))
+		return ret;
+
+	if (strlen(info->device) == 0) {
+		pr_err("mtd device must be supplied\n");
+		return -EINVAL;
+	}
+	if (!info->dmesg_size) {
+		pr_err("no recorder enabled\n");
+		return -EINVAL;
+	}
+
+	/* Setup the MTD device to use */
+	ret = kstrtoint((char *)info->device, 0, &cxt->index);
+	if (ret)
+		cxt->index = -1;
+
+	register_mtd_user(&mtdpstore_notifier);
+	return 0;
+}
+module_init(mtdpstore_init);
+
+static void __exit mtdpstore_exit(void)
+{
+	unregister_mtd_user(&mtdpstore_notifier);
+}
+module_exit(mtdpstore_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
+MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-20  1:03 ` [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
@ 2020-01-20 10:03   ` Miquel Raynal
  2020-01-21  3:36     ` liaoweixiong
  2020-01-23  4:24   ` Vignesh Raghavendra
  1 sibling, 1 reply; 32+ messages in thread
From: Miquel Raynal @ 2020-01-20 10:03 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

Hi WeiXiong,

WeiXiong Liao <liaoweixiong@allwinnertech.com> wrote on Mon, 20 Jan
2020 09:03:53 +0800:

> It's the last one of a series of patches for adaptive to MTD device.
> 
> The mtdpstore is similar to mtdoops but more powerful. It bases on
> pstore/blk, aims to store panic and oops log to a flash partition,

                                           logs?

> where it can be read back as files after mounting pstore filesystem.
> 
> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
> block device at the very beginning, but now, compatible to not only
> block device. After this series of patches, pstore/blk can also work
> for MTD device. To make it work, 'blkdev' on kconfig or module
> parameter of blkoops should be set as mtd device name or mtd number.
> See more about pstore/blk and blkoops on:
>     Documentation/admin-guide/pstore-block.rst
> 
> Why do we need mtdpstore?
> 1. repetitive jobs between pstore and mtdoops
>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
>    They have much similar logic that register to kmsg dumper and store
>    log to several chunks one by one.
> 2. do what a driver should do
>    To me, a driver should provide methods instead of policies. What MTD
>    should do is to provide read/write/erase operations, geting rid of codes
>    about chunk management, kmsg dumper and configuration.
> 3. enhanced feature
>    Not only store log, but also show it as files.
>    Not only log, but also trigger time and trigger count.
>    Not only panic/oops log, but also log recorder for pmsg, console and
>    ftrace in the future.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> Reported-by: kbuild test robot <lkp@intel.com>

I don't thing the test robot has a meaning here.

> ---
>  drivers/mtd/Kconfig     |  10 +
>  drivers/mtd/Makefile    |   1 +
>  drivers/mtd/mtdpstore.c | 530 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 541 insertions(+)
>  create mode 100644 drivers/mtd/mtdpstore.c
> 
> diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
> index 42d401ea60ee..a6e59495a738 100644
> --- a/drivers/mtd/Kconfig
> +++ b/drivers/mtd/Kconfig
> @@ -170,6 +170,16 @@ config MTD_OOPS
>  	  buffer in a flash partition where it can be read back at some
>  	  later point.
>  
> +config MTD_PSTORE
> +	tristate "Log panic/oops to an MTD buffer base on pstore"

                                                  based

> +	depends on PSTORE_BLKOOPS
> +	help
> +	  This enables panic and oops messages to be logged to a circular
> +	  buffer in a flash partition where it can be read back as files after
> +	  mounting pstore filesystem.
> +
> +	  If unsure, say N.
> +
>  config MTD_SWAP
>  	tristate "Swap on MTD device support"
>  	depends on MTD && SWAP
> diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
> index 56cc60ccc477..593d0593a038 100644
> --- a/drivers/mtd/Makefile
> +++ b/drivers/mtd/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
>  obj-$(CONFIG_SSFDC)		+= ssfdc.o
>  obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
>  obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
> +obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
>  obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
>  
>  nftl-objs		:= nftlcore.o nftlmount.o
> diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
> new file mode 100644
> index 000000000000..ab4acd3a9011
> --- /dev/null
> +++ b/drivers/mtd/mtdpstore.c
> @@ -0,0 +1,530 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * MTD Oops/Panic loger for pstore/blk
> + *
> + * Copyright (C) 2019 WeiXiong Liao <liaoweixiong@gallwinnertech.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.

The license text is not needed since you added SPDX tag.

> + *
> + */
> +#define pr_fmt(fmt) "mtdoops-pstore: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/blkoops.h>
> +#include <linux/mtd/mtd.h>
> +#include <linux/bitops.h>
> +
> +/* Maximum MTD partition size */
> +#define MTDPSTORE_MAX_MTD_SIZE (8 * 1024 * 1024)

                                  SZ_8M

> +
> +static struct mtdpstore_context {
> +	int index;
> +	struct blkoops_info bo_info;
> +	struct blkoops_device bo_dev;
> +	struct mtd_info *mtd;
> +	unsigned long *rmmap;		/* removed bit map */
> +	unsigned long *usedmap;		/* used bit map */
> +	/*
> +	 * used for panic write
> +	 * As there are no block_isbad for panic case, we should keep this
> +	 * status before panic to ensure panic_write not failed.
> +	 */
> +	unsigned long *badmap;		/* bad block bit map */
> +} oops_cxt;
> +
> +static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret;
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	ret = mtd_block_isbad(mtd, off);
> +	if (ret < 0) {
> +		pr_err("mtd_block_isbad failed, aborting\n");
> +		return ret;
> +	} else if (ret > 0) {
> +		set_bit(blknum, cxt->badmap);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	return test_bit(blknum, cxt->badmap);
> +}
> +
> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	pr_debug("mark zone %llu used\n", zonenum);
> +	set_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	pr_debug("mark zone %llu unused\n", zonenum);
> +	clear_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		pr_debug("mark zone %llu unused\n", zonenum);
> +		clear_bit(zonenum, cxt->usedmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	return test_bit(zonenum, cxt->usedmap);
> +}
> +
> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->usedmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
> +		size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t sz;
> +	int i;
> +
> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
> +	for (i = 0; i < sz; i++) {
> +		if (buf[i] != (char)0xFF)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	pr_debug("mark zone %llu removed\n", zonenum);
> +	set_bit(zonenum, cxt->rmmap);
> +}
> +
> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		clear_bit(zonenum, cxt->rmmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->rmmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct erase_info erase;
> +	int ret;
> +
> +	pr_debug("try to erase off 0x%llx\n", off);
> +	erase.len = cxt->mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(cxt->mtd, &erase);
> +	if (!ret)
> +		mtdpstore_block_clear_removed(cxt, off);
> +	else
> +		pr_err("erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
> +		       (unsigned long long)erase.addr,
> +		       (unsigned long long)erase.len, cxt->bo_info.device);
> +	return ret;
> +}
> +
> +/*
> + * called while removing file
> + *
> + * Avoiding over erasing, do erase only when all zones are removed or unused.
> + * Ensure to remove when unregister by reading, erasing and wrtiing back.
> + */
> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -EIO;
> +
> +	mtdpstore_mark_unused(cxt, off);
> +
> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
> +		mtdpstore_mark_removed(cxt, off);
> +		return 0;
> +	}
> +
> +	/* all zones are unused, erase it */
> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
> +	return mtdpstore_erase_do(cxt, off);
> +}
> +
> +/*
> + * What is securety for mtdpstore?

              security

> + * As there is no erase for panic case, we should ensure at least one zone
> + * is writable. Otherwise, panic write will be failed.

                                          will fail.

> + * If zone is used, write operation will return -ENEXT, which means that
> + * pstore/blk will try one by one until get a empty zone. So, it's no need

                                           it gets an empty zone. So it
                                           is not needed to ...
    
> + * to ensure next zone is empty, but at least one.

               the

> + */
> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret = 0, i;
> +	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
> +	u32 erasesize = cxt->mtd->erasesize;
> +
> +	for (i = 0; i < zonecnt; i++) {
> +		u32 num = (zonenum + i) % zonecnt;
> +
> +		/* found empty zone */
> +		if (!test_bit(num, cxt->usedmap))
> +			return 0;
> +	}
> +
> +	/* If there is no any empty zone, we have no way but to do erase */
> +	off = ALIGN_DOWN(off, erasesize);
> +	while (blkcnt--) {
> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
> +
> +		if (mtdpstore_block_isbad(cxt, off))
> +			continue;
> +
> +		ret = mtdpstore_erase_do(cxt, off);
> +		if (!ret) {
> +			mtdpstore_block_mark_unused(cxt, off);
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		pr_err("all blocks bad!\n");
> +	pr_debug("end security\n");
> +	return ret;
> +}
> +
> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENEXT;
> +
> +	pr_debug("try to write off 0x%llx size %zu\n", off, size);
> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || retlen != size) {
> +		pr_err("write failure at %lld (%zu of %zu written), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +/*
> + * All zones will be read as pstore/blk will read zone one by one when do
> + * recover.
> + */
> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {

IIRC size != retlen does not mean it failed, but that you should
continue reading after retlen bytes, no?

Also, mtd_is_bitflip() does not mean that you are reading a false
buffer, but that the data has been corrected as it contained bitflips.
mtd_is_eccerr() however, would be meaningful.

> +		pr_err("read failure at %lld (%zu of %zu read), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +
> +	if (mtdpstore_is_empty(cxt, buf, size))
> +		mtdpstore_mark_unused(cxt, off);
> +	else
> +		mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_panic_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENEXT;
> +
> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || size != retlen) {
> +		pr_err("panic write failure at %lld (%zu of %zu read), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	return retlen;
> +}
> +
> +static void mtdpstore_notify_add(struct mtd_info *mtd)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct blkoops_info *info = &cxt->bo_info;
> +	unsigned long longcnt;
> +
> +	if (!strcmp(mtd->name, info->device))
> +		cxt->index = mtd->index;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	pr_debug("found matching MTD device %s\n", mtd->name);
> +
> +	if (mtd->size < info->dmesg_size * 2) {
> +		pr_err("MTD partition %d not big enough\n", mtd->index);
> +		return;
> +	}
> +	if (mtd->erasesize < info->dmesg_size) {
> +		pr_err("eraseblock size of MTD partition %d too small\n",
> +				mtd->index);

What is the usual size of dmesg? Could this check be too limiting?

> +		return;
> +	}
> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
> +		pr_err("record size %lu KB must align to write size %d KB\n",
> +				info->dmesg_size / 1024,
> +				mtd->writesize / 1024);

This condition is weird, why would you check this?

> +		return;
> +	}
> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
> +				mtd->index,
> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);

Same question? I could understand that it is easier to manage blocks
knowing their maximum number though.

> +		return;
> +	}
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	cxt->bo_dev.total_size = mtd->size;
> +	/* just support dmesg right now */
> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
> +	cxt->bo_dev.read = mtdpstore_read;
> +	cxt->bo_dev.write = mtdpstore_write;
> +	cxt->bo_dev.erase = mtdpstore_erase;
> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
> +
> +	ret = blkoops_register_device(&cxt->bo_dev);
> +	if (ret) {
> +		pr_err("mtd%d register to blkoops failed\n", mtd->index);
> +		return;
> +	}
> +	cxt->mtd = mtd;
> +	pr_info("Attached to MTD device %d\n", mtd->index);
> +}
> +
> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
> +		loff_t off, size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u_char *buf;
> +	int ret;
> +	size_t retlen;
> +	struct erase_info erase;
> +
> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	/* 1st. read to cache */
> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
> +	if (ret || retlen != mtd->erasesize)
> +		goto free;
> +
> +	/* 2nd. erase block */
> +	erase.len = mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(mtd, &erase);
> +	if (ret)
> +		goto free;
> +
> +	/* 3rd. write back */
> +	while (size) {
> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
> +
> +		/* remove must clear used bit */
> +		if (mtdpstore_is_used(cxt, off))
> +			mtd_write(mtd, off, zonesize, &retlen, buf);

Besides the fact that should definitely check the write return code, I
don't understand what you do in this function. What does
flush_removed_do mean?

> +
> +		off += zonesize;
> +		size -= min_t(unsigned int, zonesize, size);
> +	}
> +
> +free:
> +	kfree(buf);
> +	return ret;
> +}
> +
> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	int ret;
> +	loff_t off;
> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
> +
> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
> +		ret = mtdpstore_block_is_removed(cxt, off);
> +		if (!ret) {
> +			off += mtd->erasesize;
> +			continue;
> +		}
> +
> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	mtdpstore_flush_removed(cxt);
> +
> +	blkoops_unregister_device(&cxt->bo_dev);
> +	kfree(cxt->badmap);
> +	kfree(cxt->usedmap);
> +	kfree(cxt->rmmap);
> +	cxt->mtd = NULL;
> +	cxt->index = -1;
> +}
> +
> +static struct mtd_notifier mtdpstore_notifier = {
> +	.add	= mtdpstore_notify_add,
> +	.remove	= mtdpstore_notify_remove,
> +};
> +
> +static int __init mtdpstore_init(void)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct blkoops_info *info = &cxt->bo_info;
> +
> +	ret = blkoops_info(info);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (strlen(info->device) == 0) {
> +		pr_err("mtd device must be supplied\n");
> +		return -EINVAL;
> +	}
> +	if (!info->dmesg_size) {
> +		pr_err("no recorder enabled\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Setup the MTD device to use */
> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
> +	if (ret)
> +		cxt->index = -1;
> +
> +	register_mtd_user(&mtdpstore_notifier);
> +	return 0;
> +}
> +module_init(mtdpstore_init);
> +
> +static void __exit mtdpstore_exit(void)
> +{
> +	unregister_mtd_user(&mtdpstore_notifier);
> +}
> +module_exit(mtdpstore_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");




Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-20 10:03   ` Miquel Raynal
@ 2020-01-21  3:36     ` liaoweixiong
  2020-01-21  8:48       ` Miquel Raynal
  0 siblings, 1 reply; 32+ messages in thread
From: liaoweixiong @ 2020-01-21  3:36 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

hi Miquel Raynal,

On 2020/1/20 PM 6:03, Miquel Raynal wrote:
> Hi WeiXiong,
> 
> WeiXiong Liao <liaoweixiong@allwinnertech.com> wrote on Mon, 20 Jan
> 2020 09:03:53 +0800:
> 
>> It's the last one of a series of patches for adaptive to MTD device.
>>
>> The mtdpstore is similar to mtdoops but more powerful. It bases on
>> pstore/blk, aims to store panic and oops log to a flash partition,
> 
>                                            logs?
> 

I will fix it. Thanks.

>> where it can be read back as files after mounting pstore filesystem.
>>
>> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
>> block device at the very beginning, but now, compatible to not only
>> block device. After this series of patches, pstore/blk can also work
>> for MTD device. To make it work, 'blkdev' on kconfig or module
>> parameter of blkoops should be set as mtd device name or mtd number.
>> See more about pstore/blk and blkoops on:
>>     Documentation/admin-guide/pstore-block.rst
>>
>> Why do we need mtdpstore?
>> 1. repetitive jobs between pstore and mtdoops
>>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
>>    They have much similar logic that register to kmsg dumper and store
>>    log to several chunks one by one.
>> 2. do what a driver should do
>>    To me, a driver should provide methods instead of policies. What MTD
>>    should do is to provide read/write/erase operations, geting rid of codes
>>    about chunk management, kmsg dumper and configuration.
>> 3. enhanced feature
>>    Not only store log, but also show it as files.
>>    Not only log, but also trigger time and trigger count.
>>    Not only panic/oops log, but also log recorder for pmsg, console and
>>    ftrace in the future.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> Reported-by: kbuild test robot <lkp@intel.com>
> 
> I don't thing the test robot has a meaning here.
> 

I do not know what meaning the test rebot tag has, but i was suggested
from kbuild test rebot to do so. How should i do to it ? Drop the tag or
keep the tag or other?
The email from kbuild test rebot said that:

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

>> ---
>>  drivers/mtd/Kconfig     |  10 +
>>  drivers/mtd/Makefile    |   1 +
>>  drivers/mtd/mtdpstore.c | 530 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 541 insertions(+)
>>  create mode 100644 drivers/mtd/mtdpstore.c
>>
>> diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
>> index 42d401ea60ee..a6e59495a738 100644
>> --- a/drivers/mtd/Kconfig
>> +++ b/drivers/mtd/Kconfig
>> @@ -170,6 +170,16 @@ config MTD_OOPS
>>  	  buffer in a flash partition where it can be read back at some
>>  	  later point.
>>  
>> +config MTD_PSTORE
>> +	tristate "Log panic/oops to an MTD buffer base on pstore"
> 
>                                                   based
> 

I will fix it. Thanks.

>> +	depends on PSTORE_BLKOOPS
>> +	help
>> +	  This enables panic and oops messages to be logged to a circular
>> +	  buffer in a flash partition where it can be read back as files after
>> +	  mounting pstore filesystem.
>> +
>> +	  If unsure, say N.
>> +
>>  config MTD_SWAP
>>  	tristate "Swap on MTD device support"
>>  	depends on MTD && SWAP
>> diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
>> index 56cc60ccc477..593d0593a038 100644
>> --- a/drivers/mtd/Makefile
>> +++ b/drivers/mtd/Makefile
>> @@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL)		+= rfd_ftl.o
>>  obj-$(CONFIG_SSFDC)		+= ssfdc.o
>>  obj-$(CONFIG_SM_FTL)		+= sm_ftl.o
>>  obj-$(CONFIG_MTD_OOPS)		+= mtdoops.o
>> +obj-$(CONFIG_MTD_PSTORE)	+= mtdpstore.o
>>  obj-$(CONFIG_MTD_SWAP)		+= mtdswap.o
>>  
>>  nftl-objs		:= nftlcore.o nftlmount.o
>> diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
>> new file mode 100644
>> index 000000000000..ab4acd3a9011
>> --- /dev/null
>> +++ b/drivers/mtd/mtdpstore.c
>> @@ -0,0 +1,530 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * MTD Oops/Panic loger for pstore/blk
>> + *
>> + * Copyright (C) 2019 WeiXiong Liao <liaoweixiong@gallwinnertech.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
> 
> The license text is not needed since you added SPDX tag.
> 

I will fix it. Thanks.

>> + *
>> + */
>> +#define pr_fmt(fmt) "mtdoops-pstore: " fmt
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/blkoops.h>
>> +#include <linux/mtd/mtd.h>
>> +#include <linux/bitops.h>
>> +
>> +/* Maximum MTD partition size */
>> +#define MTDPSTORE_MAX_MTD_SIZE (8 * 1024 * 1024)
> 
>                                   SZ_8M
> 

I will fix it. Thanks.

>> +
>> +static struct mtdpstore_context {
>> +	int index;
>> +	struct blkoops_info bo_info;
>> +	struct blkoops_device bo_dev;
>> +	struct mtd_info *mtd;
>> +	unsigned long *rmmap;		/* removed bit map */
>> +	unsigned long *usedmap;		/* used bit map */
>> +	/*
>> +	 * used for panic write
>> +	 * As there are no block_isbad for panic case, we should keep this
>> +	 * status before panic to ensure panic_write not failed.
>> +	 */
>> +	unsigned long *badmap;		/* bad block bit map */
>> +} oops_cxt;
>> +
>> +static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	int ret;
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 blknum = div_u64(off, mtd->erasesize);
>> +
>> +	if (test_bit(blknum, cxt->badmap))
>> +		return true;
>> +	ret = mtd_block_isbad(mtd, off);
>> +	if (ret < 0) {
>> +		pr_err("mtd_block_isbad failed, aborting\n");
>> +		return ret;
>> +	} else if (ret > 0) {
>> +		set_bit(blknum, cxt->badmap);
>> +		return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 blknum = div_u64(off, mtd->erasesize);
>> +
>> +	return test_bit(blknum, cxt->badmap);
>> +}
>> +
>> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	pr_debug("mark zone %llu used\n", zonenum);
>> +	set_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	pr_debug("mark zone %llu unused\n", zonenum);
>> +	clear_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		pr_debug("mark zone %llu unused\n", zonenum);
>> +		clear_bit(zonenum, cxt->usedmap);
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +}
>> +
>> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
>> +
>> +	if (test_bit(blknum, cxt->badmap))
>> +		return true;
>> +	return test_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		if (test_bit(zonenum, cxt->usedmap))
>> +			return true;
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +	return false;
>> +}
>> +
>> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
>> +		size_t size)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	size_t sz;
>> +	int i;
>> +
>> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
>> +	for (i = 0; i < sz; i++) {
>> +		if (buf[i] != (char)0xFF)
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	pr_debug("mark zone %llu removed\n", zonenum);
>> +	set_bit(zonenum, cxt->rmmap);
>> +}
>> +
>> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		clear_bit(zonenum, cxt->rmmap);
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +}
>> +
>> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		if (test_bit(zonenum, cxt->rmmap))
>> +			return true;
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +	return false;
>> +}
>> +
>> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	struct erase_info erase;
>> +	int ret;
>> +
>> +	pr_debug("try to erase off 0x%llx\n", off);
>> +	erase.len = cxt->mtd->erasesize;
>> +	erase.addr = off;
>> +	ret = mtd_erase(cxt->mtd, &erase);
>> +	if (!ret)
>> +		mtdpstore_block_clear_removed(cxt, off);
>> +	else
>> +		pr_err("erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
>> +		       (unsigned long long)erase.addr,
>> +		       (unsigned long long)erase.len, cxt->bo_info.device);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * called while removing file
>> + *
>> + * Avoiding over erasing, do erase only when all zones are removed or unused.
>> + * Ensure to remove when unregister by reading, erasing and wrtiing back.
>> + */
>> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -EIO;
>> +
>> +	mtdpstore_mark_unused(cxt, off);
>> +
>> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
>> +		mtdpstore_mark_removed(cxt, off);
>> +		return 0;
>> +	}
>> +
>> +	/* all zones are unused, erase it */
>> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
>> +	return mtdpstore_erase_do(cxt, off);
>> +}
>> +
>> +/*
>> + * What is securety for mtdpstore?
> 
>               security
> 

I will fix it. Thanks.

>> + * As there is no erase for panic case, we should ensure at least one zone
>> + * is writable. Otherwise, panic write will be failed.
> 
>                                           will fail.
> 
I will fix it. Thanks.

>> + * If zone is used, write operation will return -ENEXT, which means that
>> + * pstore/blk will try one by one until get a empty zone. So, it's no need
> 
>                                            it gets an empty zone. So it
>                                            is not needed to ...
>     

I will fix it. Thanks.

>> + * to ensure next zone is empty, but at least one.
> 
>                the
> 

I will fix it. Thanks.

>> + */
>> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	int ret = 0, i;
>> +	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
>> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
>> +	u32 erasesize = cxt->mtd->erasesize;
>> +
>> +	for (i = 0; i < zonecnt; i++) {
>> +		u32 num = (zonenum + i) % zonecnt;
>> +
>> +		/* found empty zone */
>> +		if (!test_bit(num, cxt->usedmap))
>> +			return 0;
>> +	}
>> +
>> +	/* If there is no any empty zone, we have no way but to do erase */
>> +	off = ALIGN_DOWN(off, erasesize);
>> +	while (blkcnt--) {
>> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
>> +
>> +		if (mtdpstore_block_isbad(cxt, off))
>> +			continue;
>> +
>> +		ret = mtdpstore_erase_do(cxt, off);
>> +		if (!ret) {
>> +			mtdpstore_block_mark_unused(cxt, off);
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (ret)
>> +		pr_err("all blocks bad!\n");
>> +	pr_debug("end security\n");
>> +	return ret;
>> +}
>> +
>> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	/* zone is used, please try next one */
>> +	if (mtdpstore_is_used(cxt, off))
>> +		return -ENEXT;
>> +
>> +	pr_debug("try to write off 0x%llx size %zu\n", off, size);
>> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if (ret < 0 || retlen != size) {
>> +		pr_err("write failure at %lld (%zu of %zu written), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +	mtdpstore_mark_used(cxt, off);
>> +
>> +	mtdpstore_security(cxt, off);
>> +	return retlen;
>> +}
>> +
>> +/*
>> + * All zones will be read as pstore/blk will read zone one by one when do
>> + * recover.
>> + */
>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
> 
> IIRC size != retlen does not mean it failed, but that you should
> continue reading after retlen bytes, no?
> 

Yes, you are right. I will fix it. Thanks.

> Also, mtd_is_bitflip() does not mean that you are reading a false
> buffer, but that the data has been corrected as it contained bitflips.
> mtd_is_eccerr() however, would be meaningful.
> 

Sure I know mtd_is_bitflip() does not mean failure, but I do not think
mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
mtd_is_bitflip().

>> +		pr_err("read failure at %lld (%zu of %zu read), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +
>> +	if (mtdpstore_is_empty(cxt, buf, size))
>> +		mtdpstore_mark_unused(cxt, off);
>> +	else
>> +		mtdpstore_mark_used(cxt, off);
>> +
>> +	mtdpstore_security(cxt, off);
>> +	return retlen;
>> +}
>> +
>> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_panic_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	/* zone is used, please try next one */
>> +	if (mtdpstore_is_used(cxt, off))
>> +		return -ENEXT;
>> +
>> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if (ret < 0 || size != retlen) {
>> +		pr_err("panic write failure at %lld (%zu of %zu read), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +	mtdpstore_mark_used(cxt, off);
>> +
>> +	return retlen;
>> +}
>> +
>> +static void mtdpstore_notify_add(struct mtd_info *mtd)
>> +{
>> +	int ret;
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct blkoops_info *info = &cxt->bo_info;
>> +	unsigned long longcnt;
>> +
>> +	if (!strcmp(mtd->name, info->device))
>> +		cxt->index = mtd->index;
>> +
>> +	if (mtd->index != cxt->index || cxt->index < 0)
>> +		return;
>> +
>> +	pr_debug("found matching MTD device %s\n", mtd->name);
>> +
>> +	if (mtd->size < info->dmesg_size * 2) {
>> +		pr_err("MTD partition %d not big enough\n", mtd->index);
>> +		return;
>> +	}
>> +	if (mtd->erasesize < info->dmesg_size) {
>> +		pr_err("eraseblock size of MTD partition %d too small\n",
>> +				mtd->index);
> 
> What is the usual size of dmesg? Could this check be too limiting?
> 

The size must be aligned to 4096, which is limited by blkoops. The
default value is 64K. If it is larger than erasesize, some errors will occur
since mtdpstore is designed on it.

>> +		return;
>> +	}
>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
>> +		pr_err("record size %lu KB must align to write size %d KB\n",
>> +				info->dmesg_size / 1024,
>> +				mtd->writesize / 1024);
> 
> This condition is weird, why would you check this?
> 

pstore/blk will write 'record_size' dmesg log at one time.
Since each write data must be aligned to 'writesize' for flash, I am not
sure
all flash drivers are compatible with misaligned data, that's why i
check this.

>> +		return;
>> +	}
>> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
>> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
>> +				mtd->index,
>> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);
> 
> Same question? I could understand that it is easier to manage blocks
> knowing their maximum number though.
> 

It refers to mtdoops.

>> +		return;
>> +	}
>> +
>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
>> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +
>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
>> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +
>> +	cxt->bo_dev.total_size = mtd->size;
>> +	/* just support dmesg right now */
>> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
>> +	cxt->bo_dev.read = mtdpstore_read;
>> +	cxt->bo_dev.write = mtdpstore_write;
>> +	cxt->bo_dev.erase = mtdpstore_erase;
>> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
>> +
>> +	ret = blkoops_register_device(&cxt->bo_dev);
>> +	if (ret) {
>> +		pr_err("mtd%d register to blkoops failed\n", mtd->index);
>> +		return;
>> +	}
>> +	cxt->mtd = mtd;
>> +	pr_info("Attached to MTD device %d\n", mtd->index);
>> +}
>> +
>> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
>> +		loff_t off, size_t size)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u_char *buf;
>> +	int ret;
>> +	size_t retlen;
>> +	struct erase_info erase;
>> +
>> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
>> +	if (!buf)
>> +		return -ENOMEM;
>> +
>> +	/* 1st. read to cache */
>> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
>> +	if (ret || retlen != mtd->erasesize)
>> +		goto free;
>> +
>> +	/* 2nd. erase block */
>> +	erase.len = mtd->erasesize;
>> +	erase.addr = off;
>> +	ret = mtd_erase(mtd, &erase);
>> +	if (ret)
>> +		goto free;
>> +
>> +	/* 3rd. write back */
>> +	while (size) {
>> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
>> +
>> +		/* remove must clear used bit */
>> +		if (mtdpstore_is_used(cxt, off))
>> +			mtd_write(mtd, off, zonesize, &retlen, buf);
> 
> Besides the fact that should definitely check the write return code, I
> don't understand what you do in this function. What does
> flush_removed_do mean?
> 

When user remove one log file on pstore filesystem, mtdpstore should do
something to ensure log file removed. If the whole block is no longer used,
it is nice to erase the block. However, if the block still contains
valid log,
what mtdpstore can do is to erase and write the valid log back.
That is what flush_removed_do() do.

In case of repeated erase when users remove several log files, mtdpstore
do remove jobs when exit.

Besides, mtdpstore do not check the return code to ensure write back valid
log as much as possible.

>> +
>> +		off += zonesize;
>> +		size -= min_t(unsigned int, zonesize, size);
>> +	}
>> +
>> +free:
>> +	kfree(buf);
>> +	return ret;
>> +}
>> +
>> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	int ret;
>> +	loff_t off;
>> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
>> +
>> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
>> +		ret = mtdpstore_block_is_removed(cxt, off);
>> +		if (!ret) {
>> +			off += mtd->erasesize;
>> +			continue;
>> +		}
>> +
>> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +	return 0;
>> +}
>> +
>> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +
>> +	if (mtd->index != cxt->index || cxt->index < 0)
>> +		return;
>> +
>> +	mtdpstore_flush_removed(cxt);
>> +
>> +	blkoops_unregister_device(&cxt->bo_dev);
>> +	kfree(cxt->badmap);
>> +	kfree(cxt->usedmap);
>> +	kfree(cxt->rmmap);
>> +	cxt->mtd = NULL;
>> +	cxt->index = -1;
>> +}
>> +
>> +static struct mtd_notifier mtdpstore_notifier = {
>> +	.add	= mtdpstore_notify_add,
>> +	.remove	= mtdpstore_notify_remove,
>> +};
>> +
>> +static int __init mtdpstore_init(void)
>> +{
>> +	int ret;
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct blkoops_info *info = &cxt->bo_info;
>> +
>> +	ret = blkoops_info(info);
>> +	if (unlikely(ret))
>> +		return ret;
>> +
>> +	if (strlen(info->device) == 0) {
>> +		pr_err("mtd device must be supplied\n");
>> +		return -EINVAL;
>> +	}
>> +	if (!info->dmesg_size) {
>> +		pr_err("no recorder enabled\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Setup the MTD device to use */
>> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
>> +	if (ret)
>> +		cxt->index = -1;
>> +
>> +	register_mtd_user(&mtdpstore_notifier);
>> +	return 0;
>> +}
>> +module_init(mtdpstore_init);
>> +
>> +static void __exit mtdpstore_exit(void)
>> +{
>> +	unregister_mtd_user(&mtdpstore_notifier);
>> +}
>> +module_exit(mtdpstore_exit);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>> +MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
> 
> 
> 
> 
> Thanks,
> Miquèl
> 

I will collect more suggestions and submit the new version at one time.

-- 
liaoweixiong

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-20  1:03 ` [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
@ 2020-01-21  4:13   ` Randy Dunlap
  2020-01-21  5:23     ` liaoweixiong
  0 siblings, 1 reply; 32+ messages in thread
From: Randy Dunlap @ 2020-01-21  4:13 UTC (permalink / raw)
  To: WeiXiong Liao, Kees Cook, Anton Vorontsov, Colin Cross,
	Tony Luck, Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

Hi,

I have some documentation comments for you:


On 1/19/20 5:03 PM, WeiXiong Liao wrote:
> The document, at Documentation/admin-guide/pstore-block.rst, tells us
> how to use pstore/blk and blkoops.
> 
> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> ---
>  Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
>  MAINTAINERS                                |   1 +
>  fs/pstore/Kconfig                          |   2 +
>  3 files changed, 281 insertions(+)
>  create mode 100644 Documentation/admin-guide/pstore-block.rst
> 
> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
> new file mode 100644
> index 000000000000..58418d429c55
> --- /dev/null
> +++ b/Documentation/admin-guide/pstore-block.rst
> @@ -0,0 +1,278 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Pstore block oops/panic logger
> +==============================
> +
> +Introduction
> +------------
> +
> +Pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
> +block device before the system crashes. It also supports non-block devices such
> +as mtd device.
> +
> +There is a trapper named blkoops for pstore/blk, which makes pstore/blk be
> +nicer to device drivers.
> +
> +Pstore block concepts
> +---------------------
> +
> +Pstore/blk works as a zone manager as it cuts the block device or partition
> +into several zones and stores data for different recorders. What device driver

"What a device driver" or "What device drivers" should do ...

> +should do is to provide read/write APIs.
> +
> +Pstore/blk begins at function ``blkz_register``. Besides, blkoops, a wrapper of
> +pstore/blk, begins at function ``blkoops_register_blkdev`` for block device and
> +``blkoops_register_device`` for non-block device, which is recommended instead
> +of directly using pstore/blk.
> +
> +Blkoops provides efficient configuration mothod for pstore/blk, which divides

                                            method

> +all configurations of pstore/blk into two parts, configurations for user and
> +configurations for driver.
> +
> +Configurations for user determine how pstore/blk works, such as pmsg_size,
> +dmesg_size and so on. All of them support both kconfig and module parameters,
> +but module parameters have priority over kconfig.
> +
> +Configurations for driver are all about block/non-block device, such as
> +total_size of device and read/write operations. Device driver transfers a
> +structure ``blkoops_device`` defined in *linux/blkoops.h*.
> +
> +Configurations for user
> +-----------------------
> +
> +All of these configurations support both kconfig and module parameters, but
> +module parameters have priority over kconfig.
> +Here is an example for module parameters::
> +
> +        blkoops.blkdev=179:7 blkoops.dmesg_size=64 blkoops.dump_oops=1
> +
> +The detail of each configurations may be of interest to you.
> +
> +blkdev
> +~~~~~~
> +
> +The block device to use. Most of the time, it is a partition of block device.
> +It's fine to ignore it if you are not block device.

                                 are not using a block device.

> +
> +It accepts the following variants:
> +
> +1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
> +   leading 0x, for example b302.
> +#. /dev/<disk_name> represents the device number of disk
> +#. /dev/<disk_name><decimal> represents the device number of partition - device
> +   number of disk plus the partition number
> +#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
> +   name of partitioned disk ends with a digit.
> +#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the unique id of

                                                    represents

> +   a partition if the partition table provides it. The UUID may be either an
> +   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
> +   where SSSSSSSS is a zero-filled hex representation of the 32-bit
> +   "NT disk signature", and PP is a zero-filled hex representation of the
> +   1-based partition number.
> +#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
> +   partition with a known unique id.
> +#. <major>:<minor> major and minor number of the device separated by a colon.
> +
> +dmesg_size
> +~~~~~~~~~~
> +
> +The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
> +4096. If you don't need it, safely set it 0 or ignore it.

                                      set it to 0 or ignore it.

The example above is:  blkoops.dmesg_size=64
where 64 is not a multiple of 4096. (?)

> +
> +NOTE that, the remaining space, except ``pmsg_size``, ``console_size``` and
> +others, belongs to dmesg. It means that there are multiple chunks for dmesg.
> +
> +Pstore/blk will log to dmesg chunks one by one, and always overwrite the oldest
> +chunk if there is no more free chunks.
> +
> +pmsg_size
> +~~~~~~~~~
> +
> +The chunk size in bytes for pmsg. It **MUST** be a multiple of 4096. If you
> +do not need it, safely set it 0 or ignore it.

                          set it to 0 or ignore it.

> +
> +There is only one chunk for pmsg.
> +
> +Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
> +appended to the chunk. On reboot the contents are available in
> +/sys/fs/pstore/pmsg-pstore-blk-0.
> +
> +console_size
> +~~~~~~~~~~~~
> +
> +The chunk size in bytes for console. It **MUST** be a multiple of 4096. If you
> +do not need it, safely set it 0 or ignore it.

                          set it to 0 or ignore it.

> +
> +There is only one chunk for console.
> +
> +All log of console will be appended to the chunk. On reboot the contents are
> +available in /sys/fs/pstore/console-pstore-blk-0.
> +
> +ftrace_size
> +~~~~~~~~~~~
> +
> +The chunk size in bytes for ftrace. It **MUST** be a multiple of 4096. If you
> +do not need it, safely set it 0 or ignore it.
> +
> +There may be several chunks for ftrace, according to how many processors on
> +your CPU. Each chunk size is equal to (ftrace_size / processors_count).

That is confusing (to me). It seems like it handles CPU packages separately,
so that a package that has 4 processors is collected together.
But what if the system has multiple CPU packages?  how is that handled?

> +
> +All log of ftrace will be appended to the chunk. On reboot the contents are
> +available in /sys/fs/pstore/ftrace-pstore-blk-[N], where N is the processor
> +number.
> +
> +Persistent function tracing might be useful for debugging software or hardware
> +related hangs. Here is an example of usage::
> +
> + # mount -t pstore pstore /sys/fs/pstore
> + # mount -t debugfs debugfs /sys/kernel/debug/
> + # echo 1 > /sys/kernel/debug/pstore/record_ftrace
> + # reboot -f
> + [...]
> + # mount -t pstore pstore /sys/fs/pstore
> + # tail /sys/fs/pstore/ftrace-pstore-blk-0
> + CPU:0 ts:109860 c03a4310  c0063ebc  cpuidle_select <- cpu_startup_entry+0x1a8/0x1e0
> + CPU:0 ts:109861 c03a5878  c03a4324  menu_select <- cpuidle_select+0x24/0x2c
> + CPU:0 ts:109862 c00670e8  c03a589c  pm_qos_request <- menu_select+0x38/0x4cc
> + CPU:0 ts:109863 c0092bbc  c03a5960  tick_nohz_get_sleep_length <- menu_select+0xfc/0x4cc
> + CPU:0 ts:109865 c004b2f4  c03a59d4  get_iowait_load <- menu_select+0x170/0x4cc
> + CPU:0 ts:109868 c0063b60  c0063ecc  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
> + CPU:0 ts:109869 c03a433c  c0063b94  cpuidle_enter <- call_cpuidle+0x44/0x48
> + CPU:0 ts:109871 c03a4000  c03a4350  cpuidle_enter_state <- cpuidle_enter+0x24/0x28
> + CPU:0 ts:109873 c0063ba8  c03a4090  sched_idle_set_state <- cpuidle_enter_state+0xa4/0x314
> + CPU:0 ts:109874 c03a605c  c03a40b4  arm_enter_idle_state <- cpuidle_enter_state+0xc8/0x314
> +
> +dump_oops
> +~~~~~~~~~
> +
> +Dumping both oopses and panics can be done by setting 1 (not zero) in the
> +``dump_oops`` member while setting 0 in that variable dumps only the panics.
> +
> +Configurations for driver
> +-------------------------
> +
> +Only device driver would care these configurations. Block device driver

   Only a device driver cares about these configurations. A block device driver

> +refers ``blkoops_register_blkdev`` while ``blkoops_register_device`` for

   uses ...                           while a non-block device [driver] uses 
   ``blkoops_register_device``.

> +non-block device.
> +
> +The parameters of these two APIs may be of interest to you.
> +
> +major
> +~~~~~
> +
> +It is only requested by block device which is registered by

              required (?)

> +``blkoops_register_blkdev``.  It's the major device number of registered
> +devices, by which blkoops can get the matching driver for @blkdev.
> +
> +total_size
> +~~~~~~~~~~
> +
> +It is only requested by non-block device which is registered by

              required (?)

> +``blkoops_register_device``.  It tells pstore/blk that the total size

                                              drop:  that

> +pstore/blk can use. It **MUST** be greater than 4096 and a multiple of 4096.

not greater than or equal to 4096?

> +
> +If block device, blkoops can get size of block device/partition automatically.

   For block devices, ...

> +
> +read/write
> +~~~~~~~~~~
> +
> +It's generic read/write APIs for pstore/blk, which are requested by non-block

                                                          required (?)

> +device. The generic APIs are used for almost all data but except panic data,

                                                drop:    but

> +such as pmsg, console, oops and ftrace.
> +
> +The parameter @offset is the relative position of the device.

I don't get that description. Can you improve it?

> +
> +Normally the number of bytes read/written should be returned, while for error,
> +negative number will be returned. The following return numbers mean more:
> +
> +-EBUSY: pstore/blk should try again later.
> +
> +panic_write (for non-block device)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +It's a interface for panic recorder and will be used only when panic occurs.
> +Non-block device driver registers it by ``blkoops_register_device``. If panic
> +log is unnecessary, it's fine to ignore it.
> +
> +Note that pstore/blk will recover data from device while mounting pstore
> +filesystem by default. If panic occurs but pstore/blk does not recover yet, the
> +first zone of dmesg will be used.
> +
> +The parameter @offset is the relative position of the device.

improve??

> +
> +Normally the number of bytes written should be returned, while for error,
> +negative number should be returned.
> +
> +panic_write (for block device)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +It's much similar to panic_write for non-block device, but panic_write for
> +block device writes alignment to SECTOR_SIZE, that's why the parameters are

                writes only aligned sectors of SECTOR_SIZE  (??)

> +@sects and @start_sect. Block device driver should register it by
> +``blkoops_register_blkdev``.
> +
> +The parameter @start_sect is the relative position of the block device and
> +partition. If block driver requires absolute position for panic_write,
> +``blkoops_blkdev_info`` will be helpful, which can provide the absolute
> +position of the block device (or partition) on the whole disk/flash.
> +
> +Normally zero should be returned, otherwise it indicates an error.
> +
> +Compression and header
> +----------------------
> +
> +Block device is large enough for uncompressed dmesg data. Actually we do not
> +recommend data compression because pstore/blk will insert some information into
> +the first line of dmesg data. For example::
> +
> +        Panic: Total 16 times
> +
> +It means that it's the 16th times panic log since the first booting. Sometimes

                               time of a panic log since ...

> +the oops|panic occurs since burning is very important for embedded device to

                               ^^^^^^^ huh??

> +judge whether the system is stable.
> +
> +The following line is inserted by pstore filesystem. For example::
> +
> +        Oops#2 Part1
> +
> +It means that it's the 2nd times oops log on last booting.

                          2nd time of an oops log on the last boot. (?)

> +
> +Reading the data
> +----------------
> +
> +The dump data can be read from the pstore filesystem. The format for these
> +files is ``dmesg-pstore-blk-[N]`` for dmesg(oops|panic), ``pmsg-pstore-blk-0``
> +for pmsg and so on, where N is the record number. To delete a stored
> +record from block device, simply unlink the respective pstore file. The
> +timestamp of the dump file records the trigger time.
> +
> +Attentions in panic read/write APIs
> +-----------------------------------
> +
> +If on panic, the kernel is not going to run for much longer. The tasks will not

                                                        longer, the tasks will not

> +be scheduled and the most kernel resources will be out of service. It

             drop:  the

> +looks like a single-threaded program running on a single-core computer.
> +
> +The following points require special attention for panic read/write APIs:
> +
> +1. Can **NOT** allocate any memory.
> +   If you need memory, just allocate while the block driver is initializing
> +   rather than waiting until the panic.
> +#. Must be polled, **NOT** interrupt driven.
> +   No task schedule any more. The block driver should delay to ensure the write
> +   succeeds, but NOT sleep.
> +#. Can **NOT** take any lock.
> +   There is no other task, nor any share resource; you are safe to break all

                                      shared

> +   locks.
> +#. Just use CPU to transfer.
> +   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
> +#. Operate register directly.

      Don't know what that means.

> +   Try not to use Linux kernel resources. Do I/O map while initializing rather
> +   than waiting until the panic.
> +#. Reset your block device and controller if necessary.
> +   If you are not sure the state of you block device and controller when panic,

                         of the state of your block device and controller when a panic occurs,


> +   you are safe to stop and reset them.
> +
> +Blkoops supports blkoops_blkdev_info(), which is defined in *linux/blkoops.h*,
> +to get information of block device, such as the device number, sector count and
> +start sector of the whole disk.


HTH.
-- 
~Randy


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-21  4:13   ` Randy Dunlap
@ 2020-01-21  5:23     ` liaoweixiong
  2020-01-21  6:36       ` Randy Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: liaoweixiong @ 2020-01-21  5:23 UTC (permalink / raw)
  To: Randy Dunlap, Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

hi Randy Dunlap,

On 2020/1/21 PM12:13, Randy Dunlap wrote:
> Hi,
> 
> I have some documentation comments for you:
> 
> 
> On 1/19/20 5:03 PM, WeiXiong Liao wrote:
>> The document, at Documentation/admin-guide/pstore-block.rst, tells us
>> how to use pstore/blk and blkoops.
>>
>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>> ---
>>  Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
>>  MAINTAINERS                                |   1 +
>>  fs/pstore/Kconfig                          |   2 +
>>  3 files changed, 281 insertions(+)
>>  create mode 100644 Documentation/admin-guide/pstore-block.rst
>>
>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>> new file mode 100644
>> index 000000000000..58418d429c55
>> --- /dev/null
>> +++ b/Documentation/admin-guide/pstore-block.rst
>> @@ -0,0 +1,278 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +Pstore block oops/panic logger
>> +==============================
>> +
>> +Introduction
>> +------------
>> +
>> +Pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
>> +block device before the system crashes. It also supports non-block devices such
>> +as mtd device.
>> +
>> +There is a trapper named blkoops for pstore/blk, which makes pstore/blk be
>> +nicer to device drivers.
>> +
>> +Pstore block concepts
>> +---------------------
>> +
>> +Pstore/blk works as a zone manager as it cuts the block device or partition
>> +into several zones and stores data for different recorders. What device driver
> 
> "What a device driver" or "What device drivers" should do ...
> 

I will fix it, thank you.

>> +should do is to provide read/write APIs.
>> +
>> +Pstore/blk begins at function ``blkz_register``. Besides, blkoops, a wrapper of
>> +pstore/blk, begins at function ``blkoops_register_blkdev`` for block device and
>> +``blkoops_register_device`` for non-block device, which is recommended instead
>> +of directly using pstore/blk.
>> +
>> +Blkoops provides efficient configuration mothod for pstore/blk, which divides
> 
>                                             method
> 

I will fix it, thank you.

>> +all configurations of pstore/blk into two parts, configurations for user and
>> +configurations for driver.
>> +
>> +Configurations for user determine how pstore/blk works, such as pmsg_size,
>> +dmesg_size and so on. All of them support both kconfig and module parameters,
>> +but module parameters have priority over kconfig.
>> +
>> +Configurations for driver are all about block/non-block device, such as
>> +total_size of device and read/write operations. Device driver transfers a
>> +structure ``blkoops_device`` defined in *linux/blkoops.h*.
>> +
>> +Configurations for user
>> +-----------------------
>> +
>> +All of these configurations support both kconfig and module parameters, but
>> +module parameters have priority over kconfig.
>> +Here is an example for module parameters::
>> +
>> +        blkoops.blkdev=179:7 blkoops.dmesg_size=64 blkoops.dump_oops=1
>> +
>> +The detail of each configurations may be of interest to you.
>> +
>> +blkdev
>> +~~~~~~
>> +
>> +The block device to use. Most of the time, it is a partition of block device.
>> +It's fine to ignore it if you are not block device.
> 
>                                  are not using a block device.
> 

I will fix it, thank you.

>> +
>> +It accepts the following variants:
>> +
>> +1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
>> +   leading 0x, for example b302.
>> +#. /dev/<disk_name> represents the device number of disk
>> +#. /dev/<disk_name><decimal> represents the device number of partition - device
>> +   number of disk plus the partition number
>> +#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
>> +   name of partitioned disk ends with a digit.
>> +#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the unique id of
> 
>                                                     represents
> 

I will fix it, thank you.

>> +   a partition if the partition table provides it. The UUID may be either an
>> +   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
>> +   where SSSSSSSS is a zero-filled hex representation of the 32-bit
>> +   "NT disk signature", and PP is a zero-filled hex representation of the
>> +   1-based partition number.
>> +#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
>> +   partition with a known unique id.
>> +#. <major>:<minor> major and minor number of the device separated by a colon.
>> +
>> +dmesg_size
>> +~~~~~~~~~~
>> +
>> +The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
>> +4096. If you don't need it, safely set it 0 or ignore it.
> 
>                                       set it to 0 or ignore it.
> 

I will fix it, thank you.

> The example above is:  blkoops.dmesg_size=64
> where 64 is not a multiple of 4096. (?)
> 

The module parameter dmesg_size is in unit KB.

>> +
>> +NOTE that, the remaining space, except ``pmsg_size``, ``console_size``` and
>> +others, belongs to dmesg. It means that there are multiple chunks for dmesg.
>> +
>> +Pstore/blk will log to dmesg chunks one by one, and always overwrite the oldest
>> +chunk if there is no more free chunks.
>> +
>> +pmsg_size
>> +~~~~~~~~~
>> +
>> +The chunk size in bytes for pmsg. It **MUST** be a multiple of 4096. If you
>> +do not need it, safely set it 0 or ignore it.
> 
>                           set it to 0 or ignore it.
> 

I will fix it, thank you.

>> +
>> +There is only one chunk for pmsg.
>> +
>> +Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
>> +appended to the chunk. On reboot the contents are available in
>> +/sys/fs/pstore/pmsg-pstore-blk-0.
>> +
>> +console_size
>> +~~~~~~~~~~~~
>> +
>> +The chunk size in bytes for console. It **MUST** be a multiple of 4096. If you
>> +do not need it, safely set it 0 or ignore it.
> 
>                           set it to 0 or ignore it.
> 

I will fix it, thank you.

>> +
>> +There is only one chunk for console.
>> +
>> +All log of console will be appended to the chunk. On reboot the contents are
>> +available in /sys/fs/pstore/console-pstore-blk-0.
>> +
>> +ftrace_size
>> +~~~~~~~~~~~
>> +
>> +The chunk size in bytes for ftrace. It **MUST** be a multiple of 4096. If you
>> +do not need it, safely set it 0 or ignore it.
>> +
>> +There may be several chunks for ftrace, according to how many processors on
>> +your CPU. Each chunk size is equal to (ftrace_size / processors_count).
> 
> That is confusing (to me). It seems like it handles CPU packages separately,
> so that a package that has 4 processors is collected together.
> But what if the system has multiple CPU packages?  how is that handled?
> 

The ftrace chunk size is divide to processors_count for each processor,
by this, each processor has itself chunk. So, cpu0 write to chunk0 and
cpu1 write to chunk1.

>> +
>> +All log of ftrace will be appended to the chunk. On reboot the contents are
>> +available in /sys/fs/pstore/ftrace-pstore-blk-[N], where N is the processor
>> +number.
>> +
>> +Persistent function tracing might be useful for debugging software or hardware
>> +related hangs. Here is an example of usage::
>> +
>> + # mount -t pstore pstore /sys/fs/pstore
>> + # mount -t debugfs debugfs /sys/kernel/debug/
>> + # echo 1 > /sys/kernel/debug/pstore/record_ftrace
>> + # reboot -f
>> + [...]
>> + # mount -t pstore pstore /sys/fs/pstore
>> + # tail /sys/fs/pstore/ftrace-pstore-blk-0
>> + CPU:0 ts:109860 c03a4310  c0063ebc  cpuidle_select <- cpu_startup_entry+0x1a8/0x1e0
>> + CPU:0 ts:109861 c03a5878  c03a4324  menu_select <- cpuidle_select+0x24/0x2c
>> + CPU:0 ts:109862 c00670e8  c03a589c  pm_qos_request <- menu_select+0x38/0x4cc
>> + CPU:0 ts:109863 c0092bbc  c03a5960  tick_nohz_get_sleep_length <- menu_select+0xfc/0x4cc
>> + CPU:0 ts:109865 c004b2f4  c03a59d4  get_iowait_load <- menu_select+0x170/0x4cc
>> + CPU:0 ts:109868 c0063b60  c0063ecc  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
>> + CPU:0 ts:109869 c03a433c  c0063b94  cpuidle_enter <- call_cpuidle+0x44/0x48
>> + CPU:0 ts:109871 c03a4000  c03a4350  cpuidle_enter_state <- cpuidle_enter+0x24/0x28
>> + CPU:0 ts:109873 c0063ba8  c03a4090  sched_idle_set_state <- cpuidle_enter_state+0xa4/0x314
>> + CPU:0 ts:109874 c03a605c  c03a40b4  arm_enter_idle_state <- cpuidle_enter_state+0xc8/0x314
>> +
>> +dump_oops
>> +~~~~~~~~~
>> +
>> +Dumping both oopses and panics can be done by setting 1 (not zero) in the
>> +``dump_oops`` member while setting 0 in that variable dumps only the panics.
>> +
>> +Configurations for driver
>> +-------------------------
>> +
>> +Only device driver would care these configurations. Block device driver
> 
>    Only a device driver cares about these configurations. A block device driver
> 

I will fix it, thank you.

>> +refers ``blkoops_register_blkdev`` while ``blkoops_register_device`` for
> 
>    uses ...                           while a non-block device [driver] uses 
>    ``blkoops_register_device``.
> 

I will fix it, thank you.

>> +non-block device.
>> +
>> +The parameters of these two APIs may be of interest to you.
>> +
>> +major
>> +~~~~~
>> +
>> +It is only requested by block device which is registered by
> 
>               required (?)
> 

Yes, you are right. I will fix it.

>> +``blkoops_register_blkdev``.  It's the major device number of registered
>> +devices, by which blkoops can get the matching driver for @blkdev.
>> +
>> +total_size
>> +~~~~~~~~~~
>> +
>> +It is only requested by non-block device which is registered by
> 
>               required (?)
> 

I will fix it, thank you.

>> +``blkoops_register_device``.  It tells pstore/blk that the total size
> 
>                                               drop:  that
> 

I will fix it, thank you.

>> +pstore/blk can use. It **MUST** be greater than 4096 and a multiple of 4096.
> 
> not greater than or equal to 4096?
> 

Yes, you are right. I will fix it.

>> +
>> +If block device, blkoops can get size of block device/partition automatically.
> 
>    For block devices, ...
> 

I will fix it, thank you.

>> +
>> +read/write
>> +~~~~~~~~~~
>> +
>> +It's generic read/write APIs for pstore/blk, which are requested by non-block
> 
>                                                           required (?)
> 

I will fix it, thank you.

>> +device. The generic APIs are used for almost all data but except panic data,
> 
>                                                 drop:    but
> 

I will fix it, thank you.

>> +such as pmsg, console, oops and ftrace.
>> +
>> +The parameter @offset is the relative position of the device.
> 
> I don't get that description. Can you improve it?
> 

The parameter @offset of these interface is the relative position of the
device.

>> +
>> +Normally the number of bytes read/written should be returned, while for error,
>> +negative number will be returned. The following return numbers mean more:
>> +
>> +-EBUSY: pstore/blk should try again later.
>> +
>> +panic_write (for non-block device)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +It's a interface for panic recorder and will be used only when panic occurs.
>> +Non-block device driver registers it by ``blkoops_register_device``. If panic
>> +log is unnecessary, it's fine to ignore it.
>> +
>> +Note that pstore/blk will recover data from device while mounting pstore
>> +filesystem by default. If panic occurs but pstore/blk does not recover yet, the
>> +first zone of dmesg will be used.
>> +
>> +The parameter @offset is the relative position of the device.
> 
> improve??
> 

The parameter @offset of this interface is the relative position of the
device.

>> +
>> +Normally the number of bytes written should be returned, while for error,
>> +negative number should be returned.
>> +
>> +panic_write (for block device)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +It's much similar to panic_write for non-block device, but panic_write for
>> +block device writes alignment to SECTOR_SIZE, that's why the parameters are
> 
>                 writes only aligned sectors of SECTOR_SIZE  (??)
> 

How about this?

It's much similar to panic_write for non-block device, but the position and
data size of panic_write for block device must be aligned to SECTOR_SIZE,
that's why the parameters are @sects and @start_sect. Block device driver
should register it by ``blkoops_register_blkdev``.

>> +@sects and @start_sect. Block device driver should register it by
>> +``blkoops_register_blkdev``.
>> +
>> +The parameter @start_sect is the relative position of the block device and
>> +partition. If block driver requires absolute position for panic_write,
>> +``blkoops_blkdev_info`` will be helpful, which can provide the absolute
>> +position of the block device (or partition) on the whole disk/flash.
>> +
>> +Normally zero should be returned, otherwise it indicates an error.
>> +
>> +Compression and header
>> +----------------------
>> +
>> +Block device is large enough for uncompressed dmesg data. Actually we do not
>> +recommend data compression because pstore/blk will insert some information into
>> +the first line of dmesg data. For example::
>> +
>> +        Panic: Total 16 times
>> +
>> +It means that it's the 16th times panic log since the first booting. Sometimes
> 
>                                time of a panic log since ...
> 

Should it be like this?
It means the time of a panic log since the first booting.

>> +the oops|panic occurs since burning is very important for embedded device to
> 
>                                ^^^^^^^ huh??
> 

How about this?

Sometimes the number of occurrences of oops|panic since the first
booting is important
to judge whether the system is stable.

>> +judge whether the system is stable.
>> +
>> +The following line is inserted by pstore filesystem. For example::
>> +
>> +        Oops#2 Part1
>> +
>> +It means that it's the 2nd times oops log on last booting.
> 
>                           2nd time of an oops log on the last boot. (?)
> 

How about this?

It means that it's OOPS for the 2nd time on the last boot.

>> +
>> +Reading the data
>> +----------------
>> +
>> +The dump data can be read from the pstore filesystem. The format for these
>> +files is ``dmesg-pstore-blk-[N]`` for dmesg(oops|panic), ``pmsg-pstore-blk-0``
>> +for pmsg and so on, where N is the record number. To delete a stored
>> +record from block device, simply unlink the respective pstore file. The
>> +timestamp of the dump file records the trigger time.
>> +
>> +Attentions in panic read/write APIs
>> +-----------------------------------
>> +
>> +If on panic, the kernel is not going to run for much longer. The tasks will not
> 
>                                                         longer, the tasks will not
> 

I will fix it, thank you.

>> +be scheduled and the most kernel resources will be out of service. It
> 
>              drop:  the
> 

I will fix it, thank you.

>> +looks like a single-threaded program running on a single-core computer.
>> +
>> +The following points require special attention for panic read/write APIs:
>> +
>> +1. Can **NOT** allocate any memory.
>> +   If you need memory, just allocate while the block driver is initializing
>> +   rather than waiting until the panic.
>> +#. Must be polled, **NOT** interrupt driven.
>> +   No task schedule any more. The block driver should delay to ensure the write
>> +   succeeds, but NOT sleep.
>> +#. Can **NOT** take any lock.
>> +   There is no other task, nor any share resource; you are safe to break all
> 
>                                       shared
> 

I will fix it, thank you.

>> +   locks.
>> +#. Just use CPU to transfer.
>> +   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
>> +#. Operate register directly.
> 
>       Don't know what that means.
> 

How about this?

#. Control registers directly.
    Please control registers directly rather than use Linux kernel
resources.
    Do I/O map while initializing rather than wait until a panic occurs.

>> +   Try not to use Linux kernel resources. Do I/O map while initializing rather
>> +   than waiting until the panic.
>> +#. Reset your block device and controller if necessary.
>> +   If you are not sure the state of you block device and controller when panic,
> 
>                          of the state of your block device and controller when a panic occurs,
> 
> 

I will fix it, thank you.

>> +   you are safe to stop and reset them.
>> +
>> +Blkoops supports blkoops_blkdev_info(), which is defined in *linux/blkoops.h*,
>> +to get information of block device, such as the device number, sector count and
>> +start sector of the whole disk.
> 
> 
> HTH.
> 

I will collect more suggestions and submit the new version at one time.
Thank you very much.

-- 
liaoweixiong

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-21  5:23     ` liaoweixiong
@ 2020-01-21  6:36       ` Randy Dunlap
  2020-01-21  8:19         ` liaoweixiong
  0 siblings, 1 reply; 32+ messages in thread
From: Randy Dunlap @ 2020-01-21  6:36 UTC (permalink / raw)
  To: liaoweixiong, Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

On 1/20/20 9:23 PM, liaoweixiong wrote:
> hi Randy Dunlap,
> 
> On 2020/1/21 PM12:13, Randy Dunlap wrote:
>> Hi,
>>
>> I have some documentation comments for you:
>>
>>
>> On 1/19/20 5:03 PM, WeiXiong Liao wrote:
>>> The document, at Documentation/admin-guide/pstore-block.rst, tells us
>>> how to use pstore/blk and blkoops.
>>>
>>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>>> ---
>>>  Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
>>>  MAINTAINERS                                |   1 +
>>>  fs/pstore/Kconfig                          |   2 +
>>>  3 files changed, 281 insertions(+)
>>>  create mode 100644 Documentation/admin-guide/pstore-block.rst
>>>
>>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>>> new file mode 100644
>>> index 000000000000..58418d429c55
>>> --- /dev/null
>>> +++ b/Documentation/admin-guide/pstore-block.rst
>>> +
>>> +
>>> +dmesg_size
>>> +~~~~~~~~~~
>>> +
>>> +The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
>>> +4096. If you don't need it, safely set it 0 or ignore it.
>>
>>                                       set it to 0 or ignore it.
>>
> 
> I will fix it, thank you.
> 
>> The example above is:  blkoops.dmesg_size=64
>> where 64 is not a multiple of 4096. (?)
>>
> 
> The module parameter dmesg_size is in unit KB.

I didn't see that documented anywhere.


>>> +Normally the number of bytes written should be returned, while for error,
>>> +negative number should be returned.
>>> +
>>> +panic_write (for block device)
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +It's much similar to panic_write for non-block device, but panic_write for
>>> +block device writes alignment to SECTOR_SIZE, that's why the parameters are
>>
>>                 writes only aligned sectors of SECTOR_SIZE  (??)
>>
> 
> How about this?
> 
> It's much similar to panic_write for non-block device, but the position and
> data size of panic_write for block device must be aligned to SECTOR_SIZE,
> that's why the parameters are @sects and @start_sect. Block device driver
> should register it by ``blkoops_register_blkdev``.

OK.

>>> +@sects and @start_sect. Block device driver should register it by
>>> +``blkoops_register_blkdev``.
>>> +
>>> +The parameter @start_sect is the relative position of the block device and
>>> +partition. If block driver requires absolute position for panic_write,
>>> +``blkoops_blkdev_info`` will be helpful, which can provide the absolute
>>> +position of the block device (or partition) on the whole disk/flash.
>>> +
>>> +Normally zero should be returned, otherwise it indicates an error.
>>> +
>>> +Compression and header
>>> +----------------------
>>> +
>>> +Block device is large enough for uncompressed dmesg data. Actually we do not
>>> +recommend data compression because pstore/blk will insert some information into
>>> +the first line of dmesg data. For example::
>>> +
>>> +        Panic: Total 16 times
>>> +
>>> +It means that it's the 16th times panic log since the first booting. Sometimes
>>
>>                                time of a panic log since ...
>>
> 
> Should it be like this?
> It means the time of a panic log since the first booting.

That sounds like clock time, not the number of instances or occurrences.

> 
>>> +the oops|panic occurs since burning is very important for embedded device to
>>
>>                                ^^^^^^^ huh??
>>
> 
> How about this?
> 
> Sometimes the number of occurrences of oops|panic since the first
> booting is important
> to judge whether the system is stable.

OK.

>>> +judge whether the system is stable.
>>> +
>>> +The following line is inserted by pstore filesystem. For example::
>>> +
>>> +        Oops#2 Part1
>>> +
>>> +It means that it's the 2nd times oops log on last booting.
>>
>>                           2nd time of an oops log on the last boot. (?)
>>
> 
> How about this?
> 
> It means that it's OOPS for the 2nd time on the last boot.

OK. It's an oops counter.

>>> +#. Just use CPU to transfer.
>>> +   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
>>> +#. Operate register directly.
>>
>>       Don't know what that means.
>>
> 
> How about this?
> 
> #. Control registers directly.
>     Please control registers directly rather than use Linux kernel
> resources.

OK.

>     Do I/O map while initializing rather than wait until a panic occurs.
> 
>>> +   Try not to use Linux kernel resources. Do I/O map while initializing rather
>>> +   than waiting until the panic.


-- 
~Randy


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-21  6:36       ` Randy Dunlap
@ 2020-01-21  8:19         ` liaoweixiong
  2020-01-21 15:34           ` Randy Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: liaoweixiong @ 2020-01-21  8:19 UTC (permalink / raw)
  To: Randy Dunlap, Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

hi Randy Dunlap,

On 2020/1/21 2:36 PM, Randy Dunlap wrote:
> On 1/20/20 9:23 PM, liaoweixiong wrote:
>> hi Randy Dunlap,
>>
>> On 2020/1/21 PM12:13, Randy Dunlap wrote:
>>> Hi,
>>>
>>> I have some documentation comments for you:
>>>
>>>
>>> On 1/19/20 5:03 PM, WeiXiong Liao wrote:
>>>> The document, at Documentation/admin-guide/pstore-block.rst, tells us
>>>> how to use pstore/blk and blkoops.
>>>>
>>>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>>>> ---
>>>>   Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
>>>>   MAINTAINERS                                |   1 +
>>>>   fs/pstore/Kconfig                          |   2 +
>>>>   3 files changed, 281 insertions(+)
>>>>   create mode 100644 Documentation/admin-guide/pstore-block.rst
>>>>
>>>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>>>> new file mode 100644
>>>> index 000000000000..58418d429c55
>>>> --- /dev/null
>>>> +++ b/Documentation/admin-guide/pstore-block.rst
>>>> +
>>>> +
>>>> +dmesg_size
>>>> +~~~~~~~~~~
>>>> +
>>>> +The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
>>>> +4096. If you don't need it, safely set it 0 or ignore it.
>>>
>>>                                        set it to 0 or ignore it.
>>>
>>
>> I will fix it, thank you.
>>
>>> The example above is:  blkoops.dmesg_size=64
>>> where 64 is not a multiple of 4096. (?)
>>>
>>
>> The module parameter dmesg_size is in unit KB.
> 
> I didn't see that documented anywhere.
> 

Oh, sorry, that is my oversight. It seems that not only the other size 
introductions but also introductions on Kconfig should be corrected. 
Thank you very much and is the following modification OK?

The chunk size in KB for dmesg(oops/panic). It **MUST** be a multiple of 4.

> 
>>>> +Normally the number of bytes written should be returned, while for error,
>>>> +negative number should be returned.
>>>> +
>>>> +panic_write (for block device)
>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +It's much similar to panic_write for non-block device, but panic_write for
>>>> +block device writes alignment to SECTOR_SIZE, that's why the parameters are
>>>
>>>                  writes only aligned sectors of SECTOR_SIZE  (??)
>>>
>>
>> How about this?
>>
>> It's much similar to panic_write for non-block device, but the position and
>> data size of panic_write for block device must be aligned to SECTOR_SIZE,
>> that's why the parameters are @sects and @start_sect. Block device driver
>> should register it by ``blkoops_register_blkdev``.
> 
> OK.
> 
>>>> +@sects and @start_sect. Block device driver should register it by
>>>> +``blkoops_register_blkdev``.
>>>> +
>>>> +The parameter @start_sect is the relative position of the block device and
>>>> +partition. If block driver requires absolute position for panic_write,
>>>> +``blkoops_blkdev_info`` will be helpful, which can provide the absolute
>>>> +position of the block device (or partition) on the whole disk/flash.
>>>> +
>>>> +Normally zero should be returned, otherwise it indicates an error.
>>>> +
>>>> +Compression and header
>>>> +----------------------
>>>> +
>>>> +Block device is large enough for uncompressed dmesg data. Actually we do not
>>>> +recommend data compression because pstore/blk will insert some information into
>>>> +the first line of dmesg data. For example::
>>>> +
>>>> +        Panic: Total 16 times
>>>> +
>>>> +It means that it's the 16th times panic log since the first booting. Sometimes
>>>
>>>                                 time of a panic log since ...
>>>
>>
>> Should it be like this?
>> It means the time of a panic log since the first booting.
> 
> That sounds like clock time, not the number of instances or occurrences.
> 

It is an oops/panic counter too. How about this?

It means that it's OOPS/PANIC for the 16th time since the first booting.

>>
>>>> +the oops|panic occurs since burning is very important for embedded device to
>>>
>>>                                 ^^^^^^^ huh??
>>>
>>
>> How about this?
>>
>> Sometimes the number of occurrences of oops|panic since the first
>> booting is important
>> to judge whether the system is stable.
> 
> OK.
> 
>>>> +judge whether the system is stable.
>>>> +
>>>> +The following line is inserted by pstore filesystem. For example::
>>>> +
>>>> +        Oops#2 Part1
>>>> +
>>>> +It means that it's the 2nd times oops log on last booting.
>>>
>>>                            2nd time of an oops log on the last boot. (?)
>>>
>>
>> How about this?
>>
>> It means that it's OOPS for the 2nd time on the last boot.
> 
> OK. It's an oops counter.
> 
>>>> +#. Just use CPU to transfer.
>>>> +   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
>>>> +#. Operate register directly.
>>>
>>>        Don't know what that means.
>>>
>>
>> How about this?
>>
>> #. Control registers directly.
>>      Please control registers directly rather than use Linux kernel
>> resources.
> 
> OK.
> 
>>      Do I/O map while initializing rather than wait until a panic occurs.
>>
>>>> +   Try not to use Linux kernel resources. Do I/O map while initializing rather
>>>> +   than waiting until the panic.
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-21  3:36     ` liaoweixiong
@ 2020-01-21  8:48       ` Miquel Raynal
  2020-01-22 17:22         ` liaoweixiong
  0 siblings, 1 reply; 32+ messages in thread
From: Miquel Raynal @ 2020-01-21  8:48 UTC (permalink / raw)
  To: liaoweixiong
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

Hello,

liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Tue, 21 Jan 2020
11:36:00 +0800:

> hi Miquel Raynal,
> 
> On 2020/1/20 PM 6:03, Miquel Raynal wrote:
> > Hi WeiXiong,
> > 
> > WeiXiong Liao <liaoweixiong@allwinnertech.com> wrote on Mon, 20 Jan
> > 2020 09:03:53 +0800:
> >   
> >> It's the last one of a series of patches for adaptive to MTD device.
> >>
> >> The mtdpstore is similar to mtdoops but more powerful. It bases on
> >> pstore/blk, aims to store panic and oops log to a flash partition,  
> > 
> >                                            logs?
> >   
> 
> I will fix it. Thanks.
> 
> >> where it can be read back as files after mounting pstore filesystem.
> >>
> >> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
> >> block device at the very beginning, but now, compatible to not only
> >> block device. After this series of patches, pstore/blk can also work
> >> for MTD device. To make it work, 'blkdev' on kconfig or module
> >> parameter of blkoops should be set as mtd device name or mtd number.
> >> See more about pstore/blk and blkoops on:
> >>     Documentation/admin-guide/pstore-block.rst
> >>
> >> Why do we need mtdpstore?
> >> 1. repetitive jobs between pstore and mtdoops
> >>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
> >>    They have much similar logic that register to kmsg dumper and store
> >>    log to several chunks one by one.
> >> 2. do what a driver should do
> >>    To me, a driver should provide methods instead of policies. What MTD
> >>    should do is to provide read/write/erase operations, geting rid of codes
> >>    about chunk management, kmsg dumper and configuration.
> >> 3. enhanced feature
> >>    Not only store log, but also show it as files.
> >>    Not only log, but also trigger time and trigger count.
> >>    Not only panic/oops log, but also log recorder for pmsg, console and
> >>    ftrace in the future.
> >>
> >> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
> >> Reported-by: kbuild test robot <lkp@intel.com>  
> > 
> > I don't thing the test robot has a meaning here.
> >   
> 
> I do not know what meaning the test rebot tag has, but i was suggested
> from kbuild test rebot to do so. How should i do to it ? Drop the tag or
> keep the tag or other?
> The email from kbuild test rebot said that:
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>

You probably pushed your work on a dedicated repository on which this
robot has run. It does not make any difference between upstream sources
and downstream contributions. You may add this tag when you are
fixing something reported by the robot against the upstream kernel.
Here, the driver is new, this is a feature you are adding, so please
drop the tag.

[...]

> >> +/*
> >> + * All zones will be read as pstore/blk will read zone one by one when do
> >> + * recover.
> >> + */
> >> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> >> +{
> >> +	struct mtdpstore_context *cxt = &oops_cxt;
> >> +	size_t retlen;
> >> +	int ret;
> >> +
> >> +	if (mtdpstore_block_isbad(cxt, off))
> >> +		return -ENEXT;
> >> +
> >> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> >> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> >> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {  
> > 
> > IIRC size != retlen does not mean it failed, but that you should
> > continue reading after retlen bytes, no?
> >   
> 
> Yes, you are right. I will fix it. Thanks.
> 
> > Also, mtd_is_bitflip() does not mean that you are reading a false
> > buffer, but that the data has been corrected as it contained bitflips.
> > mtd_is_eccerr() however, would be meaningful.
> >   
> 
> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
> mtd_is_bitflip().

Yes, just drop this check, only keep ret < 0.

> 
> >> +		pr_err("read failure at %lld (%zu of %zu read), err %d\n",
> >> +				off, retlen, size, ret);
> >> +		return -EIO;
> >> +	}
> >> +
> >> +	if (mtdpstore_is_empty(cxt, buf, size))
> >> +		mtdpstore_mark_unused(cxt, off);
> >> +	else
> >> +		mtdpstore_mark_used(cxt, off);
> >> +
> >> +	mtdpstore_security(cxt, off);
> >> +	return retlen;
> >> +}
> >> +
> >> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
> >> +{
> >> +	struct mtdpstore_context *cxt = &oops_cxt;
> >> +	size_t retlen;
> >> +	int ret;
> >> +
> >> +	if (mtdpstore_panic_block_isbad(cxt, off))
> >> +		return -ENEXT;
> >> +
> >> +	/* zone is used, please try next one */
> >> +	if (mtdpstore_is_used(cxt, off))
> >> +		return -ENEXT;
> >> +
> >> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> >> +	if (ret < 0 || size != retlen) {
> >> +		pr_err("panic write failure at %lld (%zu of %zu read), err %d\n",
> >> +				off, retlen, size, ret);
> >> +		return -EIO;
> >> +	}
> >> +	mtdpstore_mark_used(cxt, off);
> >> +
> >> +	return retlen;
> >> +}
> >> +
> >> +static void mtdpstore_notify_add(struct mtd_info *mtd)
> >> +{
> >> +	int ret;
> >> +	struct mtdpstore_context *cxt = &oops_cxt;
> >> +	struct blkoops_info *info = &cxt->bo_info;
> >> +	unsigned long longcnt;
> >> +
> >> +	if (!strcmp(mtd->name, info->device))
> >> +		cxt->index = mtd->index;
> >> +
> >> +	if (mtd->index != cxt->index || cxt->index < 0)
> >> +		return;
> >> +
> >> +	pr_debug("found matching MTD device %s\n", mtd->name);
> >> +
> >> +	if (mtd->size < info->dmesg_size * 2) {
> >> +		pr_err("MTD partition %d not big enough\n", mtd->index);
> >> +		return;
> >> +	}
> >> +	if (mtd->erasesize < info->dmesg_size) {
> >> +		pr_err("eraseblock size of MTD partition %d too small\n",
> >> +				mtd->index);  
> > 
> > What is the usual size of dmesg? Could this check be too limiting?
> >   
> 
> The size must be aligned to 4096, which is limited by blkoops. The
> default value is 64K. If it is larger than erasesize, some errors will occur
> since mtdpstore is designed on it.

Please add a comment with the above explanation.

> 
> >> +		return;
> >> +	}
> >> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
> >> +		pr_err("record size %lu KB must align to write size %d KB\n",
> >> +				info->dmesg_size / 1024,
> >> +				mtd->writesize / 1024);  
> > 
> > This condition is weird, why would you check this?
> >   
> 
> pstore/blk will write 'record_size' dmesg log at one time.
> Since each write data must be aligned to 'writesize' for flash, I am not
> sure
> all flash drivers are compatible with misaligned data, that's why i
> check this.

I think you should enforce this alignment instead of checking it.

> 
> >> +		return;
> >> +	}
> >> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
> >> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
> >> +				mtd->index,
> >> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);  
> > 
> > Same question? I could understand that it is easier to manage blocks
> > knowing their maximum number though.
> >   
> 
> It refers to mtdoops.

What do you mean?

> 
> >> +		return;
> >> +	}
> >> +
> >> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
> >> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> >> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> >> +
> >> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
> >> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> >> +
> >> +	cxt->bo_dev.total_size = mtd->size;
> >> +	/* just support dmesg right now */
> >> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
> >> +	cxt->bo_dev.read = mtdpstore_read;
> >> +	cxt->bo_dev.write = mtdpstore_write;
> >> +	cxt->bo_dev.erase = mtdpstore_erase;
> >> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
> >> +
> >> +	ret = blkoops_register_device(&cxt->bo_dev);
> >> +	if (ret) {
> >> +		pr_err("mtd%d register to blkoops failed\n", mtd->index);
> >> +		return;
> >> +	}
> >> +	cxt->mtd = mtd;
> >> +	pr_info("Attached to MTD device %d\n", mtd->index);
> >> +}
> >> +
> >> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
> >> +		loff_t off, size_t size)
> >> +{
> >> +	struct mtd_info *mtd = cxt->mtd;
> >> +	u_char *buf;
> >> +	int ret;
> >> +	size_t retlen;
> >> +	struct erase_info erase;
> >> +
> >> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
> >> +	if (!buf)
> >> +		return -ENOMEM;
> >> +
> >> +	/* 1st. read to cache */
> >> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
> >> +	if (ret || retlen != mtd->erasesize)
> >> +		goto free;
> >> +
> >> +	/* 2nd. erase block */
> >> +	erase.len = mtd->erasesize;
> >> +	erase.addr = off;
> >> +	ret = mtd_erase(mtd, &erase);
> >> +	if (ret)
> >> +		goto free;
> >> +
> >> +	/* 3rd. write back */
> >> +	while (size) {
> >> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
> >> +
> >> +		/* remove must clear used bit */
> >> +		if (mtdpstore_is_used(cxt, off))
> >> +			mtd_write(mtd, off, zonesize, &retlen, buf);  
> > 
> > Besides the fact that should definitely check the write return code, I
> > don't understand what you do in this function. What does
> > flush_removed_do mean?
> >   
> 
> When user remove one log file on pstore filesystem, mtdpstore should do
> something to ensure log file removed. If the whole block is no longer used,
> it is nice to erase the block. However, if the block still contains
> valid log,
> what mtdpstore can do is to erase and write the valid log back.
> That is what flush_removed_do() do.

Please explain with a comment.

> 
> In case of repeated erase when users remove several log files, mtdpstore
> do remove jobs when exit.
> 
> Besides, mtdpstore do not check the return code to ensure write back valid
> log as much as possible.

You are not in a critical path, I don't understand why you don't check
it? If it returns an error, it means the data is not written. IMHO it
is best to alert the user than to silently fail.

> 
> >> +
> >> +		off += zonesize;
> >> +		size -= min_t(unsigned int, zonesize, size);
> >> +	}
> >> +
> >> +free:
> >> +	kfree(buf);
> >> +	return ret;
> >> +}
> >> +


[...]

> > 
> > Thanks,
> > Miquèl
> >   
> 
> I will collect more suggestions and submit the new version at one time.
> 

Sure, no hurry.


Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-21  8:19         ` liaoweixiong
@ 2020-01-21 15:34           ` Randy Dunlap
  2020-01-22 15:01             ` liaoweixiong
  0 siblings, 1 reply; 32+ messages in thread
From: Randy Dunlap @ 2020-01-21 15:34 UTC (permalink / raw)
  To: liaoweixiong, Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

On 1/21/20 12:19 AM, liaoweixiong wrote:
> hi Randy Dunlap,
> 
> On 2020/1/21 2:36 PM, Randy Dunlap wrote:
>> On 1/20/20 9:23 PM, liaoweixiong wrote:
>>> hi Randy Dunlap,
>>>
>>> On 2020/1/21 PM12:13, Randy Dunlap wrote:
>>>> Hi,
>>>>
>>>> I have some documentation comments for you:
>>>>
>>>>
>>>> On 1/19/20 5:03 PM, WeiXiong Liao wrote:
>>>>> The document, at Documentation/admin-guide/pstore-block.rst, tells us
>>>>> how to use pstore/blk and blkoops.
>>>>>
>>>>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>>>>> ---
>>>>>   Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
>>>>>   MAINTAINERS                                |   1 +
>>>>>   fs/pstore/Kconfig                          |   2 +
>>>>>   3 files changed, 281 insertions(+)
>>>>>   create mode 100644 Documentation/admin-guide/pstore-block.rst
>>>>>
>>>>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>>>>> new file mode 100644
>>>>> index 000000000000..58418d429c55
>>>>> --- /dev/null
>>>>> +++ b/Documentation/admin-guide/pstore-block.rst
>>>>> +
>>>>> +
>>>>> +dmesg_size
>>>>> +~~~~~~~~~~
>>>>> +
>>>>> +The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
>>>>> +4096. If you don't need it, safely set it 0 or ignore it.
>>>>
>>>>                                        set it to 0 or ignore it.
>>>>
>>>
>>> I will fix it, thank you.
>>>
>>>> The example above is:  blkoops.dmesg_size=64
>>>> where 64 is not a multiple of 4096. (?)
>>>>
>>>
>>> The module parameter dmesg_size is in unit KB.
>>
>> I didn't see that documented anywhere.
>>
> 
> Oh, sorry, that is my oversight. It seems that not only the other size introductions but also introductions on Kconfig should be corrected. Thank you very much and is the following modification OK?
> 
> The chunk size in KB for dmesg(oops/panic). It **MUST** be a multiple of 4.

OK.


>>>>> +Compression and header
>>>>> +----------------------
>>>>> +
>>>>> +Block device is large enough for uncompressed dmesg data. Actually we do not
>>>>> +recommend data compression because pstore/blk will insert some information into
>>>>> +the first line of dmesg data. For example::
>>>>> +
>>>>> +        Panic: Total 16 times
>>>>> +
>>>>> +It means that it's the 16th times panic log since the first booting. Sometimes
>>>>
>>>>                                 time of a panic log since ...
>>>>
>>>
>>> Should it be like this?
>>> It means the time of a panic log since the first booting.
>>
>> That sounds like clock time, not the number of instances or occurrences.
>>
> 
> It is an oops/panic counter too. How about this?
> 
> It means that it's OOPS/PANIC for the 16th time since the first booting.

                                                  since the last booting {or boot}.

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-21 15:34           ` Randy Dunlap
@ 2020-01-22 15:01             ` liaoweixiong
  2020-01-22 16:08               ` Randy Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: liaoweixiong @ 2020-01-22 15:01 UTC (permalink / raw)
  To: Randy Dunlap, Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

On 2020/1/21 下午11:34, Randy Dunlap wrote:
> On 1/21/20 12:19 AM, liaoweixiong wrote:
>> hi Randy Dunlap,
>>
>> On 2020/1/21 2:36 PM, Randy Dunlap wrote:
>>> On 1/20/20 9:23 PM, liaoweixiong wrote:
>>>> hi Randy Dunlap,
>>>>
>>>> On 2020/1/21 PM12:13, Randy Dunlap wrote:
>>>>> Hi,
>>>>>
>>>>> I have some documentation comments for you:
>>>>>
>>>>>
>>>>> On 1/19/20 5:03 PM, WeiXiong Liao wrote:
>>>>>> The document, at Documentation/admin-guide/pstore-block.rst, tells us
>>>>>> how to use pstore/blk and blkoops.
>>>>>>
>>>>>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>>>>>> ---
>>>>>>    Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
>>>>>>    MAINTAINERS                                |   1 +
>>>>>>    fs/pstore/Kconfig                          |   2 +
>>>>>>    3 files changed, 281 insertions(+)
>>>>>>    create mode 100644 Documentation/admin-guide/pstore-block.rst
>>>>>>
>>>>>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>>>>>> new file mode 100644
>>>>>> index 000000000000..58418d429c55
>>>>>> --- /dev/null
>>>>>> +++ b/Documentation/admin-guide/pstore-block.rst
>>>>>> +
>>>>>> +
>>>>>> +dmesg_size
>>>>>> +~~~~~~~~~~
>>>>>> +
>>>>>> +The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
>>>>>> +4096. If you don't need it, safely set it 0 or ignore it.
>>>>>
>>>>>                                         set it to 0 or ignore it.
>>>>>
>>>>
>>>> I will fix it, thank you.
>>>>
>>>>> The example above is:  blkoops.dmesg_size=64
>>>>> where 64 is not a multiple of 4096. (?)
>>>>>
>>>>
>>>> The module parameter dmesg_size is in unit KB.
>>>
>>> I didn't see that documented anywhere.
>>>
>>
>> Oh, sorry, that is my oversight. It seems that not only the other size introductions but also introductions on Kconfig should be corrected. Thank you very much and is the following modification OK?
>>
>> The chunk size in KB for dmesg(oops/panic). It **MUST** be a multiple of 4.
> 
> OK.
> 
> 
>>>>>> +Compression and header
>>>>>> +----------------------
>>>>>> +
>>>>>> +Block device is large enough for uncompressed dmesg data. Actually we do not
>>>>>> +recommend data compression because pstore/blk will insert some information into
>>>>>> +the first line of dmesg data. For example::
>>>>>> +
>>>>>> +        Panic: Total 16 times
>>>>>> +
>>>>>> +It means that it's the 16th times panic log since the first booting. Sometimes
>>>>>
>>>>>                                  time of a panic log since ...
>>>>>
>>>>
>>>> Should it be like this?
>>>> It means the time of a panic log since the first booting.
>>>
>>> That sounds like clock time, not the number of instances or occurrences.
>>>
>>
>> It is an oops/panic counter too. How about this?
>>
>> It means that it's OOPS/PANIC for the 16th time since the first booting.
> 
>                                                    since the last booting {or boot}.
> 

Not the last booting but the first booting. This is the number of
triggers since the first time the system was installed.

> thanks.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk
  2020-01-22 15:01             ` liaoweixiong
@ 2020-01-22 16:08               ` Randy Dunlap
  0 siblings, 0 replies; 32+ messages in thread
From: Randy Dunlap @ 2020-01-22 16:08 UTC (permalink / raw)
  To: liaoweixiong, Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

On 1/22/20 7:01 AM, liaoweixiong wrote:
> On 2020/1/21 下午11:34, Randy Dunlap wrote:
>> On 1/21/20 12:19 AM, liaoweixiong wrote:
>>> hi Randy Dunlap,
>>>
>>> On 2020/1/21 2:36 PM, Randy Dunlap wrote:
>>>> On 1/20/20 9:23 PM, liaoweixiong wrote:
>>>>> hi Randy Dunlap,
>>>>>
>>>>> On 2020/1/21 PM12:13, Randy Dunlap wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have some documentation comments for you:
>>>>>>
>>>>>>
>>>>>> On 1/19/20 5:03 PM, WeiXiong Liao wrote:
>>>>>>> The document, at Documentation/admin-guide/pstore-block.rst, tells us
>>>>>>> how to use pstore/blk and blkoops.
>>>>>>>
>>>>>>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>>>>>>> ---
>>>>>>>    Documentation/admin-guide/pstore-block.rst | 278 +++++++++++++++++++++++++++++
>>>>>>>    MAINTAINERS                                |   1 +
>>>>>>>    fs/pstore/Kconfig                          |   2 +
>>>>>>>    3 files changed, 281 insertions(+)
>>>>>>>    create mode 100644 Documentation/admin-guide/pstore-block.rst
>>>>>>>
>>>>>>> diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..58418d429c55
>>>>>>> --- /dev/null
>>>>>>> +++ b/Documentation/admin-guide/pstore-block.rst

>>>>>>> +Compression and header
>>>>>>> +----------------------
>>>>>>> +
>>>>>>> +Block device is large enough for uncompressed dmesg data. Actually we do not
>>>>>>> +recommend data compression because pstore/blk will insert some information into
>>>>>>> +the first line of dmesg data. For example::
>>>>>>> +
>>>>>>> +        Panic: Total 16 times
>>>>>>> +
>>>>>>> +It means that it's the 16th times panic log since the first booting. Sometimes
>>>>>>
>>>>>>                                  time of a panic log since ...
>>>>>>
>>>>>
>>>>> Should it be like this?
>>>>> It means the time of a panic log since the first booting.
>>>>
>>>> That sounds like clock time, not the number of instances or occurrences.
>>>>
>>>
>>> It is an oops/panic counter too. How about this?
>>>
>>> It means that it's OOPS/PANIC for the 16th time since the first booting.
>>
>>                                                    since the last booting {or boot}.
>>
> 
> Not the last booting but the first booting. This is the number of
> triggers since the first time the system was installed.

OK, so it's a persistent counter.
Thanks for the clarification.

-- 
~Randy


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-21  8:48       ` Miquel Raynal
@ 2020-01-22 17:22         ` liaoweixiong
  2020-01-22 17:41           ` Miquel Raynal
  0 siblings, 1 reply; 32+ messages in thread
From: liaoweixiong @ 2020-01-22 17:22 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

hi Miquel Raynal,

On 2020/1/21 4:48 PM, Miquel Raynal wrote:
> Hello,
> 
> liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Tue, 21 Jan 2020
> 11:36:00 +0800:
> 
>> hi Miquel Raynal,
>>
>> On 2020/1/20 PM 6:03, Miquel Raynal wrote:
>>> Hi WeiXiong,
>>>
>>> WeiXiong Liao <liaoweixiong@allwinnertech.com> wrote on Mon, 20 Jan
>>> 2020 09:03:53 +0800:
>>>    
>>>> It's the last one of a series of patches for adaptive to MTD device.
>>>>
>>>> The mtdpstore is similar to mtdoops but more powerful. It bases on
>>>> pstore/blk, aims to store panic and oops log to a flash partition,
>>>
>>>                                             logs?
>>>    
>>
>> I will fix it. Thanks.
>>
>>>> where it can be read back as files after mounting pstore filesystem.
>>>>
>>>> The pstore/blk and blkoops, a wrapper for pstore/blk, are designed for
>>>> block device at the very beginning, but now, compatible to not only
>>>> block device. After this series of patches, pstore/blk can also work
>>>> for MTD device. To make it work, 'blkdev' on kconfig or module
>>>> parameter of blkoops should be set as mtd device name or mtd number.
>>>> See more about pstore/blk and blkoops on:
>>>>      Documentation/admin-guide/pstore-block.rst
>>>>
>>>> Why do we need mtdpstore?
>>>> 1. repetitive jobs between pstore and mtdoops
>>>>     Both of pstore and mtdoops do the same jobs that store panic/oops log.
>>>>     They have much similar logic that register to kmsg dumper and store
>>>>     log to several chunks one by one.
>>>> 2. do what a driver should do
>>>>     To me, a driver should provide methods instead of policies. What MTD
>>>>     should do is to provide read/write/erase operations, geting rid of codes
>>>>     about chunk management, kmsg dumper and configuration.
>>>> 3. enhanced feature
>>>>     Not only store log, but also show it as files.
>>>>     Not only log, but also trigger time and trigger count.
>>>>     Not only panic/oops log, but also log recorder for pmsg, console and
>>>>     ftrace in the future.
>>>>
>>>> Signed-off-by: WeiXiong Liao <liaoweixiong@allwinnertech.com>
>>>> Reported-by: kbuild test robot <lkp@intel.com>
>>>
>>> I don't thing the test robot has a meaning here.
>>>    
>>
>> I do not know what meaning the test rebot tag has, but i was suggested
>> from kbuild test rebot to do so. How should i do to it ? Drop the tag or
>> keep the tag or other?
>> The email from kbuild test rebot said that:
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kbuild test robot <lkp@intel.com>
> 
> You probably pushed your work on a dedicated repository on which this
> robot has run. It does not make any difference between upstream sources
> and downstream contributions. You may add this tag when you are
> fixing something reported by the robot against the upstream kernel.
> Here, the driver is new, this is a feature you are adding, so please
> drop the tag.
> 

OK. I will fix it later. Thank you.

> [...]
> 
>>>> +/*
>>>> + * All zones will be read as pstore/blk will read zone one by one when do
>>>> + * recover.
>>>> + */
>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>>>> +{
>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
>>>> +	size_t retlen;
>>>> +	int ret;
>>>> +
>>>> +	if (mtdpstore_block_isbad(cxt, off))
>>>> +		return -ENEXT;
>>>> +
>>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
>>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
>>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
>>>
>>> IIRC size != retlen does not mean it failed, but that you should
>>> continue reading after retlen bytes, no?
>>>    
>>
>> Yes, you are right. I will fix it. Thanks.
>>
>>> Also, mtd_is_bitflip() does not mean that you are reading a false
>>> buffer, but that the data has been corrected as it contained bitflips.
>>> mtd_is_eccerr() however, would be meaningful.
>>>    
>>
>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
>> mtd_is_bitflip().
> 
> Yes, just drop this check, only keep ret < 0.
> 

If I don't get it wrong, it should not	 be dropped here. Like your words,
"mtd_is_bitflip() does not mean that you are reading a false buffer,
but that the data has been corrected as it contained bitflips.", the
data I get are valid even if mtd_is_bitflip() return true. It's correct
data and it's no need to go to handle error. To me, the codes
should be:
	if (ret < 0 && !mit_is_bitflip())
		[error handling]

>>
>>>> +		pr_err("read failure at %lld (%zu of %zu read), err %d\n",
>>>> +				off, retlen, size, ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	if (mtdpstore_is_empty(cxt, buf, size))
>>>> +		mtdpstore_mark_unused(cxt, off);
>>>> +	else
>>>> +		mtdpstore_mark_used(cxt, off);
>>>> +
>>>> +	mtdpstore_security(cxt, off);
>>>> +	return retlen;
>>>> +}
>>>> +
>>>> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
>>>> +{
>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
>>>> +	size_t retlen;
>>>> +	int ret;
>>>> +
>>>> +	if (mtdpstore_panic_block_isbad(cxt, off))
>>>> +		return -ENEXT;
>>>> +
>>>> +	/* zone is used, please try next one */
>>>> +	if (mtdpstore_is_used(cxt, off))
>>>> +		return -ENEXT;
>>>> +
>>>> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
>>>> +	if (ret < 0 || size != retlen) {
>>>> +		pr_err("panic write failure at %lld (%zu of %zu read), err %d\n",
>>>> +				off, retlen, size, ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +	mtdpstore_mark_used(cxt, off);
>>>> +
>>>> +	return retlen;
>>>> +}
>>>> +
>>>> +static void mtdpstore_notify_add(struct mtd_info *mtd)
>>>> +{
>>>> +	int ret;
>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
>>>> +	struct blkoops_info *info = &cxt->bo_info;
>>>> +	unsigned long longcnt;
>>>> +
>>>> +	if (!strcmp(mtd->name, info->device))
>>>> +		cxt->index = mtd->index;
>>>> +
>>>> +	if (mtd->index != cxt->index || cxt->index < 0)
>>>> +		return;
>>>> +
>>>> +	pr_debug("found matching MTD device %s\n", mtd->name);
>>>> +
>>>> +	if (mtd->size < info->dmesg_size * 2) {
>>>> +		pr_err("MTD partition %d not big enough\n", mtd->index);
>>>> +		return;
>>>> +	}
>>>> +	if (mtd->erasesize < info->dmesg_size) {
>>>> +		pr_err("eraseblock size of MTD partition %d too small\n",
>>>> +				mtd->index);
>>>
>>> What is the usual size of dmesg? Could this check be too limiting?
>>>    
>>
>> The size must be aligned to 4096, which is limited by blkoops. The
>> default value is 64K. If it is larger than erasesize, some errors will occur
>> since mtdpstore is designed on it.
> 
> Please add a comment with the above explanation.
> 

OK, I will do it later. Thank you.

>>
>>>> +		return;
>>>> +	}
>>>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
>>>> +		pr_err("record size %lu KB must align to write size %d KB\n",
>>>> +				info->dmesg_size / 1024,
>>>> +				mtd->writesize / 1024);
>>>
>>> This condition is weird, why would you check this?
>>>    
>>
>> pstore/blk will write 'record_size' dmesg log at one time.
>> Since each write data must be aligned to 'writesize' for flash, I am not
>> sure
>> all flash drivers are compatible with misaligned data, that's why i
>> check this.
> 
> I think you should enforce this alignment instead of checking it.
> 

Do you mean that mtdpstore should enforce this alignment while running?
The way I can think of is to handle a buffer aligned to writesize and
write to flash with this aligned buffer.

That causes some error. The MTD device will be divided into mutil
chunks accroding to dmesg_size. Each chunk stores a individual
OOPS/Panic log. If dmesg_size is misaligned to writesize, the last
write results in next write failure because the page of flash can only
be programed once before next erase and the page shared by two chunks
has been used by the last write. Besides, we can not read to buffer,
ersae and write back as there is no read/erase for panic case.

>>
>>>> +		return;
>>>> +	}
>>>> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
>>>> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
>>>> +				mtd->index,
>>>> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);
>>>
>>> Same question? I could understand that it is easier to manage blocks
>>> knowing their maximum number though.
>>>    
>>
>> It refers to mtdoops.
> 
> What do you mean?
> 

To me, it's unnecessary to check at all, however it is really there
on codes of mtdoops. I refer to module mtdoops when I design mtdpstore.
It may be helpfull for some cases out of my think, that's why I keep it.

>>
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
>>>> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>>>> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>>>> +
>>>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
>>>> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>>>> +
>>>> +	cxt->bo_dev.total_size = mtd->size;
>>>> +	/* just support dmesg right now */
>>>> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
>>>> +	cxt->bo_dev.read = mtdpstore_read;
>>>> +	cxt->bo_dev.write = mtdpstore_write;
>>>> +	cxt->bo_dev.erase = mtdpstore_erase;
>>>> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
>>>> +
>>>> +	ret = blkoops_register_device(&cxt->bo_dev);
>>>> +	if (ret) {
>>>> +		pr_err("mtd%d register to blkoops failed\n", mtd->index);
>>>> +		return;
>>>> +	}
>>>> +	cxt->mtd = mtd;
>>>> +	pr_info("Attached to MTD device %d\n", mtd->index);
>>>> +}
>>>> +
>>>> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
>>>> +		loff_t off, size_t size)
>>>> +{
>>>> +	struct mtd_info *mtd = cxt->mtd;
>>>> +	u_char *buf;
>>>> +	int ret;
>>>> +	size_t retlen;
>>>> +	struct erase_info erase;
>>>> +
>>>> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
>>>> +	if (!buf)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	/* 1st. read to cache */
>>>> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
>>>> +	if (ret || retlen != mtd->erasesize)
>>>> +		goto free;
>>>> +
>>>> +	/* 2nd. erase block */
>>>> +	erase.len = mtd->erasesize;
>>>> +	erase.addr = off;
>>>> +	ret = mtd_erase(mtd, &erase);
>>>> +	if (ret)
>>>> +		goto free;
>>>> +
>>>> +	/* 3rd. write back */
>>>> +	while (size) {
>>>> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
>>>> +
>>>> +		/* remove must clear used bit */
>>>> +		if (mtdpstore_is_used(cxt, off))
>>>> +			mtd_write(mtd, off, zonesize, &retlen, buf);
>>>
>>> Besides the fact that should definitely check the write return code, I
>>> don't understand what you do in this function. What does
>>> flush_removed_do mean?
>>>    
>>
>> When user remove one log file on pstore filesystem, mtdpstore should do
>> something to ensure log file removed. If the whole block is no longer used,
>> it is nice to erase the block. However, if the block still contains
>> valid log,
>> what mtdpstore can do is to erase and write the valid log back.
>> That is what flush_removed_do() do.
> 
> Please explain with a comment.
> 

OK, I will do it later. Thank you.

>>
>> In case of repeated erase when users remove several log files, mtdpstore
>> do remove jobs when exit.
>>
>> Besides, mtdpstore do not check the return code to ensure write back valid
>> log as much as possible.
> 
> You are not in a critical path, I don't understand why you don't check
> it? If it returns an error, it means the data is not written. IMHO it
> is best to alert the user than to silently fail.
> 

This function will be called only when mtd device is removing. It's
useless to alert the user but try to flush the other valid data to
flash as mush as possible by which reduce losses. If it's just
because of busy, what happens next time?

>>
>>>> +. 
>>>> +		off += zonesize;
>>>> +		size -= min_t(unsigned int, zonesize, size);
>>>> +	}
>>>> +
>>>> +free:
>>>> +	kfree(buf);
>>>> +	return ret;
>>>> +}
>>>> +
> 
> 
> [...]
> 
>>>
>>> Thanks,
>>> Miquèl
>>>    
>>
>> I will collect more suggestions and submit the new version at one time.
>>
> 
> Sure, no hurry.
> 

I am on holiday, please forgive me for my slow response.

> 
> Thanks,
> Miquèl
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-22 17:22         ` liaoweixiong
@ 2020-01-22 17:41           ` Miquel Raynal
  2020-02-06 13:10             ` liaoweixiong
  0 siblings, 1 reply; 32+ messages in thread
From: Miquel Raynal @ 2020-01-22 17:41 UTC (permalink / raw)
  To: liaoweixiong
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

Hello,


> >>>> +/*
> >>>> + * All zones will be read as pstore/blk will read zone one by one when do
> >>>> + * recover.
> >>>> + */
> >>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> >>>> +{
> >>>> +	struct mtdpstore_context *cxt = &oops_cxt;
> >>>> +	size_t retlen;
> >>>> +	int ret;
> >>>> +
> >>>> +	if (mtdpstore_block_isbad(cxt, off))
> >>>> +		return -ENEXT;
> >>>> +
> >>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> >>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> >>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {  
> >>>
> >>> IIRC size != retlen does not mean it failed, but that you should
> >>> continue reading after retlen bytes, no?  
> >>>    >>  
> >> Yes, you are right. I will fix it. Thanks.
> >>  
> >>> Also, mtd_is_bitflip() does not mean that you are reading a false
> >>> buffer, but that the data has been corrected as it contained bitflips.
> >>> mtd_is_eccerr() however, would be meaningful.  
> >>>    >>  
> >> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
> >> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
> >> mtd_is_bitflip().  
> > 
> > Yes, just drop this check, only keep ret < 0.
> >   
> 
> If I don't get it wrong, it should not	 be dropped here. Like your words,
> "mtd_is_bitflip() does not mean that you are reading a false buffer,
> but that the data has been corrected as it contained bitflips.", the
> data I get are valid even if mtd_is_bitflip() return true. It's correct
> data and it's no need to go to handle error. To me, the codes
> should be:
> 	if (ret < 0 && !mit_is_bitflip())
> 		[error handling]

Please check the implementation of mtd_is_bitflip(). You'll probably
figure out what I am saying.

https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585


|...]

> >>>> +		return;
> >>>> +	}
> >>>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
> >>>> +		pr_err("record size %lu KB must align to write size %d KB\n",
> >>>> +				info->dmesg_size / 1024,
> >>>> +				mtd->writesize / 1024);  
> >>>
> >>> This condition is weird, why would you check this?  
> >>>    >>  
> >> pstore/blk will write 'record_size' dmesg log at one time.
> >> Since each write data must be aligned to 'writesize' for flash, I am not
> >> sure
> >> all flash drivers are compatible with misaligned data, that's why i
> >> check this.  
> > 
> > I think you should enforce this alignment instead of checking it.
> >   
> 
> Do you mean that mtdpstore should enforce this alignment while running?
> The way I can think of is to handle a buffer aligned to writesize and
> write to flash with this aligned buffer.
> 
> That causes some error. The MTD device will be divided into mutil
> chunks accroding to dmesg_size. Each chunk stores a individual
> OOPS/Panic log. If dmesg_size is misaligned to writesize, the last
> write results in next write failure because the page of flash can only
> be programed once before next erase and the page shared by two chunks
> has been used by the last write. Besides, we can not read to buffer,
> ersae and write back as there is no read/erase for panic case.

I mean: what is the usual size of dmesg? I don't get why you need it to
be ie. a multiple of 2k. It probably is actually, I don't know if there
is a standard. But if dmesg_size is eg 3k, just skip the end of the
partially written page and start writing at the next page?

> 
> >>  
> >>>> +		return;
> >>>> +	}
> >>>> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
> >>>> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
> >>>> +				mtd->index,
> >>>> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);  
> >>>
> >>> Same question? I could understand that it is easier to manage blocks
> >>> knowing their maximum number though.  
> >>>    >>  
> >> It refers to mtdoops.  
> > 
> > What do you mean?
> >   
> 
> To me, it's unnecessary to check at all, however it is really there
> on codes of mtdoops. I refer to module mtdoops when I design mtdpstore.
> It may be helpfull for some cases out of my think, that's why I keep it.

Why not.

[...]

> >>
> >> In case of repeated erase when users remove several log files, mtdpstore
> >> do remove jobs when exit.
> >>
> >> Besides, mtdpstore do not check the return code to ensure write back valid
> >> log as much as possible.  
> > 
> > You are not in a critical path, I don't understand why you don't check
> > it? If it returns an error, it means the data is not written. IMHO it
> > is best to alert the user than to silently fail.
> >   
> 
> This function will be called only when mtd device is removing. It's
> useless to alert the user but try to flush the other valid data to

It is useful to alert the user! It means something went wrong while
everything seems fine.

> flash as mush as possible by which reduce losses. If it's just
> because of busy, what happens next time?

Just because of busy? I don't get it.

I'm okay with the idea of trying to write the other chunks though:

	while (remaining_chunk) {
		ret = mtd_write()
		if (ret) {
			alert-user;
			continue;
		}
	}

> 
> >>  
> >>>> +. >>>> +		off += zonesize;
> >>>> +		size -= min_t(unsigned int, zonesize, size);
> >>>> +	}
> >>>> +
> >>>> +free:
> >>>> +	kfree(buf);
> >>>> +	return ret;
> >>>> +}
> >>>> +  
> > 
> > 
> > [...]
> >   
> >>>
> >>> Thanks,
> >>> Miquèl  
> >>>    >>  
> >> I will collect more suggestions and submit the new version at one time.
> >>  
> > 
> > Sure, no hurry.
> >   
> 
> I am on holiday, please forgive me for my slow response.

Take your time, as I said, no hurry.

> 
> > 
> > Thanks,
> > Miquèl
> >   




Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-20  1:03 ` [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
  2020-01-20 10:03   ` Miquel Raynal
@ 2020-01-23  4:24   ` Vignesh Raghavendra
  2020-01-23  7:03     ` liaoweixiong
  1 sibling, 1 reply; 32+ messages in thread
From: Vignesh Raghavendra @ 2020-01-23  4:24 UTC (permalink / raw)
  To: WeiXiong Liao, Kees Cook, Anton Vorontsov, Colin Cross,
	Tony Luck, Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

Hi

On 20/01/20 6:33 am, WeiXiong Liao wrote:
[...]
> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u64 blknum = div_u64(off, mtd->erasesize);
> +
> +	return test_bit(blknum, cxt->badmap);
> +}
> +
> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	pr_debug("mark zone %llu used\n", zonenum);

Please replace pr_*() with dev_*() throughout the patch. Device pointer
should be available via struct mtd_info

Regards
Vignesh

> +	set_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	pr_debug("mark zone %llu unused\n", zonenum);
> +	clear_bit(zonenum, cxt->usedmap);
> +}
> +
> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		pr_debug("mark zone %llu unused\n", zonenum);
> +		clear_bit(zonenum, cxt->usedmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
> +
> +	if (test_bit(blknum, cxt->badmap))
> +		return true;
> +	return test_bit(zonenum, cxt->usedmap);
> +}
> +
> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->usedmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
> +		size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	size_t sz;
> +	int i;
> +
> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
> +	for (i = 0; i < sz; i++) {
> +		if (buf[i] != (char)0xFF)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +
> +	pr_debug("mark zone %llu removed\n", zonenum);
> +	set_bit(zonenum, cxt->rmmap);
> +}
> +
> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		clear_bit(zonenum, cxt->rmmap);
> +		zonenum++;
> +		zonecnt--;
> +	}
> +}
> +
> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
> +		loff_t off)
> +{
> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
> +
> +	while (zonecnt > 0) {
> +		if (test_bit(zonenum, cxt->rmmap))
> +			return true;
> +		zonenum++;
> +		zonecnt--;
> +	}
> +	return false;
> +}
> +
> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	struct erase_info erase;
> +	int ret;
> +
> +	pr_debug("try to erase off 0x%llx\n", off);
> +	erase.len = cxt->mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(cxt->mtd, &erase);
> +	if (!ret)
> +		mtdpstore_block_clear_removed(cxt, off);
> +	else
> +		pr_err("erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
> +		       (unsigned long long)erase.addr,
> +		       (unsigned long long)erase.len, cxt->bo_info.device);
> +	return ret;
> +}
> +
> +/*
> + * called while removing file
> + *
> + * Avoiding over erasing, do erase only when all zones are removed or unused.
> + * Ensure to remove when unregister by reading, erasing and wrtiing back.
> + */
> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -EIO;
> +
> +	mtdpstore_mark_unused(cxt, off);
> +
> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
> +		mtdpstore_mark_removed(cxt, off);
> +		return 0;
> +	}
> +
> +	/* all zones are unused, erase it */
> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
> +	return mtdpstore_erase_do(cxt, off);
> +}
> +
> +/*
> + * What is securety for mtdpstore?
> + * As there is no erase for panic case, we should ensure at least one zone
> + * is writable. Otherwise, panic write will be failed.
> + * If zone is used, write operation will return -ENEXT, which means that
> + * pstore/blk will try one by one until get a empty zone. So, it's no need
> + * to ensure next zone is empty, but at least one.
> + */
> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
> +{
> +	int ret = 0, i;
> +	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
> +	u32 erasesize = cxt->mtd->erasesize;
> +
> +	for (i = 0; i < zonecnt; i++) {
> +		u32 num = (zonenum + i) % zonecnt;
> +
> +		/* found empty zone */
> +		if (!test_bit(num, cxt->usedmap))
> +			return 0;
> +	}
> +
> +	/* If there is no any empty zone, we have no way but to do erase */
> +	off = ALIGN_DOWN(off, erasesize);
> +	while (blkcnt--) {
> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
> +
> +		if (mtdpstore_block_isbad(cxt, off))
> +			continue;
> +
> +		ret = mtdpstore_erase_do(cxt, off);
> +		if (!ret) {
> +			mtdpstore_block_mark_unused(cxt, off);
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		pr_err("all blocks bad!\n");
> +	pr_debug("end security\n");
> +	return ret;
> +}
> +
> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENEXT;
> +
> +	pr_debug("try to write off 0x%llx size %zu\n", off, size);
> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || retlen != size) {
> +		pr_err("write failure at %lld (%zu of %zu written), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +/*
> + * All zones will be read as pstore/blk will read zone one by one when do
> + * recover.
> + */
> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
> +		pr_err("read failure at %lld (%zu of %zu read), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +
> +	if (mtdpstore_is_empty(cxt, buf, size))
> +		mtdpstore_mark_unused(cxt, off);
> +	else
> +		mtdpstore_mark_used(cxt, off);
> +
> +	mtdpstore_security(cxt, off);
> +	return retlen;
> +}
> +
> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	size_t retlen;
> +	int ret;
> +
> +	if (mtdpstore_panic_block_isbad(cxt, off))
> +		return -ENEXT;
> +
> +	/* zone is used, please try next one */
> +	if (mtdpstore_is_used(cxt, off))
> +		return -ENEXT;
> +
> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
> +	if (ret < 0 || size != retlen) {
> +		pr_err("panic write failure at %lld (%zu of %zu read), err %d\n",
> +				off, retlen, size, ret);
> +		return -EIO;
> +	}
> +	mtdpstore_mark_used(cxt, off);
> +
> +	return retlen;
> +}
> +
> +static void mtdpstore_notify_add(struct mtd_info *mtd)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct blkoops_info *info = &cxt->bo_info;
> +	unsigned long longcnt;
> +
> +	if (!strcmp(mtd->name, info->device))
> +		cxt->index = mtd->index;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	pr_debug("found matching MTD device %s\n", mtd->name);
> +
> +	if (mtd->size < info->dmesg_size * 2) {
> +		pr_err("MTD partition %d not big enough\n", mtd->index);
> +		return;
> +	}
> +	if (mtd->erasesize < info->dmesg_size) {
> +		pr_err("eraseblock size of MTD partition %d too small\n",
> +				mtd->index);
> +		return;
> +	}
> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
> +		pr_err("record size %lu KB must align to write size %d KB\n",
> +				info->dmesg_size / 1024,
> +				mtd->writesize / 1024);
> +		return;
> +	}
> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
> +				mtd->index,
> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);
> +		return;
> +	}
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
> +
> +	cxt->bo_dev.total_size = mtd->size;
> +	/* just support dmesg right now */
> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
> +	cxt->bo_dev.read = mtdpstore_read;
> +	cxt->bo_dev.write = mtdpstore_write;
> +	cxt->bo_dev.erase = mtdpstore_erase;
> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
> +
> +	ret = blkoops_register_device(&cxt->bo_dev);
> +	if (ret) {
> +		pr_err("mtd%d register to blkoops failed\n", mtd->index);
> +		return;
> +	}
> +	cxt->mtd = mtd;
> +	pr_info("Attached to MTD device %d\n", mtd->index);
> +}
> +
> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
> +		loff_t off, size_t size)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	u_char *buf;
> +	int ret;
> +	size_t retlen;
> +	struct erase_info erase;
> +
> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	/* 1st. read to cache */
> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
> +	if (ret || retlen != mtd->erasesize)
> +		goto free;
> +
> +	/* 2nd. erase block */
> +	erase.len = mtd->erasesize;
> +	erase.addr = off;
> +	ret = mtd_erase(mtd, &erase);
> +	if (ret)
> +		goto free;
> +
> +	/* 3rd. write back */
> +	while (size) {
> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
> +
> +		/* remove must clear used bit */
> +		if (mtdpstore_is_used(cxt, off))
> +			mtd_write(mtd, off, zonesize, &retlen, buf);
> +
> +		off += zonesize;
> +		size -= min_t(unsigned int, zonesize, size);
> +	}
> +
> +free:
> +	kfree(buf);
> +	return ret;
> +}
> +
> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
> +{
> +	struct mtd_info *mtd = cxt->mtd;
> +	int ret;
> +	loff_t off;
> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
> +
> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
> +		ret = mtdpstore_block_is_removed(cxt, off);
> +		if (!ret) {
> +			off += mtd->erasesize;
> +			continue;
> +		}
> +
> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
> +{
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +
> +	if (mtd->index != cxt->index || cxt->index < 0)
> +		return;
> +
> +	mtdpstore_flush_removed(cxt);
> +
> +	blkoops_unregister_device(&cxt->bo_dev);
> +	kfree(cxt->badmap);
> +	kfree(cxt->usedmap);
> +	kfree(cxt->rmmap);
> +	cxt->mtd = NULL;
> +	cxt->index = -1;
> +}
> +
> +static struct mtd_notifier mtdpstore_notifier = {
> +	.add	= mtdpstore_notify_add,
> +	.remove	= mtdpstore_notify_remove,
> +};
> +
> +static int __init mtdpstore_init(void)
> +{
> +	int ret;
> +	struct mtdpstore_context *cxt = &oops_cxt;
> +	struct blkoops_info *info = &cxt->bo_info;
> +
> +	ret = blkoops_info(info);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (strlen(info->device) == 0) {
> +		pr_err("mtd device must be supplied\n");
> +		return -EINVAL;
> +	}
> +	if (!info->dmesg_size) {
> +		pr_err("no recorder enabled\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Setup the MTD device to use */
> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
> +	if (ret)
> +		cxt->index = -1;
> +
> +	register_mtd_user(&mtdpstore_notifier);
> +	return 0;
> +}
> +module_init(mtdpstore_init);
> +
> +static void __exit mtdpstore_exit(void)
> +{
> +	unregister_mtd_user(&mtdpstore_notifier);
> +}
> +module_exit(mtdpstore_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
> +MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
> 

-- 
Regards
Vignesh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-23  4:24   ` Vignesh Raghavendra
@ 2020-01-23  7:03     ` liaoweixiong
  0 siblings, 0 replies; 32+ messages in thread
From: liaoweixiong @ 2020-01-23  7:03 UTC (permalink / raw)
  To: Vignesh Raghavendra, Kees Cook, Anton Vorontsov, Colin Cross,
	Tony Luck, Jonathan Corbet, Miquel Raynal, Richard Weinberger,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron
  Cc: linux-doc, linux-kernel, linux-mtd

hi Vignesh Raghavendra,

On 2020/1/23 下午12:24, Vignesh Raghavendra wrote:
> Hi
> 
> On 20/01/20 6:33 am, WeiXiong Liao wrote:
> [...]
>> +static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u64 blknum = div_u64(off, mtd->erasesize);
>> +
>> +	return test_bit(blknum, cxt->badmap);
>> +}
>> +
>> +static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	pr_debug("mark zone %llu used\n", zonenum);
> 
> Please replace pr_*() with dev_*() throughout the patch. Device pointer
> should be available via struct mtd_info
> 

OK. I will fix it later. Thank you.

> Regards
> Vignesh
> 
>> +	set_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	pr_debug("mark zone %llu unused\n", zonenum);
>> +	clear_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		pr_debug("mark zone %llu unused\n", zonenum);
>> +		clear_bit(zonenum, cxt->usedmap);
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +}
>> +
>> +static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u64 blknum = div_u64(off, cxt->mtd->erasesize);
>> +
>> +	if (test_bit(blknum, cxt->badmap))
>> +		return true;
>> +	return test_bit(zonenum, cxt->usedmap);
>> +}
>> +
>> +static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		if (test_bit(zonenum, cxt->usedmap))
>> +			return true;
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +	return false;
>> +}
>> +
>> +static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
>> +		size_t size)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	size_t sz;
>> +	int i;
>> +
>> +	sz = min_t(uint32_t, size, mtd->writesize / 4);
>> +	for (i = 0; i < sz; i++) {
>> +		if (buf[i] != (char)0xFF)
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +
>> +	pr_debug("mark zone %llu removed\n", zonenum);
>> +	set_bit(zonenum, cxt->rmmap);
>> +}
>> +
>> +static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		clear_bit(zonenum, cxt->rmmap);
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +}
>> +
>> +static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
>> +		loff_t off)
>> +{
>> +	u64 zonenum = div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = cxt->mtd->erasesize / cxt->bo_info.dmesg_size;
>> +
>> +	while (zonecnt > 0) {
>> +		if (test_bit(zonenum, cxt->rmmap))
>> +			return true;
>> +		zonenum++;
>> +		zonecnt--;
>> +	}
>> +	return false;
>> +}
>> +
>> +static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	struct erase_info erase;
>> +	int ret;
>> +
>> +	pr_debug("try to erase off 0x%llx\n", off);
>> +	erase.len = cxt->mtd->erasesize;
>> +	erase.addr = off;
>> +	ret = mtd_erase(cxt->mtd, &erase);
>> +	if (!ret)
>> +		mtdpstore_block_clear_removed(cxt, off);
>> +	else
>> +		pr_err("erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
>> +		       (unsigned long long)erase.addr,
>> +		       (unsigned long long)erase.len, cxt->bo_info.device);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * called while removing file
>> + *
>> + * Avoiding over erasing, do erase only when all zones are removed or unused.
>> + * Ensure to remove when unregister by reading, erasing and wrtiing back.
>> + */
>> +static ssize_t mtdpstore_erase(size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -EIO;
>> +
>> +	mtdpstore_mark_unused(cxt, off);
>> +
>> +	if (likely(mtdpstore_block_is_used(cxt, off))) {
>> +		mtdpstore_mark_removed(cxt, off);
>> +		return 0;
>> +	}
>> +
>> +	/* all zones are unused, erase it */
>> +	off = ALIGN_DOWN(off, cxt->mtd->erasesize);
>> +	return mtdpstore_erase_do(cxt, off);
>> +}
>> +
>> +/*
>> + * What is securety for mtdpstore?
>> + * As there is no erase for panic case, we should ensure at least one zone
>> + * is writable. Otherwise, panic write will be failed.
>> + * If zone is used, write operation will return -ENEXT, which means that
>> + * pstore/blk will try one by one until get a empty zone. So, it's no need
>> + * to ensure next zone is empty, but at least one.
>> + */
>> +static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
>> +{
>> +	int ret = 0, i;
>> +	u32 zonenum = (u32)div_u64(off, cxt->bo_info.dmesg_size);
>> +	u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->bo_info.dmesg_size);
>> +	u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
>> +	u32 erasesize = cxt->mtd->erasesize;
>> +
>> +	for (i = 0; i < zonecnt; i++) {
>> +		u32 num = (zonenum + i) % zonecnt;
>> +
>> +		/* found empty zone */
>> +		if (!test_bit(num, cxt->usedmap))
>> +			return 0;
>> +	}
>> +
>> +	/* If there is no any empty zone, we have no way but to do erase */
>> +	off = ALIGN_DOWN(off, erasesize);
>> +	while (blkcnt--) {
>> +		div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
>> +
>> +		if (mtdpstore_block_isbad(cxt, off))
>> +			continue;
>> +
>> +		ret = mtdpstore_erase_do(cxt, off);
>> +		if (!ret) {
>> +			mtdpstore_block_mark_unused(cxt, off);
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (ret)
>> +		pr_err("all blocks bad!\n");
>> +	pr_debug("end security\n");
>> +	return ret;
>> +}
>> +
>> +static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	/* zone is used, please try next one */
>> +	if (mtdpstore_is_used(cxt, off))
>> +		return -ENEXT;
>> +
>> +	pr_debug("try to write off 0x%llx size %zu\n", off, size);
>> +	ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if (ret < 0 || retlen != size) {
>> +		pr_err("write failure at %lld (%zu of %zu written), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +	mtdpstore_mark_used(cxt, off);
>> +
>> +	mtdpstore_security(cxt, off);
>> +	return retlen;
>> +}
>> +
>> +/*
>> + * All zones will be read as pstore/blk will read zone one by one when do
>> + * recover.
>> + */
>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
>> +		pr_err("read failure at %lld (%zu of %zu read), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +
>> +	if (mtdpstore_is_empty(cxt, buf, size))
>> +		mtdpstore_mark_unused(cxt, off);
>> +	else
>> +		mtdpstore_mark_used(cxt, off);
>> +
>> +	mtdpstore_security(cxt, off);
>> +	return retlen;
>> +}
>> +
>> +static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	size_t retlen;
>> +	int ret;
>> +
>> +	if (mtdpstore_panic_block_isbad(cxt, off))
>> +		return -ENEXT;
>> +
>> +	/* zone is used, please try next one */
>> +	if (mtdpstore_is_used(cxt, off))
>> +		return -ENEXT;
>> +
>> +	ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
>> +	if (ret < 0 || size != retlen) {
>> +		pr_err("panic write failure at %lld (%zu of %zu read), err %d\n",
>> +				off, retlen, size, ret);
>> +		return -EIO;
>> +	}
>> +	mtdpstore_mark_used(cxt, off);
>> +
>> +	return retlen;
>> +}
>> +
>> +static void mtdpstore_notify_add(struct mtd_info *mtd)
>> +{
>> +	int ret;
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct blkoops_info *info = &cxt->bo_info;
>> +	unsigned long longcnt;
>> +
>> +	if (!strcmp(mtd->name, info->device))
>> +		cxt->index = mtd->index;
>> +
>> +	if (mtd->index != cxt->index || cxt->index < 0)
>> +		return;
>> +
>> +	pr_debug("found matching MTD device %s\n", mtd->name);
>> +
>> +	if (mtd->size < info->dmesg_size * 2) {
>> +		pr_err("MTD partition %d not big enough\n", mtd->index);
>> +		return;
>> +	}
>> +	if (mtd->erasesize < info->dmesg_size) {
>> +		pr_err("eraseblock size of MTD partition %d too small\n",
>> +				mtd->index);
>> +		return;
>> +	}
>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
>> +		pr_err("record size %lu KB must align to write size %d KB\n",
>> +				info->dmesg_size / 1024,
>> +				mtd->writesize / 1024);
>> +		return;
>> +	}
>> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
>> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
>> +				mtd->index,
>> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);
>> +		return;
>> +	}
>> +
>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->dmesg_size));
>> +	cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +	cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +
>> +	longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
>> +	cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
>> +
>> +	cxt->bo_dev.total_size = mtd->size;
>> +	/* just support dmesg right now */
>> +	cxt->bo_dev.flags = BLKOOPS_DEV_SUPPORT_DMESG;
>> +	cxt->bo_dev.read = mtdpstore_read;
>> +	cxt->bo_dev.write = mtdpstore_write;
>> +	cxt->bo_dev.erase = mtdpstore_erase;
>> +	cxt->bo_dev.panic_write = mtdpstore_panic_write;
>> +
>> +	ret = blkoops_register_device(&cxt->bo_dev);
>> +	if (ret) {
>> +		pr_err("mtd%d register to blkoops failed\n", mtd->index);
>> +		return;
>> +	}
>> +	cxt->mtd = mtd;
>> +	pr_info("Attached to MTD device %d\n", mtd->index);
>> +}
>> +
>> +static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
>> +		loff_t off, size_t size)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	u_char *buf;
>> +	int ret;
>> +	size_t retlen;
>> +	struct erase_info erase;
>> +
>> +	buf = kmalloc(mtd->erasesize, GFP_KERNEL);
>> +	if (!buf)
>> +		return -ENOMEM;
>> +
>> +	/* 1st. read to cache */
>> +	ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
>> +	if (ret || retlen != mtd->erasesize)
>> +		goto free;
>> +
>> +	/* 2nd. erase block */
>> +	erase.len = mtd->erasesize;
>> +	erase.addr = off;
>> +	ret = mtd_erase(mtd, &erase);
>> +	if (ret)
>> +		goto free;
>> +
>> +	/* 3rd. write back */
>> +	while (size) {
>> +		unsigned int zonesize = cxt->bo_info.dmesg_size;
>> +
>> +		/* remove must clear used bit */
>> +		if (mtdpstore_is_used(cxt, off))
>> +			mtd_write(mtd, off, zonesize, &retlen, buf);
>> +
>> +		off += zonesize;
>> +		size -= min_t(unsigned int, zonesize, size);
>> +	}
>> +
>> +free:
>> +	kfree(buf);
>> +	return ret;
>> +}
>> +
>> +static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
>> +{
>> +	struct mtd_info *mtd = cxt->mtd;
>> +	int ret;
>> +	loff_t off;
>> +	u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
>> +
>> +	for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
>> +		ret = mtdpstore_block_is_removed(cxt, off);
>> +		if (!ret) {
>> +			off += mtd->erasesize;
>> +			continue;
>> +		}
>> +
>> +		ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +	return 0;
>> +}
>> +
>> +static void mtdpstore_notify_remove(struct mtd_info *mtd)
>> +{
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +
>> +	if (mtd->index != cxt->index || cxt->index < 0)
>> +		return;
>> +
>> +	mtdpstore_flush_removed(cxt);
>> +
>> +	blkoops_unregister_device(&cxt->bo_dev);
>> +	kfree(cxt->badmap);
>> +	kfree(cxt->usedmap);
>> +	kfree(cxt->rmmap);
>> +	cxt->mtd = NULL;
>> +	cxt->index = -1;
>> +}
>> +
>> +static struct mtd_notifier mtdpstore_notifier = {
>> +	.add	= mtdpstore_notify_add,
>> +	.remove	= mtdpstore_notify_remove,
>> +};
>> +
>> +static int __init mtdpstore_init(void)
>> +{
>> +	int ret;
>> +	struct mtdpstore_context *cxt = &oops_cxt;
>> +	struct blkoops_info *info = &cxt->bo_info;
>> +
>> +	ret = blkoops_info(info);
>> +	if (unlikely(ret))
>> +		return ret;
>> +
>> +	if (strlen(info->device) == 0) {
>> +		pr_err("mtd device must be supplied\n");
>> +		return -EINVAL;
>> +	}
>> +	if (!info->dmesg_size) {
>> +		pr_err("no recorder enabled\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Setup the MTD device to use */
>> +	ret = kstrtoint((char *)info->device, 0, &cxt->index);
>> +	if (ret)
>> +		cxt->index = -1;
>> +
>> +	register_mtd_user(&mtdpstore_notifier);
>> +	return 0;
>> +}
>> +module_init(mtdpstore_init);
>> +
>> +static void __exit mtdpstore_exit(void)
>> +{
>> +	unregister_mtd_user(&mtdpstore_notifier);
>> +}
>> +module_exit(mtdpstore_exit);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("WeiXiong Liao <liaoweixiong@allwinnertech.com>");
>> +MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 00/11] pstore: support crash log to block and mtd device
  2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
                   ` (10 preceding siblings ...)
  2020-01-20  1:03 ` [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
@ 2020-02-06  9:13 ` Kees Cook
  11 siblings, 0 replies; 32+ messages in thread
From: Kees Cook @ 2020-02-06  9:13 UTC (permalink / raw)
  To: WeiXiong Liao
  Cc: Anton Vorontsov, Colin Cross, Tony Luck, Jonathan Corbet,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

On Mon, Jan 20, 2020 at 09:03:42AM +0800, WeiXiong Liao wrote:
> Why do we need to log to block (mtd) device?
> 1. Most embedded intelligent equipment have no persistent ram, which
>    increases costs. We perfer to cheaper solutions, like block devices.
> 2. Do not any equipment have battery, which means that it lost all data
>    on general ram if power failure. Pstore has little to do for these
>    equipments.
> 
> Why do we need mtdpstore instead of mtdoops?
> 1. repetitive jobs between pstore and mtdoops
>    Both of pstore and mtdoops do the same jobs that store panic/oops log.
> 2. do what a driver should do
>    To me, a driver should provide methods instead of policies. What MTD
>    should do is to provide read/write/erase operations, geting rid of codes
>    about chunk management, kmsg dumper and configuration.
> 3. enhanced feature
>    Not only store log, but also show it as files.
>    Not only log, but also trigger time and trigger count.
>    Not only panic/oops log, but also log recorder for pmsg, console and
>    ftrace in the future.

Hi! Sorry for the delay in my review of this series -- it's been a busy
couple of weeks for me. :) I'm still travelling this week, but I want to
give this a good review. I really like the idea of having a block device
backend for pstore; I'm excited to get this feature landed.

I think there may be a lot of redundancy between ramoops and the block
code in this series, but I suspect the refactoring of that can happen at
a later time. I'd like to get this reviewed and tested and see if I can
land it in the v5.7 merge window.

I hope to have time to focus on this next week once I'm back in my
normal timezone. ;)

Thanks again!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-01-22 17:41           ` Miquel Raynal
@ 2020-02-06 13:10             ` liaoweixiong
  2020-02-06 15:45               ` Miquel Raynal
  0 siblings, 1 reply; 32+ messages in thread
From: liaoweixiong @ 2020-02-06 13:10 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

hi Miquel Raynal,

On 2020/1/23 AM 1:41, Miquel Raynal wrote:
> Hello,
> 
> 
>>>>>> +/*
>>>>>> + * All zones will be read as pstore/blk will read zone one by one when do
>>>>>> + * recover.
>>>>>> + */
>>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>>>>>> +{
>>>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
>>>>>> +	size_t retlen;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	if (mtdpstore_block_isbad(cxt, off))
>>>>>> +		return -ENEXT;
>>>>>> +
>>>>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
>>>>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
>>>>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
>>>>>
>>>>> IIRC size != retlen does not mean it failed, but that you should
>>>>> continue reading after retlen bytes, no?
>>>>>     >>
>>>> Yes, you are right. I will fix it. Thanks.
>>>>   
>>>>> Also, mtd_is_bitflip() does not mean that you are reading a false
>>>>> buffer, but that the data has been corrected as it contained bitflips.
>>>>> mtd_is_eccerr() however, would be meaningful.
>>>>>     >>
>>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
>>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
>>>> mtd_is_bitflip().
>>>
>>> Yes, just drop this check, only keep ret < 0.
>>>    
>>
>> If I don't get it wrong, it should not	 be dropped here. Like your words,
>> "mtd_is_bitflip() does not mean that you are reading a false buffer,
>> but that the data has been corrected as it contained bitflips.", the
>> data I get are valid even if mtd_is_bitflip() return true. It's correct
>> data and it's no need to go to handle error. To me, the codes
>> should be:
>> 	if (ret < 0 && !mit_is_bitflip())
>> 		[error handling]
> 
> Please check the implementation of mtd_is_bitflip(). You'll probably
> figure out what I am saying.
> 
> https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585
> 

How about the codes as follows:

for (done = 0, retlen = 0; done < size; done += retlen) {
	ret = mtd_read(..., &retlen, ...);
	if (!ret)
		continue;
	/*
	 * do nothing if bitflip and ecc error occurs because whether
	 * it's bitflip or ECC error, just a small number of bits flip
	 * and the impact on log data is so small. The mtdpstore just
	 * hands over what it gets and user can judge whether the data
	 * is valid or not.
	 */
	if (mtd_is_bitflip(ret)) {
		dev_warn("bitflip at....");
		continue;
	} else if (mtd_is_eccerr(ret)) {
		dev_warn("eccerr at....");
		retlen = retlen == 0 ? size : retlen;
		continue;
	} else {
		dev_err("read failure at...");
		/* this zone is broken, try next one */
		return -ENEXT;
	}
}

> 
> |...]
> 
>>>>>> +		return;
>>>>>> +	}
>>>>>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
>>>>>> +		pr_err("record size %lu KB must align to write size %d KB\n",
>>>>>> +				info->dmesg_size / 1024,
>>>>>> +				mtd->writesize / 1024);
>>>>>
>>>>> This condition is weird, why would you check this?
>>>>>     >>
>>>> pstore/blk will write 'record_size' dmesg log at one time.
>>>> Since each write data must be aligned to 'writesize' for flash, I am not
>>>> sure
>>>> all flash drivers are compatible with misaligned data, that's why i
>>>> check this.
>>>
>>> I think you should enforce this alignment instead of checking it.
>>>    
>>
>> Do you mean that mtdpstore should enforce this alignment while running?
>> The way I can think of is to handle a buffer aligned to writesize and
>> write to flash with this aligned buffer.
>>
>> That causes some error. The MTD device will be divided into mutil
>> chunks accroding to dmesg_size. Each chunk stores a individual
>> OOPS/Panic log. If dmesg_size is misaligned to writesize, the last
>> write results in next write failure because the page of flash can only
>> be programed once before next erase and the page shared by two chunks
>> has been used by the last write. Besides, we can not read to buffer,
>> ersae and write back as there is no read/erase for panic case.
> 
> I mean: what is the usual size of dmesg? I don't get why you need it to

The usual size of dmesg is 64K, usually be equal to log_buf size.

> be ie. a multiple of 2k. It probably is actually, I don't know if there
> is a standard. But if dmesg_size is eg 3k, just skip the end of the
> partially written page and start writing at the next page?
> 

1. upper layer do not support to skip partially written page
The upper layer pstore/blk will not skip the end of the partially
written page since it is not only used for MTD device, but also
block device, which has no page limited. A common practice at the
upper layer is to check the size and limit size to be aligned. We
make dmesg_size to be a multiple of 4K for greater compatibility.

2. chunks management and size per write
The mtdpstore tells pstore/blk how large the device is. Then
pstore/blk will divide it into several chunks according to
dmesg_size. The pstore/blk will write dmesg_size data at a time.

In a word, the amount of data written each time can not lead to page
slicing, so, dmesg_size must be aligned to writesize.

>>
>>>>   
>>>>>> +		return;
>>>>>> +	}
>>>>>> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
>>>>>> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
>>>>>> +				mtd->index,
>>>>>> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);
>>>>>
>>>>> Same question? I could understand that it is easier to manage blocks
>>>>> knowing their maximum number though.
>>>>>     >>
>>>> It refers to mtdoops.
>>>
>>> What do you mean?
>>>    
>>
>> To me, it's unnecessary to check at all, however it is really there
>> on codes of mtdoops. I refer to module mtdoops when I design mtdpstore.
>> It may be helpfull for some cases out of my think, that's why I keep it.
> 
> Why not.
> 

OK, I will drop it.

> [...]
> 
>>>>
>>>> In case of repeated erase when users remove several log files, mtdpstore
>>>> do remove jobs when exit.
>>>>
>>>> Besides, mtdpstore do not check the return code to ensure write back valid
>>>> log as much as possible.
>>>
>>> You are not in a critical path, I don't understand why you don't check
>>> it? If it returns an error, it means the data is not written. IMHO it
>>> is best to alert the user than to silently fail.
>>>    
>>
>> This function will be called only when mtd device is removing. It's
>> useless to alert the user but try to flush the other valid data to
> 
> It is useful to alert the user! It means something went wrong while
> everything seems fine.
> 
>> flash as mush as possible by which reduce losses. If it's just
>> because of busy, what happens next time?
> 
> Just because of busy? I don't get it.

I want to express that if the write fails due to busy, the next one
may succeed.

> 
> I'm okay with the idea of trying to write the other chunks though:
> 
> 	while (remaining_chunk) {
> 		ret = mtd_write()
> 		if (ret) {
> 			alert-user;
> 			continue;
> 		}
> 	}
> 

OK, I will fix it.

>>
>>>>   
>>>>>> +. >>>> +		off += zonesize;
>>>>>> +		size -= min_t(unsigned int, zonesize, size);
>>>>>> +	}
>>>>>> +
>>>>>> +free:
>>>>>> +	kfree(buf);
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>
>>>
>>> [...]
>>>    
>>>>>
>>>>> Thanks,
>>>>> Miquèl
>>>>>     >>
>>>> I will collect more suggestions and submit the new version at one time.
>>>>   
>>>
>>> Sure, no hurry.
>>>    
>>
>> I am on holiday, please forgive me for my slow response.
> 
> Take your time, as I said, no hurry.
> 
>>
>>>
>>> Thanks,
>>> Miquèl
>>>    
> 
> 
> 
> 
> Thanks,
> Miquèl
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-06 13:10             ` liaoweixiong
@ 2020-02-06 15:45               ` Miquel Raynal
  2020-02-07  4:13                 ` liaoweixiong
  0 siblings, 1 reply; 32+ messages in thread
From: Miquel Raynal @ 2020-02-06 15:45 UTC (permalink / raw)
  To: liaoweixiong
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

Hi liao,

liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Thu, 6 Feb 2020
21:10:47 +0800:

> hi Miquel Raynal,
> 
> On 2020/1/23 AM 1:41, Miquel Raynal wrote:
> > Hello,
> > 
> >   
> >>>>>> +/*
> >>>>>> + * All zones will be read as pstore/blk will read zone one by one when do
> >>>>>> + * recover.
> >>>>>> + */
> >>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> >>>>>> +{
> >>>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
> >>>>>> +	size_t retlen;
> >>>>>> +	int ret;
> >>>>>> +
> >>>>>> +	if (mtdpstore_block_isbad(cxt, off))
> >>>>>> +		return -ENEXT;
> >>>>>> +
> >>>>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> >>>>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> >>>>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {  
> >>>>>
> >>>>> IIRC size != retlen does not mean it failed, but that you should
> >>>>> continue reading after retlen bytes, no?  
> >>>>>     >>  
> >>>> Yes, you are right. I will fix it. Thanks.  
> >>>>   >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false  
> >>>>> buffer, but that the data has been corrected as it contained bitflips.
> >>>>> mtd_is_eccerr() however, would be meaningful.  
> >>>>>     >>  
> >>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
> >>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
> >>>> mtd_is_bitflip().  
> >>>
> >>> Yes, just drop this check, only keep ret < 0.  
> >>>    >>  
> >> If I don't get it wrong, it should not	 be dropped here. Like your words,
> >> "mtd_is_bitflip() does not mean that you are reading a false buffer,
> >> but that the data has been corrected as it contained bitflips.", the
> >> data I get are valid even if mtd_is_bitflip() return true. It's correct
> >> data and it's no need to go to handle error. To me, the codes
> >> should be:
> >> 	if (ret < 0 && !mit_is_bitflip())
> >> 		[error handling]  
> > 
> > Please check the implementation of mtd_is_bitflip(). You'll probably
> > figure out what I am saying.
> > 
> > https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585
> >   
> 
> How about the codes as follows:
> 
> for (done = 0, retlen = 0; done < size; done += retlen) {
> 	ret = mtd_read(..., &retlen, ...);
> 	if (!ret)
> 		continue;
> 	/*
> 	 * do nothing if bitflip and ecc error occurs because whether
> 	 * it's bitflip or ECC error, just a small number of bits flip
> 	 * and the impact on log data is so small. The mtdpstore just
> 	 * hands over what it gets and user can judge whether the data
> 	 * is valid or not.
> 	 */
> 	if (mtd_is_bitflip(ret)) {
> 		dev_warn("bitflip at....");
> 		continue;

I don't understand why do you check for bitflips. Bitflips have been
corrected at this stage, you just get the information that there
has been bitflips, but the data integrity is fine.

I am not against ignoring ECC errors in this case though. I would
propose:

	for (...) {
		if (ret < 0) {
			complain;
			return;
		}

		if (mtd_is_eccerr())
			complain;
	}
		
> 	} else if (mtd_is_eccerr(ret)) {
> 		dev_warn("eccerr at....");
> 		retlen = retlen == 0 ? size : retlen;
> 		continue;
> 	} else {
> 		dev_err("read failure at...");
> 		/* this zone is broken, try next one */
> 		return -ENEXT;
> 	}
> }
> 


Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-06 15:45               ` Miquel Raynal
@ 2020-02-07  4:13                 ` liaoweixiong
  2020-02-07  8:41                   ` Miquel Raynal
  0 siblings, 1 reply; 32+ messages in thread
From: liaoweixiong @ 2020-02-07  4:13 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

hi Miquel Raynal,

On 2020/2/6 PM 11:45, Miquel Raynal wrote:
> Hi liao,
> 
> liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Thu, 6 Feb 2020
> 21:10:47 +0800:
> 
>> hi Miquel Raynal,
>>
>> On 2020/1/23 AM 1:41, Miquel Raynal wrote:
>>> Hello,
>>>
>>>    
>>>>>>>> +/*
>>>>>>>> + * All zones will be read as pstore/blk will read zone one by one when do
>>>>>>>> + * recover.
>>>>>>>> + */
>>>>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>>>>>>>> +{
>>>>>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
>>>>>>>> +	size_t retlen;
>>>>>>>> +	int ret;
>>>>>>>> +
>>>>>>>> +	if (mtdpstore_block_isbad(cxt, off))
>>>>>>>> +		return -ENEXT;
>>>>>>>> +
>>>>>>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
>>>>>>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
>>>>>>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
>>>>>>>
>>>>>>> IIRC size != retlen does not mean it failed, but that you should
>>>>>>> continue reading after retlen bytes, no?
>>>>>>>      >>
>>>>>> Yes, you are right. I will fix it. Thanks.
>>>>>>    >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false
>>>>>>> buffer, but that the data has been corrected as it contained bitflips.
>>>>>>> mtd_is_eccerr() however, would be meaningful.
>>>>>>>      >>
>>>>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
>>>>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
>>>>>> mtd_is_bitflip().
>>>>>
>>>>> Yes, just drop this check, only keep ret < 0.
>>>>>     >>
>>>> If I don't get it wrong, it should not	 be dropped here. Like your words,
>>>> "mtd_is_bitflip() does not mean that you are reading a false buffer,
>>>> but that the data has been corrected as it contained bitflips.", the
>>>> data I get are valid even if mtd_is_bitflip() return true. It's correct
>>>> data and it's no need to go to handle error. To me, the codes
>>>> should be:
>>>> 	if (ret < 0 && !mit_is_bitflip())
>>>> 		[error handling]
>>>
>>> Please check the implementation of mtd_is_bitflip(). You'll probably
>>> figure out what I am saying.
>>>
>>> https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585
>>>    
>>
>> How about the codes as follows:
>>
>> for (done = 0, retlen = 0; done < size; done += retlen) {
>> 	ret = mtd_read(..., &retlen, ...);
>> 	if (!ret)
>> 		continue;
>> 	/*
>> 	 * do nothing if bitflip and ecc error occurs because whether
>> 	 * it's bitflip or ECC error, just a small number of bits flip
>> 	 * and the impact on log data is so small. The mtdpstore just
>> 	 * hands over what it gets and user can judge whether the data
>> 	 * is valid or not.
>> 	 */
>> 	if (mtd_is_bitflip(ret)) {
>> 		dev_warn("bitflip at....");
>> 		continue;

> I don't understand why do you check for bitflips. Bitflips have been
> corrected at this stage, you just get the information that there
> has been bitflips, but the data integrity is fine.
> 

Both of bitflip and eccerror are not real wrong in this
case. So we must check them.

> I am not against ignoring ECC errors in this case though. I would
> propose:
> 
> 	for (...) {
> 		if (ret < 0) {
> 			complain;
> 			return;
> 		}
> 

-117 (-EUCLEAN) means bitflip but be corrected.
-74 (-EBADMSG) means ecc error that uncorrectable
All of them are negative number that smaller than 0. If it just keeps
"ret < 0", it can never make a difference between bitflip/eccerror
and others.

> 		if (mtd_is_eccerr())
> 			complain;
> 	}
> 		
>> 	} else if (mtd_is_eccerr(ret)) {
>> 		dev_warn("eccerr at....");
>> 		retlen = retlen == 0 ? size : retlen;
>> 		continue;
>> 	} else {
>> 		dev_err("read failure at...");
>> 		/* this zone is broken, try next one */
>> 		return -ENEXT;
>> 	}
>> }
>>
> 
> 
> Thanks,
> Miquèl
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-07  4:13                 ` liaoweixiong
@ 2020-02-07  8:41                   ` Miquel Raynal
  2020-02-07 10:30                     ` liaoweixiong
  0 siblings, 1 reply; 32+ messages in thread
From: Miquel Raynal @ 2020-02-07  8:41 UTC (permalink / raw)
  To: liaoweixiong
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

Hi Liao,

liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Fri, 7 Feb 2020
12:13:08 +0800:

> hi Miquel Raynal,
> 
> On 2020/2/6 PM 11:45, Miquel Raynal wrote:
> > Hi liao,
> > 
> > liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Thu, 6 Feb 2020
> > 21:10:47 +0800:
> >   
> >> hi Miquel Raynal,
> >>
> >> On 2020/1/23 AM 1:41, Miquel Raynal wrote:  
> >>> Hello,
> >>>  
> >>>    >>>>>>>> +/*  
> >>>>>>>> + * All zones will be read as pstore/blk will read zone one by one when do
> >>>>>>>> + * recover.
> >>>>>>>> + */
> >>>>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> >>>>>>>> +{
> >>>>>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
> >>>>>>>> +	size_t retlen;
> >>>>>>>> +	int ret;
> >>>>>>>> +
> >>>>>>>> +	if (mtdpstore_block_isbad(cxt, off))
> >>>>>>>> +		return -ENEXT;
> >>>>>>>> +
> >>>>>>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> >>>>>>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> >>>>>>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {  
> >>>>>>>
> >>>>>>> IIRC size != retlen does not mean it failed, but that you should
> >>>>>>> continue reading after retlen bytes, no?  
> >>>>>>>      >>  
> >>>>>> Yes, you are right. I will fix it. Thanks.  
> >>>>>>    >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false  
> >>>>>>> buffer, but that the data has been corrected as it contained bitflips.
> >>>>>>> mtd_is_eccerr() however, would be meaningful.  
> >>>>>>>      >>  
> >>>>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
> >>>>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
> >>>>>> mtd_is_bitflip().  
> >>>>>
> >>>>> Yes, just drop this check, only keep ret < 0.  
> >>>>>     >>  
> >>>> If I don't get it wrong, it should not	 be dropped here. Like your words,
> >>>> "mtd_is_bitflip() does not mean that you are reading a false buffer,
> >>>> but that the data has been corrected as it contained bitflips.", the
> >>>> data I get are valid even if mtd_is_bitflip() return true. It's correct
> >>>> data and it's no need to go to handle error. To me, the codes
> >>>> should be:
> >>>> 	if (ret < 0 && !mit_is_bitflip())
> >>>> 		[error handling]  
> >>>
> >>> Please check the implementation of mtd_is_bitflip(). You'll probably
> >>> figure out what I am saying.
> >>>
> >>> https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585  
> >>>    >>  
> >> How about the codes as follows:
> >>
> >> for (done = 0, retlen = 0; done < size; done += retlen) {
> >> 	ret = mtd_read(..., &retlen, ...);
> >> 	if (!ret)
> >> 		continue;
> >> 	/*
> >> 	 * do nothing if bitflip and ecc error occurs because whether
> >> 	 * it's bitflip or ECC error, just a small number of bits flip
> >> 	 * and the impact on log data is so small. The mtdpstore just
> >> 	 * hands over what it gets and user can judge whether the data
> >> 	 * is valid or not.
> >> 	 */
> >> 	if (mtd_is_bitflip(ret)) {
> >> 		dev_warn("bitflip at....");
> >> 		continue;  
> 
> > I don't understand why do you check for bitflips. Bitflips have been
> > corrected at this stage, you just get the information that there
> > has been bitflips, but the data integrity is fine.
> >   
> 
> Both of bitflip and eccerror are not real wrong in this
> case. So we must check them.
> 
> > I am not against ignoring ECC errors in this case though. I would
> > propose:
> > 
> > 	for (...) {
> > 		if (ret < 0) {
> > 			complain;
> > 			return;
> > 		}
> >   
> 
> -117 (-EUCLEAN) means bitflip but be corrected.
> -74 (-EBADMSG) means ecc error that uncorrectable
> All of them are negative number that smaller than 0. If it just keeps
> "ret < 0", it can never make a difference between bitflip/eccerror
> and others.

I forgot about these, your're right, so what about:

	static bool mtdpstore_is_io_error(int ret)
	{
		return ret < 0 && !mtd_is_eccerr(ret)> && !mtd_is_bitflip(ret);
	}

	for (...) {
		if (mtdpstore_is_io_error(ret))
			return ret;

		/*
		 * Comment explaining why we still return valid data
		 * despite ECC errors.
		 */
		if (mtd_is_eccerr(ret))
			just-complain();
	}

This snippet makes a difference between any "controller/bus
timeout/error : data could not be retrieved" and "ECC error, maybe we
can still read it and try to understand (risky, must be warned)".

> 
> > 		if (mtd_is_eccerr())
> > 			complain;
> > 	}
> > 		  
> >> 	} else if (mtd_is_eccerr(ret)) {
> >> 		dev_warn("eccerr at....");
> >> 		retlen = retlen == 0 ? size : retlen;
> >> 		continue;
> >> 	} else {
> >> 		dev_err("read failure at...");
> >> 		/* this zone is broken, try next one */
> >> 		return -ENEXT;
> >> 	}
> >> }
> >>  
> > 
> > 
> > Thanks,
> > Miquèl
> >   

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
  2020-02-07  8:41                   ` Miquel Raynal
@ 2020-02-07 10:30                     ` liaoweixiong
  0 siblings, 0 replies; 32+ messages in thread
From: liaoweixiong @ 2020-02-07 10:30 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Kees Cook, Anton Vorontsov, Colin Cross, Tony Luck,
	Jonathan Corbet, Richard Weinberger, Vignesh Raghavendra,
	Mauro Carvalho Chehab, David S. Miller, Rob Herring,
	Greg Kroah-Hartman, Jonathan Cameron, linux-doc, linux-kernel,
	linux-mtd

hi Miquel Raynal,

On 2020/2/7 下午4:41, Miquel Raynal wrote:
> Hi Liao,
> 
> liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Fri, 7 Feb 2020
> 12:13:08 +0800:
> 
>> hi Miquel Raynal,
>>
>> On 2020/2/6 PM 11:45, Miquel Raynal wrote:
>>> Hi liao,
>>>
>>> liaoweixiong <liaoweixiong@allwinnertech.com> wrote on Thu, 6 Feb 2020
>>> 21:10:47 +0800:
>>>    
>>>> hi Miquel Raynal,
>>>>
>>>> On 2020/1/23 AM 1:41, Miquel Raynal wrote:
>>>>> Hello,
>>>>>   
>>>>>     >>>>>>>> +/*
>>>>>>>>>> + * All zones will be read as pstore/blk will read zone one by one when do
>>>>>>>>>> + * recover.
>>>>>>>>>> + */
>>>>>>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>>>>>>>>>> +{
>>>>>>>>>> +	struct mtdpstore_context *cxt = &oops_cxt;
>>>>>>>>>> +	size_t retlen;
>>>>>>>>>> +	int ret;
>>>>>>>>>> +
>>>>>>>>>> +	if (mtdpstore_block_isbad(cxt, off))
>>>>>>>>>> +		return -ENEXT;
>>>>>>>>>> +
>>>>>>>>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
>>>>>>>>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
>>>>>>>>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {
>>>>>>>>>
>>>>>>>>> IIRC size != retlen does not mean it failed, but that you should
>>>>>>>>> continue reading after retlen bytes, no?
>>>>>>>>>       >>
>>>>>>>> Yes, you are right. I will fix it. Thanks.
>>>>>>>>     >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false
>>>>>>>>> buffer, but that the data has been corrected as it contained bitflips.
>>>>>>>>> mtd_is_eccerr() however, would be meaningful.
>>>>>>>>>       >>
>>>>>>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
>>>>>>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
>>>>>>>> mtd_is_bitflip().
>>>>>>>
>>>>>>> Yes, just drop this check, only keep ret < 0.
>>>>>>>      >>
>>>>>> If I don't get it wrong, it should not	 be dropped here. Like your words,
>>>>>> "mtd_is_bitflip() does not mean that you are reading a false buffer,
>>>>>> but that the data has been corrected as it contained bitflips.", the
>>>>>> data I get are valid even if mtd_is_bitflip() return true. It's correct
>>>>>> data and it's no need to go to handle error. To me, the codes
>>>>>> should be:
>>>>>> 	if (ret < 0 && !mit_is_bitflip())
>>>>>> 		[error handling]
>>>>>
>>>>> Please check the implementation of mtd_is_bitflip(). You'll probably
>>>>> figure out what I am saying.
>>>>>
>>>>> https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585
>>>>>     >>
>>>> How about the codes as follows:
>>>>
>>>> for (done = 0, retlen = 0; done < size; done += retlen) {
>>>> 	ret = mtd_read(..., &retlen, ...);
>>>> 	if (!ret)
>>>> 		continue;
>>>> 	/*
>>>> 	 * do nothing if bitflip and ecc error occurs because whether
>>>> 	 * it's bitflip or ECC error, just a small number of bits flip
>>>> 	 * and the impact on log data is so small. The mtdpstore just
>>>> 	 * hands over what it gets and user can judge whether the data
>>>> 	 * is valid or not.
>>>> 	 */
>>>> 	if (mtd_is_bitflip(ret)) {
>>>> 		dev_warn("bitflip at....");
>>>> 		continue;
>>
>>> I don't understand why do you check for bitflips. Bitflips have been
>>> corrected at this stage, you just get the information that there
>>> has been bitflips, but the data integrity is fine.
>>>    
>>
>> Both of bitflip and eccerror are not real wrong in this
>> case. So we must check them.
>>
>>> I am not against ignoring ECC errors in this case though. I would
>>> propose:
>>>
>>> 	for (...) {
>>> 		if (ret < 0) {
>>> 			complain;
>>> 			return;
>>> 		}
>>>    
>>
>> -117 (-EUCLEAN) means bitflip but be corrected.
>> -74 (-EBADMSG) means ecc error that uncorrectable
>> All of them are negative number that smaller than 0. If it just keeps
>> "ret < 0", it can never make a difference between bitflip/eccerror
>> and others.
> 
> I forgot about these, your're right, so what about:
> 
> 	static bool mtdpstore_is_io_error(int ret)
> 	{
> 		return ret < 0 && !mtd_is_eccerr(ret)> && !mtd_is_bitflip(ret);
> 	}
> 
> 	for (...) {
> 		if (mtdpstore_is_io_error(ret))
> 			return ret;
> 
> 		/*
> 		 * Comment explaining why we still return valid data
> 		 * despite ECC errors.
> 		 */
> 		if (mtd_is_eccerr(ret))
> 			just-complain();
> 	}
> 
> This snippet makes a difference between any "controller/bus
> timeout/error : data could not be retrieved" and "ECC error, maybe we
> can still read it and try to understand (risky, must be warned)".
> 

That seems good to me. I will fix it later.
Thanks for your review.

>>
>>> 		if (mtd_is_eccerr())
>>> 			complain;
>>> 	}
>>> 		
>>>> 	} else if (mtd_is_eccerr(ret)) {
>>>> 		dev_warn("eccerr at....");
>>>> 		retlen = retlen == 0 ? size : retlen;
>>>> 		continue;
>>>> 	} else {
>>>> 		dev_err("read failure at...");
>>>> 		/* this zone is broken, try next one */
>>>> 		return -ENEXT;
>>>> 	}
>>>> }
>>>>   
>>>
>>>
>>> Thanks,
>>> Miquèl
>>>    
> 
> Thanks,
> Miquèl
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-02-07 10:30 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 03/11] pstore/blk: support pmsg recorder WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
2020-01-21  4:13   ` Randy Dunlap
2020-01-21  5:23     ` liaoweixiong
2020-01-21  6:36       ` Randy Dunlap
2020-01-21  8:19         ` liaoweixiong
2020-01-21 15:34           ` Randy Dunlap
2020-01-22 15:01             ` liaoweixiong
2020-01-22 16:08               ` Randy Dunlap
2020-01-20  1:03 ` [PATCH v1 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 10/11] blkoops: add interface for dirver to get information of blkoops WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
2020-01-20 10:03   ` Miquel Raynal
2020-01-21  3:36     ` liaoweixiong
2020-01-21  8:48       ` Miquel Raynal
2020-01-22 17:22         ` liaoweixiong
2020-01-22 17:41           ` Miquel Raynal
2020-02-06 13:10             ` liaoweixiong
2020-02-06 15:45               ` Miquel Raynal
2020-02-07  4:13                 ` liaoweixiong
2020-02-07  8:41                   ` Miquel Raynal
2020-02-07 10:30                     ` liaoweixiong
2020-01-23  4:24   ` Vignesh Raghavendra
2020-01-23  7:03     ` liaoweixiong
2020-02-06  9:13 ` [PATCH v1 00/11] pstore: support crash log to block and mtd device Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).