All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] add blockconsole
@ 2012-04-24 20:59 Jörn Engel
  2012-04-25 13:42 ` Jeff Moyer
  0 siblings, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-04-24 20:59 UTC (permalink / raw)
  To: linux-kernel

Console driver similar to netconsole, except it writes to a block
device.  Can be useful in a setup where netconsole, for whatever
reasons, is impractical.

Signed-off-by: Joern Engel <joern@logfs.org>
---
 Documentation/block/blockconsole.txt |   61 ++++
 block/partitions/Makefile            |    1 +
 block/partitions/blockconsole.c      |   22 ++
 block/partitions/check.c             |    4 +
 drivers/block/Kconfig                |    5 +
 drivers/block/Makefile               |    1 +
 drivers/block/blockconsole.c         |  523 ++++++++++++++++++++++++++++++++++
 include/linux/blockconsole.h         |    7 +
 8 files changed, 624 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/block/blockconsole.txt
 create mode 100644 block/partitions/blockconsole.c
 create mode 100644 drivers/block/blockconsole.c
 create mode 100644 include/linux/blockconsole.h

diff --git a/Documentation/block/blockconsole.txt b/Documentation/block/blockconsole.txt
new file mode 100644
index 0000000..e84d4ae
--- /dev/null
+++ b/Documentation/block/blockconsole.txt
@@ -0,0 +1,61 @@
+
+started by Jörn Engel <joern@logfs.org> 2012.03.17
+
+Introduction:
+=============
+
+This module logs kernel printk messages to block devices, e.g. usb
+sticks.  It allows after-the-fact debugging when the main
+disk/filesystem fails and serial consoles and netconsole are
+impractical.
+
+It can currently only be used built-in.  Blockconsole hooks into the
+partition scanning code and will bring up configured block devices as
+soon as possible.  While this doesn't allow capture of early kernel
+panics, it does capture most of the boot process.
+
+Block device configuration:
+==================================
+
+Blockconsole has no configuration parameter.  In order to use a block
+device for logging, the blockconsole header has to be written to the
+device in questions.  Logging to partitions is not supported.
+
+Example:
+  echo "Linux blockconsole version 1.0" > /dev/sdc
+
+If the string "Linux blockconsole version 1.0" is present at the
+beginning of the device, this device will be used by blockconsole upon
+next boot.  It is possible but not required to add an additional
+character before the string.  Usually that would be a newline.
+
+Miscellaneous notes:
+====================
+
+Once every megabyte blockconsole will write a copy of its header to
+the device.  This header consists of a newline, the string "Linux
+blockconsole version 1.0", a 64bit big-endian sequence number, plus
+another eight newlines for a total of 48 bytes.  This means that log
+messages can be interrupted by the header in mid-line and continue
+after the header.
+
+The 64bit big-endian sequence number is used by blockconsole to
+determine where to continue logging after a reboot.  New logs will be
+written to the first megabytes that wasn't written to by the last
+instance of blockconsole.  Therefore users might want to read the log
+device in a hex editor and look for the place where the header
+sequence number changes.  This marks the end of the log, or at least
+it marks a location less than one megabyte from the end of the log.
+
+The blockconsole header is constructed such that opening the log
+device in a text editor, ignoring memory constraints due to large
+devices, should just work and be reasonably non-confusing to readers.
+
+Writing to the log device is strictly circular.  This should give
+optimal performance and reliability on cheap devices, like usb sticks.
+
+Writing to block devices has to happen in sector granularity, while
+kernel logging happens in byte granularity.  In order not to lose
+messages in important cases like kernel crashes, a timer will write
+out partial sectors if no new messages appear for a while.  The
+unwritten part of the sector will be filled with newlines.
diff --git a/block/partitions/Makefile b/block/partitions/Makefile
index 03af8ea..bf26d4a 100644
--- a/block/partitions/Makefile
+++ b/block/partitions/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_IBM_PARTITION) += ibm.o
 obj-$(CONFIG_EFI_PARTITION) += efi.o
 obj-$(CONFIG_KARMA_PARTITION) += karma.o
 obj-$(CONFIG_SYSV68_PARTITION) += sysv68.o
+obj-$(CONFIG_BLOCKCONSOLE) += blockconsole.o
diff --git a/block/partitions/blockconsole.c b/block/partitions/blockconsole.c
new file mode 100644
index 0000000..79796a8
--- /dev/null
+++ b/block/partitions/blockconsole.c
@@ -0,0 +1,22 @@
+#include <linux/blockconsole.h>
+
+#include "check.h"
+
+int blockconsole_partition(struct parsed_partitions *state)
+{
+	Sector sect;
+	void *data;
+	int err = 0;
+
+	data = read_part_sector(state, 0, &sect);
+	if (!data)
+		return -EIO;
+	if (!bcon_magic_present(data))
+		goto out;
+
+	bcon_add(state->name);
+	err = 1;
+out:
+	put_dev_sector(sect);
+	return err;
+}
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..8de99fa 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -36,11 +36,15 @@
 
 int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/
 
+int blockconsole_partition(struct parsed_partitions *state);
 static int (*check_part[])(struct parsed_partitions *) = {
 	/*
 	 * Probe partition formats with tables at disk address 0
 	 * that also have an ADFS boot block at 0xdc0.
 	 */
+#ifdef CONFIG_BLOCKCONSOLE
+	blockconsole_partition,
+#endif
 #ifdef CONFIG_ACORN_PARTITION_ICS
 	adfspart_check_ICS,
 #endif
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a796407..7ce033d 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -555,4 +555,9 @@ config BLK_DEV_RBD
 
 	  If unsure, say N.
 
+config BLOCKCONSOLE
+	tristate "Block device console logging support"
+	help
+	  This enables logging to block devices.
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 5b79505..1eb7f902 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
 obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
 obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
+obj-$(CONFIG_BLOCKCONSOLE)	+= blockconsole.o
 
 swim_mod-y	:= swim.o swim_asm.o
diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
new file mode 100644
index 0000000..e72bb64
--- /dev/null
+++ b/drivers/block/blockconsole.c
@@ -0,0 +1,523 @@
+#include <linux/bio.h>
+#include <linux/blockconsole.h>
+#include <linux/console.h>
+#include <linux/fs.h>
+#include <linux/kthread.h>
+#include <linux/mm.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
+#define BLOCKCONSOLE_MAGIC	"Linux blockconsole version 1.0"
+#define BCON_HEADERSIZE		(48)
+#define PAGE_COUNT		(256)
+#define SECTOR_COUNT		(PAGE_COUNT * (PAGE_SIZE >> 9))
+#define CACHE_PAGE_MASK		(PAGE_COUNT - 1)
+#define CACHE_SECTOR_MASK	(SECTOR_COUNT - 1)
+#define CACHE_SIZE		(PAGE_COUNT << PAGE_SHIFT)
+#define CACHE_MASK		(CACHE_SIZE - 1)
+#define SECTOR_SHIFT		(9)
+#define SECTOR_SIZE		(1 << SECTOR_SHIFT)
+#define SECTOR_MASK		(~(SECTOR_SIZE-1))
+#define PG_SECTOR_MASK		((PAGE_SIZE >> 9) - 1)
+
+struct bcon_bio {
+	struct bio bio;
+	struct bio_vec bvec;
+	int in_flight;
+};
+
+struct blockconsole {
+	struct spinlock write_lock;
+	struct spinlock end_io_lock;
+	struct timer_list pad_timer;
+	int error_count;
+	int lost_bytes;
+	struct kref kref;
+	u64 console_bytes;
+	u64 write_bytes;
+	u64 max_bytes;
+	u64 round;
+	void *sector_array[SECTOR_COUNT];
+	struct bcon_bio bio_array[SECTOR_COUNT];
+	struct page *pages;
+	struct bcon_bio zero_bios[PAGE_COUNT];
+	struct page *zero_page;
+	struct block_device *bdev;
+	struct console console;
+	struct work_struct unregister_work;
+	struct task_struct *writeback_thread;
+};
+
+static void bcon_get(struct blockconsole *bc)
+{
+	kref_get(&bc->kref);
+}
+
+static void bcon_release(struct kref *kref)
+{
+	struct blockconsole *bc = container_of(kref, struct blockconsole, kref);
+
+	__free_pages(bc->zero_page, 0);
+	__free_pages(bc->pages, 8);
+	kfree(bc);
+}
+
+static void bcon_put(struct blockconsole *bc)
+{
+	kref_put(&bc->kref, bcon_release);
+}
+
+static int bcon_console_ofs(struct blockconsole *bc)
+{
+	return bc->console_bytes & ~SECTOR_MASK;
+}
+
+static int bcon_console_sector(struct blockconsole *bc)
+{
+	return (bc->console_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static int bcon_write_sector(struct blockconsole *bc)
+{
+	return (bc->write_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static void bcon_advance_console_bytes(struct blockconsole *bc, int bytes)
+{
+	bc->console_bytes += bytes;
+	if (bc->console_bytes >= bc->max_bytes)
+		bc->console_bytes = 0;
+	if ((bc->console_bytes & CACHE_MASK) == 0)
+		bc->console_bytes += BCON_HEADERSIZE;
+}
+
+static void request_complete(struct bio *bio, int err)
+{
+	complete((struct completion *)bio->bi_private);
+}
+
+static void bcon_init_first_page(struct blockconsole *bc)
+{
+	void *buf = page_address(bc->pages);
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+	__be64 *be_round = buf + 32;
+	u64 round = ++(bc->round);
+
+	/* XXX memset to spaces */
+	memset(buf, 10, BCON_HEADERSIZE);
+	memcpy(buf + 1, BLOCKCONSOLE_MAGIC, len);
+	*be_round = cpu_to_be64(round);
+}
+
+static int sync_read(struct blockconsole *bc, u64 ofs)
+{
+	struct bio bio;
+	struct bio_vec bio_vec;
+	struct completion complete;
+
+	bio_init(&bio);
+	bio.bi_io_vec = &bio_vec;
+	bio_vec.bv_page = bc->pages;
+	bio_vec.bv_len = SECTOR_SIZE;
+	bio_vec.bv_offset = 0;
+	bio.bi_vcnt = 1;
+	bio.bi_idx = 0;
+	bio.bi_size = SECTOR_SIZE;
+	bio.bi_bdev = bc->bdev;
+	bio.bi_sector = ofs >> SECTOR_SHIFT;
+	init_completion(&complete);
+	bio.bi_private = &complete;
+	bio.bi_end_io = request_complete;
+
+	submit_bio(READ, &bio);
+	wait_for_completion(&complete);
+	return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO;
+}
+
+static void bcon_erase_segment(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio *bio = &bcon_bio->bio;
+
+		/*
+		 * If the last erase hasn't finished yet, just skip it.  The log will
+		 * look messy, but that's all.
+		 */
+		rmb();
+		if (bcon_bio->in_flight)
+			continue;
+		bio_init(bio);
+		bio->bi_io_vec = &bcon_bio->bvec;
+		bio->bi_vcnt = 1;
+		bio->bi_size = PAGE_SIZE;
+		bio->bi_bdev = bc->bdev;
+		bio->bi_private = bc;
+		bio->bi_idx = 0;
+		bio->bi_sector = (bc->write_bytes + i * PAGE_SIZE) >> 9;
+		bcon_bio->in_flight = 1;
+		wmb();
+		/* We want the erase to go to the device first somehow */
+		submit_bio(WRITE | REQ_SOFTBARRIER, bio);
+	}
+}
+
+static void bcon_advance_write_bytes(struct blockconsole *bc, int bytes)
+{
+	bc->write_bytes += bytes;
+	if (bc->write_bytes >= bc->max_bytes) {
+		bc->write_bytes = 0;
+		bcon_init_first_page(bc);
+	}
+}
+
+static int bcon_find_end_of_log(struct blockconsole *bc)
+{
+	u64 start = 0, end = bc->max_bytes, middle;
+	__be64 *be_round = (bc->sector_array[1]) + 32;
+	int err;
+
+	sync_read(bc, 0);
+	memcpy(bc->sector_array[1], bc->sector_array[0], BCON_HEADERSIZE);
+	for (;;) {
+		middle = (start + end) / 2;
+		middle &= ~CACHE_MASK;
+		if (middle == start)
+			break;
+		err = sync_read(bc, middle);
+		if (err)
+			return err;
+		if (memcmp(bc->sector_array[1], bc->sector_array[0],
+					BCON_HEADERSIZE)) {
+			/* If the two differ, we haven't written that far yet */
+			end = middle;
+		} else {
+			start = middle;
+		}
+	}
+	bc->round = be64_to_cpu(*be_round);
+	if (middle == 0 && (bc->round == 0 || bc->round > 0x100000000ull)) {
+		/* Chances are, this device is brand-new */
+		bc->round = 0;
+		bc->console_bytes = bc->write_bytes = 0;
+		bcon_init_first_page(bc);
+	} else {
+		bc->console_bytes = bc->write_bytes = end;
+		memcpy(bc->sector_array[0], bc->sector_array[1], BCON_HEADERSIZE);
+	}
+	bcon_advance_console_bytes(bc, 0); /* To skip the header */
+	bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */
+	bcon_erase_segment(bc);
+	return 0;
+}
+
+static void bcon_unregister(struct work_struct *work)
+{
+	struct blockconsole *bc = container_of(work, struct blockconsole,
+			unregister_work);
+
+	unregister_console(&bc->console);
+	del_timer_sync(&bc->pad_timer);
+	kthread_stop(bc->writeback_thread);
+	/* No new io will be scheduled anymore now */
+	bcon_put(bc);
+}
+
+#define BCON_MAX_ERRORS	10
+static void bcon_end_io(struct bio *bio, int err)
+{
+	struct bcon_bio *bcon_bio = container_of(bio, struct bcon_bio, bio);
+	struct blockconsole *bc = bio->bi_private;
+	unsigned long flags;
+
+	/*
+	 * We want to assume the device broken and free this console if
+	 * we accumulate too many errors.  But if errors are transient,
+	 * we also want to forget about them once writes succeed again.
+	 * Oh, and we only want to reset the counter if it hasn't reached
+	 * the limit yet, so we don't bcon_put() twice from here.
+	 */
+	spin_lock_irqsave(&bc->end_io_lock, flags);
+	if (err) {
+		if (bc->error_count++ == BCON_MAX_ERRORS) {
+			schedule_work(&bc->unregister_work);
+		}
+	} else {
+		if (bc->error_count && bc->error_count < BCON_MAX_ERRORS)
+			bc->error_count = 0;
+	}
+	bcon_bio->in_flight = 0;
+	wmb(); /* FIXME: isn't this implicit in the spin_unlock already? */
+	spin_unlock_irqrestore(&bc->end_io_lock, flags);
+	bcon_put(bc);
+}
+
+static void bcon_writesector(struct blockconsole *bc, int index)
+{
+	struct bcon_bio *bcon_bio = bc->bio_array + index;
+	struct bio *bio = &bcon_bio->bio;
+
+	rmb();
+	if (bcon_bio->in_flight)
+		return;
+	bcon_get(bc);
+
+	bio_init(bio);
+	bio->bi_io_vec = &bcon_bio->bvec;
+	bio->bi_vcnt = 1;
+	bio->bi_size = SECTOR_SIZE;
+	bio->bi_bdev = bc->bdev;
+	bio->bi_private = bc;
+	bio->bi_end_io = bcon_end_io;
+
+	bio->bi_idx = 0;
+	bio->bi_sector = bc->write_bytes >> 9;
+	bcon_bio->in_flight = 1;
+	wmb();
+	submit_bio(WRITE, bio);
+}
+
+static int bcon_writeback(void *_bc)
+{
+	struct blockconsole *bc = _bc;
+	struct sched_param(sp);
+
+	sp.sched_priority = MAX_RT_PRIO - 1; /* Highest realtime prio */
+	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+		if (kthread_should_stop())
+			break;
+		while (bcon_write_sector(bc) != bcon_console_sector(bc)) {
+			bcon_writesector(bc, bcon_write_sector(bc));
+			bcon_advance_write_bytes(bc, SECTOR_SIZE);
+			if (bcon_write_sector(bc) == 0) {
+				bcon_erase_segment(bc);
+			}
+		}
+	}
+	return 0;
+}
+
+static void bcon_pad(unsigned long data)
+{
+	struct blockconsole *bc = (void *)data;
+	unsigned int n;
+	unsigned long flags;
+
+	spin_lock_irqsave(&bc->write_lock, flags);
+	if (bcon_console_ofs(bc) != 0) {
+		n = SECTOR_SIZE - bcon_console_ofs(bc);
+		memset(bc->sector_array[bcon_console_sector(bc)]
+				+ bcon_console_ofs(bc), ' ', n);
+		memset(bc->sector_array[bcon_console_sector(bc)] + 511, 10, 1);
+		bcon_advance_console_bytes(bc, n);
+		wake_up_process(bc->writeback_thread);
+	}
+	spin_unlock_irqrestore(&bc->write_lock, flags);
+}
+
+static int bcon_handle_lost_lines(struct blockconsole *bc, void *buf, size_t n)
+{
+	int written;
+
+	if (!bc->lost_bytes)
+		return 0;
+	written = snprintf(buf, n, "blockconsole dropped %d bytes\n",
+			bc->lost_bytes);
+	if (written < n)
+		return 0;
+	bc->lost_bytes = 0;
+	return 1;
+}
+
+static void bcon_write(struct console *console, const char *msg,
+		unsigned int len)
+{
+	struct blockconsole *bc = container_of(console, struct blockconsole,
+			console);
+	unsigned int n;
+	unsigned long flags;
+	int i;
+
+	spin_lock_irqsave(&bc->write_lock, flags);
+	while (len) {
+		i = bcon_console_sector(bc);
+		rmb();
+		if (bc->bio_array[i].in_flight) {
+			bc->lost_bytes += len;
+			break;
+		}
+		n = min_t(int, len, SECTOR_SIZE - bcon_console_ofs(bc));
+		if (bcon_handle_lost_lines(bc, bc->sector_array[i]
+					+ bcon_console_ofs(bc), n) == 0) {
+			memcpy(bc->sector_array[i] + bcon_console_ofs(bc), msg, n);
+			len -= n;
+			msg += n;
+		}
+		bcon_advance_console_bytes(bc, n);
+		if (bcon_console_ofs(bc) == 0) {
+			wake_up_process(bc->writeback_thread);
+		}
+	}
+	if (bcon_console_ofs(bc) != 0)
+		mod_timer(&bc->pad_timer, jiffies + HZ);
+	spin_unlock_irqrestore(&bc->write_lock, flags);
+}
+
+static void bcon_init_bios(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < SECTOR_COUNT; i++) {
+		int page_index = i >> (PAGE_SHIFT - SECTOR_SHIFT);
+		struct page *page = bc->pages + page_index;
+		struct bcon_bio *bcon_bio = bc->bio_array + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bc->sector_array[i] = page_address(bc->pages + page_index)
+			+ SECTOR_SIZE * (i & PG_SECTOR_MASK);
+		bvec->bv_page = page;
+		bvec->bv_len = SECTOR_SIZE;
+		bvec->bv_offset = SECTOR_SIZE * (i & PG_SECTOR_MASK);
+	}
+}
+
+static void bcon_init_zero_bio(struct blockconsole *bc)
+{
+	int i;
+
+	memset(page_address(bc->zero_page), 0, PAGE_SIZE);
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bvec->bv_page = bc->zero_page;
+		bvec->bv_len = PAGE_SIZE;
+		bvec->bv_offset = 0;
+	}
+}
+
+static int bcon_create(const char *devname)
+{
+	const fmode_t mode = FMODE_READ | FMODE_WRITE;
+	struct blockconsole *bc;
+	int err;
+
+	bc = kzalloc(sizeof(*bc), GFP_KERNEL);
+	if (!bc)
+		return -ENOMEM;
+	spin_lock_init(&bc->write_lock);
+	spin_lock_init(&bc->end_io_lock);
+	strcpy(bc->console.name, "blockcon");
+	bc->console.flags = CON_PRINTBUFFER | CON_ENABLED; /* FIXME: document flags */
+	bc->console.write = bcon_write;
+	bc->bdev = blkdev_get_by_path(devname, mode, NULL);
+#ifndef MODULE
+	if (IS_ERR(bc->bdev)) {
+		dev_t devt = name_to_dev_t(devname);
+		if (devt)
+			bc->bdev = blkdev_get_by_dev(devt, mode, NULL);
+	}
+#endif
+	if (IS_ERR(bc->bdev))
+		goto out;
+	bc->pages = alloc_pages(GFP_KERNEL, 8);
+	if (!bc->pages)
+		goto out;
+	bc->zero_page = alloc_pages(GFP_KERNEL, 0);
+	if (!bc->zero_page)
+		goto out1;
+	bcon_init_bios(bc);
+	bcon_init_zero_bio(bc);
+	setup_timer(&bc->pad_timer, bcon_pad, (unsigned long)bc);
+	bc->max_bytes = bc->bdev->bd_inode->i_size & ~CACHE_MASK;
+	err = bcon_find_end_of_log(bc);
+	if (err)
+		goto out2;
+	kref_init(&bc->kref); /* This reference gets freed on errors */
+	bc->writeback_thread = kthread_run(bcon_writeback, bc, "bcon_%s",
+			devname);
+	if (IS_ERR(bc->writeback_thread))
+		goto out2;
+	INIT_WORK(&bc->unregister_work, bcon_unregister);
+	register_console(&bc->console);
+	printk(KERN_INFO "blockconsole: now logging to %s\n", devname);
+	return 0;
+
+out2:
+	__free_pages(bc->zero_page, 0);
+out1:
+	__free_pages(bc->pages, 8);
+out:
+	kfree(bc);
+	/* Not strictly correct, be the caller doesn't care */
+	return -ENOMEM;
+}
+
+static void bcon_create_fuzzy(const char *name)
+{
+	char *longname;
+	int err;
+
+	err = bcon_create(name);
+	if (err) {
+		longname = kzalloc(strlen(name) + 6, GFP_KERNEL);
+		if (!longname)
+			return;
+		strcpy(longname, "/dev/");
+		strcat(longname, name);
+		bcon_create(longname);
+		kfree(longname);
+	}
+}
+
+static DEFINE_SPINLOCK(device_lock);
+static char scanned_devices[80];
+
+static void bcon_do_add(struct work_struct *work)
+{
+	char local_devices[80], *name, *remainder = local_devices;
+
+	spin_lock(&device_lock);
+	memcpy(local_devices, scanned_devices, sizeof(local_devices));
+	memset(scanned_devices, 0, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+
+	while (remainder && remainder[0]) {
+		name = strsep(&remainder, ",");
+		bcon_create_fuzzy(name);
+	}
+}
+
+DECLARE_WORK(bcon_add_work, bcon_do_add);
+
+void bcon_add(const char *name)
+{
+	/*
+	 * We add each name to a small static buffer and ask for a workqueue
+	 * to go pick it up asap.  Once it is picked up, the buffer is empty
+	 * again, so hopefully it will suffice for all sane users.
+	 */
+	spin_lock(&device_lock);
+	if (scanned_devices[0])
+		strncat(scanned_devices, ",", sizeof(scanned_devices));
+	strncat(scanned_devices, name, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+	schedule_work(&bcon_add_work);
+}
+
+int bcon_magic_present(const void *data)
+{
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+
+	return memcmp(data + 1, BLOCKCONSOLE_MAGIC, len) == 0 ||
+		memcmp(data, BLOCKCONSOLE_MAGIC, len) == 0;
+}
diff --git a/include/linux/blockconsole.h b/include/linux/blockconsole.h
new file mode 100644
index 0000000..114f7c5
--- /dev/null
+++ b/include/linux/blockconsole.h
@@ -0,0 +1,7 @@
+#ifndef LINUX_BLOCKCONSOLE_H
+#define LINUX_BLOCKCONSOLE_H
+
+int bcon_magic_present(const void *data);
+void bcon_add(const char *name);
+
+#endif
-- 
1.7.9.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] add blockconsole
  2012-04-25 13:42 ` Jeff Moyer
@ 2012-04-25 13:25   ` Jörn Engel
  2012-04-25 15:52     ` Jeff Moyer
  0 siblings, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-04-25 13:25 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Wed, 25 April 2012 09:42:39 -0400, Jeff Moyer wrote:
> 
> Neat idea, but I'm curious to know how it works when the system panics
> and you no longer can schedule the writeback thread.  What are the
> limitations you've seen in practice?

If the writeback thread doesn't get scheduled, you lose that
information.  Formerly I did the submit_bio directly, but lockdep
reminded me that I shouldn't do so from interrupt context.  Bummer.

Also, this is pretty young code so far.  We are starting to deploy it
in numbers just about now.  The obvious hope is that it will record a
non-zero class of problems that doesn't make it to syslog.  And the
obvious fear is that it will fail to record another non-zero class of
problems.  As to the extend of those two classes, we just don't have
good data yet.

But as they say, release early, release often, and hope some reviewer
finds a bug you can fix before having to debug it the hard way. ;)

Jörn

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] add blockconsole
  2012-04-24 20:59 [RFC][PATCH] add blockconsole Jörn Engel
@ 2012-04-25 13:42 ` Jeff Moyer
  2012-04-25 13:25   ` Jörn Engel
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Moyer @ 2012-04-25 13:42 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-kernel

Jörn Engel <joern@logfs.org> writes:

> Console driver similar to netconsole, except it writes to a block
> device.  Can be useful in a setup where netconsole, for whatever
> reasons, is impractical.

Hi, Joern,

Neat idea, but I'm curious to know how it works when the system panics
and you no longer can schedule the writeback thread.  What are the
limitations you've seen in practice?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] add blockconsole
  2012-04-25 13:25   ` Jörn Engel
@ 2012-04-25 15:52     ` Jeff Moyer
  2012-07-12 17:46       ` [PATCH] add blockconsole version 1.1 Jörn Engel
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Moyer @ 2012-04-25 15:52 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-kernel

Jörn Engel <joern@logfs.org> writes:

> On Wed, 25 April 2012 09:42:39 -0400, Jeff Moyer wrote:
>> 
>> Neat idea, but I'm curious to know how it works when the system panics
>> and you no longer can schedule the writeback thread.  What are the
>> limitations you've seen in practice?
>
> If the writeback thread doesn't get scheduled, you lose that
> information.  Formerly I did the submit_bio directly, but lockdep
> reminded me that I shouldn't do so from interrupt context.  Bummer.

Well, submit_bio can sleep, obviously.  Perhaps you could explore
registering a panic notifier, and then flush the logs from there?
Unfortunately, there is always the possibility that some required locks
will be taken, but it might get you a little bit further.  Or, you could
explore the route that the diskdump or lkcd folks took, implementing
their own polling mode drivers.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] add blockconsole version 1.1
  2012-04-25 15:52     ` Jeff Moyer
@ 2012-07-12 17:46       ` Jörn Engel
  2012-07-13 13:03         ` Borislav Petkov
  2012-07-23 14:33         ` Tvrtko Ursulin
  0 siblings, 2 replies; 27+ messages in thread
From: Jörn Engel @ 2012-07-12 17:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Jeff Moyer, Steve Hodgson

Console driver similar to netconsole, except it writes to a block
device.  Can be useful in a setup where netconsole, for whatever
reasons, is impractical.

Changes since version 1.0:
- Header format overhaul, addressing several annoyances when actually
  using blockconsole for production.
- Steve Hodgson added a panic notifier.

Signed-off-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Joern Engel <joern@logfs.org>
---
 Documentation/block/blockconsole.txt            |   75 +++
 Documentation/block/blockconsole/bcon_tail      |   52 ++
 Documentation/block/blockconsole/mkblockconsole |   24 +
 block/partitions/Makefile                       |    1 +
 block/partitions/blockconsole.c                 |   22 +
 block/partitions/check.c                        |    4 +
 drivers/block/Kconfig                           |    5 +
 drivers/block/Makefile                          |    1 +
 drivers/block/blockconsole.c                    |  606 +++++++++++++++++++++++
 include/linux/mount.h                           |    2 +-
 init/do_mounts.c                                |    4 +-
 11 files changed, 793 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/block/blockconsole.txt
 create mode 100755 Documentation/block/blockconsole/bcon_tail
 create mode 100755 Documentation/block/blockconsole/mkblockconsole
 create mode 100644 block/partitions/blockconsole.c
 create mode 100644 drivers/block/blockconsole.c

diff --git a/Documentation/block/blockconsole.txt b/Documentation/block/blockconsole.txt
new file mode 100644
index 0000000..a906e61
--- /dev/null
+++ b/Documentation/block/blockconsole.txt
@@ -0,0 +1,75 @@
+
+started by Jörn Engel <joern@logfs.org> 2012.03.17
+
+Introduction:
+=============
+
+This module logs kernel printk messages to block devices, e.g. usb
+sticks.  It allows after-the-fact debugging when the main
+disk/filesystem fails and serial consoles and netconsole are
+impractical.
+
+It can currently only be used built-in.  Blockconsole hooks into the
+partition scanning code and will bring up configured block devices as
+soon as possible.  While this doesn't allow capture of early kernel
+panics, it does capture most of the boot process.
+
+Block device configuration:
+==================================
+
+Blockconsole has no configuration parameter.  In order to use a block
+device for logging, the blockconsole header has to be written to the
+device in questions.  Logging to partitions is not supported.
+
+The example program mkblockconsole can be used to generate such a
+header on a device.
+
+Header format:
+==============
+
+A legal header looks like this:
+
+Linux blockconsole version 1.1
+818cf322
+00000000
+00000000
+
+It consists of a newline, the "Linux blockconsole version 1.1" string
+plus three numbers on seperate lines each.  Numbers are all 32bit,
+represented as 8-byte hex strings, with letters in lowercase.  The
+first number is a uuid for this particular console device.  Just pick
+a random number when generating the device.  The second number is a
+wrap counter and unlikely to ever increment.  The third is a tile
+counter, with a tile being one megabyte in size.
+
+Miscellaneous notes:
+====================
+
+Blockconsole will write a new header for every tile or once every
+megabyte.  The header starts with a newline in order to ensure the
+"Linux blockconsole...' string always ends up at the beginning of a
+line if you read the blockconsole in a text editor.
+
+The blockconsole header is constructed such that opening the log
+device in a text editor, ignoring memory constraints due to large
+devices, should just work and be reasonably non-confusing to readers.
+However, the example program bcon_tail can be used to copy the last 16
+tiles of the log device to /var/log/bcon.<uuid>, which should be much
+easier to handle.
+
+The wrap counter is used by blockconsole to determine where to
+continue logging after a reboot.  New logs will be written to the
+first tile that wasn't written to by the last instance of
+blockconsole.  Similarly bcon_tail is doing a binary search to find
+the end of the log.
+
+Writing to the log device is strictly circular.  This should give
+optimal performance and reliability on cheap devices, like usb sticks.
+
+Writing to block devices has to happen in sector granularity, while
+kernel logging happens in byte granularity.  In order not to lose
+messages in important cases like kernel crashes, a timer will write
+out partial sectors if no new messages appear for a while.  The
+unwritten part of the sector will be filled with spaces and a single
+newline.  In a quiet system, these empty lines can make up the bulk of
+the log.
diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail
new file mode 100755
index 0000000..950bfd1
--- /dev/null
+++ b/Documentation/block/blockconsole/bcon_tail
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+TAIL_LEN=16
+TEMPLATE=/tmp/bcon_template
+BUF=/tmp/bcon_buf
+
+end_of_log() {
+	DEV=$1
+	UUID=`head -c40 $DEV|tail -c8`
+	LOGFILE=/var/log/bcon.$UUID
+	SECTORS=`hdparm -g $DEV|grep sectors|sed 's/.*sectors = \([0-9]*\).*/\1/'`
+	#MSIZE=`expr $SECTORS / 2048`
+	dd if=$DEV iflag=direct bs=512 2>/dev/null|head -c50 > $TEMPLATE
+	#START, MIDDLE and END are in sectors
+	START=0
+	MIDDLE=$SECTORS
+	END=$SECTORS
+	while true; do
+		MIDDLE=`expr \( \( $END + $START \) / 4096 \) \* 2048`
+		if [ $MIDDLE -eq $START ]; then
+			break
+		fi
+		dd if=$DEV iflag=direct bs=512 count=1 skip=$MIDDLE 2>/dev/null|head -c50 > $BUF
+		if diff -q $BUF $TEMPLATE > /dev/null; then
+			START=$MIDDLE
+		else
+			END=$MIDDLE
+		fi
+	done
+	#switch to megabytes
+	END=`expr $END / 2048`
+	START=`expr $START / 2048`
+	if [ $START -lt $TAIL_LEN ]; then
+		START=0
+	else
+		START=`expr $START - $TAIL_LEN + 1`
+	fi
+	LEN=`expr $END - $START`
+	dd if=$DEV iflag=direct bs=1M count=$LEN skip=$START >$LOGFILE 2>/dev/null
+	echo $LOGFILE
+}
+
+# HEADER contains a newline, so the funny quoting is necessary
+HEADER='
+Linux blockconsole version 1.1'
+CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'`
+
+for DEV in $CANDIDATES; do
+	if [ "`head -c32 $DEV`" == "$HEADER" ]; then
+		end_of_log $DEV
+	fi
+done
diff --git a/Documentation/block/blockconsole/mkblockconsole b/Documentation/block/blockconsole/mkblockconsole
new file mode 100755
index 0000000..d9514e7
--- /dev/null
+++ b/Documentation/block/blockconsole/mkblockconsole
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+if [ ! $# -eq 1 ]; then
+	echo "Usage: mkblockconsole <dev>"
+	exit 1
+elif mount|fgrep -q $1; then
+	echo Device appears to be mounted - aborting
+	exit 1
+else
+	dd if=/dev/zero bs=1M count=1 > $1
+	# The funky formatting is actually needed!
+	UUID=`head -c4 /dev/urandom |hexdump -e '/4 "%08x"'`
+	echo > /tmp/$UUID
+	echo 'Linux blockconsole version 1.1' >> /tmp/$UUID
+	echo "$UUID" >> /tmp/$UUID
+	echo 00000000 >> /tmp/$UUID
+	echo 00000000 >> /tmp/$UUID
+	for i in `seq 452`; do echo -n " " >> /tmp/$UUID; done
+	echo >> /tmp/$UUID
+	cat /tmp/$UUID > $1
+	rm /tmp/$UUID
+	sync
+	exit 0
+fi
diff --git a/block/partitions/Makefile b/block/partitions/Makefile
index 03af8ea..bf26d4a 100644
--- a/block/partitions/Makefile
+++ b/block/partitions/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_IBM_PARTITION) += ibm.o
 obj-$(CONFIG_EFI_PARTITION) += efi.o
 obj-$(CONFIG_KARMA_PARTITION) += karma.o
 obj-$(CONFIG_SYSV68_PARTITION) += sysv68.o
+obj-$(CONFIG_BLOCKCONSOLE) += blockconsole.o
diff --git a/block/partitions/blockconsole.c b/block/partitions/blockconsole.c
new file mode 100644
index 0000000..79796a8
--- /dev/null
+++ b/block/partitions/blockconsole.c
@@ -0,0 +1,22 @@
+#include <linux/blockconsole.h>
+
+#include "check.h"
+
+int blockconsole_partition(struct parsed_partitions *state)
+{
+	Sector sect;
+	void *data;
+	int err = 0;
+
+	data = read_part_sector(state, 0, &sect);
+	if (!data)
+		return -EIO;
+	if (!bcon_magic_present(data))
+		goto out;
+
+	bcon_add(state->name);
+	err = 1;
+out:
+	put_dev_sector(sect);
+	return err;
+}
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..8de99fa 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -36,11 +36,15 @@
 
 int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/
 
+int blockconsole_partition(struct parsed_partitions *state);
 static int (*check_part[])(struct parsed_partitions *) = {
 	/*
 	 * Probe partition formats with tables at disk address 0
 	 * that also have an ADFS boot block at 0xdc0.
 	 */
+#ifdef CONFIG_BLOCKCONSOLE
+	blockconsole_partition,
+#endif
 #ifdef CONFIG_ACORN_PARTITION_ICS
 	adfspart_check_ICS,
 #endif
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a796407..7ce033d 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -555,4 +555,9 @@ config BLK_DEV_RBD
 
 	  If unsure, say N.
 
+config BLOCKCONSOLE
+	tristate "Block device console logging support"
+	help
+	  This enables logging to block devices.
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 5b79505..1eb7f902 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
 obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
 obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
+obj-$(CONFIG_BLOCKCONSOLE)	+= blockconsole.o
 
 swim_mod-y	:= swim.o swim_asm.o
diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
new file mode 100644
index 0000000..d13203f
--- /dev/null
+++ b/drivers/block/blockconsole.c
@@ -0,0 +1,606 @@
+#include <linux/bio.h>
+#include <linux/blockconsole.h>
+#include <linux/console.h>
+#include <linux/fs.h>
+#include <linux/kthread.h>
+#include <linux/mm.h>
+#include <linux/mount.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
+#define BLOCKCONSOLE_MAGIC_OLD	"\nLinux blockconsole version 1.0\n"
+#define BLOCKCONSOLE_MAGIC	"\nLinux blockconsole version 1.1\n"
+#define BCON_UUID_OFS		(32)
+#define BCON_ROUND_OFS		(41)
+#define BCON_TILE_OFS		(50)
+#define BCON_HEADERSIZE		(50)
+#define BCON_LONG_HEADERSIZE	(59) /* with tile index */
+
+#define PAGE_COUNT		(256)
+#define SECTOR_COUNT		(PAGE_COUNT * (PAGE_SIZE >> 9))
+#define CACHE_PAGE_MASK		(PAGE_COUNT - 1)
+#define CACHE_SECTOR_MASK	(SECTOR_COUNT - 1)
+#define CACHE_SIZE		(PAGE_COUNT << PAGE_SHIFT)
+#define CACHE_MASK		(CACHE_SIZE - 1)
+#define SECTOR_SHIFT		(9)
+#define SECTOR_SIZE		(1 << SECTOR_SHIFT)
+#define SECTOR_MASK		(~(SECTOR_SIZE-1))
+#define PG_SECTOR_MASK		((PAGE_SIZE >> 9) - 1)
+
+struct bcon_bio {
+	struct bio bio;
+	struct bio_vec bvec;
+	void *sector;
+	int in_flight;
+};
+
+struct blockconsole {
+	char devname[32];
+	struct spinlock end_io_lock;
+	struct timer_list pad_timer;
+	int error_count;
+	struct kref kref;
+	u64 console_bytes;
+	u64 write_bytes;
+	u64 max_bytes;
+	u32 round;
+	u32 uuid;
+	struct bcon_bio bio_array[SECTOR_COUNT];
+	struct page *pages;
+	struct bcon_bio zero_bios[PAGE_COUNT];
+	struct page *zero_page;
+	struct block_device *bdev;
+	struct console console;
+	struct work_struct unregister_work;
+	struct task_struct *writeback_thread;
+	struct notifier_block panic_block;
+};
+
+static void bcon_get(struct blockconsole *bc)
+{
+	kref_get(&bc->kref);
+}
+
+static void bcon_release(struct kref *kref)
+{
+	struct blockconsole *bc = container_of(kref, struct blockconsole, kref);
+
+	__free_pages(bc->zero_page, 0);
+	__free_pages(bc->pages, 8);
+	invalidate_mapping_pages(bc->bdev->bd_inode->i_mapping, 0, -1);
+	blkdev_put(bc->bdev, FMODE_READ|FMODE_WRITE);
+	kfree(bc);
+}
+
+static void bcon_put(struct blockconsole *bc)
+{
+	kref_put(&bc->kref, bcon_release);
+}
+
+static int __bcon_console_ofs(u64 console_bytes)
+{
+	return console_bytes & ~SECTOR_MASK;
+}
+
+static int bcon_console_ofs(struct blockconsole *bc)
+{
+	return __bcon_console_ofs(bc->console_bytes);
+}
+
+static int __bcon_console_sector(u64 console_bytes)
+{
+	return (console_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static int bcon_console_sector(struct blockconsole *bc)
+{
+	return __bcon_console_sector(bc->console_bytes);
+}
+
+static int bcon_write_sector(struct blockconsole *bc)
+{
+	return (bc->write_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static void clear_sector(void *sector)
+{
+	memset(sector, ' ', 511);
+	memset(sector + 511, 10, 1);
+}
+
+static void bcon_init_first_page(struct blockconsole *bc)
+{
+	char *buf = page_address(bc->pages);
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+	u32 tile = bc->console_bytes >> 20; /* We overflow after 4TB - fine */
+
+	clear_sector(buf);
+	memcpy(buf, BLOCKCONSOLE_MAGIC, len);
+	sprintf(buf + BCON_UUID_OFS, "%08x", bc->uuid);
+	sprintf(buf + BCON_ROUND_OFS, "%08x", bc->round);
+	sprintf(buf + BCON_TILE_OFS, "%08x", tile);
+	/* replace NUL with newline */
+	buf[BCON_UUID_OFS + 8] = 10;
+	buf[BCON_ROUND_OFS + 8] = 10;
+	buf[BCON_TILE_OFS + 8] = 10;
+}
+
+static void bcon_advance_console_bytes(struct blockconsole *bc, int bytes)
+{
+	u64 old, new;
+
+	do {
+		old = bc->console_bytes;
+		new = old + bytes;
+		if (new >= bc->max_bytes)
+			new = 0;
+		if ((new & CACHE_MASK) == 0) {
+			bcon_init_first_page(bc);
+			new += BCON_LONG_HEADERSIZE;
+		}
+	} while (cmpxchg64(&bc->console_bytes, old, new) != old);
+}
+
+static void request_complete(struct bio *bio, int err)
+{
+	complete((struct completion *)bio->bi_private);
+}
+
+static int sync_read(struct blockconsole *bc, u64 ofs)
+{
+	struct bio bio;
+	struct bio_vec bio_vec;
+	struct completion complete;
+
+	bio_init(&bio);
+	bio.bi_io_vec = &bio_vec;
+	bio_vec.bv_page = bc->pages;
+	bio_vec.bv_len = SECTOR_SIZE;
+	bio_vec.bv_offset = 0;
+	bio.bi_vcnt = 1;
+	bio.bi_idx = 0;
+	bio.bi_size = SECTOR_SIZE;
+	bio.bi_bdev = bc->bdev;
+	bio.bi_sector = ofs >> SECTOR_SHIFT;
+	init_completion(&complete);
+	bio.bi_private = &complete;
+	bio.bi_end_io = request_complete;
+
+	submit_bio(READ, &bio);
+	wait_for_completion(&complete);
+	return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO;
+}
+
+static void bcon_erase_segment(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio *bio = &bcon_bio->bio;
+
+		/*
+		 * If the last erase hasn't finished yet, just skip it.  The log will
+		 * look messy, but that's all.
+		 */
+		rmb();
+		if (bcon_bio->in_flight)
+			continue;
+		bio_init(bio);
+		bio->bi_io_vec = &bcon_bio->bvec;
+		bio->bi_vcnt = 1;
+		bio->bi_size = PAGE_SIZE;
+		bio->bi_bdev = bc->bdev;
+		bio->bi_private = bc;
+		bio->bi_idx = 0;
+		bio->bi_sector = (bc->write_bytes + i * PAGE_SIZE) >> 9;
+		bcon_bio->in_flight = 1;
+		wmb();
+		/* We want the erase to go to the device first somehow */
+		submit_bio(WRITE | REQ_SOFTBARRIER, bio);
+	}
+}
+
+static void bcon_advance_write_bytes(struct blockconsole *bc, int bytes)
+{
+	bc->write_bytes += bytes;
+	if (bc->write_bytes >= bc->max_bytes) {
+		bc->write_bytes = 0;
+		bcon_init_first_page(bc);
+		bc->round++;
+	}
+}
+
+static int bcon_convert_old_format(struct blockconsole *bc)
+{
+	bc->uuid = get_random_int();
+	bc->round = 0;
+	bc->console_bytes = bc->write_bytes = 0;
+	bcon_advance_console_bytes(bc, 0); /* To skip the header */
+	bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */
+	bcon_erase_segment(bc);
+	printk(KERN_INFO"blockconsole: converted %s from old format\n",
+			bc->devname);
+	return 0;
+}
+
+static int bcon_find_end_of_log(struct blockconsole *bc)
+{
+	u64 start = 0, end = bc->max_bytes, middle;
+	void *sec0 = bc->bio_array[0].sector;
+	void *sec1 = bc->bio_array[1].sector;
+	int err, version;
+
+	err = sync_read(bc, 0);
+	if (err)
+		return err;
+	/* Second sanity check, out of sheer paranoia */
+	version = bcon_magic_present(sec0);
+	if (version == 10)
+		return bcon_convert_old_format(bc);
+	bc->uuid = simple_strtoull(sec0 + BCON_UUID_OFS, NULL, 16);
+	bc->round = simple_strtoull(sec0 + BCON_ROUND_OFS, NULL, 16);
+
+	memcpy(sec1, sec0, BCON_HEADERSIZE);
+	for (;;) {
+		middle = (start + end) / 2;
+		middle &= ~CACHE_MASK;
+		if (middle == start)
+			break;
+		err = sync_read(bc, middle);
+		if (err)
+			return err;
+		if (memcmp(sec1, sec0, BCON_HEADERSIZE)) {
+			/* If the two differ, we haven't written that far yet */
+			end = middle;
+		} else {
+			start = middle;
+		}
+	}
+	bc->console_bytes = bc->write_bytes = end;
+	bcon_advance_console_bytes(bc, 0); /* To skip the header */
+	bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */
+	bcon_erase_segment(bc);
+	return 0;
+}
+
+static void bcon_unregister(struct work_struct *work)
+{
+	struct blockconsole *bc = container_of(work, struct blockconsole,
+			unregister_work);
+
+	atomic_notifier_chain_unregister(&panic_notifier_list, &bc->panic_block);
+	unregister_console(&bc->console);
+	del_timer_sync(&bc->pad_timer);
+	kthread_stop(bc->writeback_thread);
+	/* No new io will be scheduled anymore now */
+	bcon_put(bc);
+}
+
+#define BCON_MAX_ERRORS	10
+static void bcon_end_io(struct bio *bio, int err)
+{
+	struct bcon_bio *bcon_bio = container_of(bio, struct bcon_bio, bio);
+	struct blockconsole *bc = bio->bi_private;
+	unsigned long flags;
+
+	/*
+	 * We want to assume the device broken and free this console if
+	 * we accumulate too many errors.  But if errors are transient,
+	 * we also want to forget about them once writes succeed again.
+	 * Oh, and we only want to reset the counter if it hasn't reached
+	 * the limit yet, so we don't bcon_put() twice from here.
+	 */
+	spin_lock_irqsave(&bc->end_io_lock, flags);
+	if (err) {
+		if (bc->error_count++ == BCON_MAX_ERRORS) {
+			printk(KERN_INFO"blockconsole: no longer logging to %s\n", bc->devname);
+			schedule_work(&bc->unregister_work);
+		}
+	} else {
+		if (bc->error_count && bc->error_count < BCON_MAX_ERRORS)
+			bc->error_count = 0;
+	}
+	/*
+	 * Add padding (a bunch of spaces and a newline) early so bcon_pad
+	 * only has to advance a pointer.
+	 */
+	clear_sector(bcon_bio->sector);
+	bcon_bio->in_flight = 0;
+	spin_unlock_irqrestore(&bc->end_io_lock, flags);
+	bcon_put(bc);
+}
+
+static void bcon_writesector(struct blockconsole *bc, int index)
+{
+	struct bcon_bio *bcon_bio = bc->bio_array + index;
+	struct bio *bio = &bcon_bio->bio;
+
+	rmb();
+	if (bcon_bio->in_flight)
+		return;
+	bcon_get(bc);
+
+	bio_init(bio);
+	bio->bi_io_vec = &bcon_bio->bvec;
+	bio->bi_vcnt = 1;
+	bio->bi_size = SECTOR_SIZE;
+	bio->bi_bdev = bc->bdev;
+	bio->bi_private = bc;
+	bio->bi_end_io = bcon_end_io;
+
+	bio->bi_idx = 0;
+	bio->bi_sector = bc->write_bytes >> 9;
+	bcon_bio->in_flight = 1;
+	wmb();
+	submit_bio(WRITE, bio);
+}
+
+static int bcon_writeback(void *_bc)
+{
+	struct blockconsole *bc = _bc;
+	struct sched_param(sp);
+
+	sp.sched_priority = MAX_RT_PRIO - 1; /* Highest realtime prio */
+	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+		if (kthread_should_stop())
+			break;
+		while (bcon_write_sector(bc) != bcon_console_sector(bc)) {
+			bcon_writesector(bc, bcon_write_sector(bc));
+			bcon_advance_write_bytes(bc, SECTOR_SIZE);
+			if (bcon_write_sector(bc) == 0) {
+				bcon_erase_segment(bc);
+			}
+		}
+	}
+	return 0;
+}
+
+static void bcon_pad(unsigned long data)
+{
+	struct blockconsole *bc = (void *)data;
+	unsigned int n;
+
+	/*
+	 * We deliberately race against bcon_write here.  If we lose the race,
+	 * our padding is no longer where we expected it to be, i.e. it is
+	 * no longer a bunch of spaces with a newline at the end.  There could
+	 * not be a newline at all or it could be somewhere in the middle.
+	 * Either way, the log corruption is fairly obvious to spot and ignore
+	 * for human readers.
+	 */
+	n = SECTOR_SIZE - bcon_console_ofs(bc);
+	if (n != SECTOR_SIZE) {
+		bcon_advance_console_bytes(bc, n);
+		wake_up_process(bc->writeback_thread);
+	}
+}
+
+static void bcon_write(struct console *console, const char *msg,
+		unsigned int len)
+{
+	struct blockconsole *bc = container_of(console, struct blockconsole,
+			console);
+	unsigned int n;
+	u64 console_bytes;
+	int i;
+
+	while (len) {
+		console_bytes = bc->console_bytes;
+		i = __bcon_console_sector(console_bytes);
+		rmb();
+		if (bc->bio_array[i].in_flight)
+			break;
+		n = min_t(int, len, SECTOR_SIZE -
+				__bcon_console_ofs(console_bytes));
+		memcpy(bc->bio_array[i].sector +
+				__bcon_console_ofs(console_bytes), msg, n);
+		len -= n;
+		msg += n;
+		bcon_advance_console_bytes(bc, n);
+	}
+	wake_up_process(bc->writeback_thread);
+	mod_timer(&bc->pad_timer, jiffies + HZ);
+}
+
+static void bcon_init_bios(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < SECTOR_COUNT; i++) {
+		int page_index = i >> (PAGE_SHIFT - SECTOR_SHIFT);
+		struct page *page = bc->pages + page_index;
+		struct bcon_bio *bcon_bio = bc->bio_array + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bcon_bio->sector = page_address(bc->pages + page_index)
+			+ SECTOR_SIZE * (i & PG_SECTOR_MASK);
+		clear_sector(bcon_bio->sector);
+		bvec->bv_page = page;
+		bvec->bv_len = SECTOR_SIZE;
+		bvec->bv_offset = SECTOR_SIZE * (i & PG_SECTOR_MASK);
+	}
+}
+
+static void bcon_init_zero_bio(struct blockconsole *bc)
+{
+	int i;
+
+	memset(page_address(bc->zero_page), 0, PAGE_SIZE);
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bvec->bv_page = bc->zero_page;
+		bvec->bv_len = PAGE_SIZE;
+		bvec->bv_offset = 0;
+	}
+}
+
+static int blockconsole_panic(struct notifier_block *this, unsigned long event,
+		void *ptr)
+{
+	struct blockconsole *bc = container_of(this, struct blockconsole,
+			panic_block);
+	unsigned int n;
+
+	n = SECTOR_SIZE - bcon_console_ofs(bc);
+	if (n != SECTOR_SIZE)
+		bcon_advance_console_bytes(bc, n);
+	bcon_writeback(bc);
+	return NOTIFY_DONE;
+}
+
+static int bcon_create(const char *devname)
+{
+	const fmode_t mode = FMODE_READ | FMODE_WRITE;
+	struct blockconsole *bc;
+	int err;
+
+	bc = kzalloc(sizeof(*bc), GFP_KERNEL);
+	if (!bc)
+		return -ENOMEM;
+	memset(bc->devname, ' ', sizeof(bc->devname));
+	strlcpy(bc->devname, devname, sizeof(bc->devname));
+	spin_lock_init(&bc->end_io_lock);
+	strcpy(bc->console.name, "bcon");
+	bc->console.flags = CON_PRINTBUFFER | CON_ENABLED; /* FIXME: document flags */
+	bc->console.write = bcon_write;
+	bc->bdev = blkdev_get_by_path(devname, mode, NULL);
+#ifndef MODULE
+	if (IS_ERR(bc->bdev)) {
+		dev_t devt = name_to_dev_t(devname);
+		if (devt)
+			bc->bdev = blkdev_get_by_dev(devt, mode, NULL);
+	}
+#endif
+	if (IS_ERR(bc->bdev))
+		goto out;
+	bc->pages = alloc_pages(GFP_KERNEL, 8);
+	if (!bc->pages)
+		goto out;
+	bc->zero_page = alloc_pages(GFP_KERNEL, 0);
+	if (!bc->zero_page)
+		goto out1;
+	bcon_init_bios(bc);
+	bcon_init_zero_bio(bc);
+	setup_timer(&bc->pad_timer, bcon_pad, (unsigned long)bc);
+	bc->max_bytes = bc->bdev->bd_inode->i_size & ~CACHE_MASK;
+	err = bcon_find_end_of_log(bc);
+	if (err)
+		goto out2;
+	kref_init(&bc->kref); /* This reference gets freed on errors */
+	bc->writeback_thread = kthread_run(bcon_writeback, bc, "bcon_%s",
+			devname);
+	if (IS_ERR(bc->writeback_thread))
+		goto out2;
+	INIT_WORK(&bc->unregister_work, bcon_unregister);
+	register_console(&bc->console);
+	bc->panic_block.notifier_call = blockconsole_panic;
+	bc->panic_block.priority = INT_MAX;
+	atomic_notifier_chain_register(&panic_notifier_list, &bc->panic_block);
+	printk(KERN_INFO"blockconsole: now logging to %s at %llx\n", devname,
+			bc->console_bytes >> 20);
+	return 0;
+
+out2:
+	__free_pages(bc->zero_page, 0);
+out1:
+	__free_pages(bc->pages, 8);
+out:
+	kfree(bc);
+	/* Not strictly correct, be the caller doesn't care */
+	return -ENOMEM;
+}
+
+static void bcon_create_fuzzy(const char *name)
+{
+	char *longname;
+	int err;
+
+	err = bcon_create(name);
+	if (err) {
+		longname = kzalloc(strlen(name) + 6, GFP_KERNEL);
+		if (!longname)
+			return;
+		strcpy(longname, "/dev/");
+		strcat(longname, name);
+		bcon_create(longname);
+		kfree(longname);
+	}
+}
+
+static DEFINE_SPINLOCK(device_lock);
+static char scanned_devices[80];
+
+static void bcon_do_add(struct work_struct *work)
+{
+	char local_devices[80], *name, *remainder = local_devices;
+
+	spin_lock(&device_lock);
+	memcpy(local_devices, scanned_devices, sizeof(local_devices));
+	memset(scanned_devices, 0, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+
+	while (remainder && remainder[0]) {
+		name = strsep(&remainder, ",");
+		bcon_create_fuzzy(name);
+	}
+}
+
+DECLARE_WORK(bcon_add_work, bcon_do_add);
+
+void bcon_add(const char *name)
+{
+	/*
+	 * We add each name to a small static buffer and ask for a workqueue
+	 * to go pick it up asap.  Once it is picked up, the buffer is empty
+	 * again, so hopefully it will suffice for all sane users.
+	 */
+	spin_lock(&device_lock);
+	if (scanned_devices[0])
+		strncat(scanned_devices, ",", sizeof(scanned_devices));
+	strncat(scanned_devices, name, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+	schedule_work(&bcon_add_work);
+}
+
+static int isnum(const void *data)
+{
+	unsigned long long num;
+	char *end;
+
+	/* Must be an 8-digit hex number followed by newline */
+	num = simple_strtoull(data, &end, 16);
+	if (end != data + 8)
+		return 0;
+	if (*end != 10)
+		return 0;
+	if (num > 0xffffffffull)
+		return 0;
+	return 1;
+}
+
+int bcon_magic_present(const void *data)
+{
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+
+	if (!memcmp(data, BLOCKCONSOLE_MAGIC_OLD, len))
+		return 10;
+	if (memcmp(data, BLOCKCONSOLE_MAGIC, len))
+		return 0;
+	if (!isnum(data + BCON_UUID_OFS))
+		return 0;
+	if (!isnum(data + BCON_ROUND_OFS))
+		return 0;
+	if (!isnum(data + BCON_TILE_OFS))
+		return 0;
+	return 11;
+}
diff --git a/include/linux/mount.h b/include/linux/mount.h
index d7029f4..6b5fa77 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -74,6 +74,6 @@ extern struct vfsmount *vfs_kern_mount(struct file_system_type *type,
 extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list);
 extern void mark_mounts_for_expiry(struct list_head *mounts);
 
-extern dev_t name_to_dev_t(char *name);
+extern dev_t name_to_dev_t(const char *name);
 
 #endif /* _LINUX_MOUNT_H */
diff --git a/init/do_mounts.c b/init/do_mounts.c
index d3f0aee..a6d9bcb 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -106,7 +106,7 @@ no_match:
  *
  * Returns the matching dev_t on success or 0 on failure.
  */
-static dev_t devt_from_partuuid(char *uuid_str)
+static dev_t devt_from_partuuid(const char *uuid_str)
 {
 	dev_t res = 0;
 	struct device *dev = NULL;
@@ -183,7 +183,7 @@ done:
  *	bangs.
  */
 
-dev_t name_to_dev_t(char *name)
+dev_t name_to_dev_t(const char *name)
 {
 	char s[32];
 	char *p;
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-12 17:46       ` [PATCH] add blockconsole version 1.1 Jörn Engel
@ 2012-07-13 13:03         ` Borislav Petkov
  2012-07-13 16:20           ` Jörn Engel
  2012-07-23 14:33         ` Tvrtko Ursulin
  1 sibling, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2012-07-13 13:03 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Thu, Jul 12, 2012 at 01:46:34PM -0400, Jörn Engel wrote:
> Console driver similar to netconsole, except it writes to a block
> device.  Can be useful in a setup where netconsole, for whatever
> reasons, is impractical.
> 
> Changes since version 1.0:
> - Header format overhaul, addressing several annoyances when actually
>   using blockconsole for production.
> - Steve Hodgson added a panic notifier.
> 
> Signed-off-by: Steve Hodgson <steve@purestorage.com>
> Signed-off-by: Joern Engel <joern@logfs.org>
> ---
>  Documentation/block/blockconsole.txt            |   75 +++
>  Documentation/block/blockconsole/bcon_tail      |   52 ++
>  Documentation/block/blockconsole/mkblockconsole |   24 +
>  block/partitions/Makefile                       |    1 +
>  block/partitions/blockconsole.c                 |   22 +
>  block/partitions/check.c                        |    4 +
>  drivers/block/Kconfig                           |    5 +
>  drivers/block/Makefile                          |    1 +
>  drivers/block/blockconsole.c                    |  606 +++++++++++++++++++++++
>  include/linux/mount.h                           |    2 +-
>  init/do_mounts.c                                |    4 +-
>  11 files changed, 793 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/block/blockconsole.txt
>  create mode 100755 Documentation/block/blockconsole/bcon_tail
>  create mode 100755 Documentation/block/blockconsole/mkblockconsole
>  create mode 100644 block/partitions/blockconsole.c
>  create mode 100644 drivers/block/blockconsole.c
> 
> diff --git a/Documentation/block/blockconsole.txt b/Documentation/block/blockconsole.txt
> new file mode 100644
> index 0000000..a906e61
> --- /dev/null
> +++ b/Documentation/block/blockconsole.txt
> @@ -0,0 +1,75 @@
> +
> +started by Jörn Engel <joern@logfs.org> 2012.03.17
> +
> +Introduction:
> +=============
> +
> +This module logs kernel printk messages to block devices, e.g. usb
> +sticks.  It allows after-the-fact debugging when the main
> +disk/filesystem fails and serial consoles and netconsole are
> +impractical.
> +
> +It can currently only be used built-in.

If so, you need to change the tristate in Kconfig below to bool.

> Blockconsole hooks into the
> +partition scanning code and will bring up configured block devices as
> +soon as possible.  While this doesn't allow capture of early kernel
> +panics, it does capture most of the boot process.
> +
> +Block device configuration:
> +==================================
> +
> +Blockconsole has no configuration parameter.  In order to use a block
> +device for logging, the blockconsole header has to be written to the
> +device in questions.

	     question.

> Logging to partitions is not supported.

That could be useful though. We have a setup here where we create a
partition on the block device and install the OS there for testing
purposes while leaving room on the device after it for other OS installs
and other people to test stuff.

If blockconsole could log to partitions, one could create an additional
small partition exactly for such logs.

I don't know how much work adding logging to partitions is though.

> +
> +The example program mkblockconsole can be used to generate such a
> +header on a device.
> +
> +Header format:
> +==============
> +
> +A legal header looks like this:
> +
> +Linux blockconsole version 1.1
> +818cf322
> +00000000
> +00000000
> +
> +It consists of a newline, the "Linux blockconsole version 1.1" string
> +plus three numbers on seperate lines each.  Numbers are all 32bit,

			separate

> +represented as 8-byte hex strings, with letters in lowercase.  The
> +first number is a uuid for this particular console device.  Just pick
> +a random number when generating the device.  The second number is a
> +wrap counter and unlikely to ever increment.  The third is a tile
> +counter, with a tile being one megabyte in size.
> +
> +Miscellaneous notes:
> +====================
> +
> +Blockconsole will write a new header for every tile or once every
> +megabyte.  The header starts with a newline in order to ensure the
> +"Linux blockconsole...' string always ends up at the beginning of a
> +line if you read the blockconsole in a text editor.
> +
> +The blockconsole header is constructed such that opening the log
> +device in a text editor, ignoring memory constraints due to large
> +devices, should just work and be reasonably non-confusing to readers.
> +However, the example program bcon_tail can be used to copy the last 16
> +tiles of the log device to /var/log/bcon.<uuid>, which should be much
> +easier to handle.
> +
> +The wrap counter is used by blockconsole to determine where to
> +continue logging after a reboot.  New logs will be written to the
> +first tile that wasn't written to by the last instance of
> +blockconsole.  Similarly bcon_tail is doing a binary search to find
> +the end of the log.
> +
> +Writing to the log device is strictly circular.  This should give
> +optimal performance and reliability on cheap devices, like usb sticks.
> +
> +Writing to block devices has to happen in sector granularity, while
> +kernel logging happens in byte granularity.  In order not to lose
> +messages in important cases like kernel crashes, a timer will write
> +out partial sectors if no new messages appear for a while.  The
> +unwritten part of the sector will be filled with spaces and a single
> +newline.  In a quiet system, these empty lines can make up the bulk of
> +the log.
> diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail
> new file mode 100755
> index 0000000..950bfd1
> --- /dev/null
> +++ b/Documentation/block/blockconsole/bcon_tail
> @@ -0,0 +1,52 @@
> +#!/bin/bash
> +
> +TAIL_LEN=16
> +TEMPLATE=/tmp/bcon_template
> +BUF=/tmp/bcon_buf
> +
> +end_of_log() {
> +	DEV=$1
> +	UUID=`head -c40 $DEV|tail -c8`
> +	LOGFILE=/var/log/bcon.$UUID
> +	SECTORS=`hdparm -g $DEV|grep sectors|sed 's/.*sectors = \([0-9]*\).*/\1/'`
> +	#MSIZE=`expr $SECTORS / 2048`
> +	dd if=$DEV iflag=direct bs=512 2>/dev/null|head -c50 > $TEMPLATE
> +	#START, MIDDLE and END are in sectors
> +	START=0
> +	MIDDLE=$SECTORS
> +	END=$SECTORS
> +	while true; do
> +		MIDDLE=`expr \( \( $END + $START \) / 4096 \) \* 2048`
> +		if [ $MIDDLE -eq $START ]; then
> +			break
> +		fi
> +		dd if=$DEV iflag=direct bs=512 count=1 skip=$MIDDLE 2>/dev/null|head -c50 > $BUF
> +		if diff -q $BUF $TEMPLATE > /dev/null; then
> +			START=$MIDDLE
> +		else
> +			END=$MIDDLE
> +		fi
> +	done
> +	#switch to megabytes
> +	END=`expr $END / 2048`
> +	START=`expr $START / 2048`
> +	if [ $START -lt $TAIL_LEN ]; then
> +		START=0
> +	else
> +		START=`expr $START - $TAIL_LEN + 1`
> +	fi
> +	LEN=`expr $END - $START`
> +	dd if=$DEV iflag=direct bs=1M count=$LEN skip=$START >$LOGFILE 2>/dev/null
> +	echo $LOGFILE
> +}
> +
> +# HEADER contains a newline, so the funny quoting is necessary
> +HEADER='
> +Linux blockconsole version 1.1'
> +CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'`
> +
> +for DEV in $CANDIDATES; do
> +	if [ "`head -c32 $DEV`" == "$HEADER" ]; then
> +		end_of_log $DEV
> +	fi
> +done
> diff --git a/Documentation/block/blockconsole/mkblockconsole b/Documentation/block/blockconsole/mkblockconsole
> new file mode 100755
> index 0000000..d9514e7
> --- /dev/null
> +++ b/Documentation/block/blockconsole/mkblockconsole
> @@ -0,0 +1,24 @@
> +#!/bin/sh
> +
> +if [ ! $# -eq 1 ]; then
> +	echo "Usage: mkblockconsole <dev>"
> +	exit 1
> +elif mount|fgrep -q $1; then
> +	echo Device appears to be mounted - aborting
> +	exit 1
> +else
> +	dd if=/dev/zero bs=1M count=1 > $1
> +	# The funky formatting is actually needed!
> +	UUID=`head -c4 /dev/urandom |hexdump -e '/4 "%08x"'`
> +	echo > /tmp/$UUID
> +	echo 'Linux blockconsole version 1.1' >> /tmp/$UUID
> +	echo "$UUID" >> /tmp/$UUID
> +	echo 00000000 >> /tmp/$UUID
> +	echo 00000000 >> /tmp/$UUID
> +	for i in `seq 452`; do echo -n " " >> /tmp/$UUID; done
> +	echo >> /tmp/$UUID
> +	cat /tmp/$UUID > $1
> +	rm /tmp/$UUID
> +	sync
> +	exit 0
> +fi
> diff --git a/block/partitions/Makefile b/block/partitions/Makefile
> index 03af8ea..bf26d4a 100644
> --- a/block/partitions/Makefile
> +++ b/block/partitions/Makefile
> @@ -18,3 +18,4 @@ obj-$(CONFIG_IBM_PARTITION) += ibm.o
>  obj-$(CONFIG_EFI_PARTITION) += efi.o
>  obj-$(CONFIG_KARMA_PARTITION) += karma.o
>  obj-$(CONFIG_SYSV68_PARTITION) += sysv68.o
> +obj-$(CONFIG_BLOCKCONSOLE) += blockconsole.o
> diff --git a/block/partitions/blockconsole.c b/block/partitions/blockconsole.c
> new file mode 100644
> index 0000000..79796a8
> --- /dev/null
> +++ b/block/partitions/blockconsole.c
> @@ -0,0 +1,22 @@
> +#include <linux/blockconsole.h>

This one is kinda missing from the patch:

block/partitions/blockconsole.c:1:32: fatal error: linux/blockconsole.h: No such file or directory
compilation terminated.
make[2]: *** [block/partitions/blockconsole.o] Error 1
make[1]: *** [block/partitions] Error 2
make: *** [block] Error 2
make: *** Waiting for unfinished jobs....

> +
> +#include "check.h"
> +
> +int blockconsole_partition(struct parsed_partitions *state)
> +{
> +	Sector sect;
> +	void *data;
> +	int err = 0;
> +
> +	data = read_part_sector(state, 0, &sect);
> +	if (!data)
> +		return -EIO;
> +	if (!bcon_magic_present(data))
> +		goto out;
> +
> +	bcon_add(state->name);
> +	err = 1;
> +out:
> +	put_dev_sector(sect);
> +	return err;
> +}
> diff --git a/block/partitions/check.c b/block/partitions/check.c
> index bc90867..8de99fa 100644
> --- a/block/partitions/check.c
> +++ b/block/partitions/check.c
> @@ -36,11 +36,15 @@
>  
>  int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/
>  
> +int blockconsole_partition(struct parsed_partitions *state);
>  static int (*check_part[])(struct parsed_partitions *) = {
>  	/*
>  	 * Probe partition formats with tables at disk address 0
>  	 * that also have an ADFS boot block at 0xdc0.
>  	 */
> +#ifdef CONFIG_BLOCKCONSOLE
> +	blockconsole_partition,
> +#endif
>  #ifdef CONFIG_ACORN_PARTITION_ICS
>  	adfspart_check_ICS,
>  #endif
> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index a796407..7ce033d 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -555,4 +555,9 @@ config BLK_DEV_RBD
>  
>  	  If unsure, say N.
>  
> +config BLOCKCONSOLE
> +	tristate "Block device console logging support"
> +	help
> +	  This enables logging to block devices.
> +

This help text should be expanded to be more verbose, maybe add
reference to the documentation files.

>  endif # BLK_DEV
> diff --git a/drivers/block/Makefile b/drivers/block/Makefile

...

So I'll gladly give it a run once you have a patch that builds :-)

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-13 13:03         ` Borislav Petkov
@ 2012-07-13 16:20           ` Jörn Engel
  2012-07-13 21:14             ` Borislav Petkov
  2012-07-16 12:46             ` Borislav Petkov
  0 siblings, 2 replies; 27+ messages in thread
From: Jörn Engel @ 2012-07-13 16:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Fri, 13 July 2012 15:03:38 +0200, Borislav Petkov wrote:
> On Thu, Jul 12, 2012 at 01:46:34PM -0400, Jörn Engel wrote:
> > +
> > +It can currently only be used built-in.
> 
> If so, you need to change the tristate in Kconfig below to bool.

Fair point, fixed.

> > Logging to partitions is not supported.
> 
> That could be useful though. We have a setup here where we create a
> partition on the block device and install the OS there for testing
> purposes while leaving room on the device after it for other OS installs
> and other people to test stuff.
> 
> If blockconsole could log to partitions, one could create an additional
> small partition exactly for such logs.
> 
> I don't know how much work adding logging to partitions is though.

The actual logging shouldn't care one bit.  But abusing the
partitioning code to detect a blockconsole device would no longer
work, so some alternative for that is needed.

What I like about abusing the partitioning code is that blockconsole
just works, without any command line parameters or other setup, either
on boot or by pluggin in a new device.  And because our particular use
case is a dedicated usb stick, we don't mind the drawbacks much.

So maybe the best option would be a module_param_call() parser,
allowing either command line options or some userspace helper to do
the detection.

> > +#include <linux/blockconsole.h>
> 
> This one is kinda missing from the patch:

Doh!

> > +config BLOCKCONSOLE
> > +	tristate "Block device console logging support"
> > +	help
> > +	  This enables logging to block devices.
> > +
> 
> This help text should be expanded to be more verbose, maybe add
> reference to the documentation files.

Added.

> So I'll gladly give it a run once you have a patch that builds :-)

Thanks!  The patch below should do that - provided my brain slightly
less broken than it must have been yesterday.

Jörn

--
The grand essentials of happiness are: something to do, something to
love, and something to hope for.
-- Allan K. Chalmers


Console driver similar to netconsole, except it writes to a block
device.  Can be useful in a setup where netconsole, for whatever
reasons, is impractical.

Changes since version 1.0:
- Header format overhaul, addressing several annoyances when actually
  using blockconsole for production.
- Steve Hodgson added a panic notifier.

Signed-off-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Joern Engel <joern@logfs.org>
---
 Documentation/block/blockconsole.txt            |   75 +++
 Documentation/block/blockconsole/bcon_tail      |   52 ++
 Documentation/block/blockconsole/mkblockconsole |   24 +
 block/partitions/Makefile                       |    1 +
 block/partitions/blockconsole.c                 |   22 +
 block/partitions/check.c                        |    4 +
 drivers/block/Kconfig                           |    6 +
 drivers/block/Makefile                          |    1 +
 drivers/block/blockconsole.c                    |  606 +++++++++++++++++++++++
 include/linux/blockconsole.h                    |    7 +
 include/linux/mount.h                           |    2 +-
 init/do_mounts.c                                |    4 +-
 12 files changed, 801 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/block/blockconsole.txt
 create mode 100755 Documentation/block/blockconsole/bcon_tail
 create mode 100755 Documentation/block/blockconsole/mkblockconsole
 create mode 100644 block/partitions/blockconsole.c
 create mode 100644 drivers/block/blockconsole.c
 create mode 100644 include/linux/blockconsole.h

diff --git a/Documentation/block/blockconsole.txt b/Documentation/block/blockconsole.txt
new file mode 100644
index 0000000..09de185
--- /dev/null
+++ b/Documentation/block/blockconsole.txt
@@ -0,0 +1,75 @@
+
+started by Jörn Engel <joern@logfs.org> 2012.03.17
+
+Introduction:
+=============
+
+This module logs kernel printk messages to block devices, e.g. usb
+sticks.  It allows after-the-fact debugging when the main
+disk/filesystem fails and serial consoles and netconsole are
+impractical.
+
+It can currently only be used built-in.  Blockconsole hooks into the
+partition scanning code and will bring up configured block devices as
+soon as possible.  While this doesn't allow capture of early kernel
+panics, it does capture most of the boot process.
+
+Block device configuration:
+==================================
+
+Blockconsole has no configuration parameter.  In order to use a block
+device for logging, the blockconsole header has to be written to the
+device in question.  Logging to partitions is not supported.
+
+The example program mkblockconsole can be used to generate such a
+header on a device.
+
+Header format:
+==============
+
+A legal header looks like this:
+
+Linux blockconsole version 1.1
+818cf322
+00000000
+00000000
+
+It consists of a newline, the "Linux blockconsole version 1.1" string
+plus three numbers on separate lines each.  Numbers are all 32bit,
+represented as 8-byte hex strings, with letters in lowercase.  The
+first number is a uuid for this particular console device.  Just pick
+a random number when generating the device.  The second number is a
+wrap counter and unlikely to ever increment.  The third is a tile
+counter, with a tile being one megabyte in size.
+
+Miscellaneous notes:
+====================
+
+Blockconsole will write a new header for every tile or once every
+megabyte.  The header starts with a newline in order to ensure the
+"Linux blockconsole...' string always ends up at the beginning of a
+line if you read the blockconsole in a text editor.
+
+The blockconsole header is constructed such that opening the log
+device in a text editor, ignoring memory constraints due to large
+devices, should just work and be reasonably non-confusing to readers.
+However, the example program bcon_tail can be used to copy the last 16
+tiles of the log device to /var/log/bcon.<uuid>, which should be much
+easier to handle.
+
+The wrap counter is used by blockconsole to determine where to
+continue logging after a reboot.  New logs will be written to the
+first tile that wasn't written to by the last instance of
+blockconsole.  Similarly bcon_tail is doing a binary search to find
+the end of the log.
+
+Writing to the log device is strictly circular.  This should give
+optimal performance and reliability on cheap devices, like usb sticks.
+
+Writing to block devices has to happen in sector granularity, while
+kernel logging happens in byte granularity.  In order not to lose
+messages in important cases like kernel crashes, a timer will write
+out partial sectors if no new messages appear for a while.  The
+unwritten part of the sector will be filled with spaces and a single
+newline.  In a quiet system, these empty lines can make up the bulk of
+the log.
diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail
new file mode 100755
index 0000000..950bfd1
--- /dev/null
+++ b/Documentation/block/blockconsole/bcon_tail
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+TAIL_LEN=16
+TEMPLATE=/tmp/bcon_template
+BUF=/tmp/bcon_buf
+
+end_of_log() {
+	DEV=$1
+	UUID=`head -c40 $DEV|tail -c8`
+	LOGFILE=/var/log/bcon.$UUID
+	SECTORS=`hdparm -g $DEV|grep sectors|sed 's/.*sectors = \([0-9]*\).*/\1/'`
+	#MSIZE=`expr $SECTORS / 2048`
+	dd if=$DEV iflag=direct bs=512 2>/dev/null|head -c50 > $TEMPLATE
+	#START, MIDDLE and END are in sectors
+	START=0
+	MIDDLE=$SECTORS
+	END=$SECTORS
+	while true; do
+		MIDDLE=`expr \( \( $END + $START \) / 4096 \) \* 2048`
+		if [ $MIDDLE -eq $START ]; then
+			break
+		fi
+		dd if=$DEV iflag=direct bs=512 count=1 skip=$MIDDLE 2>/dev/null|head -c50 > $BUF
+		if diff -q $BUF $TEMPLATE > /dev/null; then
+			START=$MIDDLE
+		else
+			END=$MIDDLE
+		fi
+	done
+	#switch to megabytes
+	END=`expr $END / 2048`
+	START=`expr $START / 2048`
+	if [ $START -lt $TAIL_LEN ]; then
+		START=0
+	else
+		START=`expr $START - $TAIL_LEN + 1`
+	fi
+	LEN=`expr $END - $START`
+	dd if=$DEV iflag=direct bs=1M count=$LEN skip=$START >$LOGFILE 2>/dev/null
+	echo $LOGFILE
+}
+
+# HEADER contains a newline, so the funny quoting is necessary
+HEADER='
+Linux blockconsole version 1.1'
+CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'`
+
+for DEV in $CANDIDATES; do
+	if [ "`head -c32 $DEV`" == "$HEADER" ]; then
+		end_of_log $DEV
+	fi
+done
diff --git a/Documentation/block/blockconsole/mkblockconsole b/Documentation/block/blockconsole/mkblockconsole
new file mode 100755
index 0000000..d9514e7
--- /dev/null
+++ b/Documentation/block/blockconsole/mkblockconsole
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+if [ ! $# -eq 1 ]; then
+	echo "Usage: mkblockconsole <dev>"
+	exit 1
+elif mount|fgrep -q $1; then
+	echo Device appears to be mounted - aborting
+	exit 1
+else
+	dd if=/dev/zero bs=1M count=1 > $1
+	# The funky formatting is actually needed!
+	UUID=`head -c4 /dev/urandom |hexdump -e '/4 "%08x"'`
+	echo > /tmp/$UUID
+	echo 'Linux blockconsole version 1.1' >> /tmp/$UUID
+	echo "$UUID" >> /tmp/$UUID
+	echo 00000000 >> /tmp/$UUID
+	echo 00000000 >> /tmp/$UUID
+	for i in `seq 452`; do echo -n " " >> /tmp/$UUID; done
+	echo >> /tmp/$UUID
+	cat /tmp/$UUID > $1
+	rm /tmp/$UUID
+	sync
+	exit 0
+fi
diff --git a/block/partitions/Makefile b/block/partitions/Makefile
index 03af8ea..bf26d4a 100644
--- a/block/partitions/Makefile
+++ b/block/partitions/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_IBM_PARTITION) += ibm.o
 obj-$(CONFIG_EFI_PARTITION) += efi.o
 obj-$(CONFIG_KARMA_PARTITION) += karma.o
 obj-$(CONFIG_SYSV68_PARTITION) += sysv68.o
+obj-$(CONFIG_BLOCKCONSOLE) += blockconsole.o
diff --git a/block/partitions/blockconsole.c b/block/partitions/blockconsole.c
new file mode 100644
index 0000000..79796a8
--- /dev/null
+++ b/block/partitions/blockconsole.c
@@ -0,0 +1,22 @@
+#include <linux/blockconsole.h>
+
+#include "check.h"
+
+int blockconsole_partition(struct parsed_partitions *state)
+{
+	Sector sect;
+	void *data;
+	int err = 0;
+
+	data = read_part_sector(state, 0, &sect);
+	if (!data)
+		return -EIO;
+	if (!bcon_magic_present(data))
+		goto out;
+
+	bcon_add(state->name);
+	err = 1;
+out:
+	put_dev_sector(sect);
+	return err;
+}
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..8de99fa 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -36,11 +36,15 @@
 
 int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/
 
+int blockconsole_partition(struct parsed_partitions *state);
 static int (*check_part[])(struct parsed_partitions *) = {
 	/*
 	 * Probe partition formats with tables at disk address 0
 	 * that also have an ADFS boot block at 0xdc0.
 	 */
+#ifdef CONFIG_BLOCKCONSOLE
+	blockconsole_partition,
+#endif
 #ifdef CONFIG_ACORN_PARTITION_ICS
 	adfspart_check_ICS,
 #endif
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a796407..637c952 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -555,4 +555,10 @@ config BLK_DEV_RBD
 
 	  If unsure, say N.
 
+config BLOCKCONSOLE
+	bool "Block device console logging support"
+	help
+	  This enables logging to block devices.
+	  See <file:Documentation/block/blockconsole.txt> for details.
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 5b79505..1eb7f902 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
 obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
 obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
+obj-$(CONFIG_BLOCKCONSOLE)	+= blockconsole.o
 
 swim_mod-y	:= swim.o swim_asm.o
diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
new file mode 100644
index 0000000..d13203f
--- /dev/null
+++ b/drivers/block/blockconsole.c
@@ -0,0 +1,606 @@
+#include <linux/bio.h>
+#include <linux/blockconsole.h>
+#include <linux/console.h>
+#include <linux/fs.h>
+#include <linux/kthread.h>
+#include <linux/mm.h>
+#include <linux/mount.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
+#define BLOCKCONSOLE_MAGIC_OLD	"\nLinux blockconsole version 1.0\n"
+#define BLOCKCONSOLE_MAGIC	"\nLinux blockconsole version 1.1\n"
+#define BCON_UUID_OFS		(32)
+#define BCON_ROUND_OFS		(41)
+#define BCON_TILE_OFS		(50)
+#define BCON_HEADERSIZE		(50)
+#define BCON_LONG_HEADERSIZE	(59) /* with tile index */
+
+#define PAGE_COUNT		(256)
+#define SECTOR_COUNT		(PAGE_COUNT * (PAGE_SIZE >> 9))
+#define CACHE_PAGE_MASK		(PAGE_COUNT - 1)
+#define CACHE_SECTOR_MASK	(SECTOR_COUNT - 1)
+#define CACHE_SIZE		(PAGE_COUNT << PAGE_SHIFT)
+#define CACHE_MASK		(CACHE_SIZE - 1)
+#define SECTOR_SHIFT		(9)
+#define SECTOR_SIZE		(1 << SECTOR_SHIFT)
+#define SECTOR_MASK		(~(SECTOR_SIZE-1))
+#define PG_SECTOR_MASK		((PAGE_SIZE >> 9) - 1)
+
+struct bcon_bio {
+	struct bio bio;
+	struct bio_vec bvec;
+	void *sector;
+	int in_flight;
+};
+
+struct blockconsole {
+	char devname[32];
+	struct spinlock end_io_lock;
+	struct timer_list pad_timer;
+	int error_count;
+	struct kref kref;
+	u64 console_bytes;
+	u64 write_bytes;
+	u64 max_bytes;
+	u32 round;
+	u32 uuid;
+	struct bcon_bio bio_array[SECTOR_COUNT];
+	struct page *pages;
+	struct bcon_bio zero_bios[PAGE_COUNT];
+	struct page *zero_page;
+	struct block_device *bdev;
+	struct console console;
+	struct work_struct unregister_work;
+	struct task_struct *writeback_thread;
+	struct notifier_block panic_block;
+};
+
+static void bcon_get(struct blockconsole *bc)
+{
+	kref_get(&bc->kref);
+}
+
+static void bcon_release(struct kref *kref)
+{
+	struct blockconsole *bc = container_of(kref, struct blockconsole, kref);
+
+	__free_pages(bc->zero_page, 0);
+	__free_pages(bc->pages, 8);
+	invalidate_mapping_pages(bc->bdev->bd_inode->i_mapping, 0, -1);
+	blkdev_put(bc->bdev, FMODE_READ|FMODE_WRITE);
+	kfree(bc);
+}
+
+static void bcon_put(struct blockconsole *bc)
+{
+	kref_put(&bc->kref, bcon_release);
+}
+
+static int __bcon_console_ofs(u64 console_bytes)
+{
+	return console_bytes & ~SECTOR_MASK;
+}
+
+static int bcon_console_ofs(struct blockconsole *bc)
+{
+	return __bcon_console_ofs(bc->console_bytes);
+}
+
+static int __bcon_console_sector(u64 console_bytes)
+{
+	return (console_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static int bcon_console_sector(struct blockconsole *bc)
+{
+	return __bcon_console_sector(bc->console_bytes);
+}
+
+static int bcon_write_sector(struct blockconsole *bc)
+{
+	return (bc->write_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static void clear_sector(void *sector)
+{
+	memset(sector, ' ', 511);
+	memset(sector + 511, 10, 1);
+}
+
+static void bcon_init_first_page(struct blockconsole *bc)
+{
+	char *buf = page_address(bc->pages);
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+	u32 tile = bc->console_bytes >> 20; /* We overflow after 4TB - fine */
+
+	clear_sector(buf);
+	memcpy(buf, BLOCKCONSOLE_MAGIC, len);
+	sprintf(buf + BCON_UUID_OFS, "%08x", bc->uuid);
+	sprintf(buf + BCON_ROUND_OFS, "%08x", bc->round);
+	sprintf(buf + BCON_TILE_OFS, "%08x", tile);
+	/* replace NUL with newline */
+	buf[BCON_UUID_OFS + 8] = 10;
+	buf[BCON_ROUND_OFS + 8] = 10;
+	buf[BCON_TILE_OFS + 8] = 10;
+}
+
+static void bcon_advance_console_bytes(struct blockconsole *bc, int bytes)
+{
+	u64 old, new;
+
+	do {
+		old = bc->console_bytes;
+		new = old + bytes;
+		if (new >= bc->max_bytes)
+			new = 0;
+		if ((new & CACHE_MASK) == 0) {
+			bcon_init_first_page(bc);
+			new += BCON_LONG_HEADERSIZE;
+		}
+	} while (cmpxchg64(&bc->console_bytes, old, new) != old);
+}
+
+static void request_complete(struct bio *bio, int err)
+{
+	complete((struct completion *)bio->bi_private);
+}
+
+static int sync_read(struct blockconsole *bc, u64 ofs)
+{
+	struct bio bio;
+	struct bio_vec bio_vec;
+	struct completion complete;
+
+	bio_init(&bio);
+	bio.bi_io_vec = &bio_vec;
+	bio_vec.bv_page = bc->pages;
+	bio_vec.bv_len = SECTOR_SIZE;
+	bio_vec.bv_offset = 0;
+	bio.bi_vcnt = 1;
+	bio.bi_idx = 0;
+	bio.bi_size = SECTOR_SIZE;
+	bio.bi_bdev = bc->bdev;
+	bio.bi_sector = ofs >> SECTOR_SHIFT;
+	init_completion(&complete);
+	bio.bi_private = &complete;
+	bio.bi_end_io = request_complete;
+
+	submit_bio(READ, &bio);
+	wait_for_completion(&complete);
+	return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO;
+}
+
+static void bcon_erase_segment(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio *bio = &bcon_bio->bio;
+
+		/*
+		 * If the last erase hasn't finished yet, just skip it.  The log will
+		 * look messy, but that's all.
+		 */
+		rmb();
+		if (bcon_bio->in_flight)
+			continue;
+		bio_init(bio);
+		bio->bi_io_vec = &bcon_bio->bvec;
+		bio->bi_vcnt = 1;
+		bio->bi_size = PAGE_SIZE;
+		bio->bi_bdev = bc->bdev;
+		bio->bi_private = bc;
+		bio->bi_idx = 0;
+		bio->bi_sector = (bc->write_bytes + i * PAGE_SIZE) >> 9;
+		bcon_bio->in_flight = 1;
+		wmb();
+		/* We want the erase to go to the device first somehow */
+		submit_bio(WRITE | REQ_SOFTBARRIER, bio);
+	}
+}
+
+static void bcon_advance_write_bytes(struct blockconsole *bc, int bytes)
+{
+	bc->write_bytes += bytes;
+	if (bc->write_bytes >= bc->max_bytes) {
+		bc->write_bytes = 0;
+		bcon_init_first_page(bc);
+		bc->round++;
+	}
+}
+
+static int bcon_convert_old_format(struct blockconsole *bc)
+{
+	bc->uuid = get_random_int();
+	bc->round = 0;
+	bc->console_bytes = bc->write_bytes = 0;
+	bcon_advance_console_bytes(bc, 0); /* To skip the header */
+	bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */
+	bcon_erase_segment(bc);
+	printk(KERN_INFO"blockconsole: converted %s from old format\n",
+			bc->devname);
+	return 0;
+}
+
+static int bcon_find_end_of_log(struct blockconsole *bc)
+{
+	u64 start = 0, end = bc->max_bytes, middle;
+	void *sec0 = bc->bio_array[0].sector;
+	void *sec1 = bc->bio_array[1].sector;
+	int err, version;
+
+	err = sync_read(bc, 0);
+	if (err)
+		return err;
+	/* Second sanity check, out of sheer paranoia */
+	version = bcon_magic_present(sec0);
+	if (version == 10)
+		return bcon_convert_old_format(bc);
+	bc->uuid = simple_strtoull(sec0 + BCON_UUID_OFS, NULL, 16);
+	bc->round = simple_strtoull(sec0 + BCON_ROUND_OFS, NULL, 16);
+
+	memcpy(sec1, sec0, BCON_HEADERSIZE);
+	for (;;) {
+		middle = (start + end) / 2;
+		middle &= ~CACHE_MASK;
+		if (middle == start)
+			break;
+		err = sync_read(bc, middle);
+		if (err)
+			return err;
+		if (memcmp(sec1, sec0, BCON_HEADERSIZE)) {
+			/* If the two differ, we haven't written that far yet */
+			end = middle;
+		} else {
+			start = middle;
+		}
+	}
+	bc->console_bytes = bc->write_bytes = end;
+	bcon_advance_console_bytes(bc, 0); /* To skip the header */
+	bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */
+	bcon_erase_segment(bc);
+	return 0;
+}
+
+static void bcon_unregister(struct work_struct *work)
+{
+	struct blockconsole *bc = container_of(work, struct blockconsole,
+			unregister_work);
+
+	atomic_notifier_chain_unregister(&panic_notifier_list, &bc->panic_block);
+	unregister_console(&bc->console);
+	del_timer_sync(&bc->pad_timer);
+	kthread_stop(bc->writeback_thread);
+	/* No new io will be scheduled anymore now */
+	bcon_put(bc);
+}
+
+#define BCON_MAX_ERRORS	10
+static void bcon_end_io(struct bio *bio, int err)
+{
+	struct bcon_bio *bcon_bio = container_of(bio, struct bcon_bio, bio);
+	struct blockconsole *bc = bio->bi_private;
+	unsigned long flags;
+
+	/*
+	 * We want to assume the device broken and free this console if
+	 * we accumulate too many errors.  But if errors are transient,
+	 * we also want to forget about them once writes succeed again.
+	 * Oh, and we only want to reset the counter if it hasn't reached
+	 * the limit yet, so we don't bcon_put() twice from here.
+	 */
+	spin_lock_irqsave(&bc->end_io_lock, flags);
+	if (err) {
+		if (bc->error_count++ == BCON_MAX_ERRORS) {
+			printk(KERN_INFO"blockconsole: no longer logging to %s\n", bc->devname);
+			schedule_work(&bc->unregister_work);
+		}
+	} else {
+		if (bc->error_count && bc->error_count < BCON_MAX_ERRORS)
+			bc->error_count = 0;
+	}
+	/*
+	 * Add padding (a bunch of spaces and a newline) early so bcon_pad
+	 * only has to advance a pointer.
+	 */
+	clear_sector(bcon_bio->sector);
+	bcon_bio->in_flight = 0;
+	spin_unlock_irqrestore(&bc->end_io_lock, flags);
+	bcon_put(bc);
+}
+
+static void bcon_writesector(struct blockconsole *bc, int index)
+{
+	struct bcon_bio *bcon_bio = bc->bio_array + index;
+	struct bio *bio = &bcon_bio->bio;
+
+	rmb();
+	if (bcon_bio->in_flight)
+		return;
+	bcon_get(bc);
+
+	bio_init(bio);
+	bio->bi_io_vec = &bcon_bio->bvec;
+	bio->bi_vcnt = 1;
+	bio->bi_size = SECTOR_SIZE;
+	bio->bi_bdev = bc->bdev;
+	bio->bi_private = bc;
+	bio->bi_end_io = bcon_end_io;
+
+	bio->bi_idx = 0;
+	bio->bi_sector = bc->write_bytes >> 9;
+	bcon_bio->in_flight = 1;
+	wmb();
+	submit_bio(WRITE, bio);
+}
+
+static int bcon_writeback(void *_bc)
+{
+	struct blockconsole *bc = _bc;
+	struct sched_param(sp);
+
+	sp.sched_priority = MAX_RT_PRIO - 1; /* Highest realtime prio */
+	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+		if (kthread_should_stop())
+			break;
+		while (bcon_write_sector(bc) != bcon_console_sector(bc)) {
+			bcon_writesector(bc, bcon_write_sector(bc));
+			bcon_advance_write_bytes(bc, SECTOR_SIZE);
+			if (bcon_write_sector(bc) == 0) {
+				bcon_erase_segment(bc);
+			}
+		}
+	}
+	return 0;
+}
+
+static void bcon_pad(unsigned long data)
+{
+	struct blockconsole *bc = (void *)data;
+	unsigned int n;
+
+	/*
+	 * We deliberately race against bcon_write here.  If we lose the race,
+	 * our padding is no longer where we expected it to be, i.e. it is
+	 * no longer a bunch of spaces with a newline at the end.  There could
+	 * not be a newline at all or it could be somewhere in the middle.
+	 * Either way, the log corruption is fairly obvious to spot and ignore
+	 * for human readers.
+	 */
+	n = SECTOR_SIZE - bcon_console_ofs(bc);
+	if (n != SECTOR_SIZE) {
+		bcon_advance_console_bytes(bc, n);
+		wake_up_process(bc->writeback_thread);
+	}
+}
+
+static void bcon_write(struct console *console, const char *msg,
+		unsigned int len)
+{
+	struct blockconsole *bc = container_of(console, struct blockconsole,
+			console);
+	unsigned int n;
+	u64 console_bytes;
+	int i;
+
+	while (len) {
+		console_bytes = bc->console_bytes;
+		i = __bcon_console_sector(console_bytes);
+		rmb();
+		if (bc->bio_array[i].in_flight)
+			break;
+		n = min_t(int, len, SECTOR_SIZE -
+				__bcon_console_ofs(console_bytes));
+		memcpy(bc->bio_array[i].sector +
+				__bcon_console_ofs(console_bytes), msg, n);
+		len -= n;
+		msg += n;
+		bcon_advance_console_bytes(bc, n);
+	}
+	wake_up_process(bc->writeback_thread);
+	mod_timer(&bc->pad_timer, jiffies + HZ);
+}
+
+static void bcon_init_bios(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < SECTOR_COUNT; i++) {
+		int page_index = i >> (PAGE_SHIFT - SECTOR_SHIFT);
+		struct page *page = bc->pages + page_index;
+		struct bcon_bio *bcon_bio = bc->bio_array + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bcon_bio->sector = page_address(bc->pages + page_index)
+			+ SECTOR_SIZE * (i & PG_SECTOR_MASK);
+		clear_sector(bcon_bio->sector);
+		bvec->bv_page = page;
+		bvec->bv_len = SECTOR_SIZE;
+		bvec->bv_offset = SECTOR_SIZE * (i & PG_SECTOR_MASK);
+	}
+}
+
+static void bcon_init_zero_bio(struct blockconsole *bc)
+{
+	int i;
+
+	memset(page_address(bc->zero_page), 0, PAGE_SIZE);
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bvec->bv_page = bc->zero_page;
+		bvec->bv_len = PAGE_SIZE;
+		bvec->bv_offset = 0;
+	}
+}
+
+static int blockconsole_panic(struct notifier_block *this, unsigned long event,
+		void *ptr)
+{
+	struct blockconsole *bc = container_of(this, struct blockconsole,
+			panic_block);
+	unsigned int n;
+
+	n = SECTOR_SIZE - bcon_console_ofs(bc);
+	if (n != SECTOR_SIZE)
+		bcon_advance_console_bytes(bc, n);
+	bcon_writeback(bc);
+	return NOTIFY_DONE;
+}
+
+static int bcon_create(const char *devname)
+{
+	const fmode_t mode = FMODE_READ | FMODE_WRITE;
+	struct blockconsole *bc;
+	int err;
+
+	bc = kzalloc(sizeof(*bc), GFP_KERNEL);
+	if (!bc)
+		return -ENOMEM;
+	memset(bc->devname, ' ', sizeof(bc->devname));
+	strlcpy(bc->devname, devname, sizeof(bc->devname));
+	spin_lock_init(&bc->end_io_lock);
+	strcpy(bc->console.name, "bcon");
+	bc->console.flags = CON_PRINTBUFFER | CON_ENABLED; /* FIXME: document flags */
+	bc->console.write = bcon_write;
+	bc->bdev = blkdev_get_by_path(devname, mode, NULL);
+#ifndef MODULE
+	if (IS_ERR(bc->bdev)) {
+		dev_t devt = name_to_dev_t(devname);
+		if (devt)
+			bc->bdev = blkdev_get_by_dev(devt, mode, NULL);
+	}
+#endif
+	if (IS_ERR(bc->bdev))
+		goto out;
+	bc->pages = alloc_pages(GFP_KERNEL, 8);
+	if (!bc->pages)
+		goto out;
+	bc->zero_page = alloc_pages(GFP_KERNEL, 0);
+	if (!bc->zero_page)
+		goto out1;
+	bcon_init_bios(bc);
+	bcon_init_zero_bio(bc);
+	setup_timer(&bc->pad_timer, bcon_pad, (unsigned long)bc);
+	bc->max_bytes = bc->bdev->bd_inode->i_size & ~CACHE_MASK;
+	err = bcon_find_end_of_log(bc);
+	if (err)
+		goto out2;
+	kref_init(&bc->kref); /* This reference gets freed on errors */
+	bc->writeback_thread = kthread_run(bcon_writeback, bc, "bcon_%s",
+			devname);
+	if (IS_ERR(bc->writeback_thread))
+		goto out2;
+	INIT_WORK(&bc->unregister_work, bcon_unregister);
+	register_console(&bc->console);
+	bc->panic_block.notifier_call = blockconsole_panic;
+	bc->panic_block.priority = INT_MAX;
+	atomic_notifier_chain_register(&panic_notifier_list, &bc->panic_block);
+	printk(KERN_INFO"blockconsole: now logging to %s at %llx\n", devname,
+			bc->console_bytes >> 20);
+	return 0;
+
+out2:
+	__free_pages(bc->zero_page, 0);
+out1:
+	__free_pages(bc->pages, 8);
+out:
+	kfree(bc);
+	/* Not strictly correct, be the caller doesn't care */
+	return -ENOMEM;
+}
+
+static void bcon_create_fuzzy(const char *name)
+{
+	char *longname;
+	int err;
+
+	err = bcon_create(name);
+	if (err) {
+		longname = kzalloc(strlen(name) + 6, GFP_KERNEL);
+		if (!longname)
+			return;
+		strcpy(longname, "/dev/");
+		strcat(longname, name);
+		bcon_create(longname);
+		kfree(longname);
+	}
+}
+
+static DEFINE_SPINLOCK(device_lock);
+static char scanned_devices[80];
+
+static void bcon_do_add(struct work_struct *work)
+{
+	char local_devices[80], *name, *remainder = local_devices;
+
+	spin_lock(&device_lock);
+	memcpy(local_devices, scanned_devices, sizeof(local_devices));
+	memset(scanned_devices, 0, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+
+	while (remainder && remainder[0]) {
+		name = strsep(&remainder, ",");
+		bcon_create_fuzzy(name);
+	}
+}
+
+DECLARE_WORK(bcon_add_work, bcon_do_add);
+
+void bcon_add(const char *name)
+{
+	/*
+	 * We add each name to a small static buffer and ask for a workqueue
+	 * to go pick it up asap.  Once it is picked up, the buffer is empty
+	 * again, so hopefully it will suffice for all sane users.
+	 */
+	spin_lock(&device_lock);
+	if (scanned_devices[0])
+		strncat(scanned_devices, ",", sizeof(scanned_devices));
+	strncat(scanned_devices, name, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+	schedule_work(&bcon_add_work);
+}
+
+static int isnum(const void *data)
+{
+	unsigned long long num;
+	char *end;
+
+	/* Must be an 8-digit hex number followed by newline */
+	num = simple_strtoull(data, &end, 16);
+	if (end != data + 8)
+		return 0;
+	if (*end != 10)
+		return 0;
+	if (num > 0xffffffffull)
+		return 0;
+	return 1;
+}
+
+int bcon_magic_present(const void *data)
+{
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+
+	if (!memcmp(data, BLOCKCONSOLE_MAGIC_OLD, len))
+		return 10;
+	if (memcmp(data, BLOCKCONSOLE_MAGIC, len))
+		return 0;
+	if (!isnum(data + BCON_UUID_OFS))
+		return 0;
+	if (!isnum(data + BCON_ROUND_OFS))
+		return 0;
+	if (!isnum(data + BCON_TILE_OFS))
+		return 0;
+	return 11;
+}
diff --git a/include/linux/blockconsole.h b/include/linux/blockconsole.h
new file mode 100644
index 0000000..114f7c5
--- /dev/null
+++ b/include/linux/blockconsole.h
@@ -0,0 +1,7 @@
+#ifndef LINUX_BLOCKCONSOLE_H
+#define LINUX_BLOCKCONSOLE_H
+
+int bcon_magic_present(const void *data);
+void bcon_add(const char *name);
+
+#endif
diff --git a/include/linux/mount.h b/include/linux/mount.h
index d7029f4..6b5fa77 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -74,6 +74,6 @@ extern struct vfsmount *vfs_kern_mount(struct file_system_type *type,
 extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list);
 extern void mark_mounts_for_expiry(struct list_head *mounts);
 
-extern dev_t name_to_dev_t(char *name);
+extern dev_t name_to_dev_t(const char *name);
 
 #endif /* _LINUX_MOUNT_H */
diff --git a/init/do_mounts.c b/init/do_mounts.c
index d3f0aee..a6d9bcb 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -106,7 +106,7 @@ no_match:
  *
  * Returns the matching dev_t on success or 0 on failure.
  */
-static dev_t devt_from_partuuid(char *uuid_str)
+static dev_t devt_from_partuuid(const char *uuid_str)
 {
 	dev_t res = 0;
 	struct device *dev = NULL;
@@ -183,7 +183,7 @@ done:
  *	bangs.
  */
 
-dev_t name_to_dev_t(char *name)
+dev_t name_to_dev_t(const char *name)
 {
 	char s[32];
 	char *p;
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-13 16:20           ` Jörn Engel
@ 2012-07-13 21:14             ` Borislav Petkov
  2012-07-16 12:46             ` Borislav Petkov
  1 sibling, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2012-07-13 21:14 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Fri, Jul 13, 2012 at 12:20:09PM -0400, Jörn Engel wrote:
> > > Logging to partitions is not supported.
> > 
> > That could be useful though. We have a setup here where we create a
> > partition on the block device and install the OS there for testing
> > purposes while leaving room on the device after it for other OS installs
> > and other people to test stuff.
> > 
> > If blockconsole could log to partitions, one could create an additional
> > small partition exactly for such logs.
> > 
> > I don't know how much work adding logging to partitions is though.
> 
> The actual logging shouldn't care one bit.  But abusing the
> partitioning code to detect a blockconsole device would no longer
> work, so some alternative for that is needed.
> 
> What I like about abusing the partitioning code is that blockconsole
> just works, without any command line parameters or other setup, either
> on boot or by pluggin in a new device.  And because our particular use
> case is a dedicated usb stick, we don't mind the drawbacks much.

Ok, actually using a dedicated usb stick obviates the need to log to
partitions - el cheapo usb sticks are ubiquitous. And I didn't realize
the usb stick use case when talking about the partitions example above
so forget what I said, logging to a dedicated usb stick is the easiest.

You probably could mention this in the docs as the most natural use case
for blockconsole if you haven't done so.

[ … ]

> Thanks!  The patch below should do that - provided my brain slightly
> less broken than it must have been yesterday.

Thanks, will run it next week and let you know.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-13 16:20           ` Jörn Engel
  2012-07-13 21:14             ` Borislav Petkov
@ 2012-07-16 12:46             ` Borislav Petkov
  2012-07-18 18:53               ` Jörn Engel
  1 sibling, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2012-07-16 12:46 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Fri, Jul 13, 2012 at 12:20:09PM -0400, Jörn Engel wrote:

[ … ]

> diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail
> new file mode 100755
> index 0000000..950bfd1
> --- /dev/null
> +++ b/Documentation/block/blockconsole/bcon_tail
> @@ -0,0 +1,52 @@
> +#!/bin/bash
> +
> +TAIL_LEN=16
> +TEMPLATE=/tmp/bcon_template
> +BUF=/tmp/bcon_buf
> +
> +end_of_log() {
> +	DEV=$1
> +	UUID=`head -c40 $DEV|tail -c8`
> +	LOGFILE=/var/log/bcon.$UUID
> +	SECTORS=`hdparm -g $DEV|grep sectors|sed 's/.*sectors = \([0-9]*\).*/\1/'`
> +	#MSIZE=`expr $SECTORS / 2048`
> +	dd if=$DEV iflag=direct bs=512 2>/dev/null|head -c50 > $TEMPLATE
> +	#START, MIDDLE and END are in sectors
> +	START=0
> +	MIDDLE=$SECTORS
> +	END=$SECTORS
> +	while true; do
> +		MIDDLE=`expr \( \( $END + $START \) / 4096 \) \* 2048`
> +		if [ $MIDDLE -eq $START ]; then
> +			break
> +		fi
> +		dd if=$DEV iflag=direct bs=512 count=1 skip=$MIDDLE 2>/dev/null|head -c50 > $BUF
> +		if diff -q $BUF $TEMPLATE > /dev/null; then
> +			START=$MIDDLE
> +		else
> +			END=$MIDDLE
> +		fi
> +	done
> +	#switch to megabytes
> +	END=`expr $END / 2048`
> +	START=`expr $START / 2048`
> +	if [ $START -lt $TAIL_LEN ]; then
> +		START=0
> +	else
> +		START=`expr $START - $TAIL_LEN + 1`
> +	fi
> +	LEN=`expr $END - $START`
> +	dd if=$DEV iflag=direct bs=1M count=$LEN skip=$START >$LOGFILE 2>/dev/null
> +	echo $LOGFILE
> +}
> +
> +# HEADER contains a newline, so the funny quoting is necessary
> +HEADER='
> +Linux blockconsole version 1.1'
> +CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'`

You probably want to check lsscsi presence on the system, wasn't
installed by default on my debian testing image, for example. See diff
at the end of this mail.

> +
> +for DEV in $CANDIDATES; do
> +	if [ "`head -c32 $DEV`" == "$HEADER" ]; then
> +		end_of_log $DEV
> +	fi
> +done
> diff --git a/Documentation/block/blockconsole/mkblockconsole b/Documentation/block/blockconsole/mkblockconsole
> new file mode 100755
> index 0000000..d9514e7
> --- /dev/null
> +++ b/Documentation/block/blockconsole/mkblockconsole
> @@ -0,0 +1,24 @@
> +#!/bin/sh
> +
> +if [ ! $# -eq 1 ]; then
> +	echo "Usage: mkblockconsole <dev>"

	echo "Usage: $0 <dev>"

in case the name of the script changes.

> +	exit 1
> +elif mount|fgrep -q $1; then
> +	echo Device appears to be mounted - aborting
> +	exit 1
> +else
> +	dd if=/dev/zero bs=1M count=1 > $1
> +	# The funky formatting is actually needed!
> +	UUID=`head -c4 /dev/urandom |hexdump -e '/4 "%08x"'`
> +	echo > /tmp/$UUID
> +	echo 'Linux blockconsole version 1.1' >> /tmp/$UUID
> +	echo "$UUID" >> /tmp/$UUID
> +	echo 00000000 >> /tmp/$UUID
> +	echo 00000000 >> /tmp/$UUID
> +	for i in `seq 452`; do echo -n " " >> /tmp/$UUID; done
> +	echo >> /tmp/$UUID
> +	cat /tmp/$UUID > $1
> +	rm /tmp/$UUID
> +	sync
> +	exit 0
> +fi
> diff --git a/block/partitions/Makefile b/block/partitions/Makefile
> index 03af8ea..bf26d4a 100644
> --- a/block/partitions/Makefile
> +++ b/block/partitions/Makefile
> @@ -18,3 +18,4 @@ obj-$(CONFIG_IBM_PARTITION) += ibm.o
>  obj-$(CONFIG_EFI_PARTITION) += efi.o
>  obj-$(CONFIG_KARMA_PARTITION) += karma.o
>  obj-$(CONFIG_SYSV68_PARTITION) += sysv68.o
> +obj-$(CONFIG_BLOCKCONSOLE) += blockconsole.o
> diff --git a/block/partitions/blockconsole.c b/block/partitions/blockconsole.c
> new file mode 100644
> index 0000000..79796a8
> --- /dev/null
> +++ b/block/partitions/blockconsole.c
> @@ -0,0 +1,22 @@
> +#include <linux/blockconsole.h>
> +
> +#include "check.h"
> +
> +int blockconsole_partition(struct parsed_partitions *state)
> +{
> +	Sector sect;
> +	void *data;
> +	int err = 0;
> +
> +	data = read_part_sector(state, 0, &sect);
> +	if (!data)
> +		return -EIO;
> +	if (!bcon_magic_present(data))
> +		goto out;
> +
> +	bcon_add(state->name);
> +	err = 1;
> +out:
> +	put_dev_sector(sect);
> +	return err;
> +}
> diff --git a/block/partitions/check.c b/block/partitions/check.c
> index bc90867..8de99fa 100644
> --- a/block/partitions/check.c
> +++ b/block/partitions/check.c
> @@ -36,11 +36,15 @@
>  
>  int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/
>  
> +int blockconsole_partition(struct parsed_partitions *state);
>  static int (*check_part[])(struct parsed_partitions *) = {
>  	/*
>  	 * Probe partition formats with tables at disk address 0
>  	 * that also have an ADFS boot block at 0xdc0.
>  	 */
> +#ifdef CONFIG_BLOCKCONSOLE
> +	blockconsole_partition,
> +#endif
>  #ifdef CONFIG_ACORN_PARTITION_ICS
>  	adfspart_check_ICS,
>  #endif
> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index a796407..637c952 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -555,4 +555,10 @@ config BLK_DEV_RBD
>  
>  	  If unsure, say N.
>  
> +config BLOCKCONSOLE
> +	bool "Block device console logging support"
> +	help
> +	  This enables logging to block devices.
> +	  See <file:Documentation/block/blockconsole.txt> for details.
> +
>  endif # BLK_DEV
> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
> index 5b79505..1eb7f902 100644
> --- a/drivers/block/Makefile
> +++ b/drivers/block/Makefile
> @@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
>  obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
>  obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
>  obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
> +obj-$(CONFIG_BLOCKCONSOLE)	+= blockconsole.o
>  
>  swim_mod-y	:= swim.o swim_asm.o
> diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
> new file mode 100644
> index 0000000..d13203f
> --- /dev/null
> +++ b/drivers/block/blockconsole.c
> @@ -0,0 +1,606 @@
> +#include <linux/bio.h>
> +#include <linux/blockconsole.h>
> +#include <linux/console.h>
> +#include <linux/fs.h>
> +#include <linux/kthread.h>
> +#include <linux/mm.h>
> +#include <linux/mount.h>
> +#include <linux/random.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +#include <linux/workqueue.h>
> +
> +#define BLOCKCONSOLE_MAGIC_OLD	"\nLinux blockconsole version 1.0\n"

blockconsole is not yet upstream, you probably want to get rid of the
_OLD handling completely?

> +#define BLOCKCONSOLE_MAGIC	"\nLinux blockconsole version 1.1\n"
> +#define BCON_UUID_OFS		(32)
> +#define BCON_ROUND_OFS		(41)
> +#define BCON_TILE_OFS		(50)
> +#define BCON_HEADERSIZE		(50)
> +#define BCON_LONG_HEADERSIZE	(59) /* with tile index */
> +
> +#define PAGE_COUNT		(256)
> +#define SECTOR_COUNT		(PAGE_COUNT * (PAGE_SIZE >> 9))
> +#define CACHE_PAGE_MASK		(PAGE_COUNT - 1)
> +#define CACHE_SECTOR_MASK	(SECTOR_COUNT - 1)
> +#define CACHE_SIZE		(PAGE_COUNT << PAGE_SHIFT)
> +#define CACHE_MASK		(CACHE_SIZE - 1)
> +#define SECTOR_SHIFT		(9)
> +#define SECTOR_SIZE		(1 << SECTOR_SHIFT)
> +#define SECTOR_MASK		(~(SECTOR_SIZE-1))
> +#define PG_SECTOR_MASK		((PAGE_SIZE >> 9) - 1)
> +
> +struct bcon_bio {
> +	struct bio bio;
> +	struct bio_vec bvec;
> +	void *sector;
> +	int in_flight;
> +};
> +
> +struct blockconsole {
> +	char devname[32];
> +	struct spinlock end_io_lock;
> +	struct timer_list pad_timer;
> +	int error_count;
> +	struct kref kref;

Another build failure missing

#include <linux/kref.h>

drivers/block/blockconsole.c:44:14: error: field ‘kref’ has incomplete type
drivers/block/blockconsole.c: In function ‘bcon_get’:
drivers/block/blockconsole.c:63:2: error: implicit declaration of function ‘kref_get’ [-Werror=implicit-function-declaration]
drivers/block/blockconsole.c: In function ‘bcon_release’:
drivers/block/blockconsole.c:68:28: warning: initialization from incompatible pointer type [enabled by default]
drivers/block/blockconsole.c:68:28: warning: (near initialization for ‘bc’) [enabled by default]
drivers/block/blockconsole.c: In function ‘bcon_put’:
drivers/block/blockconsole.c:79:2: error: implicit declaration of function ‘kref_put’ [-Werror=implicit-function-declaration]
drivers/block/blockconsole.c: In function ‘bcon_create’:
drivers/block/blockconsole.c:499:2: error: implicit declaration of function ‘kref_init’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors

make[2]: *** [drivers/block/blockconsole.o] Error 1
make[1]: *** [drivers/block] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....

Below's a diff of what I did here to make this work, feel free to take
anything from it.

With the include added, it builds fine. Then I took an usb stick and I
did:

$ ./mkblockconsole /dev/sdc

<reboot>

$ ./bcon_tail

which created a file called /var/log/bcon.32ea1561.

Doing a

$ less /var/log/bcon.32ea1561
Linux blockconsole version 1.1
32ea1561
00000000
00000000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@
...

gives a lot of zeros, right up to the first megabyte, to be exact after which
the log starts:

$ hexdump -C /var/log/bcon.32ea1561

00000000  0a 4c 69 6e 75 78 20 62  6c 6f 63 6b 63 6f 6e 73  |.Linux blockcons|
00000010  6f 6c 65 20 76 65 72 73  69 6f 6e 20 31 2e 31 0a  |ole version 1.1.|
00000020  33 32 65 61 31 35 36 31  0a 30 30 30 30 30 30 30  |32ea1561.0000000|
00000030  30 0a 30 30 30 30 30 30  30 30 0a 20 20 20 20 20  |0.00000000.     |
00000040  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
000001f0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 0a  |               .|
00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00100000  0a 4c 69 6e 75 78 20 62  6c 6f 63 6b 63 6f 6e 73  |.Linux blockcons|
00100010  6f 6c 65 20 76 65 72 73  69 6f 6e 20 31 2e 31 0a  |ole version 1.1.|
00100020  33 32 65 61 31 35 36 31  0a 30 30 30 30 30 30 30  |32ea1561.0000000|
00100030  30 0a 30 30 30 30 30 30  30 31 0a 5b 20 20 20 20  |0.00000001.[    |
00100040  30 2e 30 30 30 30 30 30  5d 20 49 6e 69 74 69 61  |0.000000] Initia|
00100050  6c 69 7a 69 6e 67 20 63  67 72 6f 75 70 20 73 75  |lizing cgroup su|
....

So I can read the log by doing

$ strings /var/log/bcon.32ea1561 | less

So why is that first megabyte full of zeros there?

Other than that, it works like a charm and I like the idea that no
kernel cmdline args are needed.

Also, you might want to add a step-by-step fast howto to the docs with
concrete steps like the above so that people can try this out faster.

Thanks.

--
diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail
index 950bfd1..e415b6f 100755
--- a/Documentation/block/blockconsole/bcon_tail
+++ b/Documentation/block/blockconsole/bcon_tail
@@ -4,6 +4,12 @@ TAIL_LEN=16
 TEMPLATE=/tmp/bcon_template
 BUF=/tmp/bcon_buf
 
+if [ -z "$(which lsscsi)" ];
+then
+	echo "You need to install the lsscsi package on your distro."
+	exit 1
+fi
+
 end_of_log() {
 	DEV=$1
 	UUID=`head -c40 $DEV|tail -c8`
diff --git a/Documentation/block/blockconsole/mkblockconsole b/Documentation/block/blockconsole/mkblockconsole
index d9514e7..05c4ad8 100755
--- a/Documentation/block/blockconsole/mkblockconsole
+++ b/Documentation/block/blockconsole/mkblockconsole
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 if [ ! $# -eq 1 ]; then
-	echo "Usage: mkblockconsole <dev>"
+	echo "Usage: $0 <dev>"
 	exit 1
 elif mount|fgrep -q $1; then
 	echo Device appears to be mounted - aborting
diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
index d13203f..b4e995d 100644
--- a/drivers/block/blockconsole.c
+++ b/drivers/block/blockconsole.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/workqueue.h>
+#include <linux/kref.h>
 
 #define BLOCKCONSOLE_MAGIC_OLD	"\nLinux blockconsole version 1.0\n"
 #define BLOCKCONSOLE_MAGIC	"\nLinux blockconsole version 1.1\n"

-- 
Regards/Gruss,
Boris.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-16 12:46             ` Borislav Petkov
@ 2012-07-18 18:53               ` Jörn Engel
  2012-07-18 21:45                 ` Borislav Petkov
  2012-08-14 11:54                 ` Jan Engelhardt
  0 siblings, 2 replies; 27+ messages in thread
From: Jörn Engel @ 2012-07-18 18:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Mon, 16 July 2012 14:46:15 +0200, Borislav Petkov wrote:
> On Fri, Jul 13, 2012 at 12:20:09PM -0400, Jörn Engel wrote:
> 
> > +CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'`
> 
> You probably want to check lsscsi presence on the system, wasn't
> installed by default on my debian testing image, for example. See diff
> at the end of this mail.

Ack.

> > +	echo "Usage: mkblockconsole <dev>"
> 
> 	echo "Usage: $0 <dev>"
> 
> in case the name of the script changes.

Ack.

> > +#define BLOCKCONSOLE_MAGIC_OLD	"\nLinux blockconsole version 1.0\n"
> 
> blockconsole is not yet upstream, you probably want to get rid of the
> _OLD handling completely?

Agreed.

> > +	struct kref kref;
> 
> Another build failure missing
> 
> #include <linux/kref.h>

Ack.

> With the include added, it builds fine. Then I took an usb stick and I
> did:
> 
> $ ./mkblockconsole /dev/sdc
> 
> <reboot>

You can also run hdparm -z <dev> instead.  Or replug the device.  Main
danger of hdparm is that running the command twice will cause two
instances of blockconsole to use the same device.  Not sure how to
solve that problem - or if.

> So why is that first megabyte full of zeros there?

It gives you some scratch space to store information in.  How useful
that actually is may be a matter of opinion.  But independent of that,
you will find large amounts of zeroes all over.  Every time you
reboot, the new blockconsole will start writing at a megabyte-aligned
offset and whatever remains of the last megabyte should be zero-filled
as well.  Vim treats this as a single line, which makes it only mildly
annoying to me.

> Other than that, it works like a charm and I like the idea that no
> kernel cmdline args are needed.
> 
> Also, you might want to add a step-by-step fast howto to the docs with
> concrete steps like the above so that people can try this out faster.

I will try to find a quiet moment for that.  If you happened to beat
me to it, you certainly won't hear any complaints.

> --
> diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail
> index 950bfd1..e415b6f 100755
> --- a/Documentation/block/blockconsole/bcon_tail
> +++ b/Documentation/block/blockconsole/bcon_tail
> @@ -4,6 +4,12 @@ TAIL_LEN=16
>  TEMPLATE=/tmp/bcon_template
>  BUF=/tmp/bcon_buf
>  
> +if [ -z "$(which lsscsi)" ];
> +then
> +	echo "You need to install the lsscsi package on your distro."
> +	exit 1
> +fi
> +
>  end_of_log() {
>  	DEV=$1
>  	UUID=`head -c40 $DEV|tail -c8`
> diff --git a/Documentation/block/blockconsole/mkblockconsole b/Documentation/block/blockconsole/mkblockconsole
> index d9514e7..05c4ad8 100755
> --- a/Documentation/block/blockconsole/mkblockconsole
> +++ b/Documentation/block/blockconsole/mkblockconsole
> @@ -1,7 +1,7 @@
>  #!/bin/sh
>  
>  if [ ! $# -eq 1 ]; then
> -	echo "Usage: mkblockconsole <dev>"
> +	echo "Usage: $0 <dev>"
>  	exit 1
>  elif mount|fgrep -q $1; then
>  	echo Device appears to be mounted - aborting
> diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
> index d13203f..b4e995d 100644
> --- a/drivers/block/blockconsole.c
> +++ b/drivers/block/blockconsole.c
> @@ -9,6 +9,7 @@
>  #include <linux/slab.h>
>  #include <linux/string.h>
>  #include <linux/workqueue.h>
> +#include <linux/kref.h>
>  
>  #define BLOCKCONSOLE_MAGIC_OLD	"\nLinux blockconsole version 1.0\n"
>  #define BLOCKCONSOLE_MAGIC	"\nLinux blockconsole version 1.1\n"

Acked-by: Joern Engel <joern@logfs.org>

Thanks for the testing and the patch!  I will fold it in and resend
when I deal with the other two details.

Jörn

--
It does not matter how slowly you go, so long as you do not stop.
-- Confucius

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-18 21:45                 ` Borislav Petkov
@ 2012-07-18 21:08                   ` Jörn Engel
  2012-07-19  9:26                     ` Borislav Petkov
  2012-07-23 20:04                   ` Jörn Engel
  1 sibling, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-07-18 21:08 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Wed, 18 July 2012 23:45:21 +0200, Borislav Petkov wrote:
> 
> > > So why is that first megabyte full of zeros there?
> > 
> > It gives you some scratch space to store information in.
> 
> How? By me writing something in that empty line in vim? Or something
> else storing stuff there?

Assuming you want to do it in an automated fashion - by patching or
replacing mkblockconsole.  Again, I have no opinion on whether this
actually makes sense.  It is possible, it does not really hurt the
primary function and people have explicitly asked me for it.  Good
enough for me.

> > How useful that actually is may be a matter of opinion. But
> > independent of that, you will find large amounts of zeroes all over.
> > Every time you reboot, the new blockconsole will start writing at a
> > megabyte-aligned offset and whatever remains of the last megabyte
> > should be zero-filled as well.
> 
> Ah, those are the tiles you're talking about in the docs, right?

Yes.

> Oh, I didn't mean anything involved but rather a quick steps write-up
> (steps can always be expanded and made more verbose later):
> 
> Blocksonsole in three easy steps
> ================================
> 
> 1. Find an unused USB stick and prepare it for blockconsole by writing
> the blockconsole signature to it:
> 
> $ ./mkblockconsole /dev/sdc
> 
>   [ Assuming /dev/sdc is the device node of the USB stick you just mounted. ]
> 
> 2. USB stick is ready for use, replug it so that the kernel can start
> logging to it.
> 
> 3. After you've done logging, read out the logs from it like this:
> 
> $ ./bcon_tail
> 
>   [ This creates a file called /var/log/bcon.<random number> which
>   contains the logs. Open it with a sane editor like vim which can
>   display zeroed gaps as a single line and start staring at the logs. ]

Or show off your geekiness by using back ticks:
$ vi `./bcon_tail`

> Something like the above, just slap it at the beginning of
> Documentation/block/blockconsole.txt for impatient people like me and
> that's it :-).

Will do.

Jörn

--
There's nothing better for promoting creativity in a medium than
making an audience feel "Hmm ­ I could do better than that!"
-- Douglas Adams in a slashdot interview

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-18 18:53               ` Jörn Engel
@ 2012-07-18 21:45                 ` Borislav Petkov
  2012-07-18 21:08                   ` Jörn Engel
  2012-07-23 20:04                   ` Jörn Engel
  2012-08-14 11:54                 ` Jan Engelhardt
  1 sibling, 2 replies; 27+ messages in thread
From: Borislav Petkov @ 2012-07-18 21:45 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Wed, Jul 18, 2012 at 02:53:35PM -0400, Jörn Engel wrote:

[ … ]

> > $ ./mkblockconsole /dev/sdc
> > 
> > <reboot>
> 
> You can also run hdparm -z <dev> instead.  Or replug the device.  Main
> danger of hdparm is that running the command twice will cause two
> instances of blockconsole to use the same device.  Not sure how to
> solve that problem - or if.

Actually, I meant <reboot> in the sense here that I wanted to test the
case where user has a prepared stick and wants to catch full boot log of
the booting system.

> > So why is that first megabyte full of zeros there?
> 
> It gives you some scratch space to store information in.

How? By me writing something in that empty line in vim? Or something
else storing stuff there?

> How useful that actually is may be a matter of opinion. But
> independent of that, you will find large amounts of zeroes all over.
> Every time you reboot, the new blockconsole will start writing at a
> megabyte-aligned offset and whatever remains of the last megabyte
> should be zero-filled as well.

Ah, those are the tiles you're talking about in the docs, right?

> Vim treats this as a single line, which makes it only mildly annoying
> to me.

Ok, I should try that.

> > Other than that, it works like a charm and I like the idea that no
> > kernel cmdline args are needed.
> > 
> > Also, you might want to add a step-by-step fast howto to the docs with
> > concrete steps like the above so that people can try this out faster.
> 
> I will try to find a quiet moment for that.  If you happened to beat
> me to it, you certainly won't hear any complaints.

Oh, I didn't mean anything involved but rather a quick steps write-up
(steps can always be expanded and made more verbose later):

Blocksonsole in three easy steps
================================

1. Find an unused USB stick and prepare it for blockconsole by writing
the blockconsole signature to it:

$ ./mkblockconsole /dev/sdc

  [ Assuming /dev/sdc is the device node of the USB stick you just mounted. ]

2. USB stick is ready for use, replug it so that the kernel can start
logging to it.

3. After you've done logging, read out the logs from it like this:

$ ./bcon_tail

  [ This creates a file called /var/log/bcon.<random number> which
  contains the logs. Open it with a sane editor like vim which can
  display zeroed gaps as a single line and start staring at the logs. ]

---

Something like the above, just slap it at the beginning of
Documentation/block/blockconsole.txt for impatient people like me and
that's it :-).

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-18 21:08                   ` Jörn Engel
@ 2012-07-19  9:26                     ` Borislav Petkov
  0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2012-07-19  9:26 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Wed, Jul 18, 2012 at 05:08:15PM -0400, Jörn Engel wrote:
> On Wed, 18 July 2012 23:45:21 +0200, Borislav Petkov wrote:
> > 
> > > > So why is that first megabyte full of zeros there?
> > > 
> > > It gives you some scratch space to store information in.
> > 
> > How? By me writing something in that empty line in vim? Or something
> > else storing stuff there?
> 
> Assuming you want to do it in an automated fashion - by patching or
> replacing mkblockconsole.  Again, I have no opinion on whether this
> actually makes sense.  It is possible, it does not really hurt the
> primary function and people have explicitly asked me for it.  Good
> enough for me.

I see. It would be interesting to know what the use cases of those
people are. In any case, this is not an interface since you only have
one-way data movement from kernel to userspace and you can change the
formatting/layout of that data later with no obvious issues, AFAICT.

[ … ]

> Or show off your geekiness by using back ticks:
> $ vi `./bcon_tail`

Uuh, magic. Definitely! :-)

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-12 17:46       ` [PATCH] add blockconsole version 1.1 Jörn Engel
  2012-07-13 13:03         ` Borislav Petkov
@ 2012-07-23 14:33         ` Tvrtko Ursulin
  2012-07-23 20:02           ` Jörn Engel
  1 sibling, 1 reply; 27+ messages in thread
From: Tvrtko Ursulin @ 2012-07-23 14:33 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson


Hi,

On Thursday 12 Jul 2012 18:46:34 Jörn Engel wrote:
> Console driver similar to netconsole, except it writes to a block
> device.  Can be useful in a setup where netconsole, for whatever
> reasons, is impractical.

Perhaps you need to add a word or two about limitations compared to netconsole 
in documentation because it is quite significant difference in reliability? I 
mean so it is not assumed it is analogous to netconsole but just a different 
underlying media. I don't know if someone would expect it, but better said 
than not. 

I second the notion that logging to partitions would be useful.

Also, and I haven't checked what the swap format is, if it could somehow be 
integrated together that could be useful.

Tvrtko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-23 14:33         ` Tvrtko Ursulin
@ 2012-07-23 20:02           ` Jörn Engel
  2012-07-24  8:01             ` Tvrtko Ursulin
  0 siblings, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-07-23 20:02 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Mon, 23 July 2012 15:33:16 +0100, Tvrtko Ursulin wrote:
> On Thursday 12 Jul 2012 18:46:34 Jörn Engel wrote:
> > Console driver similar to netconsole, except it writes to a block
> > device.  Can be useful in a setup where netconsole, for whatever
> > reasons, is impractical.
> 
> Perhaps you need to add a word or two about limitations compared to netconsole 
> in documentation because it is quite significant difference in reliability? I 
> mean so it is not assumed it is analogous to netconsole but just a different 
> underlying media. I don't know if someone would expect it, but better said 
> than not. 

Given that I don't even know the limitations, that's a bit tough.  As
a general rule, I would always prefer netconsole.  It appears to be
more reliable than blockconsole and beats serial console by half a
lightyear.  But as a fallback when netconsole is not realistic,
blockconsole has proven useful.

> I second the notion that logging to partitions would be useful.

Below is a compile-tested patch to do that.  Feel free to give it a
spin and fix any bugs.

> Also, and I haven't checked what the swap format is, if it could somehow be 
> integrated together that could be useful.

That appears to be slightly less likely than crossbreeding a rabbit
with a chicken.  Is there something obvious I have missed?

Jörn

--
The story so far:
In the beginning the Universe was created.  This has made a lot
of people very angry and been widely regarded as a bad move.
-- Douglas Adams

[PATCH 2/2] bcon: Add a module parameter to support partitions

The usual methods of hooking into the partition scanner does not work
for partitions.  Allow those who care to pass in a module parameter.

Signed-off-by: Joern Engel <joern@logfs.org>
---
 drivers/block/blockconsole.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
index 09f239c..91c27ce 100644
--- a/drivers/block/blockconsole.c
+++ b/drivers/block/blockconsole.c
@@ -10,6 +10,7 @@
 #include <linux/kref.h>
 #include <linux/kthread.h>
 #include <linux/mm.h>
+#include <linux/moduleparam.h>
 #include <linux/mount.h>
 #include <linux/random.h>
 #include <linux/slab.h>
@@ -543,6 +544,14 @@ static void bcon_create_fuzzy(const char *name)
 	}
 }
 
+static int bcon_setup(const char *val, struct kernel_param *kp)
+{
+	bcon_create_fuzzy(val);
+	return 0;
+}
+
+module_param_call(device, bcon_setup, NULL, NULL, 0200);
+
 static DEFINE_SPINLOCK(device_lock);
 static char scanned_devices[80];
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-18 21:45                 ` Borislav Petkov
  2012-07-18 21:08                   ` Jörn Engel
@ 2012-07-23 20:04                   ` Jörn Engel
  2012-07-24 15:42                     ` Borislav Petkov
  1 sibling, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-07-23 20:04 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Wed, 18 July 2012 23:45:21 +0200, Borislav Petkov wrote:
> 
> Something like the above, just slap it at the beginning of
> Documentation/block/blockconsole.txt for impatient people like me and
> that's it :-).

And below is an updated patch with your changes folded in.  I did a
few minor tweaks, so there is every possibility I may have messed it
up.

Jörn

--
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike

[PATCH 1/2] add blockconsole version 1.1

Console driver similar to netconsole, except it writes to a block
device.  Can be useful in a setup where netconsole, for whatever
reasons, is impractical.

Changes since version 1.0:
- Header format overhaul, addressing several annoyances when actually
  using blockconsole for production.
- Steve Hodgson added a panic notifier.
- Added improvements from Borislav Petkov.

Signed-off-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Joern Engel <joern@logfs.org>
---
 Documentation/block/blockconsole.txt            |   94 ++++
 Documentation/block/blockconsole/bcon_tail      |   57 +++
 Documentation/block/blockconsole/mkblockconsole |   24 +
 block/partitions/Makefile                       |    1 +
 block/partitions/blockconsole.c                 |   22 +
 block/partitions/check.c                        |    4 +
 drivers/block/Kconfig                           |    6 +
 drivers/block/Makefile                          |    1 +
 drivers/block/blockconsole.c                    |  612 +++++++++++++++++++++++
 include/linux/blockconsole.h                    |    7 +
 include/linux/mount.h                           |    2 +-
 init/do_mounts.c                                |    4 +-
 12 files changed, 831 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/block/blockconsole.txt
 create mode 100755 Documentation/block/blockconsole/bcon_tail
 create mode 100755 Documentation/block/blockconsole/mkblockconsole
 create mode 100644 block/partitions/blockconsole.c
 create mode 100644 drivers/block/blockconsole.c
 create mode 100644 include/linux/blockconsole.h

diff --git a/Documentation/block/blockconsole.txt b/Documentation/block/blockconsole.txt
new file mode 100644
index 0000000..2b45516
--- /dev/null
+++ b/Documentation/block/blockconsole.txt
@@ -0,0 +1,94 @@
+started by Jörn Engel <joern@logfs.org> 2012.03.17
+
+Blocksonsole for the impatient
+==============================
+
+1. Find an unused USB stick and prepare it for blockconsole by writing
+   the blockconsole signature to it:
+   $ ./mkblockconsole /dev/<usb_stick>
+
+2. USB stick is ready for use, replug it so that the kernel can start
+   logging to it.
+
+3. After you've done logging, read out the logs from it like this:
+   $ ./bcon_tail
+
+   This creates a file called /var/log/bcon.<UUID> which contains the
+   last 16M of the logs.  Open it with a sane editor like vim which
+   can display zeroed gaps as a single line and start staring at the
+   logs.
+   For the really impatient, use:
+   $ vi `./bcon_tail`
+
+Introduction:
+=============
+
+This module logs kernel printk messages to block devices, e.g. usb
+sticks.  It allows after-the-fact debugging when the main
+disk/filesystem fails and serial consoles and netconsole are
+impractical.
+
+It can currently only be used built-in.  Blockconsole hooks into the
+partition scanning code and will bring up configured block devices as
+soon as possible.  While this doesn't allow capture of early kernel
+panics, it does capture most of the boot process.
+
+Block device configuration:
+==================================
+
+Blockconsole has no configuration parameter.  In order to use a block
+device for logging, the blockconsole header has to be written to the
+device in question.  Logging to partitions is not supported.
+
+The example program mkblockconsole can be used to generate such a
+header on a device.
+
+Header format:
+==============
+
+A legal header looks like this:
+
+Linux blockconsole version 1.1
+818cf322
+00000000
+00000000
+
+It consists of a newline, the "Linux blockconsole version 1.1" string
+plus three numbers on separate lines each.  Numbers are all 32bit,
+represented as 8-byte hex strings, with letters in lowercase.  The
+first number is a uuid for this particular console device.  Just pick
+a random number when generating the device.  The second number is a
+wrap counter and unlikely to ever increment.  The third is a tile
+counter, with a tile being one megabyte in size.
+
+Miscellaneous notes:
+====================
+
+Blockconsole will write a new header for every tile or once every
+megabyte.  The header starts with a newline in order to ensure the
+"Linux blockconsole...' string always ends up at the beginning of a
+line if you read the blockconsole in a text editor.
+
+The blockconsole header is constructed such that opening the log
+device in a text editor, ignoring memory constraints due to large
+devices, should just work and be reasonably non-confusing to readers.
+However, the example program bcon_tail can be used to copy the last 16
+tiles of the log device to /var/log/bcon.<uuid>, which should be much
+easier to handle.
+
+The wrap counter is used by blockconsole to determine where to
+continue logging after a reboot.  New logs will be written to the
+first tile that wasn't written to by the last instance of
+blockconsole.  Similarly bcon_tail is doing a binary search to find
+the end of the log.
+
+Writing to the log device is strictly circular.  This should give
+optimal performance and reliability on cheap devices, like usb sticks.
+
+Writing to block devices has to happen in sector granularity, while
+kernel logging happens in byte granularity.  In order not to lose
+messages in important cases like kernel crashes, a timer will write
+out partial sectors if no new messages appear for a while.  The
+unwritten part of the sector will be filled with spaces and a single
+newline.  In a quiet system, these empty lines can make up the bulk of
+the log.
diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail
new file mode 100755
index 0000000..5d788c9
--- /dev/null
+++ b/Documentation/block/blockconsole/bcon_tail
@@ -0,0 +1,57 @@
+#!/bin/bash
+
+TAIL_LEN=16
+TEMPLATE=/tmp/bcon_template
+BUF=/tmp/bcon_buf
+
+if [ -z "$(which lsscsi)" ]; then
+	echo "You need to install the lsscsi package on your distro."
+	exit 1
+fi
+
+end_of_log() {
+	DEV=$1
+	UUID=`head -c40 $DEV|tail -c8`
+	LOGFILE=/var/log/bcon.$UUID
+	SECTORS=`hdparm -g $DEV|grep sectors|sed 's/.*sectors = \([0-9]*\).*/\1/'`
+	#MSIZE=`expr $SECTORS / 2048`
+	dd if=$DEV iflag=direct bs=512 2>/dev/null|head -c50 > $TEMPLATE
+	#START, MIDDLE and END are in sectors
+	START=0
+	MIDDLE=$SECTORS
+	END=$SECTORS
+	while true; do
+		MIDDLE=`expr \( \( $END + $START \) / 4096 \) \* 2048`
+		if [ $MIDDLE -eq $START ]; then
+			break
+		fi
+		dd if=$DEV iflag=direct bs=512 count=1 skip=$MIDDLE 2>/dev/null|head -c50 > $BUF
+		if diff -q $BUF $TEMPLATE > /dev/null; then
+			START=$MIDDLE
+		else
+			END=$MIDDLE
+		fi
+	done
+	#switch to megabytes
+	END=`expr $END / 2048`
+	START=`expr $START / 2048`
+	if [ $START -lt $TAIL_LEN ]; then
+		START=0
+	else
+		START=`expr $START - $TAIL_LEN + 1`
+	fi
+	LEN=`expr $END - $START`
+	dd if=$DEV iflag=direct bs=1M count=$LEN skip=$START >$LOGFILE 2>/dev/null
+	echo $LOGFILE
+}
+
+# HEADER contains a newline, so the funny quoting is necessary
+HEADER='
+Linux blockconsole version 1.1'
+CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'`
+
+for DEV in $CANDIDATES; do
+	if [ "`head -c32 $DEV`" == "$HEADER" ]; then
+		end_of_log $DEV
+	fi
+done
diff --git a/Documentation/block/blockconsole/mkblockconsole b/Documentation/block/blockconsole/mkblockconsole
new file mode 100755
index 0000000..05c4ad8
--- /dev/null
+++ b/Documentation/block/blockconsole/mkblockconsole
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+if [ ! $# -eq 1 ]; then
+	echo "Usage: $0 <dev>"
+	exit 1
+elif mount|fgrep -q $1; then
+	echo Device appears to be mounted - aborting
+	exit 1
+else
+	dd if=/dev/zero bs=1M count=1 > $1
+	# The funky formatting is actually needed!
+	UUID=`head -c4 /dev/urandom |hexdump -e '/4 "%08x"'`
+	echo > /tmp/$UUID
+	echo 'Linux blockconsole version 1.1' >> /tmp/$UUID
+	echo "$UUID" >> /tmp/$UUID
+	echo 00000000 >> /tmp/$UUID
+	echo 00000000 >> /tmp/$UUID
+	for i in `seq 452`; do echo -n " " >> /tmp/$UUID; done
+	echo >> /tmp/$UUID
+	cat /tmp/$UUID > $1
+	rm /tmp/$UUID
+	sync
+	exit 0
+fi
diff --git a/block/partitions/Makefile b/block/partitions/Makefile
index 03af8ea..bf26d4a 100644
--- a/block/partitions/Makefile
+++ b/block/partitions/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_IBM_PARTITION) += ibm.o
 obj-$(CONFIG_EFI_PARTITION) += efi.o
 obj-$(CONFIG_KARMA_PARTITION) += karma.o
 obj-$(CONFIG_SYSV68_PARTITION) += sysv68.o
+obj-$(CONFIG_BLOCKCONSOLE) += blockconsole.o
diff --git a/block/partitions/blockconsole.c b/block/partitions/blockconsole.c
new file mode 100644
index 0000000..79796a8
--- /dev/null
+++ b/block/partitions/blockconsole.c
@@ -0,0 +1,22 @@
+#include <linux/blockconsole.h>
+
+#include "check.h"
+
+int blockconsole_partition(struct parsed_partitions *state)
+{
+	Sector sect;
+	void *data;
+	int err = 0;
+
+	data = read_part_sector(state, 0, &sect);
+	if (!data)
+		return -EIO;
+	if (!bcon_magic_present(data))
+		goto out;
+
+	bcon_add(state->name);
+	err = 1;
+out:
+	put_dev_sector(sect);
+	return err;
+}
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..8de99fa 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -36,11 +36,15 @@
 
 int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/
 
+int blockconsole_partition(struct parsed_partitions *state);
 static int (*check_part[])(struct parsed_partitions *) = {
 	/*
 	 * Probe partition formats with tables at disk address 0
 	 * that also have an ADFS boot block at 0xdc0.
 	 */
+#ifdef CONFIG_BLOCKCONSOLE
+	blockconsole_partition,
+#endif
 #ifdef CONFIG_ACORN_PARTITION_ICS
 	adfspart_check_ICS,
 #endif
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a796407..637c952 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -555,4 +555,10 @@ config BLK_DEV_RBD
 
 	  If unsure, say N.
 
+config BLOCKCONSOLE
+	bool "Block device console logging support"
+	help
+	  This enables logging to block devices.
+	  See <file:Documentation/block/blockconsole.txt> for details.
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 5b79505..1eb7f902 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -41,5 +41,6 @@ obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
 obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
 obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
+obj-$(CONFIG_BLOCKCONSOLE)	+= blockconsole.o
 
 swim_mod-y	:= swim.o swim_asm.o
diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c
new file mode 100644
index 0000000..09f239c
--- /dev/null
+++ b/drivers/block/blockconsole.c
@@ -0,0 +1,612 @@
+/*
+ * Blockconsole - write kernel console to a block device
+ *
+ * Copyright (C) 2012  Joern Engel <joern@logfs.org>
+ */
+#include <linux/bio.h>
+#include <linux/blockconsole.h>
+#include <linux/console.h>
+#include <linux/fs.h>
+#include <linux/kref.h>
+#include <linux/kthread.h>
+#include <linux/mm.h>
+#include <linux/mount.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
+#define BLOCKCONSOLE_MAGIC_OLD	"\nLinux blockconsole version 1.0\n"
+#define BLOCKCONSOLE_MAGIC	"\nLinux blockconsole version 1.1\n"
+#define BCON_UUID_OFS		(32)
+#define BCON_ROUND_OFS		(41)
+#define BCON_TILE_OFS		(50)
+#define BCON_HEADERSIZE		(50)
+#define BCON_LONG_HEADERSIZE	(59) /* with tile index */
+
+#define PAGE_COUNT		(256)
+#define SECTOR_COUNT		(PAGE_COUNT * (PAGE_SIZE >> 9))
+#define CACHE_PAGE_MASK		(PAGE_COUNT - 1)
+#define CACHE_SECTOR_MASK	(SECTOR_COUNT - 1)
+#define CACHE_SIZE		(PAGE_COUNT << PAGE_SHIFT)
+#define CACHE_MASK		(CACHE_SIZE - 1)
+#define SECTOR_SHIFT		(9)
+#define SECTOR_SIZE		(1 << SECTOR_SHIFT)
+#define SECTOR_MASK		(~(SECTOR_SIZE-1))
+#define PG_SECTOR_MASK		((PAGE_SIZE >> 9) - 1)
+
+struct bcon_bio {
+	struct bio bio;
+	struct bio_vec bvec;
+	void *sector;
+	int in_flight;
+};
+
+struct blockconsole {
+	char devname[32];
+	struct spinlock end_io_lock;
+	struct timer_list pad_timer;
+	int error_count;
+	struct kref kref;
+	u64 console_bytes;
+	u64 write_bytes;
+	u64 max_bytes;
+	u32 round;
+	u32 uuid;
+	struct bcon_bio bio_array[SECTOR_COUNT];
+	struct page *pages;
+	struct bcon_bio zero_bios[PAGE_COUNT];
+	struct page *zero_page;
+	struct block_device *bdev;
+	struct console console;
+	struct work_struct unregister_work;
+	struct task_struct *writeback_thread;
+	struct notifier_block panic_block;
+};
+
+static void bcon_get(struct blockconsole *bc)
+{
+	kref_get(&bc->kref);
+}
+
+static void bcon_release(struct kref *kref)
+{
+	struct blockconsole *bc = container_of(kref, struct blockconsole, kref);
+
+	__free_pages(bc->zero_page, 0);
+	__free_pages(bc->pages, 8);
+	invalidate_mapping_pages(bc->bdev->bd_inode->i_mapping, 0, -1);
+	blkdev_put(bc->bdev, FMODE_READ|FMODE_WRITE);
+	kfree(bc);
+}
+
+static void bcon_put(struct blockconsole *bc)
+{
+	kref_put(&bc->kref, bcon_release);
+}
+
+static int __bcon_console_ofs(u64 console_bytes)
+{
+	return console_bytes & ~SECTOR_MASK;
+}
+
+static int bcon_console_ofs(struct blockconsole *bc)
+{
+	return __bcon_console_ofs(bc->console_bytes);
+}
+
+static int __bcon_console_sector(u64 console_bytes)
+{
+	return (console_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static int bcon_console_sector(struct blockconsole *bc)
+{
+	return __bcon_console_sector(bc->console_bytes);
+}
+
+static int bcon_write_sector(struct blockconsole *bc)
+{
+	return (bc->write_bytes >> SECTOR_SHIFT) & CACHE_SECTOR_MASK;
+}
+
+static void clear_sector(void *sector)
+{
+	memset(sector, ' ', 511);
+	memset(sector + 511, 10, 1);
+}
+
+static void bcon_init_first_page(struct blockconsole *bc)
+{
+	char *buf = page_address(bc->pages);
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+	u32 tile = bc->console_bytes >> 20; /* We overflow after 4TB - fine */
+
+	clear_sector(buf);
+	memcpy(buf, BLOCKCONSOLE_MAGIC, len);
+	sprintf(buf + BCON_UUID_OFS, "%08x", bc->uuid);
+	sprintf(buf + BCON_ROUND_OFS, "%08x", bc->round);
+	sprintf(buf + BCON_TILE_OFS, "%08x", tile);
+	/* replace NUL with newline */
+	buf[BCON_UUID_OFS + 8] = 10;
+	buf[BCON_ROUND_OFS + 8] = 10;
+	buf[BCON_TILE_OFS + 8] = 10;
+}
+
+static void bcon_advance_console_bytes(struct blockconsole *bc, int bytes)
+{
+	u64 old, new;
+
+	do {
+		old = bc->console_bytes;
+		new = old + bytes;
+		if (new >= bc->max_bytes)
+			new = 0;
+		if ((new & CACHE_MASK) == 0) {
+			bcon_init_first_page(bc);
+			new += BCON_LONG_HEADERSIZE;
+		}
+	} while (cmpxchg64(&bc->console_bytes, old, new) != old);
+}
+
+static void request_complete(struct bio *bio, int err)
+{
+	complete((struct completion *)bio->bi_private);
+}
+
+static int sync_read(struct blockconsole *bc, u64 ofs)
+{
+	struct bio bio;
+	struct bio_vec bio_vec;
+	struct completion complete;
+
+	bio_init(&bio);
+	bio.bi_io_vec = &bio_vec;
+	bio_vec.bv_page = bc->pages;
+	bio_vec.bv_len = SECTOR_SIZE;
+	bio_vec.bv_offset = 0;
+	bio.bi_vcnt = 1;
+	bio.bi_idx = 0;
+	bio.bi_size = SECTOR_SIZE;
+	bio.bi_bdev = bc->bdev;
+	bio.bi_sector = ofs >> SECTOR_SHIFT;
+	init_completion(&complete);
+	bio.bi_private = &complete;
+	bio.bi_end_io = request_complete;
+
+	submit_bio(READ, &bio);
+	wait_for_completion(&complete);
+	return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO;
+}
+
+static void bcon_erase_segment(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio *bio = &bcon_bio->bio;
+
+		/*
+		 * If the last erase hasn't finished yet, just skip it.  The log will
+		 * look messy, but that's all.
+		 */
+		rmb();
+		if (bcon_bio->in_flight)
+			continue;
+		bio_init(bio);
+		bio->bi_io_vec = &bcon_bio->bvec;
+		bio->bi_vcnt = 1;
+		bio->bi_size = PAGE_SIZE;
+		bio->bi_bdev = bc->bdev;
+		bio->bi_private = bc;
+		bio->bi_idx = 0;
+		bio->bi_sector = (bc->write_bytes + i * PAGE_SIZE) >> 9;
+		bcon_bio->in_flight = 1;
+		wmb();
+		/* We want the erase to go to the device first somehow */
+		submit_bio(WRITE | REQ_SOFTBARRIER, bio);
+	}
+}
+
+static void bcon_advance_write_bytes(struct blockconsole *bc, int bytes)
+{
+	bc->write_bytes += bytes;
+	if (bc->write_bytes >= bc->max_bytes) {
+		bc->write_bytes = 0;
+		bcon_init_first_page(bc);
+		bc->round++;
+	}
+}
+
+static int bcon_convert_old_format(struct blockconsole *bc)
+{
+	bc->uuid = get_random_int();
+	bc->round = 0;
+	bc->console_bytes = bc->write_bytes = 0;
+	bcon_advance_console_bytes(bc, 0); /* To skip the header */
+	bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */
+	bcon_erase_segment(bc);
+	printk(KERN_INFO"blockconsole: converted %s from old format\n",
+			bc->devname);
+	return 0;
+}
+
+static int bcon_find_end_of_log(struct blockconsole *bc)
+{
+	u64 start = 0, end = bc->max_bytes, middle;
+	void *sec0 = bc->bio_array[0].sector;
+	void *sec1 = bc->bio_array[1].sector;
+	int err, version;
+
+	err = sync_read(bc, 0);
+	if (err)
+		return err;
+	/* Second sanity check, out of sheer paranoia */
+	version = bcon_magic_present(sec0);
+	if (version == 10)
+		return bcon_convert_old_format(bc);
+	bc->uuid = simple_strtoull(sec0 + BCON_UUID_OFS, NULL, 16);
+	bc->round = simple_strtoull(sec0 + BCON_ROUND_OFS, NULL, 16);
+
+	memcpy(sec1, sec0, BCON_HEADERSIZE);
+	for (;;) {
+		middle = (start + end) / 2;
+		middle &= ~CACHE_MASK;
+		if (middle == start)
+			break;
+		err = sync_read(bc, middle);
+		if (err)
+			return err;
+		if (memcmp(sec1, sec0, BCON_HEADERSIZE)) {
+			/* If the two differ, we haven't written that far yet */
+			end = middle;
+		} else {
+			start = middle;
+		}
+	}
+	bc->console_bytes = bc->write_bytes = end;
+	bcon_advance_console_bytes(bc, 0); /* To skip the header */
+	bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */
+	bcon_erase_segment(bc);
+	return 0;
+}
+
+static void bcon_unregister(struct work_struct *work)
+{
+	struct blockconsole *bc = container_of(work, struct blockconsole,
+			unregister_work);
+
+	atomic_notifier_chain_unregister(&panic_notifier_list, &bc->panic_block);
+	unregister_console(&bc->console);
+	del_timer_sync(&bc->pad_timer);
+	kthread_stop(bc->writeback_thread);
+	/* No new io will be scheduled anymore now */
+	bcon_put(bc);
+}
+
+#define BCON_MAX_ERRORS	10
+static void bcon_end_io(struct bio *bio, int err)
+{
+	struct bcon_bio *bcon_bio = container_of(bio, struct bcon_bio, bio);
+	struct blockconsole *bc = bio->bi_private;
+	unsigned long flags;
+
+	/*
+	 * We want to assume the device broken and free this console if
+	 * we accumulate too many errors.  But if errors are transient,
+	 * we also want to forget about them once writes succeed again.
+	 * Oh, and we only want to reset the counter if it hasn't reached
+	 * the limit yet, so we don't bcon_put() twice from here.
+	 */
+	spin_lock_irqsave(&bc->end_io_lock, flags);
+	if (err) {
+		if (bc->error_count++ == BCON_MAX_ERRORS) {
+			printk(KERN_INFO"blockconsole: no longer logging to %s\n", bc->devname);
+			schedule_work(&bc->unregister_work);
+		}
+	} else {
+		if (bc->error_count && bc->error_count < BCON_MAX_ERRORS)
+			bc->error_count = 0;
+	}
+	/*
+	 * Add padding (a bunch of spaces and a newline) early so bcon_pad
+	 * only has to advance a pointer.
+	 */
+	clear_sector(bcon_bio->sector);
+	bcon_bio->in_flight = 0;
+	spin_unlock_irqrestore(&bc->end_io_lock, flags);
+	bcon_put(bc);
+}
+
+static void bcon_writesector(struct blockconsole *bc, int index)
+{
+	struct bcon_bio *bcon_bio = bc->bio_array + index;
+	struct bio *bio = &bcon_bio->bio;
+
+	rmb();
+	if (bcon_bio->in_flight)
+		return;
+	bcon_get(bc);
+
+	bio_init(bio);
+	bio->bi_io_vec = &bcon_bio->bvec;
+	bio->bi_vcnt = 1;
+	bio->bi_size = SECTOR_SIZE;
+	bio->bi_bdev = bc->bdev;
+	bio->bi_private = bc;
+	bio->bi_end_io = bcon_end_io;
+
+	bio->bi_idx = 0;
+	bio->bi_sector = bc->write_bytes >> 9;
+	bcon_bio->in_flight = 1;
+	wmb();
+	submit_bio(WRITE, bio);
+}
+
+static int bcon_writeback(void *_bc)
+{
+	struct blockconsole *bc = _bc;
+	struct sched_param(sp);
+
+	sp.sched_priority = MAX_RT_PRIO - 1; /* Highest realtime prio */
+	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+		if (kthread_should_stop())
+			break;
+		while (bcon_write_sector(bc) != bcon_console_sector(bc)) {
+			bcon_writesector(bc, bcon_write_sector(bc));
+			bcon_advance_write_bytes(bc, SECTOR_SIZE);
+			if (bcon_write_sector(bc) == 0) {
+				bcon_erase_segment(bc);
+			}
+		}
+	}
+	return 0;
+}
+
+static void bcon_pad(unsigned long data)
+{
+	struct blockconsole *bc = (void *)data;
+	unsigned int n;
+
+	/*
+	 * We deliberately race against bcon_write here.  If we lose the race,
+	 * our padding is no longer where we expected it to be, i.e. it is
+	 * no longer a bunch of spaces with a newline at the end.  There could
+	 * not be a newline at all or it could be somewhere in the middle.
+	 * Either way, the log corruption is fairly obvious to spot and ignore
+	 * for human readers.
+	 */
+	n = SECTOR_SIZE - bcon_console_ofs(bc);
+	if (n != SECTOR_SIZE) {
+		bcon_advance_console_bytes(bc, n);
+		wake_up_process(bc->writeback_thread);
+	}
+}
+
+static void bcon_write(struct console *console, const char *msg,
+		unsigned int len)
+{
+	struct blockconsole *bc = container_of(console, struct blockconsole,
+			console);
+	unsigned int n;
+	u64 console_bytes;
+	int i;
+
+	while (len) {
+		console_bytes = bc->console_bytes;
+		i = __bcon_console_sector(console_bytes);
+		rmb();
+		if (bc->bio_array[i].in_flight)
+			break;
+		n = min_t(int, len, SECTOR_SIZE -
+				__bcon_console_ofs(console_bytes));
+		memcpy(bc->bio_array[i].sector +
+				__bcon_console_ofs(console_bytes), msg, n);
+		len -= n;
+		msg += n;
+		bcon_advance_console_bytes(bc, n);
+	}
+	wake_up_process(bc->writeback_thread);
+	mod_timer(&bc->pad_timer, jiffies + HZ);
+}
+
+static void bcon_init_bios(struct blockconsole *bc)
+{
+	int i;
+
+	for (i = 0; i < SECTOR_COUNT; i++) {
+		int page_index = i >> (PAGE_SHIFT - SECTOR_SHIFT);
+		struct page *page = bc->pages + page_index;
+		struct bcon_bio *bcon_bio = bc->bio_array + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bcon_bio->sector = page_address(bc->pages + page_index)
+			+ SECTOR_SIZE * (i & PG_SECTOR_MASK);
+		clear_sector(bcon_bio->sector);
+		bvec->bv_page = page;
+		bvec->bv_len = SECTOR_SIZE;
+		bvec->bv_offset = SECTOR_SIZE * (i & PG_SECTOR_MASK);
+	}
+}
+
+static void bcon_init_zero_bio(struct blockconsole *bc)
+{
+	int i;
+
+	memset(page_address(bc->zero_page), 0, PAGE_SIZE);
+	for (i = 0; i < PAGE_COUNT; i++) {
+		struct bcon_bio *bcon_bio = bc->zero_bios + i;
+		struct bio_vec *bvec = &bcon_bio->bvec;
+
+		bcon_bio->in_flight = 0;
+		bvec->bv_page = bc->zero_page;
+		bvec->bv_len = PAGE_SIZE;
+		bvec->bv_offset = 0;
+	}
+}
+
+static int blockconsole_panic(struct notifier_block *this, unsigned long event,
+		void *ptr)
+{
+	struct blockconsole *bc = container_of(this, struct blockconsole,
+			panic_block);
+	unsigned int n;
+
+	n = SECTOR_SIZE - bcon_console_ofs(bc);
+	if (n != SECTOR_SIZE)
+		bcon_advance_console_bytes(bc, n);
+	bcon_writeback(bc);
+	return NOTIFY_DONE;
+}
+
+static int bcon_create(const char *devname)
+{
+	const fmode_t mode = FMODE_READ | FMODE_WRITE;
+	struct blockconsole *bc;
+	int err;
+
+	bc = kzalloc(sizeof(*bc), GFP_KERNEL);
+	if (!bc)
+		return -ENOMEM;
+	memset(bc->devname, ' ', sizeof(bc->devname));
+	strlcpy(bc->devname, devname, sizeof(bc->devname));
+	spin_lock_init(&bc->end_io_lock);
+	strcpy(bc->console.name, "bcon");
+	bc->console.flags = CON_PRINTBUFFER | CON_ENABLED; /* FIXME: document flags */
+	bc->console.write = bcon_write;
+	bc->bdev = blkdev_get_by_path(devname, mode, NULL);
+#ifndef MODULE
+	if (IS_ERR(bc->bdev)) {
+		dev_t devt = name_to_dev_t(devname);
+		if (devt)
+			bc->bdev = blkdev_get_by_dev(devt, mode, NULL);
+	}
+#endif
+	if (IS_ERR(bc->bdev))
+		goto out;
+	bc->pages = alloc_pages(GFP_KERNEL, 8);
+	if (!bc->pages)
+		goto out;
+	bc->zero_page = alloc_pages(GFP_KERNEL, 0);
+	if (!bc->zero_page)
+		goto out1;
+	bcon_init_bios(bc);
+	bcon_init_zero_bio(bc);
+	setup_timer(&bc->pad_timer, bcon_pad, (unsigned long)bc);
+	bc->max_bytes = bc->bdev->bd_inode->i_size & ~CACHE_MASK;
+	err = bcon_find_end_of_log(bc);
+	if (err)
+		goto out2;
+	kref_init(&bc->kref); /* This reference gets freed on errors */
+	bc->writeback_thread = kthread_run(bcon_writeback, bc, "bcon_%s",
+			devname);
+	if (IS_ERR(bc->writeback_thread))
+		goto out2;
+	INIT_WORK(&bc->unregister_work, bcon_unregister);
+	register_console(&bc->console);
+	bc->panic_block.notifier_call = blockconsole_panic;
+	bc->panic_block.priority = INT_MAX;
+	atomic_notifier_chain_register(&panic_notifier_list, &bc->panic_block);
+	printk(KERN_INFO"blockconsole: now logging to %s at %llx\n", devname,
+			bc->console_bytes >> 20);
+	return 0;
+
+out2:
+	__free_pages(bc->zero_page, 0);
+out1:
+	__free_pages(bc->pages, 8);
+out:
+	kfree(bc);
+	/* Not strictly correct, be the caller doesn't care */
+	return -ENOMEM;
+}
+
+static void bcon_create_fuzzy(const char *name)
+{
+	char *longname;
+	int err;
+
+	err = bcon_create(name);
+	if (err) {
+		longname = kzalloc(strlen(name) + 6, GFP_KERNEL);
+		if (!longname)
+			return;
+		strcpy(longname, "/dev/");
+		strcat(longname, name);
+		bcon_create(longname);
+		kfree(longname);
+	}
+}
+
+static DEFINE_SPINLOCK(device_lock);
+static char scanned_devices[80];
+
+static void bcon_do_add(struct work_struct *work)
+{
+	char local_devices[80], *name, *remainder = local_devices;
+
+	spin_lock(&device_lock);
+	memcpy(local_devices, scanned_devices, sizeof(local_devices));
+	memset(scanned_devices, 0, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+
+	while (remainder && remainder[0]) {
+		name = strsep(&remainder, ",");
+		bcon_create_fuzzy(name);
+	}
+}
+
+DECLARE_WORK(bcon_add_work, bcon_do_add);
+
+void bcon_add(const char *name)
+{
+	/*
+	 * We add each name to a small static buffer and ask for a workqueue
+	 * to go pick it up asap.  Once it is picked up, the buffer is empty
+	 * again, so hopefully it will suffice for all sane users.
+	 */
+	spin_lock(&device_lock);
+	if (scanned_devices[0])
+		strncat(scanned_devices, ",", sizeof(scanned_devices));
+	strncat(scanned_devices, name, sizeof(scanned_devices));
+	spin_unlock(&device_lock);
+	schedule_work(&bcon_add_work);
+}
+
+static int isnum(const void *data)
+{
+	unsigned long long num;
+	char *end;
+
+	/* Must be an 8-digit hex number followed by newline */
+	num = simple_strtoull(data, &end, 16);
+	if (end != data + 8)
+		return 0;
+	if (*end != 10)
+		return 0;
+	if (num > 0xffffffffull)
+		return 0;
+	return 1;
+}
+
+int bcon_magic_present(const void *data)
+{
+	size_t len = strlen(BLOCKCONSOLE_MAGIC);
+
+	if (!memcmp(data, BLOCKCONSOLE_MAGIC_OLD, len))
+		return 10;
+	if (memcmp(data, BLOCKCONSOLE_MAGIC, len))
+		return 0;
+	if (!isnum(data + BCON_UUID_OFS))
+		return 0;
+	if (!isnum(data + BCON_ROUND_OFS))
+		return 0;
+	if (!isnum(data + BCON_TILE_OFS))
+		return 0;
+	return 11;
+}
diff --git a/include/linux/blockconsole.h b/include/linux/blockconsole.h
new file mode 100644
index 0000000..114f7c5
--- /dev/null
+++ b/include/linux/blockconsole.h
@@ -0,0 +1,7 @@
+#ifndef LINUX_BLOCKCONSOLE_H
+#define LINUX_BLOCKCONSOLE_H
+
+int bcon_magic_present(const void *data);
+void bcon_add(const char *name);
+
+#endif
diff --git a/include/linux/mount.h b/include/linux/mount.h
index d7029f4..6b5fa77 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -74,6 +74,6 @@ extern struct vfsmount *vfs_kern_mount(struct file_system_type *type,
 extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list);
 extern void mark_mounts_for_expiry(struct list_head *mounts);
 
-extern dev_t name_to_dev_t(char *name);
+extern dev_t name_to_dev_t(const char *name);
 
 #endif /* _LINUX_MOUNT_H */
diff --git a/init/do_mounts.c b/init/do_mounts.c
index d3f0aee..a6d9bcb 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -106,7 +106,7 @@ no_match:
  *
  * Returns the matching dev_t on success or 0 on failure.
  */
-static dev_t devt_from_partuuid(char *uuid_str)
+static dev_t devt_from_partuuid(const char *uuid_str)
 {
 	dev_t res = 0;
 	struct device *dev = NULL;
@@ -183,7 +183,7 @@ done:
  *	bangs.
  */
 
-dev_t name_to_dev_t(char *name)
+dev_t name_to_dev_t(const char *name)
 {
 	char s[32];
 	char *p;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-23 20:02           ` Jörn Engel
@ 2012-07-24  8:01             ` Tvrtko Ursulin
  2012-07-24 14:38               ` Jörn Engel
  0 siblings, 1 reply; 27+ messages in thread
From: Tvrtko Ursulin @ 2012-07-24  8:01 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Monday 23 Jul 2012 21:02:30 Jörn Engel wrote:
> On Mon, 23 July 2012 15:33:16 +0100, Tvrtko Ursulin wrote:
> > On Thursday 12 Jul 2012 18:46:34 Jörn Engel wrote:
> > > Console driver similar to netconsole, except it writes to a block
> > > device.  Can be useful in a setup where netconsole, for whatever
> > > reasons, is impractical.
> > 
> > Perhaps you need to add a word or two about limitations compared to
> > netconsole in documentation because it is quite significant difference
> > in reliability? I mean so it is not assumed it is analogous to
> > netconsole but just a different underlying media. I don't know if
> > someone would expect it, but better said than not.
> 
> Given that I don't even know the limitations, that's a bit tough.  As
> a general rule, I would always prefer netconsole.  It appears to be
> more reliable than blockconsole and beats serial console by half a
> lightyear.  But as a fallback when netconsole is not realistic,
> blockconsole has proven useful.

At the very least block console does not work from interrupt context while 
netconsole does, right? Also netconsole does things to try and work around low 
memory situations. Things like that I think would be useful additions to 
documentation.
 
> > I second the notion that logging to partitions would be useful.
> 
> Below is a compile-tested patch to do that.  Feel free to give it a
> spin and fix any bugs.

I can't promise to do that in the very near future, but in principle idea 
could be interesting to me, at least to evaluate how reliable mechanism is 
with different storage interfaces and controllers.

> > Also, and I haven't checked what the swap format is, if it could somehow
> > be integrated together that could be useful.
> 
> That appears to be slightly less likely than crossbreeding a rabbit
> with a chicken.  Is there something obvious I have missed?

I was thinking how swap space is always there and is potentially much faster 
to write to than a random USB stick - which could translate to more reliable. 
Then it's a question of which storage subsystem (libata vs. usb-storage) would 
work better in different oops/panic situations. Again I tend to have less hope 
in USB based solutions - maybe it's my bias from working in that area many 
years ago. So the idea of swap space was that _if_ swap format could be 
extended to allocate a number of blocks to use other than swap, then that area 
could be used by blockconsole. Seemed like a convenient and potentially more 
reliable solution to me, but as I said the latter may depend.

Tvrtko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-24  8:01             ` Tvrtko Ursulin
@ 2012-07-24 14:38               ` Jörn Engel
  2012-07-25  8:17                 ` Tvrtko Ursulin
  0 siblings, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-07-24 14:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Tue, 24 July 2012 09:01:16 +0100, Tvrtko Ursulin wrote:
> On Monday 23 Jul 2012 21:02:30 Jörn Engel wrote:
> > On Mon, 23 July 2012 15:33:16 +0100, Tvrtko Ursulin wrote:
> > > On Thursday 12 Jul 2012 18:46:34 Jörn Engel wrote:
> 
> At the very least block console does not work from interrupt context while 
> netconsole does, right? Also netconsole does things to try and work around low 
> memory situations. Things like that I think would be useful additions to 
> documentation.

Blockconsole does work from interrupt context.  It has buffers for 1MB
worth of data.  Until those fill up, it only does a memcpy and
schedules a workqueue for writeback.  If you panic, it will do the
writeback immediately.  While I wouldn't believe this to always work,
I have yet to see a confirmed failure case.

Blockconsole itself has no allocations in the write path, so it should
be unaffected by low memory situation.  The underlying driver and
block layer code may well be.

> > > I second the notion that logging to partitions would be useful.
> > 
> > Below is a compile-tested patch to do that.  Feel free to give it a
> > spin and fix any bugs.
> 
> I can't promise to do that in the very near future, but in principle idea 
> could be interesting to me, at least to evaluate how reliable mechanism is 
> with different storage interfaces and controllers.

Fair enough.  In the meantime I will leave this code out.  Adding a
new interface that noone has tested would be pretty bad style.

> > > Also, and I haven't checked what the swap format is, if it could somehow
> > > be integrated together that could be useful.
> > 
> > That appears to be slightly less likely than crossbreeding a rabbit
> > with a chicken.  Is there something obvious I have missed?
> 
> I was thinking how swap space is always there and is potentially much faster 
> to write to than a random USB stick - which could translate to more reliable. 
> Then it's a question of which storage subsystem (libata vs. usb-storage) would 
> work better in different oops/panic situations. Again I tend to have less hope 
> in USB based solutions - maybe it's my bias from working in that area many 
> years ago. So the idea of swap space was that _if_ swap format could be 
> extended to allocate a number of blocks to use other than swap, then that area 
> could be used by blockconsole. Seemed like a convenient and potentially more 
> reliable solution to me, but as I said the latter may depend.

In my systems swap is often absent.  Plus, taking a few blocks a swap
aside is in the end just partitioning in a new dress.  So the argumen
appears to boil down to using partitions again.  The equivalent of
swap files might be interesting, but can also be somewhat scary.  So I
would leave it to others to actually write the code - if they care.

Libata is fine, blockconsole can work on any block device.

Jörn

--
Anything that can go wrong, will.
-- Finagle's Law

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-24 15:42                     ` Borislav Petkov
@ 2012-07-24 14:53                       ` Jörn Engel
  2012-07-24 16:25                         ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-07-24 14:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Tue, 24 July 2012 17:42:19 +0200, Borislav Petkov wrote:
> 
> Just a minor nuisance: I have this in the log:
> 
> ...
> [   10.498422] console [bcon0] enabled
> [   10.499899] blockconsole: now logging to /dev/sdc at 1
> [   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
>                                                                                                                                 
> [   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
> [   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
> ...
> 
> which doesn't have the empty line with a bunch of '\s' chars:
> 
> ...
> [   10.498422] console [bcon0] enabled
> [   10.499899] blockconsole: now logging to /dev/sdc at 1
> [   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
> [   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
> [   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
> ...
> 
> Do you know per-chance why that happens? I have a couple more lines like
> that further in the log file which bcon_tail generated.

If there is no logging for a second, blockconsole will flush the
current sector.  So however much of it is empty will be filled with
spaces and a newline at the end.  Result are those empty lines.

The advantage should be better robustness, in particular when dealing
with cheap flash devices.  Disadvantage is the wasted real estate on
your monitor - although sometimes I have found it nice to have syntax
highlighting (in a way) for pauses in the logging.

Jörn

--
It does not require a majority to prevail, but rather an irate,
tireless minority keen to set brush fires in people's minds.
-- Samuel Adams

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-23 20:04                   ` Jörn Engel
@ 2012-07-24 15:42                     ` Borislav Petkov
  2012-07-24 14:53                       ` Jörn Engel
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2012-07-24 15:42 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Mon, Jul 23, 2012 at 04:04:59PM -0400, Jörn Engel wrote:
> On Wed, 18 July 2012 23:45:21 +0200, Borislav Petkov wrote:
> > 
> > Something like the above, just slap it at the beginning of
> > Documentation/block/blockconsole.txt for impatient people like me and
> > that's it :-).
> 
> And below is an updated patch with your changes folded in.  I did a
> few minor tweaks, so there is every possibility I may have messed it
> up.

Ok, everything seems to build and boot fine, logging works too.

Just a minor nuisance: I have this in the log:

...
[   10.498422] console [bcon0] enabled
[   10.499899] blockconsole: now logging to /dev/sdc at 1
[   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
                                                                                                                                
[   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
[   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
...

which doesn't have the empty line with a bunch of '\s' chars:

...
[   10.498422] console [bcon0] enabled
[   10.499899] blockconsole: now logging to /dev/sdc at 1
[   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
[   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
[   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
...

Do you know per-chance why that happens? I have a couple more lines like
that further in the log file which bcon_tail generated.

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-24 14:53                       ` Jörn Engel
@ 2012-07-24 16:25                         ` Borislav Petkov
  2012-07-24 17:52                           ` Jörn Engel
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2012-07-24 16:25 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Tue, Jul 24, 2012 at 10:53:35AM -0400, Jörn Engel wrote:
> On Tue, 24 July 2012 17:42:19 +0200, Borislav Petkov wrote:
> > 
> > Just a minor nuisance: I have this in the log:
> > 
> > ...
> > [   10.498422] console [bcon0] enabled
> > [   10.499899] blockconsole: now logging to /dev/sdc at 1
> > [   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
> >                                                                                                                                 
> > [   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
> > [   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
> > ...
> > 
> > which doesn't have the empty line with a bunch of '\s' chars:
> > 
> > ...
> > [   10.498422] console [bcon0] enabled
> > [   10.499899] blockconsole: now logging to /dev/sdc at 1
> > [   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
> > [   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
> > [   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
> > ...
> > 
> > Do you know per-chance why that happens? I have a couple more lines like
> > that further in the log file which bcon_tail generated.
> 
> If there is no logging for a second, blockconsole will flush the
> current sector.  So however much of it is empty will be filled with
> spaces and a newline at the end.  Result are those empty lines.

... and this is consistent with the printk timestamps above: 10.5 to
12.6 seconds = one empty line.

> The advantage should be better robustness, in particular when dealing
> with cheap flash devices.

In the sense that we flush the current sector after one second the
latest so that we can lose as small amount of data as possible if the
system crashes right at that point?

And, at the same time, writes to cheap devices get flushed for sure?

> Disadvantage is the wasted real estate on your monitor - although
> sometimes I have found it nice to have syntax highlighting (in a way)
> for pauses in the logging.

Ok, I see what you mean. I see a red line in vim here. Ok, good to know,
maybe this feature with the empty lines could be in the docs too so
people don't ask that question again?

Or you issue a tag instead of an empty line like so:

[   10.498422] console [bcon0] enabled
[   10.499899] blockconsole: now logging to /dev/sdc at 1
[   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
<<LOG timeout of 1sec>>
[   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
[   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6

which explains everything.

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-24 16:25                         ` Borislav Petkov
@ 2012-07-24 17:52                           ` Jörn Engel
  2012-07-24 20:28                             ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Jörn Engel @ 2012-07-24 17:52 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Tue, 24 July 2012 18:25:47 +0200, Borislav Petkov wrote:
> > The advantage should be better robustness, in particular when dealing
> > with cheap flash devices.
> 
> In the sense that we flush the current sector after one second the
> latest so that we can lose as small amount of data as possible if the
> system crashes right at that point?

In the sense that cheap devices don't always handle rewrites of the
same sector well.  Often this results in the entire erase block being
rewritten, causing bad performance and wear-out.  Many cheap devices
aren't real block devices.  They are barely good enough to support
FAT and may die near-instantly with a different write pattern.
Blockconsole assumes utter crap as an underlying device.

The timer mainly ensures that, on a quiet system, those two lines of
output from half an hour ago actually make it to the device
eventually.  In the case of a crash, the panic notifier is supposed to
do the same for those messages you _really_ care about.

> Ok, I see what you mean. I see a red line in vim here. Ok, good to know,
> maybe this feature with the empty lines could be in the docs too so
> people don't ask that question again?

Last paragraph. ;)

> Or you issue a tag instead of an empty line like so:
> 
> [   10.498422] console [bcon0] enabled
> [   10.499899] blockconsole: now logging to /dev/sdc at 1
> [   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
> <<LOG timeout of 1sec>>
> [   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
> [   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
> 
> which explains everything.

That would only work if you have at least 26 bytes to pad.  Bunch of
spaces with a newline works for any value between 0 and 512.

Jörn

--
Invincibility is in oneself, vulnerability is in the opponent.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-24 17:52                           ` Jörn Engel
@ 2012-07-24 20:28                             ` Borislav Petkov
  2012-12-19 10:20                               ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2012-07-24 20:28 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Tue, Jul 24, 2012 at 01:52:32PM -0400, Jörn Engel wrote:
> On Tue, 24 July 2012 18:25:47 +0200, Borislav Petkov wrote:
> > > The advantage should be better robustness, in particular when dealing
> > > with cheap flash devices.
> > 
> > In the sense that we flush the current sector after one second the
> > latest so that we can lose as small amount of data as possible if the
> > system crashes right at that point?
> 
> In the sense that cheap devices don't always handle rewrites of the
> same sector well.  Often this results in the entire erase block being
> rewritten, causing bad performance and wear-out.  Many cheap devices
> aren't real block devices.  They are barely good enough to support
> FAT and may die near-instantly with a different write pattern.
> Blockconsole assumes utter crap as an underlying device.
> 
> The timer mainly ensures that, on a quiet system, those two lines of
> output from half an hour ago actually make it to the device
> eventually.  In the case of a crash, the panic notifier is supposed to
> do the same for those messages you _really_ care about.
> 
> > Ok, I see what you mean. I see a red line in vim here. Ok, good to know,
> > maybe this feature with the empty lines could be in the docs too so
> > people don't ask that question again?
> 
> Last paragraph. ;)
> 
> > Or you issue a tag instead of an empty line like so:
> > 
> > [   10.498422] console [bcon0] enabled
> > [   10.499899] blockconsole: now logging to /dev/sdc at 1
> > [   10.594791] usb 5-2: new full-speed USB device number 3 using ohci_hcd
> > <<LOG timeout of 1sec>>
> > [   12.665911] xhci_hcd 0000:00:10.0: xHCI Host Controller
> > [   12.668469] xhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 6
> > 
> > which explains everything.
> 
> That would only work if you have at least 26 bytes to pad.  Bunch of
> spaces with a newline works for any value between 0 and 512.

Ok, thanks for taking the time to explain this - very interesting stuff.

So, as far as I'm concerned blockconsole is ready for shipping! 8-)

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-24 14:38               ` Jörn Engel
@ 2012-07-25  8:17                 ` Tvrtko Ursulin
  2012-07-25 16:39                   ` Jörn Engel
  0 siblings, 1 reply; 27+ messages in thread
From: Tvrtko Ursulin @ 2012-07-25  8:17 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Tuesday 24 Jul 2012 15:38:22 Jörn Engel wrote:
> On Tue, 24 July 2012 09:01:16 +0100, Tvrtko Ursulin wrote:
> > On Monday 23 Jul 2012 21:02:30 Jörn Engel wrote:
> > > On Mon, 23 July 2012 15:33:16 +0100, Tvrtko Ursulin wrote:
> > > > On Thursday 12 Jul 2012 18:46:34 Jörn Engel wrote:
> > At the very least block console does not work from interrupt context
> > while netconsole does, right? Also netconsole does things to try and
> > work around low memory situations. Things like that I think would be
> > useful additions to documentation.
> 
> Blockconsole does work from interrupt context.  It has buffers for 1MB
> worth of data.  Until those fill up, it only does a memcpy and
> schedules a workqueue for writeback.  If you panic, it will do the
> writeback immediately.  While I wouldn't believe this to always work,
> I have yet to see a confirmed failure case.

As far as I know there is nothing like netpoll in the block layer so it has to 
be a lot less reliable than netconsole. Especially with delaying write out to 
a workqueue. Anyway, I am not arguing, just saying in my opinion those caveats 
are worth documenting.
 
> Blockconsole itself has no allocations in the write path, so it should
> be unaffected by low memory situation.  The underlying driver and
> block layer code may well be.

Same thing.
 
> > > > Also, and I haven't checked what the swap format is, if it could
> > > > somehow be integrated together that could be useful.
> > > 
> > > That appears to be slightly less likely than crossbreeding a rabbit
> > > with a chicken.  Is there something obvious I have missed?
> > 
> > I was thinking how swap space is always there and is potentially much
> > faster to write to than a random USB stick - which could translate to
> > more reliable. Then it's a question of which storage subsystem (libata
> > vs. usb-storage) would work better in different oops/panic situations.
> > Again I tend to have less hope in USB based solutions - maybe it's my
> > bias from working in that area many years ago. So the idea of swap space
> > was that _if_ swap format could be extended to allocate a number of
> > blocks to use other than swap, then that area could be used by
> > blockconsole. Seemed like a convenient and potentially more reliable
> > solution to me, but as I said the latter may depend.
> 
> In my systems swap is often absent.  Plus, taking a few blocks a swap
> aside is in the end just partitioning in a new dress.  So the argumen
> appears to boil down to using partitions again.  The equivalent of
> swap files might be interesting, but can also be somewhat scary.  So I
> would leave it to others to actually write the code - if they care.

I knew you'll pick me up on a new partitioning scheme. :) I just see it as 
convenience. Whereas it is often not possible (or at least to much effort) to 
create new partitions, swap if often around and potentially more reliable than 
a random USB stick (considering the whole data path).
 
> Libata is fine, blockconsole can work on any block device.

My point was that it's reliability will differ depending on the block device 
in use, which is unlike netconsole. Again I am not arguing against the 
feature, but if you don't see things like these are worth documenting I give 
up.

Tvrtko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-25  8:17                 ` Tvrtko Ursulin
@ 2012-07-25 16:39                   ` Jörn Engel
  0 siblings, 0 replies; 27+ messages in thread
From: Jörn Engel @ 2012-07-25 16:39 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

On Wed, 25 July 2012 09:17:09 +0100, Tvrtko Ursulin wrote:
> 
> As far as I know there is nothing like netpoll in the block layer so it has to 
> be a lot less reliable than netconsole. Especially with delaying write out to 
> a workqueue. Anyway, I am not arguing, just saying in my opinion those caveats 
> are worth documenting.
...
> My point was that it's reliability will differ depending on the block device 
> in use, which is unlike netconsole. Again I am not arguing against the 
> feature, but if you don't see things like these are worth documenting I give 
> up.

I have nothing against documenting things.  Can you suggest something
better than "reliability of blockconsole will depend on the
reliability of the underlying storage layer", which sounds rather
obvious?

Jörn

--
I don't understand it. Nobody does.
-- Richard P. Feynman

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-18 18:53               ` Jörn Engel
  2012-07-18 21:45                 ` Borislav Petkov
@ 2012-08-14 11:54                 ` Jan Engelhardt
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Engelhardt @ 2012-08-14 11:54 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Borislav Petkov, Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson


On Wednesday 2012-07-18 20:53, Jörn Engel wrote:
>
>> With the include added, it builds fine. Then I took an usb stick and I
>> did:
>> 
>> $ ./mkblockconsole /dev/sdc
>> 
>> <reboot>
>
>You can also run hdparm -z <dev> instead.

We have too many ways of doing some things.
util-linux conveniently has `blockdev --rereadpt /dev/sdc`.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] add blockconsole version 1.1
  2012-07-24 20:28                             ` Borislav Petkov
@ 2012-12-19 10:20                               ` Borislav Petkov
  0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2012-12-19 10:20 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-kernel, Jeff Moyer, Steve Hodgson

Hey Jörn,

On Tue, Jul 24, 2012 at 10:28:20PM +0200, Borislav Petkov wrote:
> Ok, thanks for taking the time to explain this - very interesting
> stuff.
>
> So, as far as I'm concerned blockconsole is ready for shipping! 8-)

you're probably very busy so I'll be quick: I still think that
blockconsole is a pretty useful thing and I (and I'm sure others too)
would like to restart the discussion and see it upstream eventually.

If there's anything I can do to help, pls let me know.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-12-19 10:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-24 20:59 [RFC][PATCH] add blockconsole Jörn Engel
2012-04-25 13:42 ` Jeff Moyer
2012-04-25 13:25   ` Jörn Engel
2012-04-25 15:52     ` Jeff Moyer
2012-07-12 17:46       ` [PATCH] add blockconsole version 1.1 Jörn Engel
2012-07-13 13:03         ` Borislav Petkov
2012-07-13 16:20           ` Jörn Engel
2012-07-13 21:14             ` Borislav Petkov
2012-07-16 12:46             ` Borislav Petkov
2012-07-18 18:53               ` Jörn Engel
2012-07-18 21:45                 ` Borislav Petkov
2012-07-18 21:08                   ` Jörn Engel
2012-07-19  9:26                     ` Borislav Petkov
2012-07-23 20:04                   ` Jörn Engel
2012-07-24 15:42                     ` Borislav Petkov
2012-07-24 14:53                       ` Jörn Engel
2012-07-24 16:25                         ` Borislav Petkov
2012-07-24 17:52                           ` Jörn Engel
2012-07-24 20:28                             ` Borislav Petkov
2012-12-19 10:20                               ` Borislav Petkov
2012-08-14 11:54                 ` Jan Engelhardt
2012-07-23 14:33         ` Tvrtko Ursulin
2012-07-23 20:02           ` Jörn Engel
2012-07-24  8:01             ` Tvrtko Ursulin
2012-07-24 14:38               ` Jörn Engel
2012-07-25  8:17                 ` Tvrtko Ursulin
2012-07-25 16:39                   ` Jörn Engel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.