All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] printk: support structured and multi-facility log messages
@ 2012-04-04 19:59 Kay Sievers
  2012-04-04 21:05 ` Greg Kroah-Hartmann
                   ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Kay Sievers @ 2012-04-04 19:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartmann

From: Kay Sievers <kay@vrfy.org>
Subject: printk: support structured and multi-facility log messages

Kernel log messages are the primary source of information about the overall
state of the system and connected devices. Traditional kernel messages are
mostly human language, targeted at a human reading them. This part of the
picture works very well since a very long time.

However, most machines run unattended almost all of their time, and
software, and not humans, need to process the kernel messages. Having
a machine making sense out of human language messages is inefficient,
unreliable, and sometimes plain impossible to get right. With human
language messages all useful information about their context,
available at the time of creation of the messages, is just thrown
away. Later, software consumers of the messages will need to apply
magic to reconstruct what the context might have been, to be able to
interpret the messages.

This patch extends printk() to be able to attach arbitrary key/value
pairs to logged messages, to carry machine-readable data which
describes the context of the log message at time of its
creation. Users of the log can retrieve, along with the human-readable
message, a key/value dictionary to reliably identify specific devices,
drivers, subsystems, classes and types of messages.

Various features of this patch:

- Record-based stream instead of the traditional byte stream
  buffer. All records carry a 64 bit timestamp, the syslog facility
  and priority in the record header.

- Records consume almost the same amount, sometimes less memory than
  the traditional byte stream buffer (if printk_time is enabled). The record
  header is 16 bytes long, plus some padding bytes at the end if needed.
  The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
  the timestamp and a newline.

- Buffer management is based on message sequence numbers. When records
  need to be discarded, the reading heads move on to the next full
  record. Unlike the byte-stream buffer, no old logged lines get
  truncated or partly overwritten by new ones. Sequence numbers also
  allow consumers of the log stream to get notified if any message in
  the stream they are about to read gets discarded during the time
  of reading.

- Better buffered IO support for KERN_CONT continuation lines, when printk()
  is called multiple times for a single line. The use of KERN_CONT is now
  mandatory to use continuation; a few places in the kernel need trivial fixes
  here. The buffering could possibly be extended to per-cpu variables to allow
  better thread-safety for multiple printk() invocations for a single line.

- Full-featured syslog facility value support. Different facilities
  can tag their messages. All userspace-injected messages enforce a
  facility value > 0 now, to be able to reliably distinguish them from
  the kernel-generated messages. Independent subsystems like a
  baseband processor running its own firmware, or a kernel-related
  userspace process can use their own unique facility values. Multiple
  independent log streams can co-exist that way in the same
  buffer. All share the same global sequence number counter to ensure
  proper ordering (and interleaving) and to allow the consumers of the
  log to reliably correlate the events from different facilities.

- Output of dev_printk() is reliably machine-readable now. In addition
  to the printed plain text message, it creates a log dictionary with the
  following properties:
    SUBSYSTEM=     - the driver-core subsytem name
    DEVICE=
      b12:8        - block dev_t
      c127:3       - char dev_t
      n8           - netdev ifindex
      +sound:card0 - subsystem:devname

- Support for multiple concurrent readers of /dev/kmsg, with read(),
  seek(), poll() support. Output of message sequence numbers, to allow
  userspace log consumers to reliably reconnect and reconstruct their
  state at any given time. After open("/dev/kmsg"), read() always
  returns *all* buffered records. If only future messages should be
  read, SEEK_END can be used. In case records get overwritten while
  /dev/kmsg is held open, or records get faster overwritten than they
  are read, the next read() will return -EPIPE and the current reading
  position gets updated to the next available record. The passed
  sequence numbers allow the log consumer to calculate the amount of
  lost messages.

  $ cat /dev/kmsg
  PRIORITY=5
  SEQNUM=0
  TIMESTAMP=0
  MESSAGE=Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...

  ...

  PRIORITY=7
  SEQNUM=268
  TIMESTAMP=399682
  MESSAGE=pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
  SUBSYSTEM=acpi
  DEVICE=+acpi:PNP0A03:00

  ...

  SYSLOG_FACILITY=3
  PRIORITY=6
  SEQNUM=863
  TIMESTAMP=2479024
  MESSAGE=udevd[71]: starting version 182

  ...

  PRIORITY=6
  SEQNUM=1012
  TIMESTAMP=4069447
  MESSAGE=usb 2-1.4: MAC-Address: 02:80:37:e6:12:00
  SUBSYSTEM=usb
  DEVICE=c189:130

Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
---
 drivers/base/core.c    |   49 +
 drivers/char/mem.c     |   40 -
 include/linux/printk.h |   13 
 kernel/printk.c        | 1391 +++++++++++++++++++++++++++++++++----------------
 4 files changed, 1024 insertions(+), 469 deletions(-)

--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -25,6 +25,7 @@
 #include <linux/mutex.h>
 #include <linux/async.h>
 #include <linux/pm_runtime.h>
+#include <linux/netdevice.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -1843,15 +1844,57 @@ void device_shutdown(void)
  */
 
 #ifdef CONFIG_PRINTK
-
 int __dev_printk(const char *level, const struct device *dev,
 		 struct va_format *vaf)
 {
+	char dict[128];
+	size_t dictlen = 0;
+	const char *subsys;
+
 	if (!dev)
 		return printk("%s(NULL device *): %pV", level, vaf);
 
-	return printk("%s%s %s: %pV",
-		      level, dev_driver_string(dev), dev_name(dev), vaf);
+	if (dev->class)
+		subsys = dev->class->name;
+	else if (dev->bus)
+		subsys = dev->bus->name;
+	else
+		subsys = "(NULL subsystem)";
+
+	dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+			    "SUBSYSTEM=%s\n", subsys);
+
+	/*
+	 * Add device identifier DEVICE=:
+	 *   b12:8         block dev_t
+	 *   c127:3        char dev_t
+	 *   n8            netdev ifindex
+	 *   +sound:card0  subsystem:devname
+	 */
+	if (MAJOR(dev->devt)) {
+		char c;
+
+		if (strcmp(subsys, "block") == 0)
+			c = 'b';
+		else
+			c = 'c';
+		dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+				   "DEVICE=%c%u:%u\n",
+				   c, MAJOR(dev->devt), MINOR(dev->devt));
+	} else if (strcmp(subsys, "net") == 0) {
+		struct net_device *net = to_net_dev(dev);
+
+		dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+				    "DEVICE=n%u\n", net->ifindex);
+	} else {
+		dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+				    "DEVICE=+%s:%s\n", subsys, dev_name(dev));
+	}
+
+	return printk_emit(0, level[1] - '0',
+			   dict, dictlen,
+			   "%s %s: %pV",
+			   dev_driver_string(dev), dev_name(dev), vaf);
 }
 EXPORT_SYMBOL(__dev_printk);
 
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -807,44 +807,6 @@ static const struct file_operations oldm
 };
 #endif
 
-static ssize_t kmsg_writev(struct kiocb *iocb, const struct iovec *iv,
-			   unsigned long count, loff_t pos)
-{
-	char *line, *p;
-	int i;
-	ssize_t ret = -EFAULT;
-	size_t len = iov_length(iv, count);
-
-	line = kmalloc(len + 1, GFP_KERNEL);
-	if (line == NULL)
-		return -ENOMEM;
-
-	/*
-	 * copy all vectors into a single string, to ensure we do
-	 * not interleave our log line with other printk calls
-	 */
-	p = line;
-	for (i = 0; i < count; i++) {
-		if (copy_from_user(p, iv[i].iov_base, iv[i].iov_len))
-			goto out;
-		p += iv[i].iov_len;
-	}
-	p[0] = '\0';
-
-	ret = printk("%s", line);
-	/* printk can add a prefix */
-	if (ret > len)
-		ret = len;
-out:
-	kfree(line);
-	return ret;
-}
-
-static const struct file_operations kmsg_fops = {
-	.aio_write = kmsg_writev,
-	.llseek = noop_llseek,
-};
-
 static const struct memdev {
 	const char *name;
 	umode_t mode;
@@ -863,7 +825,7 @@ static const struct memdev {
 	 [7] = { "full", 0666, &full_fops, NULL },
 	 [8] = { "random", 0666, &random_fops, NULL },
 	 [9] = { "urandom", 0666, &urandom_fops, NULL },
-	[11] = { "kmsg", 0, &kmsg_fops, NULL },
+	[11] = { "kmsg", 0644, &kmsg_fops, NULL },
 #ifdef CONFIG_CRASH_DUMP
 	[12] = { "oldmem", 0, &oldmem_fops, NULL },
 #endif
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -95,8 +95,19 @@ extern int printk_needs_cpu(int cpu);
 extern void printk_tick(void);
 
 #ifdef CONFIG_PRINTK
+asmlinkage __printf(5, 0)
+int vprintk_emit(int facility, int level,
+		 const char *dict, size_t dictlen,
+		 const char *fmt, va_list args);
+
 asmlinkage __printf(1, 0)
 int vprintk(const char *fmt, va_list args);
+
+asmlinkage __printf(5, 6) __cold
+asmlinkage int printk_emit(int facility, int level,
+			   const char *dict, size_t dictlen,
+			   const char *fmt, ...);
+
 asmlinkage __printf(1, 2) __cold
 int printk(const char *fmt, ...);
 
@@ -289,6 +300,8 @@ extern void dump_stack(void) __cold;
 	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #endif
 
+extern const struct file_operations kmsg_fops;
+
 enum {
 	DUMP_PREFIX_NONE,
 	DUMP_PREFIX_ADDRESS,
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -41,6 +41,7 @@
 #include <linux/cpu.h>
 #include <linux/notifier.h>
 #include <linux/rculist.h>
+#include <linux/poll.h>
 
 #include <asm/uaccess.h>
 
@@ -54,8 +55,6 @@ void asmlinkage __attribute__((weak)) ea
 {
 }
 
-#define __LOG_BUF_LEN	(1 << CONFIG_LOG_BUF_SHIFT)
-
 /* printk's without a loglevel use this.. */
 #define DEFAULT_MESSAGE_LOGLEVEL CONFIG_DEFAULT_MESSAGE_LOGLEVEL
 
@@ -99,24 +98,6 @@ EXPORT_SYMBOL_GPL(console_drivers);
 static int console_locked, console_suspended;
 
 /*
- * logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars
- * It is also used in interesting ways to provide interlocking in
- * console_unlock();.
- */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-#define LOG_BUF_MASK (log_buf_len-1)
-#define LOG_BUF(idx) (log_buf[(idx) & LOG_BUF_MASK])
-
-/*
- * The indices into log_buf are not constrained to log_buf_len - they
- * must be masked before subscripting
- */
-static unsigned log_start;	/* Index into log_buf: next char to be read by syslog() */
-static unsigned con_start;	/* Index into log_buf: next char to be sent to consoles */
-static unsigned log_end;	/* Index into log_buf: most-recently-written-char + 1 */
-
-/*
  * If exclusive_console is non-NULL then only this console is to be printed to.
  */
 static struct console *exclusive_console;
@@ -146,12 +127,546 @@ EXPORT_SYMBOL(console_set_on_cmdline);
 static int console_may_schedule;
 
 #ifdef CONFIG_PRINTK
+/*
+ * The printk log buffer consists of a chain of concatenated variable
+ * length records. Every record starts with a record header, containing
+ * the overall length of the record.
+ *
+ * The heads to the first and last entry in the buffer, as well as the
+ * sequence numbers of these both entries are maintained.
+ *
+ * If the heads indicate available messages, a length == 0 for the next
+ * message indicates a wrap-around to the beginning of the buffer.
+ *
+ * Every record carries the monotonic timestamp in nanoseconds, as well as
+ * the standard userspace syslog level and syslog facility. The usual
+ * kernel messages use LOG_KERN; userspace-injected messages always carry
+ * a matching syslog facility, by default LOG_USER. The origin of every
+ * message can be reliably determined that way.
+ *
+ * The human readable log message directly follows the message header. The
+ * length of the message text is stored in the header, the message is _not_
+ * terminated by a '\n' or '\0' character.
+ *
+ * Optionally, a message can carry a dictionary of properties (key/value pairs).
+ * The content of the dictionary is opaque to the printk code, it is stored
+ * only to provide userspace with a machine-readable message context.
+ *
+ * Examples for well-defined, commonly used property names are:
+ *   DEVICE=b12:8               device identifier
+ *                                b12:8         block dev_t
+ *                                c127:3        char dev_t
+ *                                n8            netdev ifindex
+ *                                +sound:card0  subsystem:devname
+ *   SUBSYSTEM=pci              driver-core subsystem name
+ *
+ * Valid characters in property names are [a-zA-Z0-9.-_]. The plain text value
+ * follows directly the after a '=' character. Every property is terminated by
+ * a '\n' character.
+ *
+ * Any value that can possibly carry non-printable characters must be
+ * explicitly encoded as a binary value.
+ *
+ * A binary value is encoded by not using a '=' character as a delimiter,
+ * but a '\n' character followed directly by a non-aligned inlined little-endian
+ * uint64_t, which specifies the number of binary bytes to read as value.
+ * Example for a dictionary containing a 4 byte long binary value:
+ *  "DEVICE=b12:8\nFIRMWARE_DUMP\n\x04\x00\x00\x00\x00\x00\x00\x00\x11\x12\x13\x14\nDRIVER=ahci\n"
+ *
+ * Userspace must be able to trust the generated key/value output stream to be
+ * valid regarding the key and newline sequence. Values containing newlines
+ * could be interpreted as keys, and would be a potential security risk. Any
+ * properties copied verbatim from hardware, or other untrusted sources, into
+ * the dictionary must be encoded as binary, to avoid such problems.
+ *
+ * Example of a message structure:
+ *   0000  ff 8f 00 00 00 00 00 00      monotonic time in nsec
+ *   0008  38 00                        record is 56 bytes long
+ *   000a        0b 00                  text is 11 bytes long
+ *   000c              1f 00            dictionary is 23 bytes long
+ *   000e                    03         LOG_ERR (level)
+ *   000f                       00      LOG_KERN (facility)
+ *   0010  69 74 27 73 20 61 20 6c      "it's a l"
+ *         69 6e 65                     "ine"
+ *   001b           44 45 56 49 43      "DEVIC"
+ *         45 3d 62 38 3a 32 0a 44      "E=b8:2\nD"
+ *         52 49 56 45 52 3d 62 75      "RIVER=bu"
+ *         67 0a                        "g\n"
+ *   0032        00 00 00 00 00 00      padding to next message header
+ *
+ * The 'struct log' buffer header must never be directly exported to
+ * userspace, it is a kernel-private implementation detail.
+ *
+ * The classic syslog() syscall format export must only export the human
+ * readable text message, with the syslog level and facility prefixed
+ * as <>. Adding unconverted binary data would break the syslog() syscall
+ * export format.
+ *
+ * Interfaces that export the structured dictionary data, should follow the
+ * binary record format definition of the dictionary and use the following
+ * standard property names:
+ *   SYSLOG_FACILITY=           syslog facility number
+ *   PRIORITY=                  message priority number
+ *   SEQNUM=                    log record sequence number
+ *   TIMESTAMP=                 monotonic time in microseconds
+ *   MESSAGE=                   human readable message
+ *
+ * The optional dictionary properties attached to the record should be copied
+ * in the same output format directly after the standard message properties.
+ *
+ * In a flat text stream of records, every record should be terminated
+ * by "\n\n", which separates the records from each other.
+ */
+
+struct log {
+	u64 ts_nsec;		/* timestamp in nanoseconds */
+	u16 len;		/* length of entire record */
+	u16 text_len;		/* length of text buffer */
+	u16 dict_len;		/* length of dictionary buffer */
+	u8 level;		/* syslog level (LOG_WARN, LOG_ERR, ...) */
+	u8 facility;		/* syslog facility (LOG_KERN == 0) */
+};
 
+/*
+ * The logbuf_lock protects kmsg buffer, indices, counters. It is also
+ * used in interesting ways to provide interlocking in console_unlock();
+ */
+static DEFINE_RAW_SPINLOCK(logbuf_lock);
+
+/* cpu currently holding logbuf_lock */
+static volatile unsigned int logbuf_cpu = UINT_MAX;
+
+#define LOG_LINE_MAX 1024
+
+/* record buffer */
+#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
 static char __log_buf[__LOG_BUF_LEN];
 static char *log_buf = __log_buf;
-static int log_buf_len = __LOG_BUF_LEN;
-static unsigned logged_chars; /* Number of chars produced since last read+clear operation */
-static int saved_console_loglevel = -1;
+static u32 log_buf_len = __LOG_BUF_LEN;
+
+/* index and sequence number of the first record stored in the buffer */
+static u64 log_first_seq;
+static u32 log_first_idx;
+
+/* index and sequence number of the next record to store in the buffer */
+static u64 log_next_seq;
+static u32 log_next_idx;
+
+/* the next printk record to read after the last 'clear' command */
+static u64 clear_seq;
+static u32 clear_idx;
+
+/* the next printk record to read by syslog(READ) or /proc/kmsg */
+static u64 syslog_seq;
+static u32 syslog_idx;
+
+/* human readable text of the record */
+static char *log_text(const struct log *msg)
+{
+	return (char *)msg + sizeof(struct log);
+}
+
+/* optional key/value pair dictionary attached to the record */
+static char *log_dict(const struct log *msg)
+{
+	return (char *)msg + sizeof(struct log) + msg->text_len;
+}
+
+/* get record by index; idx must point to valid msg */
+static struct log *log_from_idx(u32 idx)
+{
+	struct log *msg = (struct log *)(log_buf + idx);
+
+	/*
+	 * A length == 0 record is the end of buffer marker. Wrap around and
+	 * read the message at the start of the buffer.
+	 */
+	if (!msg->len)
+		return (struct log *)log_buf;
+	return msg;
+}
+
+/* get next record; idx must point to valid msg */
+static u32 log_next(u32 idx)
+{
+	struct log *msg = (struct log *)(log_buf + idx);
+
+	/* length == 0 indicates the end of the buffer; wrap */
+	/*
+	 * A length == 0 record is the end of buffer marker. Wrap around and
+	 * read the message at the start of the buffer as *this* one, and
+	 * return the one after that.
+	 */
+	if (!msg->len) {
+		msg = (struct log *)log_buf;
+		return msg->len;
+	}
+	return idx + msg->len;
+}
+
+#if !defined(CONFIG_64BIT) || defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+#define LOG_ALIGN 4
+#else
+#define LOG_ALIGN 8
+#endif
+
+/* insert record into the buffer, discard old ones, update heads */
+static void log_store(int facility, int level,
+		      const char *dict, u16 dict_len,
+		      const char *text, u16 text_len)
+{
+	struct log *msg;
+	u32 size, pad_len;
+
+	/* number of '\0' padding bytes to next message */
+	size = sizeof(struct log) + text_len + dict_len;
+	pad_len = (-size) & (LOG_ALIGN - 1);
+	size += pad_len;
+
+	while (log_first_seq < log_next_seq) {
+		u32 free;
+
+		if (log_next_idx > log_first_idx)
+			free = max(log_buf_len - log_next_idx, log_first_idx);
+		else
+			free = log_first_idx - log_next_idx;
+
+		if (free > size + sizeof(struct log))
+			break;
+
+		/* drop old messages until we have enough contiuous space */
+		log_first_idx = log_next(log_first_idx);
+		log_first_seq++;
+	}
+
+	if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
+		/*
+		 * This message + an additional empty header does not fit
+		 * at the end of the buffer. Add an empty header with len == 0
+		 * to signify a wrap around.
+		 */
+		memset(log_buf + log_next_idx, 0, sizeof(struct log));
+		log_next_idx = 0;
+	}
+
+	/* fill message */
+	msg = (struct log *)(log_buf + log_next_idx);
+	memcpy(log_text(msg), text, text_len);
+	msg->text_len = text_len;
+	memcpy(log_dict(msg), dict, dict_len);
+	msg->dict_len = dict_len;
+	msg->level = level;
+	msg->facility = facility;
+	msg->ts_nsec = local_clock();
+	memset(log_dict(msg) + dict_len, 0, pad_len);
+	msg->len = sizeof(struct log) + text_len + dict_len + pad_len;
+
+	/* insert message */
+	log_next_idx += msg->len;
+	log_next_seq++;
+}
+
+/* /dev/kmsg - userspace message inject/listen interface */
+struct devkmsg_user {
+	u64 seq;
+	u32 idx;
+	struct mutex lock;
+	char buf[8192];
+};
+
+static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,
+			      unsigned long count, loff_t pos)
+{
+	char *buf, *line;
+	int i;
+	int level = default_message_loglevel;
+	int facility = 1;	/* LOG_USER */
+	size_t len = iov_length(iv, count);
+	ssize_t ret = len;
+
+	if (len > LOG_LINE_MAX)
+		return -EINVAL;
+	buf = kmalloc(len+1, GFP_KERNEL);
+	if (buf == NULL)
+		return -ENOMEM;
+
+	line = buf;
+	for (i = 0; i < count; i++) {
+		if (copy_from_user(line, iv[i].iov_base, iv[i].iov_len))
+			goto out;
+		line += iv[i].iov_len;
+	}
+
+	/*
+	 * Extract and skip the syslog prefix <[0-9]*>. Coming from userspace
+	 * the decimal value represents 32bit, the lower 3 bit are the log
+	 * level, the rest are the log facility.
+	 *
+	 * If no prefix or no userspace facility is specified, we
+	 * enforce LOG_USER, to be able to reliably distinguish
+	 * kernel-generated messages from userspace-injected ones.
+	 */
+	line = buf;
+	if (line[0] == '<') {
+		char *endp = NULL;
+
+		i = simple_strtoul(line+1, &endp, 10);
+		if (endp && endp[0] == '>') {
+			level = i & 7;
+			if (i >> 3)
+				facility = i >> 3;
+			endp++;
+			len -= endp - line;
+			line = endp;
+		}
+	}
+	line[len] = '\0';
+
+	printk_emit(facility, level, NULL, 0, "%s", line);
+out:
+	kfree(buf);
+	return ret;
+}
+
+static ssize_t devkmsg_read(struct file *file, char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	struct devkmsg_user *user = file->private_data;
+	struct log *msg;
+	bool binary = false;
+	size_t i;
+	size_t len;
+	ssize_t ret;
+
+	if (!user)
+		return -EBADF;
+
+	mutex_lock(&user->lock);
+	raw_spin_lock(&logbuf_lock);
+	while (user->seq == log_next_seq) {
+		if (file->f_flags & O_NONBLOCK) {
+			ret = -EAGAIN;
+			raw_spin_unlock(&logbuf_lock);
+			goto out;
+		}
+
+		raw_spin_unlock(&logbuf_lock);
+		ret = wait_event_interruptible(log_wait,
+					       user->seq != log_next_seq);
+		if (ret)
+			goto out;
+		raw_spin_lock(&logbuf_lock);
+	}
+
+	if (user->seq < log_first_seq) {
+		/* our last seen message is gone, return error and reset */
+		user->idx = log_first_idx;
+		user->seq = log_first_seq;
+		ret = -EPIPE;
+		raw_spin_unlock(&logbuf_lock);
+		goto out;
+	}
+
+	msg = log_from_idx(user->idx);
+	len = 0;
+
+	if (msg->facility) {
+		memcpy(user->buf + len, "SYSLOG_FACILITY=", 16);
+		len += 16;
+		len += sprintf(user->buf + len, "%u", msg->facility);
+		user->buf[len++] = '\n';
+	}
+
+	memcpy(user->buf + len, "PRIORITY=", 9);
+	len += 9;
+	user->buf[len++] = '0' + msg->level;
+	user->buf[len++] = '\n';
+
+	memcpy(user->buf + len, "SEQNUM=", 7);
+	len += 7;
+	len += sprintf(user->buf + len, "%llu", user->seq);
+	user->buf[len++] = '\n';
+
+	if (msg->ts_nsec) {
+		memcpy(user->buf + len, "TIMESTAMP=", 10);
+		len += 10;
+		len += sprintf(user->buf + len, "%llu", msg->ts_nsec / 1000);
+		user->buf[len++] = '\n';
+	}
+
+	memcpy(user->buf + len, "MESSAGE", 7);
+	len += 7;
+	if (len + msg->text_len + msg->dict_len + 2 > sizeof(user->buf)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* any non-printable characters switch to binary property output */
+	for (i = 0; i < msg->text_len; i++) {
+		char c = log_text(msg)[i];
+
+		if (c < ' ' || c >= 128) {
+			binary = true;
+			break;
+		}
+	}
+	if (binary) {
+		u64 len_le64;
+
+		/* add uint64_t little-endian length of binary data */
+		user->buf[len++] = '\n';
+		len_le64 = cpu_to_le64(msg->text_len+1);
+		memcpy(user->buf + len, &len_le64, sizeof(len_le64));
+		len += sizeof(len_le64);
+	} else {
+		/* plain text value */
+		user->buf[len++] = '=';
+	}
+
+	memcpy(user->buf + len, log_text(msg), msg->text_len);
+	len += msg->text_len;
+	user->buf[len++] = '\n';
+
+	/*
+	 * Copy dictionary; all keys, including the last one, are expected to
+	 * be terminated by '\n'.
+	 */
+	memcpy(user->buf + len, log_dict(msg), msg->dict_len);
+	len += msg->dict_len;
+
+	/*
+	 * Terminate the record by a second '\n' to make the /dev/kmgs'
+	 * output parsable as a text stream and human readable.
+	 */
+	user->buf[len++] = '\n';
+
+	user->idx = log_next(user->idx);
+	user->seq++;
+	raw_spin_unlock(&logbuf_lock);
+
+	if (len > count) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (copy_to_user(buf, user->buf, len)) {
+		ret = -EFAULT;
+		goto out;
+	}
+	ret = len;
+out:
+	mutex_unlock(&user->lock);
+	return ret;
+}
+
+static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct devkmsg_user *user = file->private_data;
+	loff_t ret = 0;
+
+	if (!user)
+		return -EBADF;
+	if (offset)
+		return -ESPIPE;
+
+	raw_spin_lock(&logbuf_lock);
+	switch (whence) {
+	case SEEK_SET:
+		/* the first record */
+		user->idx = log_first_idx;
+		user->seq = log_first_seq;
+		break;
+	case SEEK_DATA:
+		/*
+		 * The first record after the last SYSLOG_ACTION_CLEAR,
+		 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
+		 * changes no global state, and does not clear anything.
+		 */
+		user->idx = clear_idx;
+		user->seq = clear_seq;
+		break;
+	case SEEK_END:
+		/* after the last record */
+		user->idx = log_next_idx;
+		user->seq = log_next_seq;
+		break;
+	default:
+		ret = -EINVAL;
+	}
+	raw_spin_unlock(&logbuf_lock);
+	return ret;
+}
+
+static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
+{
+	struct devkmsg_user *user = file->private_data;
+	int ret = 0;
+
+	if (!user)
+		return POLLERR|POLLNVAL;
+
+	poll_wait(file, &log_wait, wait);
+
+	raw_spin_lock(&logbuf_lock);
+	if (user->seq < log_next_seq) {
+		/* return error when data has vanished underneath us */
+		if (user->seq < log_first_seq)
+			ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
+		ret = POLLIN|POLLRDNORM;
+	}
+	raw_spin_unlock(&logbuf_lock);
+
+	return ret;
+}
+
+static int devkmsg_open(struct inode *inode, struct file *file)
+{
+	struct devkmsg_user *user;
+	int err;
+
+	/* write-only does not need any file context */
+	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+		return 0;
+
+	err = security_syslog(SYSLOG_ACTION_READ_ALL);
+	if (err)
+		return err;
+
+	user = kmalloc(sizeof(struct devkmsg_user), GFP_KERNEL);
+	if (!user)
+		return -ENOMEM;
+
+	mutex_init(&user->lock);
+
+	raw_spin_lock(&logbuf_lock);
+	user->idx = log_first_idx;
+	user->seq = log_first_seq;
+	raw_spin_unlock(&logbuf_lock);
+
+	file->private_data = user;
+	return 0;
+}
+
+static int devkmsg_release(struct inode *inode, struct file *file)
+{
+	struct devkmsg_user *user = file->private_data;
+
+	if (!user)
+		return 0;
+
+	mutex_destroy(&user->lock);
+	kfree(user);
+	return 0;
+}
+
+const struct file_operations kmsg_fops = {
+	.open = devkmsg_open,
+	.read = devkmsg_read,
+	.aio_write = devkmsg_writev,
+	.llseek = devkmsg_llseek,
+	.poll = devkmsg_poll,
+	.release = devkmsg_release,
+};
 
 #ifdef CONFIG_KEXEC
 /*
@@ -165,9 +680,9 @@ static int saved_console_loglevel = -1;
 void log_buf_kexec_setup(void)
 {
 	VMCOREINFO_SYMBOL(log_buf);
-	VMCOREINFO_SYMBOL(log_end);
 	VMCOREINFO_SYMBOL(log_buf_len);
-	VMCOREINFO_SYMBOL(logged_chars);
+	VMCOREINFO_SYMBOL(log_first_idx);
+	VMCOREINFO_SYMBOL(log_next_idx);
 }
 #endif
 
@@ -191,7 +706,6 @@ early_param("log_buf_len", log_buf_len_s
 void __init setup_log_buf(int early)
 {
 	unsigned long flags;
-	unsigned start, dest_idx, offset;
 	char *new_log_buf;
 	int free;
 
@@ -219,20 +733,8 @@ void __init setup_log_buf(int early)
 	log_buf_len = new_log_buf_len;
 	log_buf = new_log_buf;
 	new_log_buf_len = 0;
-	free = __LOG_BUF_LEN - log_end;
-
-	offset = start = min(con_start, log_start);
-	dest_idx = 0;
-	while (start != log_end) {
-		unsigned log_idx_mask = start & (__LOG_BUF_LEN - 1);
-
-		log_buf[dest_idx] = __log_buf[log_idx_mask];
-		start++;
-		dest_idx++;
-	}
-	log_start -= offset;
-	con_start -= offset;
-	log_end -= offset;
+	free = __LOG_BUF_LEN - log_next_idx;
+	memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	pr_info("log_buf_len: %d\n", log_buf_len);
@@ -332,11 +834,165 @@ static int check_syslog_permissions(int
 	return 0;
 }
 
+#if defined(CONFIG_PRINTK_TIME)
+static bool printk_time = 1;
+#else
+static bool printk_time;
+#endif
+module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
+
+static int syslog_print_line(u32 idx, char *text, size_t size)
+{
+	struct log *msg;
+	size_t len;
+
+	msg = log_from_idx(idx);
+	if (!text) {
+		/* calculate length only */
+		len = 3;
+
+		if (msg->facility)
+			len++;
+		if (msg->facility > 12)
+			len++;
+
+		if (printk_time)
+			len += 15;
+
+		len += msg->text_len;
+		len++;
+		return len;
+	}
+
+	len = sprintf(text, "<%u>", (msg->facility << 3) | msg->level);
+
+	if (printk_time) {
+		unsigned long long t = msg->ts_nsec;
+		unsigned long rem_ns = do_div(t, 1000000000);
+
+		len += sprintf(text + len, "[%5lu.%06lu] ",
+				   (unsigned long) t, rem_ns / 1000);
+	}
+
+	if (len + msg->text_len > size)
+		return -EINVAL;
+	memcpy(text + len, log_text(msg), msg->text_len);
+	len += msg->text_len;
+	text[len++] = '\n';
+	return len;
+}
+
+static int syslog_print(char __user *buf, int size)
+{
+	char *text;
+	int len;
+
+	text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+	if (!text)
+		return -ENOMEM;
+
+	raw_spin_lock_irq(&logbuf_lock);
+	if (syslog_seq < log_first_seq) {
+		/* messages are gone, move to first one */
+		syslog_seq = log_first_seq;
+		syslog_idx = log_first_idx;
+	}
+	len = syslog_print_line(syslog_idx, text, LOG_LINE_MAX);
+	syslog_idx = log_next(syslog_idx);
+	syslog_seq++;
+	raw_spin_unlock_irq(&logbuf_lock);
+
+	if (len > 0 && copy_to_user(buf, text, len))
+		len = -EFAULT;
+
+	kfree(text);
+	return len;
+}
+
+static int syslog_print_all(char __user *buf, int size, bool clear)
+{
+	char *text;
+	int len = 0;
+
+	text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+	if (!text)
+		return -ENOMEM;
+
+	raw_spin_lock_irq(&logbuf_lock);
+	if (buf) {
+		u64 next_seq;
+		u64 seq;
+		u32 idx;
+
+		if (clear_seq < log_first_seq) {
+			/* messages are gone, move to first available one */
+			clear_seq = log_first_seq;
+			clear_idx = log_first_idx;
+		}
+
+		/*
+		 * Find first record that fits, including all following records,
+		 * into the user-provided buffer for this dump.
+		*/
+		seq = clear_seq;
+		idx = clear_idx;
+		while (seq < log_next_seq) {
+			len += syslog_print_line(idx, NULL, 0);
+			idx = log_next(idx);
+			seq++;
+		}
+		seq = clear_seq;
+		idx = clear_idx;
+		while (len > size && seq < log_next_seq) {
+			len -= syslog_print_line(idx, NULL, 0);
+			idx = log_next(idx);
+			seq++;
+		}
+
+		/* last message in this dump */
+		next_seq = log_next_seq;
+
+		len = 0;
+		while (len >= 0 && seq < next_seq) {
+			int textlen;
+
+			textlen = syslog_print_line(idx, text, LOG_LINE_MAX);
+			if (textlen < 0) {
+				len = textlen;
+				break;
+			}
+			idx = log_next(idx);
+			seq++;
+
+			raw_spin_unlock_irq(&logbuf_lock);
+			if (copy_to_user(buf + len, text, textlen))
+				len = -EFAULT;
+			else
+				len += textlen;
+			raw_spin_lock_irq(&logbuf_lock);
+
+			if (seq < log_first_seq) {
+				/* messages are gone, move to next one */
+				seq = log_first_seq;
+				idx = log_first_idx;
+			}
+		}
+	}
+
+	if (clear) {
+		clear_seq = log_next_seq;
+		clear_idx = log_next_idx;
+	}
+	raw_spin_unlock_irq(&logbuf_lock);
+
+	kfree(text);
+	return len;
+}
+
 int do_syslog(int type, char __user *buf, int len, bool from_file)
 {
-	unsigned i, j, limit, count;
-	int do_clear = 0;
-	char c;
+	bool clear = false;
+	static int saved_console_loglevel = -1;
 	int error;
 
 	error = check_syslog_permissions(type, from_file);
@@ -364,28 +1020,14 @@ int do_syslog(int type, char __user *buf
 			goto out;
 		}
 		error = wait_event_interruptible(log_wait,
-							(log_start - log_end));
+						 syslog_seq != log_next_seq);
 		if (error)
 			goto out;
-		i = 0;
-		raw_spin_lock_irq(&logbuf_lock);
-		while (!error && (log_start != log_end) && i < len) {
-			c = LOG_BUF(log_start);
-			log_start++;
-			raw_spin_unlock_irq(&logbuf_lock);
-			error = __put_user(c,buf);
-			buf++;
-			i++;
-			cond_resched();
-			raw_spin_lock_irq(&logbuf_lock);
-		}
-		raw_spin_unlock_irq(&logbuf_lock);
-		if (!error)
-			error = i;
+		error = syslog_print(buf, len);
 		break;
 	/* Read/clear last kernel messages */
 	case SYSLOG_ACTION_READ_CLEAR:
-		do_clear = 1;
+		clear = true;
 		/* FALL THRU */
 	/* Read last kernel messages */
 	case SYSLOG_ACTION_READ_ALL:
@@ -399,52 +1041,11 @@ int do_syslog(int type, char __user *buf
 			error = -EFAULT;
 			goto out;
 		}
-		count = len;
-		if (count > log_buf_len)
-			count = log_buf_len;
-		raw_spin_lock_irq(&logbuf_lock);
-		if (count > logged_chars)
-			count = logged_chars;
-		if (do_clear)
-			logged_chars = 0;
-		limit = log_end;
-		/*
-		 * __put_user() could sleep, and while we sleep
-		 * printk() could overwrite the messages
-		 * we try to copy to user space. Therefore
-		 * the messages are copied in reverse. <manfreds>
-		 */
-		for (i = 0; i < count && !error; i++) {
-			j = limit-1-i;
-			if (j + log_buf_len < log_end)
-				break;
-			c = LOG_BUF(j);
-			raw_spin_unlock_irq(&logbuf_lock);
-			error = __put_user(c,&buf[count-1-i]);
-			cond_resched();
-			raw_spin_lock_irq(&logbuf_lock);
-		}
-		raw_spin_unlock_irq(&logbuf_lock);
-		if (error)
-			break;
-		error = i;
-		if (i != count) {
-			int offset = count-error;
-			/* buffer overflow during copy, correct user buffer. */
-			for (i = 0; i < error; i++) {
-				if (__get_user(c,&buf[i+offset]) ||
-				    __put_user(c,&buf[i])) {
-					error = -EFAULT;
-					break;
-				}
-				cond_resched();
-			}
-		}
+		error = syslog_print_all(buf, len, clear);
 		break;
 	/* Clear ring buffer */
 	case SYSLOG_ACTION_CLEAR:
-		logged_chars = 0;
-		break;
+		syslog_print_all(NULL, 0, true);
 	/* Disable logging to console */
 	case SYSLOG_ACTION_CONSOLE_OFF:
 		if (saved_console_loglevel == -1)
@@ -472,7 +1073,33 @@ int do_syslog(int type, char __user *buf
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		error = log_end - log_start;
+		raw_spin_lock_irq(&logbuf_lock);
+		if (syslog_seq < log_first_seq) {
+			/* messages are gone, move to first one */
+			syslog_seq = log_first_seq;
+			syslog_idx = log_first_idx;
+		}
+		if (from_file) {
+			/*
+			 * Short-cut for poll(/"proc/kmsg") which simply checks
+			 * for pending data, not the size; return the count of
+			 * records, not the length.
+			 */
+			error = log_next_idx - syslog_idx;
+		} else {
+			u64 seq;
+			u32 idx;
+
+			error = 0;
+			seq = syslog_seq;
+			idx = syslog_idx;
+			while (seq < log_next_seq) {
+				error += syslog_print_line(idx, NULL, 0);
+				idx = log_next(idx);
+				seq++;
+			}
+		}
+		raw_spin_unlock_irq(&logbuf_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -501,29 +1128,11 @@ void kdb_syslog_data(char *syslog_data[4
 {
 	syslog_data[0] = log_buf;
 	syslog_data[1] = log_buf + log_buf_len;
-	syslog_data[2] = log_buf + log_end -
-		(logged_chars < log_buf_len ? logged_chars : log_buf_len);
-	syslog_data[3] = log_buf + log_end;
+	syslog_data[2] = log_buf + log_first_idx;
+	syslog_data[3] = log_buf + log_next_idx;
 }
 #endif	/* CONFIG_KGDB_KDB */
 
-/*
- * Call the console drivers on a range of log_buf
- */
-static void __call_console_drivers(unsigned start, unsigned end)
-{
-	struct console *con;
-
-	for_each_console(con) {
-		if (exclusive_console && con != exclusive_console)
-			continue;
-		if ((con->flags & CON_ENABLED) && con->write &&
-				(cpu_online(smp_processor_id()) ||
-				(con->flags & CON_ANYTIME)))
-			con->write(con, &LOG_BUF(start), end - start);
-	}
-}
-
 static bool __read_mostly ignore_loglevel;
 
 static int __init ignore_loglevel_setup(char *str)
@@ -540,142 +1149,33 @@ MODULE_PARM_DESC(ignore_loglevel, "ignor
 	"print all kernel messages to the console.");
 
 /*
- * Write out chars from start to end - 1 inclusive
- */
-static void _call_console_drivers(unsigned start,
-				unsigned end, int msg_log_level)
-{
-	trace_console(&LOG_BUF(0), start, end, log_buf_len);
-
-	if ((msg_log_level < console_loglevel || ignore_loglevel) &&
-			console_drivers && start != end) {
-		if ((start & LOG_BUF_MASK) > (end & LOG_BUF_MASK)) {
-			/* wrapped write */
-			__call_console_drivers(start & LOG_BUF_MASK,
-						log_buf_len);
-			__call_console_drivers(0, end & LOG_BUF_MASK);
-		} else {
-			__call_console_drivers(start, end);
-		}
-	}
-}
-
-/*
- * Parse the syslog header <[0-9]*>. The decimal value represents 32bit, the
- * lower 3 bit are the log level, the rest are the log facility. In case
- * userspace passes usual userspace syslog messages to /dev/kmsg or
- * /dev/ttyprintk, the log prefix might contain the facility. Printk needs
- * to extract the correct log level for in-kernel processing, and not mangle
- * the original value.
- *
- * If a prefix is found, the length of the prefix is returned. If 'level' is
- * passed, it will be filled in with the log level without a possible facility
- * value. If 'special' is passed, the special printk prefix chars are accepted
- * and returned. If no valid header is found, 0 is returned and the passed
- * variables are not touched.
- */
-static size_t log_prefix(const char *p, unsigned int *level, char *special)
-{
-	unsigned int lev = 0;
-	char sp = '\0';
-	size_t len;
-
-	if (p[0] != '<' || !p[1])
-		return 0;
-	if (p[2] == '>') {
-		/* usual single digit level number or special char */
-		switch (p[1]) {
-		case '0' ... '7':
-			lev = p[1] - '0';
-			break;
-		case 'c': /* KERN_CONT */
-		case 'd': /* KERN_DEFAULT */
-			sp = p[1];
-			break;
-		default:
-			return 0;
-		}
-		len = 3;
-	} else {
-		/* multi digit including the level and facility number */
-		char *endp = NULL;
-
-		lev = (simple_strtoul(&p[1], &endp, 10) & 7);
-		if (endp == NULL || endp[0] != '>')
-			return 0;
-		len = (endp + 1) - p;
-	}
-
-	/* do not accept special char if not asked for */
-	if (sp && !special)
-		return 0;
-
-	if (special) {
-		*special = sp;
-		/* return special char, do not touch level */
-		if (sp)
-			return len;
-	}
-
-	if (level)
-		*level = lev;
-	return len;
-}
-
-/*
  * Call the console drivers, asking them to write out
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
  */
-static void call_console_drivers(unsigned start, unsigned end)
+static void call_console_drivers(int level, const char *text, size_t len)
 {
-	unsigned cur_index, start_print;
-	static int msg_level = -1;
+	struct console *con;
 
-	BUG_ON(((int)(start - end)) > 0);
+	trace_console(text, 0, len, len);
 
-	cur_index = start;
-	start_print = start;
-	while (cur_index != end) {
-		if (msg_level < 0 && ((end - cur_index) > 2)) {
-			/* strip log prefix */
-			cur_index += log_prefix(&LOG_BUF(cur_index), &msg_level, NULL);
-			start_print = cur_index;
-		}
-		while (cur_index != end) {
-			char c = LOG_BUF(cur_index);
-
-			cur_index++;
-			if (c == '\n') {
-				if (msg_level < 0) {
-					/*
-					 * printk() has already given us loglevel tags in
-					 * the buffer.  This code is here in case the
-					 * log buffer has wrapped right round and scribbled
-					 * on those tags
-					 */
-					msg_level = default_message_loglevel;
-				}
-				_call_console_drivers(start_print, cur_index, msg_level);
-				msg_level = -1;
-				start_print = cur_index;
-				break;
-			}
-		}
-	}
-	_call_console_drivers(start_print, end, msg_level);
-}
+	if (level >= console_loglevel && !ignore_loglevel)
+		return;
+	if (!console_drivers)
+		return;
 
-static void emit_log_char(char c)
-{
-	LOG_BUF(log_end) = c;
-	log_end++;
-	if (log_end - log_start > log_buf_len)
-		log_start = log_end - log_buf_len;
-	if (log_end - con_start > log_buf_len)
-		con_start = log_end - log_buf_len;
-	if (logged_chars < log_buf_len)
-		logged_chars++;
+	for_each_console(con) {
+		if (exclusive_console && con != exclusive_console)
+			continue;
+		if (!(con->flags & CON_ENABLED))
+			continue;
+		if (!con->write)
+			continue;
+		if (!cpu_online(smp_processor_id()) &&
+		    !(con->flags & CON_ANYTIME))
+			continue;
+		con->write(con, text, len);
+	}
 }
 
 /*
@@ -700,16 +1200,6 @@ static void zap_locks(void)
 	sema_init(&console_sem, 1);
 }
 
-#if defined(CONFIG_PRINTK_TIME)
-static bool printk_time = 1;
-#else
-static bool printk_time = 0;
-#endif
-module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
-
-static bool always_kmsg_dump;
-module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
-
 /* Check if we have any console registered that can be called early in boot. */
 static int have_callable_console(void)
 {
@@ -722,51 +1212,6 @@ static int have_callable_console(void)
 	return 0;
 }
 
-/**
- * printk - print a kernel message
- * @fmt: format string
- *
- * This is printk().  It can be called from any context.  We want it to work.
- *
- * We try to grab the console_lock.  If we succeed, it's easy - we log the output and
- * call the console drivers.  If we fail to get the semaphore we place the output
- * into the log buffer and return.  The current holder of the console_sem will
- * notice the new output in console_unlock(); and will send it to the
- * consoles before releasing the lock.
- *
- * One effect of this deferred printing is that code which calls printk() and
- * then changes console_loglevel may break. This is because console_loglevel
- * is inspected when the actual printing occurs.
- *
- * See also:
- * printf(3)
- *
- * See the vsnprintf() documentation for format string extensions over C99.
- */
-
-asmlinkage int printk(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-#ifdef CONFIG_KGDB_KDB
-	if (unlikely(kdb_trap_printk)) {
-		va_start(args, fmt);
-		r = vkdb_printf(fmt, args);
-		va_end(args);
-		return r;
-	}
-#endif
-	va_start(args, fmt);
-	r = vprintk(fmt, args);
-	va_end(args);
-
-	return r;
-}
-
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int printk_cpu = UINT_MAX;
-
 /*
  * Can we actually use the console at this time on this cpu?
  *
@@ -810,17 +1255,12 @@ static int console_trylock_for_printk(un
 			retval = 0;
 		}
 	}
-	printk_cpu = UINT_MAX;
+	logbuf_cpu = UINT_MAX;
 	if (wake)
 		up(&console_sem);
 	raw_spin_unlock(&logbuf_lock);
 	return retval;
 }
-static const char recursion_bug_msg [] =
-		KERN_CRIT "BUG: recent printk recursion!\n";
-static int recursion_bug;
-static int new_text_line = 1;
-static char printk_buf[1024];
 
 int printk_delay_msec __read_mostly;
 
@@ -836,15 +1276,22 @@ static inline void printk_delay(void)
 	}
 }
 
-asmlinkage int vprintk(const char *fmt, va_list args)
-{
-	int printed_len = 0;
-	int current_log_level = default_message_loglevel;
+asmlinkage int vprintk_emit(int facility, int level,
+			    const char *dict, size_t dictlen,
+			    const char *fmt, va_list args)
+{
+	static int recursion_bug;
+	static char buf[LOG_LINE_MAX];
+	static size_t buflen;
+	static int buflevel;
+	static char textbuf[LOG_LINE_MAX];
+	char *text = textbuf;
+	size_t textlen;
 	unsigned long flags;
 	int this_cpu;
-	char *p;
-	size_t plen;
-	char special;
+	bool newline = false;
+	bool cont = false;
+	int printed_len = 0;
 
 	boot_delay_msec();
 	printk_delay();
@@ -856,7 +1303,7 @@ asmlinkage int vprintk(const char *fmt,
 	/*
 	 * Ouch, printk recursed into itself!
 	 */
-	if (unlikely(printk_cpu == this_cpu)) {
+	if (unlikely(logbuf_cpu == this_cpu)) {
 		/*
 		 * If a crash is occurring during printk() on this CPU,
 		 * then try to get the crash message out but make sure
@@ -873,97 +1320,92 @@ asmlinkage int vprintk(const char *fmt,
 
 	lockdep_off();
 	raw_spin_lock(&logbuf_lock);
-	printk_cpu = this_cpu;
+	logbuf_cpu = this_cpu;
 
 	if (recursion_bug) {
+		static const char recursion_msg[] =
+			"BUG: recent printk recursion!";
+
 		recursion_bug = 0;
-		strcpy(printk_buf, recursion_bug_msg);
-		printed_len = strlen(recursion_bug_msg);
-	}
-	/* Emit the output into the temporary buffer */
-	printed_len += vscnprintf(printk_buf + printed_len,
-				  sizeof(printk_buf) - printed_len, fmt, args);
-
-	p = printk_buf;
-
-	/* Read log level and handle special printk prefix */
-	plen = log_prefix(p, &current_log_level, &special);
-	if (plen) {
-		p += plen;
-
-		switch (special) {
-		case 'c': /* Strip <c> KERN_CONT, continue line */
-			plen = 0;
-			break;
-		case 'd': /* Strip <d> KERN_DEFAULT, start new line */
-			plen = 0;
-		default:
-			if (!new_text_line) {
-				emit_log_char('\n');
-				new_text_line = 1;
-			}
-		}
+		printed_len += strlen(recursion_msg);
+		/* emit KERN_CRIT message */
+		log_store(0, 2, NULL, 0, recursion_msg, printed_len);
 	}
 
 	/*
-	 * Copy the output into log_buf. If the caller didn't provide
-	 * the appropriate log prefix, we insert them here
+	 * The printf needs to come first; we need the syslog
+	 * prefix which might be passed-in as a parameter.
 	 */
-	for (; *p; p++) {
-		if (new_text_line) {
-			new_text_line = 0;
-
-			if (plen) {
-				/* Copy original log prefix */
-				int i;
-
-				for (i = 0; i < plen; i++)
-					emit_log_char(printk_buf[i]);
-				printed_len += plen;
-			} else {
-				/* Add log prefix */
-				emit_log_char('<');
-				emit_log_char(current_log_level + '0');
-				emit_log_char('>');
-				printed_len += 3;
-			}
+	textlen = vscnprintf(text, sizeof(textbuf), fmt, args);
 
-			if (printk_time) {
-				/* Add the current time stamp */
-				char tbuf[50], *tp;
-				unsigned tlen;
-				unsigned long long t;
-				unsigned long nanosec_rem;
-
-				t = cpu_clock(printk_cpu);
-				nanosec_rem = do_div(t, 1000000000);
-				tlen = sprintf(tbuf, "[%5lu.%06lu] ",
-						(unsigned long) t,
-						nanosec_rem / 1000);
-
-				for (tp = tbuf; tp < tbuf + tlen; tp++)
-					emit_log_char(*tp);
-				printed_len += tlen;
-			}
+	while (textlen && text[textlen-1] == '\n') {
+		/* mark and strip a trailing newline */
+		textlen--;
+		newline = true;
+	}
 
-			if (!*p)
-				break;
+	/* strip syslog prefix and extract log level or flags */
+	if (text[0] == '<' && text[1] && text[2] == '>') {
+		switch (text[1]) {
+		case '0' ... '7':
+			if (level == -1)
+				level = text[1] - '0';
+			text += 3;
+			textlen -= 3;
+			break;
+		case 'c':	/* KERN_CONT */
+			cont = true;
+		case 'd':	/* KERN_DEFAULT */
+			text += 3;
+			textlen -= 3;
+			break;
 		}
+	}
 
-		emit_log_char(*p);
-		if (*p == '\n')
-			new_text_line = 1;
+	if (buflen && (!cont || dict)) {
+		/* no continuation; flush existing buffer */
+		log_store(facility, buflevel, NULL, 0, buf, buflen);
+		printed_len += buflen;
+		buflen = 0;
+	}
+
+	if (buflen == 0) {
+		/* remember level for first message in the buffer */
+		if (level == -1)
+			buflevel = default_message_loglevel;
+		else
+			buflevel = level;
+	}
+
+	if (buflen || !newline) {
+		/* append to existing buffer, or buffer until next message */
+		if (buflen + textlen > sizeof(buf))
+			textlen = sizeof(buf) - buflen;
+		memcpy(buf + buflen, text, textlen);
+		buflen += textlen;
+	}
+
+	if (newline) {
+		/* end of line; flush buffer */
+		if (buflen) {
+			log_store(facility, buflevel,
+				  dict, dictlen, buf, buflen);
+			printed_len += buflen;
+			buflen = 0;
+		} else {
+			log_store(facility, buflevel,
+				  dict, dictlen, text, textlen);
+			printed_len += textlen;
+		}
 	}
 
 	/*
-	 * Try to acquire and then immediately release the
-	 * console semaphore. The release will do all the
-	 * actual magic (print out buffers, wake up klogd,
-	 * etc). 
+	 * Try to acquire and then immediately release the console semaphore.
+	 * The release will print out buffers and wake up /dev/kmsg and syslog()
+	 * users.
 	 *
-	 * The console_trylock_for_printk() function
-	 * will release 'logbuf_lock' regardless of whether it
-	 * actually gets the semaphore or not.
+	 * The console_trylock_for_printk() function will release 'logbuf_lock'
+	 * regardless of whether it actually gets the console semaphore or not.
 	 */
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
@@ -974,12 +1416,73 @@ out_restore_irqs:
 
 	return printed_len;
 }
-EXPORT_SYMBOL(printk);
+EXPORT_SYMBOL(vprintk_emit);
+
+asmlinkage int vprintk(const char *fmt, va_list args)
+{
+	return vprintk_emit(0, -1, NULL, 0, fmt, args);
+}
 EXPORT_SYMBOL(vprintk);
 
+asmlinkage int printk_emit(int facility, int level,
+			   const char *dict, size_t dictlen,
+			   const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(facility, level, dict, dictlen, fmt, args);
+	va_end(args);
+
+	return r;
+}
+EXPORT_SYMBOL(printk_emit);
+
+/**
+ * printk - print a kernel message
+ * @fmt: format string
+ *
+ * This is printk(). It can be called from any context. We want it to work.
+ *
+ * We try to grab the console_lock. If we succeed, it's easy - we log the
+ * output and call the console drivers.  If we fail to get the semaphore, we
+ * place the output into the log buffer and return. The current holder of
+ * the console_sem will notice the new output in console_unlock(); and will
+ * send it to the consoles before releasing the lock.
+ *
+ * One effect of this deferred printing is that code which calls printk() and
+ * then changes console_loglevel may break. This is because console_loglevel
+ * is inspected when the actual printing occurs.
+ *
+ * See also:
+ * printf(3)
+ *
+ * See the vsnprintf() documentation for format string extensions over C99.
+ */
+asmlinkage int printk(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+#ifdef CONFIG_KGDB_KDB
+	if (unlikely(kdb_trap_printk)) {
+		va_start(args, fmt);
+		r = vkdb_printf(fmt, args);
+		va_end(args);
+		return r;
+	}
+#endif
+	va_start(args, fmt);
+	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+EXPORT_SYMBOL(printk);
 #else
 
-static void call_console_drivers(unsigned start, unsigned end)
+static void call_console_drivers(int level, const char *text, size_t len)
 {
 }
 
@@ -1217,7 +1720,7 @@ int is_console_locked(void)
 }
 
 /*
- * Delayed printk facility, for scheduler-internal messages:
+ * Delayed printk version, for scheduler-internal messages:
  */
 #define PRINTK_BUF_SIZE		512
 
@@ -1253,6 +1756,10 @@ void wake_up_klogd(void)
 		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
 }
 
+/* the next printk record to write to the console */
+static u64 console_seq;
+static u32 console_idx;
+
 /**
  * console_unlock - unlock the console system
  *
@@ -1263,15 +1770,16 @@ void wake_up_klogd(void)
  * by printk().  If this is the case, console_unlock(); emits
  * the output prior to releasing the lock.
  *
- * If there is output waiting for klogd, we wake it up.
+ * If there is output waiting, we wake it /dev/kmsg and syslog() users.
  *
  * console_unlock(); may be called from any context.
  */
 void console_unlock(void)
 {
+	static u64 seen_seq;
 	unsigned long flags;
-	unsigned _con_start, _log_end;
-	unsigned wake_klogd = 0, retry = 0;
+	bool wake_klogd = false;
+	bool retry;
 
 	if (console_suspended) {
 		up(&console_sem);
@@ -1281,17 +1789,41 @@ void console_unlock(void)
 	console_may_schedule = 0;
 
 again:
-	for ( ; ; ) {
+	for (;;) {
+		struct log *msg;
+		static char text[LOG_LINE_MAX];
+		size_t len;
+		int level;
+
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		wake_klogd |= log_start - log_end;
-		if (con_start == log_end)
-			break;			/* Nothing to print */
-		_con_start = con_start;
-		_log_end = log_end;
-		con_start = log_end;		/* Flush */
+		if (seen_seq != log_next_seq) {
+			wake_klogd = true;
+			seen_seq = log_next_seq;
+		}
+
+		if (console_seq < log_first_seq) {
+			/* messages are gone, move to first one */
+			console_seq = log_first_seq;
+			console_idx = log_first_idx;
+		}
+
+		if (console_seq == log_next_seq)
+			break;
+
+		msg = log_from_idx(console_idx);
+		level = msg->level;
+		len = msg->text_len;
+		if (len+1 >= sizeof(text))
+			len = sizeof(text)-1;
+		memcpy(text, log_text(msg), len);
+		text[len++] = '\n';
+
+		console_idx = log_next(console_idx);
+		console_seq++;
 		raw_spin_unlock(&logbuf_lock);
+
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(_con_start, _log_end);
+		call_console_drivers(level, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
 	}
@@ -1312,8 +1844,7 @@ again:
 	 * flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
-	if (con_start != log_end)
-		retry = 1;
+	retry = console_seq != log_next_seq;
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	if (retry && console_trylock())
@@ -1549,7 +2080,8 @@ void register_console(struct console *ne
 		 * for us.
 		 */
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		con_start = log_start;
+		console_seq = syslog_seq;
+		console_idx = syslog_idx;
 		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 		/*
 		 * We're about to replay the log buffer.  Only do this to the
@@ -1758,6 +2290,9 @@ int kmsg_dump_unregister(struct kmsg_dum
 }
 EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
 
+static bool always_kmsg_dump;
+module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
+
 /**
  * kmsg_dump - dump kernel log to kernel message dumpers.
  * @reason: the reason (oops, panic etc) for dumping
@@ -1767,8 +2302,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
  */
 void kmsg_dump(enum kmsg_dump_reason reason)
 {
-	unsigned long end;
-	unsigned chars;
+	u64 idx;
 	struct kmsg_dumper *dumper;
 	const char *s1, *s2;
 	unsigned long l1, l2;
@@ -1780,24 +2314,27 @@ void kmsg_dump(enum kmsg_dump_reason rea
 	/* Theoretically, the log could move on after we do this, but
 	   there's not a lot we can do about that. The new messages
 	   will overwrite the start of what we dump. */
-	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	end = log_end & LOG_BUF_MASK;
-	chars = logged_chars;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (chars > end) {
-		s1 = log_buf + log_buf_len - chars + end;
-		l1 = chars - end;
+	raw_spin_lock_irqsave(&logbuf_lock, flags);
+	if (syslog_seq < log_first_seq)
+		idx = syslog_idx;
+	else
+		idx = log_first_idx;
+
+	if (idx > log_next_idx) {
+		s1 = log_buf;
+		l1 = log_next_idx;
 
-		s2 = log_buf;
-		l2 = end;
+		s2 = log_buf + idx;
+		l2 = log_buf_len - idx;
 	} else {
 		s1 = "";
 		l1 = 0;
 
-		s2 = log_buf + end - chars;
-		l2 = chars;
+		s2 = log_buf + idx;
+		l2 = log_next_idx - idx;
 	}
+	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(dumper, &dump_list, list)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 19:59 [PATCH] printk: support structured and multi-facility log messages Kay Sievers
@ 2012-04-04 21:05 ` Greg Kroah-Hartmann
  2012-04-04 21:14   ` Kay Sievers
  2012-04-04 21:16   ` richard -rw- weinberger
  2012-04-04 23:51 ` Joe Perches
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 34+ messages in thread
From: Greg Kroah-Hartmann @ 2012-04-04 21:05 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel

On Wed, Apr 04, 2012 at 09:59:14PM +0200, Kay Sievers wrote:
> - Full-featured syslog facility value support. Different facilities
>   can tag their messages. All userspace-injected messages enforce a
>   facility value > 0 now, to be able to reliably distinguish them from
>   the kernel-generated messages. Independent subsystems like a
>   baseband processor running its own firmware, or a kernel-related
>   userspace process can use their own unique facility values. Multiple
>   independent log streams can co-exist that way in the same
>   buffer. All share the same global sequence number counter to ensure
>   proper ordering (and interleaving) and to allow the consumers of the
>   log to reliably correlate the events from different facilities.
> 
> - Output of dev_printk() is reliably machine-readable now. In addition
>   to the printed plain text message, it creates a log dictionary with the
>   following properties:
>     SUBSYSTEM=     - the driver-core subsytem name
>     DEVICE=
>       b12:8        - block dev_t
>       c127:3       - char dev_t
>       n8           - netdev ifindex
>       +sound:card0 - subsystem:devname

I like this a lot, thanks for doing this.

Is there somewhere in Documentation/ABI that we can document this
interface so that people know what it is, what is defined, and how to
use it?

> - Support for multiple concurrent readers of /dev/kmsg, with read(),
>   seek(), poll() support. Output of message sequence numbers, to allow
>   userspace log consumers to reliably reconnect and reconstruct their
>   state at any given time. After open("/dev/kmsg"), read() always
>   returns *all* buffered records. If only future messages should be
>   read, SEEK_END can be used. In case records get overwritten while
>   /dev/kmsg is held open, or records get faster overwritten than they
>   are read, the next read() will return -EPIPE and the current reading
>   position gets updated to the next available record. The passed
>   sequence numbers allow the log consumer to calculate the amount of
>   lost messages.

I just noticed that 'tail -f' doesn't seem to work on /dev/kmsg, should
it?  Or does it need to do something else to get "just the new ones"?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 21:05 ` Greg Kroah-Hartmann
@ 2012-04-04 21:14   ` Kay Sievers
  2012-04-05  0:31     ` Greg Kroah-Hartmann
  2012-04-04 21:16   ` richard -rw- weinberger
  1 sibling, 1 reply; 34+ messages in thread
From: Kay Sievers @ 2012-04-04 21:14 UTC (permalink / raw)
  To: Greg Kroah-Hartmann; +Cc: linux-kernel

On Wed, Apr 4, 2012 at 23:05, Greg Kroah-Hartmann <greg@kroah.com> wrote:
> On Wed, Apr 04, 2012 at 09:59:14PM +0200, Kay Sievers wrote:

>> - Output of dev_printk() is reliably machine-readable now. In addition
>>   to the printed plain text message, it creates a log dictionary with the
>>   following properties:
>>     SUBSYSTEM=     - the driver-core subsytem name
>>     DEVICE=
>>       b12:8        - block dev_t
>>       c127:3       - char dev_t
>>       n8           - netdev ifindex
>>       +sound:card0 - subsystem:devname
>
> I like this a lot, thanks for doing this.
>
> Is there somewhere in Documentation/ABI that we can document this
> interface so that people know what it is, what is defined, and how to
> use it?

It's the notation udev uses to identify its devices internally. I just
added the above description to the source code so far. If we agree on
that, or some other scheme, we should definitely copy that into the
ABI docs. Along with a description of the semantics of the chardev
regarding open() poll() and seek().

>> - Support for multiple concurrent readers of /dev/kmsg, with read(),
>>   seek(), poll() support. Output of message sequence numbers, to allow
>>   userspace log consumers to reliably reconnect and reconstruct their
>>   state at any given time. After open("/dev/kmsg"), read() always
>>   returns *all* buffered records. If only future messages should be
>>   read, SEEK_END can be used. In case records get overwritten while
>>   /dev/kmsg is held open, or records get faster overwritten than they
>>   are read, the next read() will return -EPIPE and the current reading
>>   position gets updated to the next available record. The passed
>>   sequence numbers allow the log consumer to calculate the amount of
>>   lost messages.
>
> I just noticed that 'tail -f' doesn't seem to work on /dev/kmsg, should
> it?  Or does it need to do something else to get "just the new ones"?

Yeah, 'tail' might not work, it expects a regular file. This is a
chardev and it has not the regular file semantics which 'tail'
expects; 'cat' should work fine. 'Real' tools will just use poll() to
know when to get new messages out of the file descriptor.

Kay

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 21:05 ` Greg Kroah-Hartmann
  2012-04-04 21:14   ` Kay Sievers
@ 2012-04-04 21:16   ` richard -rw- weinberger
  2012-04-04 21:20     ` Kay Sievers
  1 sibling, 1 reply; 34+ messages in thread
From: richard -rw- weinberger @ 2012-04-04 21:16 UTC (permalink / raw)
  To: Greg Kroah-Hartmann; +Cc: Kay Sievers, linux-kernel

On Wed, Apr 4, 2012 at 11:05 PM, Greg Kroah-Hartmann <greg@kroah.com> wrote:
> I just noticed that 'tail -f' doesn't seem to work on /dev/kmsg, should
> it?  Or does it need to do something else to get "just the new ones"?
>

AFAIK tail looks at the file size/timestamp to detect modifications...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 21:16   ` richard -rw- weinberger
@ 2012-04-04 21:20     ` Kay Sievers
  0 siblings, 0 replies; 34+ messages in thread
From: Kay Sievers @ 2012-04-04 21:20 UTC (permalink / raw)
  To: richard -rw- weinberger; +Cc: Greg Kroah-Hartmann, linux-kernel

On Wed, Apr 4, 2012 at 23:16, richard -rw- weinberger
<richard.weinberger@gmail.com> wrote:
> On Wed, Apr 4, 2012 at 11:05 PM, Greg Kroah-Hartmann <greg@kroah.com> wrote:
>> I just noticed that 'tail -f' doesn't seem to work on /dev/kmsg, should
>> it?  Or does it need to do something else to get "just the new ones"?
>>
>
> AFAIK tail looks at the file size/timestamp to detect modifications...

It uses inotify when available, these days.

Kay

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 19:59 [PATCH] printk: support structured and multi-facility log messages Kay Sievers
  2012-04-04 21:05 ` Greg Kroah-Hartmann
@ 2012-04-04 23:51 ` Joe Perches
  2012-04-05  0:33   ` Greg Kroah-Hartmann
  2012-04-05  8:38 ` Joe Perches
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Joe Perches @ 2012-04-04 23:51 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, Greg Kroah-Hartmann

On Wed, 2012-04-04 at 21:59 +0200, Kay Sievers wrote:
> From: Kay Sievers <kay@vrfy.org>
> Subject: printk: support structured and multi-facility log messages
[]
> This patch extends printk() to be able to attach arbitrary key/value
> pairs to logged messages,

How does it do that?

[]

> - Records consume almost the same amount, sometimes less memory than
>   the traditional byte stream buffer (if printk_time is enabled). The record
>   header is 16 bytes long, plus some padding bytes at the end if needed.
>   The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
>   the timestamp and a newline.

I suggest using lzo on the string portion.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 21:14   ` Kay Sievers
@ 2012-04-05  0:31     ` Greg Kroah-Hartmann
  0 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartmann @ 2012-04-05  0:31 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel

On Wed, Apr 04, 2012 at 11:14:25PM +0200, Kay Sievers wrote:
> On Wed, Apr 4, 2012 at 23:05, Greg Kroah-Hartmann <greg@kroah.com> wrote:
> > On Wed, Apr 04, 2012 at 09:59:14PM +0200, Kay Sievers wrote:
> 
> >> - Output of dev_printk() is reliably machine-readable now. In addition
> >>   to the printed plain text message, it creates a log dictionary with the
> >>   following properties:
> >>     SUBSYSTEM=     - the driver-core subsytem name
> >>     DEVICE=
> >>       b12:8        - block dev_t
> >>       c127:3       - char dev_t
> >>       n8           - netdev ifindex
> >>       +sound:card0 - subsystem:devname
> >
> > I like this a lot, thanks for doing this.
> >
> > Is there somewhere in Documentation/ABI that we can document this
> > interface so that people know what it is, what is defined, and how to
> > use it?
> 
> It's the notation udev uses to identify its devices internally. I just
> added the above description to the source code so far. If we agree on
> that, or some other scheme, we should definitely copy that into the
> ABI docs. Along with a description of the semantics of the chardev
> regarding open() poll() and seek().

Ok, that sounds good.  I like the description, and implementation, so I
have no objections to this patch, I'd be glad to queue it up through my
tree to get testing in linux-next now if you want.

> >> - Support for multiple concurrent readers of /dev/kmsg, with read(),
> >>   seek(), poll() support. Output of message sequence numbers, to allow
> >>   userspace log consumers to reliably reconnect and reconstruct their
> >>   state at any given time. After open("/dev/kmsg"), read() always
> >>   returns *all* buffered records. If only future messages should be
> >>   read, SEEK_END can be used. In case records get overwritten while
> >>   /dev/kmsg is held open, or records get faster overwritten than they
> >>   are read, the next read() will return -EPIPE and the current reading
> >>   position gets updated to the next available record. The passed
> >>   sequence numbers allow the log consumer to calculate the amount of
> >>   lost messages.
> >
> > I just noticed that 'tail -f' doesn't seem to work on /dev/kmsg, should
> > it?  Or does it need to do something else to get "just the new ones"?
> 
> Yeah, 'tail' might not work, it expects a regular file. This is a
> chardev and it has not the regular file semantics which 'tail'
> expects; 'cat' should work fine. 'Real' tools will just use poll() to
> know when to get new messages out of the file descriptor.

Ah, you are right, I forgot about the "real file" stuff with tail, the
device node never gets updated with a timestamp.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 23:51 ` Joe Perches
@ 2012-04-05  0:33   ` Greg Kroah-Hartmann
  2012-04-05  0:40     ` Joe Perches
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartmann @ 2012-04-05  0:33 UTC (permalink / raw)
  To: Joe Perches; +Cc: Kay Sievers, linux-kernel

On Wed, Apr 04, 2012 at 04:51:20PM -0700, Joe Perches wrote:
> On Wed, 2012-04-04 at 21:59 +0200, Kay Sievers wrote:
> > From: Kay Sievers <kay@vrfy.org>
> > Subject: printk: support structured and multi-facility log messages
> []
> > This patch extends printk() to be able to attach arbitrary key/value
> > pairs to logged messages,
> 
> How does it do that?

The implementation doesn't show that?

> []
> 
> > - Records consume almost the same amount, sometimes less memory than
> >   the traditional byte stream buffer (if printk_time is enabled). The record
> >   header is 16 bytes long, plus some padding bytes at the end if needed.
> >   The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
> >   the timestamp and a newline.
> 
> I suggest using lzo on the string portion.

As the used overall size is pretty much the same, that doesn't really
seem necessary to me.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05  0:33   ` Greg Kroah-Hartmann
@ 2012-04-05  0:40     ` Joe Perches
  2012-04-05  7:46       ` Kay Sievers
  0 siblings, 1 reply; 34+ messages in thread
From: Joe Perches @ 2012-04-05  0:40 UTC (permalink / raw)
  To: Greg Kroah-Hartmann; +Cc: Kay Sievers, linux-kernel

On Wed, 2012-04-04 at 17:33 -0700, Greg Kroah-Hartmann wrote:
> On Wed, Apr 04, 2012 at 04:51:20PM -0700, Joe Perches wrote:
> > On Wed, 2012-04-04 at 21:59 +0200, Kay Sievers wrote:
> > > From: Kay Sievers <kay@vrfy.org>
> > > Subject: printk: support structured and multi-facility log messages
> > []
> > > This patch extends printk() to be able to attach arbitrary key/value
> > > pairs to logged messages,
> > How does it do that?
> The implementation doesn't show that?

Not so far as I can tell.

How are arbitrary key/value pairs attached
in a printk?

> > []
> > 
> > > - Records consume almost the same amount, sometimes less memory than
> > >   the traditional byte stream buffer (if printk_time is enabled). The record
> > >   header is 16 bytes long, plus some padding bytes at the end if needed.
> > >   The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
> > >   the timestamp and a newline.
> > 
> > I suggest using lzo on the string portion.
> 
> As the used overall size is pretty much the same, that doesn't really
> seem necessary to me.

As dmesg output would still exist, compression could reduce
the size of the duplicated logged output buffers significantly.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05  0:40     ` Joe Perches
@ 2012-04-05  7:46       ` Kay Sievers
  2012-04-05  8:08         ` Sam Ravnborg
  0 siblings, 1 reply; 34+ messages in thread
From: Kay Sievers @ 2012-04-05  7:46 UTC (permalink / raw)
  To: Joe Perches; +Cc: Greg Kroah-Hartmann, linux-kernel

On Thu, Apr 5, 2012 at 02:40, Joe Perches <joe@perches.com> wrote:
> On Wed, 2012-04-04 at 17:33 -0700, Greg Kroah-Hartmann wrote:
>> On Wed, Apr 04, 2012 at 04:51:20PM -0700, Joe Perches wrote:
>> > On Wed, 2012-04-04 at 21:59 +0200, Kay Sievers wrote:
>> > > From: Kay Sievers <kay@vrfy.org>
>> > > Subject: printk: support structured and multi-facility log messages
>> > []
>> > > This patch extends printk() to be able to attach arbitrary key/value
>> > > pairs to logged messages,
>> > How does it do that?
>> The implementation doesn't show that?
>
> Not so far as I can tell.
>
> How are arbitrary key/value pairs attached
> in a printk?

There are 120 lines in the changelog, and a 100 lines large comment
block in the source code, and another 50 lines spread over the code,
and the hexdump examples in it, and all that? Even without looking at
a single line of code, it should be explained from different
viewpoints more than once.

>> > > - Records consume almost the same amount, sometimes less memory than
>> > >   the traditional byte stream buffer (if printk_time is enabled). The record
>> > >   header is 16 bytes long, plus some padding bytes at the end if needed.
>> > >   The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
>> > >   the timestamp and a newline.
>> >
>> > I suggest using lzo on the string portion.
>>
>> As the used overall size is pretty much the same, that doesn't really
>> seem necessary to me.
>
> As dmesg output would still exist, compression could reduce
> the size of the duplicated logged output buffers significantly.

There are no other or duplicated buffers than the primary log buffer,
which is exactly the same buffer as today's printk() uses. When this
buffer is exported, the data is converted to a stream on-the-fly, and
it's a feature, that such stream is human readable and not compressed
output. I don't think there is much justification for a compression
overhead and the problems it would introduce.

Kay

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05  7:46       ` Kay Sievers
@ 2012-04-05  8:08         ` Sam Ravnborg
  2012-04-05  8:35           ` Kay Sievers
  0 siblings, 1 reply; 34+ messages in thread
From: Sam Ravnborg @ 2012-04-05  8:08 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Joe Perches, Greg Kroah-Hartmann, linux-kernel

> >
> > How are arbitrary key/value pairs attached
> > in a printk?
> 
> There are 120 lines in the changelog, and a 100 lines large comment
> block in the source code, and another 50 lines spread over the code,
> and the hexdump examples in it, and all that? Even without looking at
> a single line of code, it should be explained from different
> viewpoints more than once.

color me blind - but I could not find any example of how I specify
key/value pairs when I use printk().

I could find lots of internal information - that parts looks to
be extensively documented - good.

But how I as a user specify this I failed to find.
Today I do:

    printk(KERN_ERR "foobar is bad");

But how is that extended to include the key/value pairs?


	Sam

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05  8:08         ` Sam Ravnborg
@ 2012-04-05  8:35           ` Kay Sievers
  2012-04-05 11:44             ` Joe Perches
  0 siblings, 1 reply; 34+ messages in thread
From: Kay Sievers @ 2012-04-05  8:35 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Joe Perches, Greg Kroah-Hartmann, linux-kernel

On Thu, Apr 5, 2012 at 10:08, Sam Ravnborg <sam@ravnborg.org> wrote:
>> >
>> > How are arbitrary key/value pairs attached
>> > in a printk?
>>
>> There are 120 lines in the changelog, and a 100 lines large comment
>> block in the source code, and another 50 lines spread over the code,
>> and the hexdump examples in it, and all that? Even without looking at
>> a single line of code, it should be explained from different
>> viewpoints more than once.
>
> color me blind - but I could not find any example of how I specify
> key/value pairs when I use printk().
>
> I could find lots of internal information - that parts looks to
> be extensively documented - good.
>
> But how I as a user specify this I failed to find.
> Today I do:
>
>    printk(KERN_ERR "foobar is bad");
>
> But how is that extended to include the key/value pairs?

The use of structured data is mainly focused on subsystem specific
users. Like the block layer or the network stack could invent a bunch
of well-defined set of properties, and wrap the emission of these up
in some common subsystem logging functions for their drivers to use.
They would reflect common log events, meaningful in the context of a
specific subsystem.

Think of it more like the model how we handle properties in uevents,
it's a similar approach.

This is not so much intended to convert the (almost random) language
in slightly better (random) key/value pairs. Userpace, which is the
only reason to add that stuff in the first place, will need to know
how to make sense out of the passed values, so in the future, we will
need some kind of coordination per subsystem here.

For the current simple example in the patch, look where DEVICE= and
SUBSYSTEM= are passed through dev_printk() -> vprintk_emit() ->
log_store(), and then read through /dev/kmsg.

The output looks like something like this:
  http://people.freedesktop.org/~kay/dev-kmsg.txt

Kay

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 19:59 [PATCH] printk: support structured and multi-facility log messages Kay Sievers
  2012-04-04 21:05 ` Greg Kroah-Hartmann
  2012-04-04 23:51 ` Joe Perches
@ 2012-04-05  8:38 ` Joe Perches
  2012-04-05  8:44   ` Kay Sievers
  2012-04-05 15:05 ` Ingo Molnar
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Joe Perches @ 2012-04-05  8:38 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, Greg Kroah-Hartmann

On Wed, 2012-04-04 at 21:59 +0200, Kay Sievers wrote:
> From: Kay Sievers <kay@vrfy.org>
> + * In a flat text stream of records, every record should be terminated
> + * by "\n\n", which separates the records from each other.

What do you do about "\n\n" in existing printks?

$ grep -rP --include=*.[ch] '\bprintk\b.*\\n\\n' *


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05  8:38 ` Joe Perches
@ 2012-04-05  8:44   ` Kay Sievers
  0 siblings, 0 replies; 34+ messages in thread
From: Kay Sievers @ 2012-04-05  8:44 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel, Greg Kroah-Hartmann

On Thu, Apr 5, 2012 at 10:38, Joe Perches <joe@perches.com> wrote:
> On Wed, 2012-04-04 at 21:59 +0200, Kay Sievers wrote:
>> From: Kay Sievers <kay@vrfy.org>
>> + * In a flat text stream of records, every record should be terminated
>> + * by "\n\n", which separates the records from each other.
>
> What do you do about "\n\n" in existing printks?
>
> $ grep -rP --include=*.[ch] '\bprintk\b.*\\n\\n' *

Messages with non-printable characters are exported to the syslog()
syscall or /proc/kmsg as they are, like it always was, but the export
to /dev/kmsg encodes the MESSAGE= property as a binary property then.

Kay

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05  8:35           ` Kay Sievers
@ 2012-04-05 11:44             ` Joe Perches
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Perches @ 2012-04-05 11:44 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Sam Ravnborg, Greg Kroah-Hartmann, linux-kernel

On Thu, 2012-04-05 at 10:35 +0200, Kay Sievers wrote:
> The use of structured data is mainly focused on subsystem specific
> users. Like the block layer or the network stack could invent a bunch
> of well-defined set of properties, and wrap the emission of these up
> in some common subsystem logging functions for their drivers to use.
> They would reflect common log events, meaningful in the context of a
> specific subsystem.

I think this is generally less useful that you do.

> Think of it more like the model how we handle properties in uevents,
> it's a similar approach.
> 
> This is not so much intended to convert the (almost random) language
> in slightly better (random) key/value pairs.

I think you should highlight this in the changelog.
People _will_ use it that way anyway.

> Userpace, which is the
> only reason to add that stuff in the first place, will need to know
> how to make sense out of the passed values, so in the future, we will
> need some kind of coordination per subsystem here.

I think you should separate your patch a bit.

Don't call it an extension to printk(), it's
hardly that, printk() is a function call that emits
stuff.  The printk subsystem does a lot more.

perhaps:

changes for printk_emit()
changes for dev_printk
changes for devkmsg_



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 19:59 [PATCH] printk: support structured and multi-facility log messages Kay Sievers
                   ` (2 preceding siblings ...)
  2012-04-05  8:38 ` Joe Perches
@ 2012-04-05 15:05 ` Ingo Molnar
  2012-04-05 15:25   ` Kay Sievers
  2012-04-05 17:09 ` Linus Torvalds
  2012-04-07  0:26 ` Jiri Kosina
  5 siblings, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-04-05 15:05 UTC (permalink / raw)
  To: Kay Sievers
  Cc: linux-kernel, Greg Kroah-Hartmann, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra


* Kay Sievers <kay@vrfy.org> wrote:

> From: Kay Sievers <kay@vrfy.org>
>
> Subject: printk: support structured and multi-facility log messages
> 
> Kernel log messages are the primary source of information about the overall
> state of the system and connected devices. Traditional kernel messages are
> mostly human language, targeted at a human reading them. This part of the
> picture works very well since a very long time.
> 
> However, most machines run unattended almost all of their time, and
> software, and not humans, need to process the kernel messages. Having
> a machine making sense out of human language messages is inefficient,
> unreliable, and sometimes plain impossible to get right. With human
> language messages all useful information about their context,
> available at the time of creation of the messages, is just thrown
> away. Later, software consumers of the messages will need to apply
> magic to reconstruct what the context might have been, to be able to
> interpret the messages.
> 
> This patch extends printk() to be able to attach arbitrary key/value
> pairs to logged messages, to carry machine-readable data which
> describes the context of the log message at time of its
> creation. Users of the log can retrieve, along with the human-readable
> message, a key/value dictionary to reliably identify specific devices,
> drivers, subsystems, classes and types of messages.
> 
> Various features of this patch:
> 
> - Record-based stream instead of the traditional byte stream
>   buffer. All records carry a 64 bit timestamp, the syslog facility
>   and priority in the record header.
> 
> - Records consume almost the same amount, sometimes less memory than
>   the traditional byte stream buffer (if printk_time is enabled). The record
>   header is 16 bytes long, plus some padding bytes at the end if needed.
>   The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
>   the timestamp and a newline.
> 
> - Buffer management is based on message sequence numbers. When records
>   need to be discarded, the reading heads move on to the next full
>   record. Unlike the byte-stream buffer, no old logged lines get
>   truncated or partly overwritten by new ones. Sequence numbers also
>   allow consumers of the log stream to get notified if any message in
>   the stream they are about to read gets discarded during the time
>   of reading.
> 
> - Better buffered IO support for KERN_CONT continuation lines, when printk()
>   is called multiple times for a single line. The use of KERN_CONT is now
>   mandatory to use continuation; a few places in the kernel need trivial fixes
>   here. The buffering could possibly be extended to per-cpu variables to allow
>   better thread-safety for multiple printk() invocations for a single line.
> 
> - Full-featured syslog facility value support. Different facilities
>   can tag their messages. All userspace-injected messages enforce a
>   facility value > 0 now, to be able to reliably distinguish them from
>   the kernel-generated messages. Independent subsystems like a
>   baseband processor running its own firmware, or a kernel-related
>   userspace process can use their own unique facility values. Multiple
>   independent log streams can co-exist that way in the same
>   buffer. All share the same global sequence number counter to ensure
>   proper ordering (and interleaving) and to allow the consumers of the
>   log to reliably correlate the events from different facilities.
> 
> - Output of dev_printk() is reliably machine-readable now. In addition
>   to the printed plain text message, it creates a log dictionary with the
>   following properties:
>     SUBSYSTEM=     - the driver-core subsytem name
>     DEVICE=
>       b12:8        - block dev_t
>       c127:3       - char dev_t
>       n8           - netdev ifindex
>       +sound:card0 - subsystem:devname
> 
> - Support for multiple concurrent readers of /dev/kmsg, with read(),
>   seek(), poll() support. Output of message sequence numbers, to allow
>   userspace log consumers to reliably reconnect and reconstruct their
>   state at any given time. After open("/dev/kmsg"), read() always
>   returns *all* buffered records. If only future messages should be
>   read, SEEK_END can be used. In case records get overwritten while
>   /dev/kmsg is held open, or records get faster overwritten than they
>   are read, the next read() will return -EPIPE and the current reading
>   position gets updated to the next available record. The passed
>   sequence numbers allow the log consumer to calculate the amount of
>   lost messages.
> 
>   $ cat /dev/kmsg
>   PRIORITY=5
>   SEQNUM=0
>   TIMESTAMP=0
>   MESSAGE=Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
> 
>   ...
> 
>   PRIORITY=7
>   SEQNUM=268
>   TIMESTAMP=399682
>   MESSAGE=pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
>   SUBSYSTEM=acpi
>   DEVICE=+acpi:PNP0A03:00
> 
>   ...
> 
>   SYSLOG_FACILITY=3
>   PRIORITY=6
>   SEQNUM=863
>   TIMESTAMP=2479024
>   MESSAGE=udevd[71]: starting version 182
> 
>   ...
> 
>   PRIORITY=6
>   SEQNUM=1012
>   TIMESTAMP=4069447
>   MESSAGE=usb 2-1.4: MAC-Address: 02:80:37:e6:12:00
>   SUBSYSTEM=usb
>   DEVICE=c189:130

Hm, this seems to be a conceptual duplication of the console 
tracepoint that include/trace/events/printk.h already offers.

Have you looked at using that and extending it with a new 
tracepoint and new fields (with log priority level) where 
needed?

If you use it via a perf fd then you'll have structured logging, 
a record formatted ring-buffer, multiple readers support, 
filtering support, etc. - and of course you'll have access to a 
lot more tracepoints as well, should the system logging facility 
decide to log MCE events, etc.

I.e. it will provide (vastly) more capabilities, it would be 
much more extensible - and the patch from you would shrink as 
well, drastically.

IMHO the last thing we need is Yet Another Logging Facility.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05 15:05 ` Ingo Molnar
@ 2012-04-05 15:25   ` Kay Sievers
  2012-04-05 17:18     ` Ingo Molnar
  0 siblings, 1 reply; 34+ messages in thread
From: Kay Sievers @ 2012-04-05 15:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Greg Kroah-Hartmann, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra

On Thu, Apr 5, 2012 at 17:05, Ingo Molnar <mingo@kernel.org> wrote:
> * Kay Sievers <kay@vrfy.org> wrote:

>> - Record-based stream instead of the traditional byte stream
>>   buffer. All records carry a 64 bit timestamp, the syslog facility
>>   and priority in the record header.

> Hm, this seems to be a conceptual duplication of the console
> tracepoint that include/trace/events/printk.h already offers.

General purpose kernel logging should not depend on _optional_
facilities mainly intended for debugging and profiling. This is stuff
that should be used on every box out there, very much much unlike
tracing.

I can understand that you propose the trace things, but I'm not
convinced that tracing is the same as system logging, or it will ever
become it.

Tracing is either duplicating everything in the kmsg buffer, or it
will possibly provide dangling pointers into it. Both doesn't sound
too convincing. If printk_time is used, you can even store more
messages in the same buffer as today.

The added code is less than 100 lines, if you remove the export to
/dev/kmsg. We get what we are looking for basically for free, without
to mandate any additional subsystem to built into the kernel/.

I doubt we can here talk about "yet another logging"; it is exactly
the intention _not _ to have another copy of the same data, like
tracing would introduce. This is just an improved version of the good
old kmsg buffer without really changing its interface.

Kay

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 19:59 [PATCH] printk: support structured and multi-facility log messages Kay Sievers
                   ` (3 preceding siblings ...)
  2012-04-05 15:05 ` Ingo Molnar
@ 2012-04-05 17:09 ` Linus Torvalds
  2012-04-05 17:53   ` Linus Torvalds
  2012-04-05 19:47   ` Kay Sievers
  2012-04-07  0:26 ` Jiri Kosina
  5 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-04-05 17:09 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, Greg Kroah-Hartmann

On Wed, Apr 4, 2012 at 12:59 PM, Kay Sievers <kay@vrfy.org> wrote:
> From: Kay Sievers <kay@vrfy.org>
> Subject: printk: support structured and multi-facility log messages

Much nicer than I was afraid of. Gone are the insane UUID ideas.

However, some concerns remain:

>  $ cat /dev/kmsg
>  PRIORITY=5
>  SEQNUM=0
>  TIMESTAMP=0
>  MESSAGE=Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...

Why has this been explicitly designed to be as ugly as humanly
possible, and to be so inconvenient to parse too?

IOW, what's the advantage of the insanity? Especially since PRIORITY,
SEQNUM and TIMESTAMP are fundamental, marking them and separating them
from each other seems totally pointless. It's ugly for humans, and
it's pointlessly harder for automation too, as far as I can tell.

I assume these is some reason for this, but I would want to know what
that reason is. The above is hard to use with grep or any of the
standard unix line-at-a-time tools, so it seems misdesigned. It looks
to be designed for some hacky one-off special parsing tool.

So here's a serious suggestion: aim for "line oriented". And there's a
really nice continuation character that people are getting used to,
and that is widely used in another line-oriented record message model
- SMTP headers. It's "if the first character on a line is a space,
it's a continuation of the previous record".

So if the format was

  priority,seqnum,timestamp,message
  [ possible continuation for multi-line things, and now only the
continuation-lines are tagged about what they are ]

it would be much more easily greppable etc, it would be more readable,
and it would be *easier* to actually parse than that horrible format.

> +/*
> + * The printk log buffer consists of a chain of concatenated variable
> + * length records. Every record starts with a record header, containing
> + * the overall length of the record.

Please also make it very explicit that records have to be "strings".
All your examples are strings, all our history is strings, and I want
to make it clear that we don't do binary data in this thing.

You made the statement that you want to get away from 30 years of
history, but the fact is, "30 years of history" is a damn good
argument for "that thing worked".

Strings work. Embedded random binary crap? Not so much.

No "we're going to hide binary UUID's or firmware images here" crap.

> + * The human readable log message directly follows the message header. The
> + * length of the message text is stored in the header, the message is _not_
> + * terminated by a '\n' or '\0' character.

I agree with having a length specifier, however, you seem to have some
crazy definition where "\n\n" has special meaning.

That's stupid. Don't do it. Make it clear that things are strings, and
the only thing that has special meaning is the NUL character. It's
true in your examples, and it's just a good idea. But the reason I
want to point it out is that from the code and comments, it's clear
that you actually still think that "binary is acceptable" is true.
It's not true, and it needs to not be true.

So this part looks fine, and we do the key/value pairs for other things:

> + * Optionally, a message can carry a dictionary of properties (key/value pairs).
> + * The content of the dictionary is opaque to the printk code, it is stored
> + * only to provide userspace with a machine-readable message context.

But again, "opaque" had better still be "text", not random binary
blobs. It's how environment variables work, it's how we're used to
things, it's just how things work.

So make it opaque extra information, yes, but make sure it's still text.

Because this is just pure and utter *shit*:

> + * A binary value is encoded by not using a '=' character as a delimiter,
> + * but a '\n' character followed directly by a non-aligned inlined little-endian
> + * uint64_t, which specifies the number of binary bytes to read as value.
> + * Example for a dictionary containing a 4 byte long binary value:
> + *  "DEVICE=b12:8\nFIRMWARE_DUMP\n\x04\x00\x00\x00\x00\x00\x00\x00\x11\x12\x13\x14\nDRIVER=ahci\n"

Why? No thank you.

This part gets a big NAK from me.

I don't see the point of this. It's pure crap. There's no advantage.
And the "use an uint64_t" is a classic case of just being a f*cking
moron. There's no way we can *ever* support that kind of sizes for log
messages in the kernel anyway. It's not like we can even support
32-bit random blobs. In fact,  you use a 16-bit value for the
dict-buffer size.

This is the kind of thinking and code that just makes me go "No way,
not today, not ever". It's *stupid*.

So make the rule simple:
 - it's text.
 - it's not binary
 - it's not arbitrarily sized
 - it doesn't have crazy idiotic 64-bit binary numbers encoded in it.
 - '\n\n' isn't special, NUL is. Because that's how we encode things.
If you want to escape something, you escape NUL and special characters
using the standard *text* escaping mechanisms, not using some random
hodge-podge that you just made up.

Other than this issue, I actually don't have any problems with the
code. Most of it seems fairly reasonable. But this one just made me go
"wtf, how can something so reasonable then be so completely crazy in
this regard".

                       Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05 15:25   ` Kay Sievers
@ 2012-04-05 17:18     ` Ingo Molnar
  0 siblings, 0 replies; 34+ messages in thread
From: Ingo Molnar @ 2012-04-05 17:18 UTC (permalink / raw)
  To: Kay Sievers
  Cc: linux-kernel, Greg Kroah-Hartmann, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra


* Kay Sievers <kay@vrfy.org> wrote:

> On Thu, Apr 5, 2012 at 17:05, Ingo Molnar <mingo@kernel.org> wrote:
> > * Kay Sievers <kay@vrfy.org> wrote:
> 
> >> - Record-based stream instead of the traditional byte stream
> >>   buffer. All records carry a 64 bit timestamp, the syslog facility
> >>   and priority in the record header.
> 
> > Hm, this seems to be a conceptual duplication of the console
> > tracepoint that include/trace/events/printk.h already offers.
> 
> General purpose kernel logging should not depend on _optional_ 
> facilities mainly intended for debugging and profiling. This 
> is stuff that should be used on every box out there, very much 
> much unlike tracing.

Firstly, you modify printk() and make it less simple and break 
its ASCII oriented nature - which is my primary objection: if 
you want to emit binary data then use existing binary logging 
facilities.

If you want to extend the ASCII output then work in the ASCII, 
human readable space and improve it.

If you want binary record logging (I'm not convinced printk() 
should do that, btw. - we are keeping it ASCII for damn good 
reasons), use perf software events - check your kernel, those 
are enabled more widely than even CONFIG_PRINTK is enabled - so 
your whole "but I don't want to rely on a non-generic facility" 
objection is bogus: perf events, if you decide to use them, are 
as generic as it gets.

So unless you have better justification for it than what you 
listed so far, or unless you have a better implementation that 
either goes into the better binary logging or into the better 
ASCII logging direction:

NAcked-by: Ingo Molnar <mingo@kernel.org>

Because this one is neither.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05 17:09 ` Linus Torvalds
@ 2012-04-05 17:53   ` Linus Torvalds
  2012-04-13 13:42     ` Stijn Devriendt
  2012-04-05 19:47   ` Kay Sievers
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-04-05 17:53 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, Greg Kroah-Hartmann

On Thu, Apr 5, 2012 at 10:09 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If you want to escape something, you escape NUL and special characters
> using the standard *text* escaping mechanisms, not using some random
> hodge-podge that you just made up.

Btw, we could certainly add more support for escaping into vsnprintf() itself.

For example, we already do all those nice helpers for converting
different forms of common binary data into ascii: "%pU" for uuid's,
'%pS' for symbol addresses etc.

If there really are cases where people want small blobs of data and it
would make sense, we could easily do "%.*pQ" for "quoted blob" or
whatever, where the precision is obviously the size of the blob (so
"%.4pQ" would be a quoted four-byte entity.

We could also extend on the current '%pm', which was really designed
for mac-addresses, but the format is actually just a hex sequence. We
could extend that too to just take the precision into account, and
just default to a six-byte mac address. But we could trivially make
"%.16pm" print out a 16-byte hex string instead.

I would also be very open to having some auto-quoting support for
'%s': it might even be something we could enable by default. So that
if you print out strings that contain special characters (including
'\n' etc), we could make sure that vsnprintf() actually quotes them.
We don't have to follow the POSIX rules for things, after all.

So I'm very open to making it easier to use ASCII.

I'm *very* closed to making any kernel logging facility use binary
data. If you want binary data, go somewhere else. Seriously.

                  Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05 17:09 ` Linus Torvalds
  2012-04-05 17:53   ` Linus Torvalds
@ 2012-04-05 19:47   ` Kay Sievers
  2012-04-06  1:12     ` Joe Perches
  2012-04-11 11:39     ` Eric W. Biederman
  1 sibling, 2 replies; 34+ messages in thread
From: Kay Sievers @ 2012-04-05 19:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Greg Kroah-Hartmann

On Thu, 2012-04-05 at 10:09 -0700, Linus Torvalds wrote:
> On Wed, Apr 4, 2012 at 12:59 PM, Kay Sievers <kay@vrfy.org> wrote:
> > From: Kay Sievers <kay@vrfy.org>
> > Subject: printk: support structured and multi-facility log messages

> Other than this issue, I actually don't have any problems with the
> code. Most of it seems fairly reasonable. But this one just made me go
> "wtf, how can something so reasonable then be so completely crazy in
> this regard".

Binary support is just what we are required to support in userspace
logging. We can surely live without that in the kernel.

What we are looking for is mainly the 'context' of the message, and not
possible binary data; and that will still be provided the same way with
the now simpler and more line-based output.

Better?

  [root@mop ~]# cat /dev/kmsg
  5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
  6,1,0;Command line: root=/dev/sda1 console=tty0 console=ttyS0
  6,2,0;BIOS-provided physical RAM map:
  6,3,0; BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
  6,4,0; BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
  6,5,0; BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
  6,6,0; BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
  6 ,7,0; BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
  6,8,0; BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
  6,9,0; BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
  6,10,0;NX (Execute Disable) protection: active
  6,11,0;DMI 2.4 present.
  [...]
  4,382,6189353;sr0: scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
  6,383,6190667;cdrom: Uniform CD-ROM driver Revision: 3.20
  7,384,6193747;sr 1:0:0:0: Attached scsi CD-ROM sr0
   SUBSYSTEM=scsi
   DEVICE=+scsi:1:0:0:0
  6,385,9202973;EXT4-fs (sda1): re-mounted. Opts: (null)
  6,386,10929285;Bluetooth: Core ver 2.16
  6,387,10930852;NET: Registered protocol family 31
  6,388,10932686;Bluetooth: HCI device and connection manager initialized
  6,389,10935723;Bluetooth: HCI socket layer initialized
  6,390,10937847;Bluetooth: L2CAP socket layer initialized
  ...


Thanks,
Kay



From: Kay Sievers <kay@vrfy.org>
Subject: printk: support structured and multi-facility log messages

Kernel log messages are the primary source of information about the overall
state of the system and connected devices. Traditional kernel messages are
mostly human language, targeted at a human reading them. This part of the
picture works very well since a very long time.

However, most machines run unattended almost all of their time, and
software, and not humans, need to process the kernel messages. Having
a machine making sense out of human language messages is inefficient,
unreliable, and sometimes plain impossible to get right. With human
language messages all useful information about their context,
available at the time of creation of the messages, is just thrown
away. Later, software consumers of the messages will need to apply
magic to reconstruct what the context might have been, to be able to
interpret the messages.

This patch extends printk() to be able to attach arbitrary key/value
pairs to logged messages, to carry machine-readable data which
describes the context of the log message at time of its
creation. Users of the log can retrieve, along with the human-readable
message, a key/value dictionary to reliably identify specific devices,
drivers, subsystems, classes and types of messages.

Various features of this patch:

- Record-based stream instead of the traditional byte stream
  buffer. All records carry a 64 bit timestamp, the syslog facility
  and priority in the record header.

- Records consume almost the same amount, sometimes less memory than
  the traditional byte stream buffer (if printk_time is enabled). The record
  header is 16 bytes long, plus some padding bytes at the end if needed.
  The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
  the timestamp and a newline.

- Buffer management is based on message sequence numbers. When records
  need to be discarded, the reading heads move on to the next full
  record. Unlike the byte-stream buffer, no old logged lines get
  truncated or partly overwritten by new ones. Sequence numbers also
  allow consumers of the log stream to get notified if any message in
  the stream they are about to read gets discarded during the time
  of reading.

- Better buffered IO support for KERN_CONT continuation lines, when printk()
  is called multiple times for a single line. The use of KERN_CONT is now
  mandatory to use continuation; a few places in the kernel need trivial fixes
  here. The buffering could possibly be extended to per-cpu variables to allow
  better thread-safety for multiple printk() invocations for a single line.

- Full-featured syslog facility value support. Different facilities
  can tag their messages. All userspace-injected messages enforce a
  facility value > 0 now, to be able to reliably distinguish them from
  the kernel-generated messages. Independent subsystems like a
  baseband processor running its own firmware, or a kernel-related
  userspace process can use their own unique facility values. Multiple
  independent log streams can co-exist that way in the same
  buffer. All share the same global sequence number counter to ensure
  proper ordering (and interleaving) and to allow the consumers of the
  log to reliably correlate the events from different facilities.

- Output of dev_printk() is reliably machine-readable now. In addition
  to the printed plain text message, it creates a log dictionary with the
  following properties:
    SUBSYSTEM=     - the driver-core subsytem name
    DEVICE=
      b12:8        - block dev_t
      c127:3       - char dev_t
      n8           - netdev ifindex
      +sound:card0 - subsystem:devname

- Support for multiple concurrent readers of /dev/kmsg, with read(),
  seek(), poll() support. Output of message sequence numbers, to allow
  userspace log consumers to reliably reconnect and reconstruct their
  state at any given time. After open("/dev/kmsg"), read() always
  returns *all* buffered records. If only future messages should be
  read, SEEK_END can be used. In case records get overwritten while
  /dev/kmsg is held open, or records get faster overwritten than they
  are read, the next read() will return -EPIPE and the current reading
  position gets updated to the next available record. The passed
  sequence numbers allow the log consumer to calculate the amount of
  lost messages.

  [root@mop ~]# cat /dev/kmsg
  5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
  6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
  7,160,424069;pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
   SUBSYSTEM=acpi
   DEVICE=+acpi:PNP0A03:00
  6,339,5140900;NET: Registered protocol family 10
  30,340,5690716;udevd[80]: starting version 181
  6,341,6081421;FDC 0 is a S82078B
  6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
  7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
   SUBSYSTEM=scsi
   DEVICE=+scsi:1:0:0:0
  6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0

Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
---
 drivers/base/core.c    |   52 +
 drivers/char/mem.c     |   40 -
 include/linux/printk.h |   13 
 kernel/printk.c        | 1333 +++++++++++++++++++++++++++++++++----------------
 4 files changed, 969 insertions(+), 469 deletions(-)

--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -25,6 +25,7 @@
 #include <linux/mutex.h>
 #include <linux/async.h>
 #include <linux/pm_runtime.h>
+#include <linux/netdevice.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -1843,15 +1844,60 @@ void device_shutdown(void)
  */
 
 #ifdef CONFIG_PRINTK
-
 int __dev_printk(const char *level, const struct device *dev,
 		 struct va_format *vaf)
 {
+	char dict[128];
+	size_t dictlen = 0;
+	const char *subsys;
+
 	if (!dev)
 		return printk("%s(NULL device *): %pV", level, vaf);
 
-	return printk("%s%s %s: %pV",
-		      level, dev_driver_string(dev), dev_name(dev), vaf);
+	if (dev->class)
+		subsys = dev->class->name;
+	else if (dev->bus)
+		subsys = dev->bus->name;
+	else
+		subsys = "(NULL subsystem)";
+
+	dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+			    "SUBSYSTEM=%s", subsys);
+
+	/*
+	 * Add device identifier DEVICE=:
+	 *   b12:8         block dev_t
+	 *   c127:3        char dev_t
+	 *   n8            netdev ifindex
+	 *   +sound:card0  subsystem:devname
+	 */
+	if (MAJOR(dev->devt)) {
+		char c;
+
+		if (strcmp(subsys, "block") == 0)
+			c = 'b';
+		else
+			c = 'c';
+		dictlen++;
+		dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+				   "DEVICE=%c%u:%u",
+				   c, MAJOR(dev->devt), MINOR(dev->devt));
+	} else if (strcmp(subsys, "net") == 0) {
+		struct net_device *net = to_net_dev(dev);
+
+		dictlen++;
+		dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+				    "DEVICE=n%u", net->ifindex);
+	} else {
+		dictlen++;
+		dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
+				    "DEVICE=+%s:%s", subsys, dev_name(dev));
+	}
+
+	return printk_emit(0, level[1] - '0',
+			   dict, dictlen,
+			   "%s %s: %pV",
+			   dev_driver_string(dev), dev_name(dev), vaf);
 }
 EXPORT_SYMBOL(__dev_printk);
 
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -807,44 +807,6 @@ static const struct file_operations oldm
 };
 #endif
 
-static ssize_t kmsg_writev(struct kiocb *iocb, const struct iovec *iv,
-			   unsigned long count, loff_t pos)
-{
-	char *line, *p;
-	int i;
-	ssize_t ret = -EFAULT;
-	size_t len = iov_length(iv, count);
-
-	line = kmalloc(len + 1, GFP_KERNEL);
-	if (line == NULL)
-		return -ENOMEM;
-
-	/*
-	 * copy all vectors into a single string, to ensure we do
-	 * not interleave our log line with other printk calls
-	 */
-	p = line;
-	for (i = 0; i < count; i++) {
-		if (copy_from_user(p, iv[i].iov_base, iv[i].iov_len))
-			goto out;
-		p += iv[i].iov_len;
-	}
-	p[0] = '\0';
-
-	ret = printk("%s", line);
-	/* printk can add a prefix */
-	if (ret > len)
-		ret = len;
-out:
-	kfree(line);
-	return ret;
-}
-
-static const struct file_operations kmsg_fops = {
-	.aio_write = kmsg_writev,
-	.llseek = noop_llseek,
-};
-
 static const struct memdev {
 	const char *name;
 	umode_t mode;
@@ -863,7 +825,7 @@ static const struct memdev {
 	 [7] = { "full", 0666, &full_fops, NULL },
 	 [8] = { "random", 0666, &random_fops, NULL },
 	 [9] = { "urandom", 0666, &urandom_fops, NULL },
-	[11] = { "kmsg", 0, &kmsg_fops, NULL },
+	[11] = { "kmsg", 0644, &kmsg_fops, NULL },
 #ifdef CONFIG_CRASH_DUMP
 	[12] = { "oldmem", 0, &oldmem_fops, NULL },
 #endif
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -95,8 +95,19 @@ extern int printk_needs_cpu(int cpu);
 extern void printk_tick(void);
 
 #ifdef CONFIG_PRINTK
+asmlinkage __printf(5, 0)
+int vprintk_emit(int facility, int level,
+		 const char *dict, size_t dictlen,
+		 const char *fmt, va_list args);
+
 asmlinkage __printf(1, 0)
 int vprintk(const char *fmt, va_list args);
+
+asmlinkage __printf(5, 6) __cold
+asmlinkage int printk_emit(int facility, int level,
+			   const char *dict, size_t dictlen,
+			   const char *fmt, ...);
+
 asmlinkage __printf(1, 2) __cold
 int printk(const char *fmt, ...);
 
@@ -289,6 +300,8 @@ extern void dump_stack(void) __cold;
 	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #endif
 
+extern const struct file_operations kmsg_fops;
+
 enum {
 	DUMP_PREFIX_NONE,
 	DUMP_PREFIX_ADDRESS,
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -41,6 +41,7 @@
 #include <linux/cpu.h>
 #include <linux/notifier.h>
 #include <linux/rculist.h>
+#include <linux/poll.h>
 
 #include <asm/uaccess.h>
 
@@ -54,8 +55,6 @@ void asmlinkage __attribute__((weak)) ea
 {
 }
 
-#define __LOG_BUF_LEN	(1 << CONFIG_LOG_BUF_SHIFT)
-
 /* printk's without a loglevel use this.. */
 #define DEFAULT_MESSAGE_LOGLEVEL CONFIG_DEFAULT_MESSAGE_LOGLEVEL
 
@@ -99,24 +98,6 @@ EXPORT_SYMBOL_GPL(console_drivers);
 static int console_locked, console_suspended;
 
 /*
- * logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars
- * It is also used in interesting ways to provide interlocking in
- * console_unlock();.
- */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-#define LOG_BUF_MASK (log_buf_len-1)
-#define LOG_BUF(idx) (log_buf[(idx) & LOG_BUF_MASK])
-
-/*
- * The indices into log_buf are not constrained to log_buf_len - they
- * must be masked before subscripting
- */
-static unsigned log_start;	/* Index into log_buf: next char to be read by syslog() */
-static unsigned con_start;	/* Index into log_buf: next char to be sent to consoles */
-static unsigned log_end;	/* Index into log_buf: most-recently-written-char + 1 */
-
-/*
  * If exclusive_console is non-NULL then only this console is to be printed to.
  */
 static struct console *exclusive_console;
@@ -146,12 +127,488 @@ EXPORT_SYMBOL(console_set_on_cmdline);
 static int console_may_schedule;
 
 #ifdef CONFIG_PRINTK
+/*
+ * The printk log buffer consists of a chain of concatenated variable
+ * length records. Every record starts with a record header, containing
+ * the overall length of the record.
+ *
+ * The heads to the first and last entry in the buffer, as well as the
+ * sequence numbers of these both entries are maintained when messages
+ * are stored..
+ *
+ * If the heads indicate available messages, the length in the header
+ * tells the start next message. A length == 0 for the next message
+ * indicates a wrap-around to the beginning of the buffer.
+ *
+ * Every record carries the monotonic timestamp in microseconds, as well as
+ * the standard userspace syslog level and syslog facility. The usual
+ * kernel messages use LOG_KERN; userspace-injected messages always carry
+ * a matching syslog facility, by default LOG_USER. The origin of every
+ * message can be reliably determined that way.
+ *
+ * The human readable log message directly follows the message header. The
+ * length of the message text is stored in the header, the stored message
+ * is not terminated.
+ *
+ * Optionally, a message can carry a dictionary of properties (key/value pairs),
+ * to provide userspace with a machine-readable message context.
+ *
+ * Examples for well-defined, commonly used property names are:
+ *   DEVICE=b12:8               device identifier
+ *                                b12:8         block dev_t
+ *                                c127:3        char dev_t
+ *                                n8            netdev ifindex
+ *                                +sound:card0  subsystem:devname
+ *   SUBSYSTEM=pci              driver-core subsystem name
+ *
+ * Valid characters in property names are [a-zA-Z0-9.-_]. The plain text value
+ * follows directly after a '=' character. Every property is terminated by
+ * a '\0' character. The last property is not terminated.
+ *
+ * Example of a message structure:
+ *   0000  ff 8f 00 00 00 00 00 00      monotonic time in nsec
+ *   0008  34 00                        record is 52 bytes long
+ *   000a        0b 00                  text is 11 bytes long
+ *   000c              1f 00            dictionary is 23 bytes long
+ *   000e                    03 00      LOG_KERN (facility) LOG_ERR (level)
+ *   0010  69 74 27 73 20 61 20 6c      "it's a l"
+ *         69 6e 65                     "ine"
+ *   001b           44 45 56 49 43      "DEVIC"
+ *         45 3d 62 38 3a 32 00 44      "E=b8:2\0D"
+ *         52 49 56 45 52 3d 62 75      "RIVER=bu"
+ *         67                           "g"
+ *   0032     00 00 00                  padding to next message header
+ *
+ * The 'struct log' buffer header must never be directly exported to
+ * userspace, it is a kernel-private implementation detail that might
+ * need to be changed in the future, when the requirements change.
+ *
+ * /dev/kmsg exports the structured data in the following line format:
+ *   "level,sequnum,timestamp;<message text>\n"
+ *
+ * The optional key/value pairs are attached as continuation lines starting
+ * with a space character and terminated by a newline. All possible
+ * non-printable characters are escaped in the "\xff" notation.
+ *
+ * Users of the export format should ignore possible additional values
+ * separated by ',', and find the message after the ';' character.
+ */
+
+struct log {
+	u64 ts_nsec;		/* timestamp in nanoseconds */
+	u16 len;		/* length of entire record */
+	u16 text_len;		/* length of text buffer */
+	u16 dict_len;		/* length of dictionary buffer */
+	u16 level;		/* syslog level + facility */
+};
 
+/*
+ * The logbuf_lock protects kmsg buffer, indices, counters. It is also
+ * used in interesting ways to provide interlocking in console_unlock();
+ */
+static DEFINE_RAW_SPINLOCK(logbuf_lock);
+
+/* cpu currently holding logbuf_lock */
+static volatile unsigned int logbuf_cpu = UINT_MAX;
+
+#define LOG_LINE_MAX 1024
+
+/* record buffer */
+#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
 static char __log_buf[__LOG_BUF_LEN];
 static char *log_buf = __log_buf;
-static int log_buf_len = __LOG_BUF_LEN;
-static unsigned logged_chars; /* Number of chars produced since last read+clear operation */
-static int saved_console_loglevel = -1;
+static u32 log_buf_len = __LOG_BUF_LEN;
+
+/* index and sequence number of the first record stored in the buffer */
+static u64 log_first_seq;
+static u32 log_first_idx;
+
+/* index and sequence number of the next record to store in the buffer */
+static u64 log_next_seq;
+static u32 log_next_idx;
+
+/* the next printk record to read after the last 'clear' command */
+static u64 clear_seq;
+static u32 clear_idx;
+
+/* the next printk record to read by syslog(READ) or /proc/kmsg */
+static u64 syslog_seq;
+static u32 syslog_idx;
+
+/* human readable text of the record */
+static char *log_text(const struct log *msg)
+{
+	return (char *)msg + sizeof(struct log);
+}
+
+/* optional key/value pair dictionary attached to the record */
+static char *log_dict(const struct log *msg)
+{
+	return (char *)msg + sizeof(struct log) + msg->text_len;
+}
+
+/* get record by index; idx must point to valid msg */
+static struct log *log_from_idx(u32 idx)
+{
+	struct log *msg = (struct log *)(log_buf + idx);
+
+	/*
+	 * A length == 0 record is the end of buffer marker. Wrap around and
+	 * read the message at the start of the buffer.
+	 */
+	if (!msg->len)
+		return (struct log *)log_buf;
+	return msg;
+}
+
+/* get next record; idx must point to valid msg */
+static u32 log_next(u32 idx)
+{
+	struct log *msg = (struct log *)(log_buf + idx);
+
+	/* length == 0 indicates the end of the buffer; wrap */
+	/*
+	 * A length == 0 record is the end of buffer marker. Wrap around and
+	 * read the message at the start of the buffer as *this* one, and
+	 * return the one after that.
+	 */
+	if (!msg->len) {
+		msg = (struct log *)log_buf;
+		return msg->len;
+	}
+	return idx + msg->len;
+}
+
+#if !defined(CONFIG_64BIT) || defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+#define LOG_ALIGN 4
+#else
+#define LOG_ALIGN 8
+#endif
+
+/* insert record into the buffer, discard old ones, update heads */
+static void log_store(int facility, int level,
+		      const char *dict, u16 dict_len,
+		      const char *text, u16 text_len)
+{
+	struct log *msg;
+	u32 size, pad_len;
+
+	/* number of '\0' padding bytes to next message */
+	size = sizeof(struct log) + text_len + dict_len;
+	pad_len = (-size) & (LOG_ALIGN - 1);
+	size += pad_len;
+
+	while (log_first_seq < log_next_seq) {
+		u32 free;
+
+		if (log_next_idx > log_first_idx)
+			free = max(log_buf_len - log_next_idx, log_first_idx);
+		else
+			free = log_first_idx - log_next_idx;
+
+		if (free > size + sizeof(struct log))
+			break;
+
+		/* drop old messages until we have enough contiuous space */
+		log_first_idx = log_next(log_first_idx);
+		log_first_seq++;
+	}
+
+	if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
+		/*
+		 * This message + an additional empty header does not fit
+		 * at the end of the buffer. Add an empty header with len == 0
+		 * to signify a wrap around.
+		 */
+		memset(log_buf + log_next_idx, 0, sizeof(struct log));
+		log_next_idx = 0;
+	}
+
+	/* fill message */
+	msg = (struct log *)(log_buf + log_next_idx);
+	memcpy(log_text(msg), text, text_len);
+	msg->text_len = text_len;
+	memcpy(log_dict(msg), dict, dict_len);
+	msg->dict_len = dict_len;
+	msg->level = (facility << 3) | (level & 7);
+	msg->ts_nsec = local_clock();
+	memset(log_dict(msg) + dict_len, 0, pad_len);
+	msg->len = sizeof(struct log) + text_len + dict_len + pad_len;
+
+	/* insert message */
+	log_next_idx += msg->len;
+	log_next_seq++;
+}
+
+/* /dev/kmsg - userspace message inject/listen interface */
+struct devkmsg_user {
+	u64 seq;
+	u32 idx;
+	struct mutex lock;
+	char buf[8192];
+};
+
+static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,
+			      unsigned long count, loff_t pos)
+{
+	char *buf, *line;
+	int i;
+	int level = default_message_loglevel;
+	int facility = 1;	/* LOG_USER */
+	size_t len = iov_length(iv, count);
+	ssize_t ret = len;
+
+	if (len > LOG_LINE_MAX)
+		return -EINVAL;
+	buf = kmalloc(len+1, GFP_KERNEL);
+	if (buf == NULL)
+		return -ENOMEM;
+
+	line = buf;
+	for (i = 0; i < count; i++) {
+		if (copy_from_user(line, iv[i].iov_base, iv[i].iov_len))
+			goto out;
+		line += iv[i].iov_len;
+	}
+
+	/*
+	 * Extract and skip the syslog prefix <[0-9]*>. Coming from userspace
+	 * the decimal value represents 32bit, the lower 3 bit are the log
+	 * level, the rest are the log facility.
+	 *
+	 * If no prefix or no userspace facility is specified, we
+	 * enforce LOG_USER, to be able to reliably distinguish
+	 * kernel-generated messages from userspace-injected ones.
+	 */
+	line = buf;
+	if (line[0] == '<') {
+		char *endp = NULL;
+
+		i = simple_strtoul(line+1, &endp, 10);
+		if (endp && endp[0] == '>') {
+			level = i & 7;
+			if (i >> 3)
+				facility = i >> 3;
+			endp++;
+			len -= endp - line;
+			line = endp;
+		}
+	}
+	line[len] = '\0';
+
+	printk_emit(facility, level, NULL, 0, "%s", line);
+out:
+	kfree(buf);
+	return ret;
+}
+
+static ssize_t devkmsg_read(struct file *file, char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	struct devkmsg_user *user = file->private_data;
+	struct log *msg;
+	size_t i;
+	size_t len;
+	ssize_t ret;
+
+	if (!user)
+		return -EBADF;
+
+	mutex_lock(&user->lock);
+	raw_spin_lock(&logbuf_lock);
+	while (user->seq == log_next_seq) {
+		if (file->f_flags & O_NONBLOCK) {
+			ret = -EAGAIN;
+			raw_spin_unlock(&logbuf_lock);
+			goto out;
+		}
+
+		raw_spin_unlock(&logbuf_lock);
+		ret = wait_event_interruptible(log_wait,
+					       user->seq != log_next_seq);
+		if (ret)
+			goto out;
+		raw_spin_lock(&logbuf_lock);
+	}
+
+	if (user->seq < log_first_seq) {
+		/* our last seen message is gone, return error and reset */
+		user->idx = log_first_idx;
+		user->seq = log_first_seq;
+		ret = -EPIPE;
+		raw_spin_unlock(&logbuf_lock);
+		goto out;
+	}
+
+	msg = log_from_idx(user->idx);
+	len = sprintf(user->buf, "%u,%llu,%llu;",
+		      msg->level, user->seq, msg->ts_nsec / 1000);
+
+	/* escape non-printable characters */
+	for (i = 0; i < msg->text_len; i++) {
+		char c = log_text(msg)[i];
+
+		if (c < ' ' || c >= 128)
+			len += sprintf(user->buf + len, "\\x%02x", c);
+		else
+			user->buf[len++] = c;
+	}
+	user->buf[len++] = '\n';
+
+	if (msg->dict_len) {
+		bool line = true;
+
+		for (i = 0; i < msg->dict_len; i++) {
+			char c = log_dict(msg)[i];
+
+			if (line) {
+				user->buf[len++] = ' ';
+				line = false;
+			}
+
+			if (c == '\0') {
+				user->buf[len++] = '\n';
+				line = true;
+				continue;
+			}
+
+			if (c < ' ' || c >= 128) {
+				len += sprintf(user->buf + len, "\\x%02x", c);
+				continue;
+			}
+
+			user->buf[len++] = c;
+		}
+		user->buf[len++] = '\n';
+	}
+
+	user->idx = log_next(user->idx);
+	user->seq++;
+	raw_spin_unlock(&logbuf_lock);
+
+	if (len > count) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (copy_to_user(buf, user->buf, len)) {
+		ret = -EFAULT;
+		goto out;
+	}
+	ret = len;
+out:
+	mutex_unlock(&user->lock);
+	return ret;
+}
+
+static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct devkmsg_user *user = file->private_data;
+	loff_t ret = 0;
+
+	if (!user)
+		return -EBADF;
+	if (offset)
+		return -ESPIPE;
+
+	raw_spin_lock(&logbuf_lock);
+	switch (whence) {
+	case SEEK_SET:
+		/* the first record */
+		user->idx = log_first_idx;
+		user->seq = log_first_seq;
+		break;
+	case SEEK_DATA:
+		/*
+		 * The first record after the last SYSLOG_ACTION_CLEAR,
+		 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
+		 * changes no global state, and does not clear anything.
+		 */
+		user->idx = clear_idx;
+		user->seq = clear_seq;
+		break;
+	case SEEK_END:
+		/* after the last record */
+		user->idx = log_next_idx;
+		user->seq = log_next_seq;
+		break;
+	default:
+		ret = -EINVAL;
+	}
+	raw_spin_unlock(&logbuf_lock);
+	return ret;
+}
+
+static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
+{
+	struct devkmsg_user *user = file->private_data;
+	int ret = 0;
+
+	if (!user)
+		return POLLERR|POLLNVAL;
+
+	poll_wait(file, &log_wait, wait);
+
+	raw_spin_lock(&logbuf_lock);
+	if (user->seq < log_next_seq) {
+		/* return error when data has vanished underneath us */
+		if (user->seq < log_first_seq)
+			ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
+		ret = POLLIN|POLLRDNORM;
+	}
+	raw_spin_unlock(&logbuf_lock);
+
+	return ret;
+}
+
+static int devkmsg_open(struct inode *inode, struct file *file)
+{
+	struct devkmsg_user *user;
+	int err;
+
+	/* write-only does not need any file context */
+	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+		return 0;
+
+	err = security_syslog(SYSLOG_ACTION_READ_ALL);
+	if (err)
+		return err;
+
+	user = kmalloc(sizeof(struct devkmsg_user), GFP_KERNEL);
+	if (!user)
+		return -ENOMEM;
+
+	mutex_init(&user->lock);
+
+	raw_spin_lock(&logbuf_lock);
+	user->idx = log_first_idx;
+	user->seq = log_first_seq;
+	raw_spin_unlock(&logbuf_lock);
+
+	file->private_data = user;
+	return 0;
+}
+
+static int devkmsg_release(struct inode *inode, struct file *file)
+{
+	struct devkmsg_user *user = file->private_data;
+
+	if (!user)
+		return 0;
+
+	mutex_destroy(&user->lock);
+	kfree(user);
+	return 0;
+}
+
+const struct file_operations kmsg_fops = {
+	.open = devkmsg_open,
+	.read = devkmsg_read,
+	.aio_write = devkmsg_writev,
+	.llseek = devkmsg_llseek,
+	.poll = devkmsg_poll,
+	.release = devkmsg_release,
+};
 
 #ifdef CONFIG_KEXEC
 /*
@@ -165,9 +622,9 @@ static int saved_console_loglevel = -1;
 void log_buf_kexec_setup(void)
 {
 	VMCOREINFO_SYMBOL(log_buf);
-	VMCOREINFO_SYMBOL(log_end);
 	VMCOREINFO_SYMBOL(log_buf_len);
-	VMCOREINFO_SYMBOL(logged_chars);
+	VMCOREINFO_SYMBOL(log_first_idx);
+	VMCOREINFO_SYMBOL(log_next_idx);
 }
 #endif
 
@@ -191,7 +648,6 @@ early_param("log_buf_len", log_buf_len_s
 void __init setup_log_buf(int early)
 {
 	unsigned long flags;
-	unsigned start, dest_idx, offset;
 	char *new_log_buf;
 	int free;
 
@@ -219,20 +675,8 @@ void __init setup_log_buf(int early)
 	log_buf_len = new_log_buf_len;
 	log_buf = new_log_buf;
 	new_log_buf_len = 0;
-	free = __LOG_BUF_LEN - log_end;
-
-	offset = start = min(con_start, log_start);
-	dest_idx = 0;
-	while (start != log_end) {
-		unsigned log_idx_mask = start & (__LOG_BUF_LEN - 1);
-
-		log_buf[dest_idx] = __log_buf[log_idx_mask];
-		start++;
-		dest_idx++;
-	}
-	log_start -= offset;
-	con_start -= offset;
-	log_end -= offset;
+	free = __LOG_BUF_LEN - log_next_idx;
+	memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	pr_info("log_buf_len: %d\n", log_buf_len);
@@ -332,11 +776,165 @@ static int check_syslog_permissions(int
 	return 0;
 }
 
+#if defined(CONFIG_PRINTK_TIME)
+static bool printk_time = 1;
+#else
+static bool printk_time;
+#endif
+module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
+
+static int syslog_print_line(u32 idx, char *text, size_t size)
+{
+	struct log *msg;
+	size_t len;
+
+	msg = log_from_idx(idx);
+	if (!text) {
+		/* calculate length only */
+		len = 3;
+
+		if (msg->level > 9)
+			len++;
+		if (msg->level > 99)
+			len++;
+
+		if (printk_time)
+			len += 15;
+
+		len += msg->text_len;
+		len++;
+		return len;
+	}
+
+	len = sprintf(text, "<%u>", msg->level);
+
+	if (printk_time) {
+		unsigned long long t = msg->ts_nsec;
+		unsigned long rem_ns = do_div(t, 1000000000);
+
+		len += sprintf(text + len, "[%5lu.%06lu] ",
+				   (unsigned long) t, rem_ns / 1000);
+	}
+
+	if (len + msg->text_len > size)
+		return -EINVAL;
+	memcpy(text + len, log_text(msg), msg->text_len);
+	len += msg->text_len;
+	text[len++] = '\n';
+	return len;
+}
+
+static int syslog_print(char __user *buf, int size)
+{
+	char *text;
+	int len;
+
+	text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+	if (!text)
+		return -ENOMEM;
+
+	raw_spin_lock_irq(&logbuf_lock);
+	if (syslog_seq < log_first_seq) {
+		/* messages are gone, move to first one */
+		syslog_seq = log_first_seq;
+		syslog_idx = log_first_idx;
+	}
+	len = syslog_print_line(syslog_idx, text, LOG_LINE_MAX);
+	syslog_idx = log_next(syslog_idx);
+	syslog_seq++;
+	raw_spin_unlock_irq(&logbuf_lock);
+
+	if (len > 0 && copy_to_user(buf, text, len))
+		len = -EFAULT;
+
+	kfree(text);
+	return len;
+}
+
+static int syslog_print_all(char __user *buf, int size, bool clear)
+{
+	char *text;
+	int len = 0;
+
+	text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+	if (!text)
+		return -ENOMEM;
+
+	raw_spin_lock_irq(&logbuf_lock);
+	if (buf) {
+		u64 next_seq;
+		u64 seq;
+		u32 idx;
+
+		if (clear_seq < log_first_seq) {
+			/* messages are gone, move to first available one */
+			clear_seq = log_first_seq;
+			clear_idx = log_first_idx;
+		}
+
+		/*
+		 * Find first record that fits, including all following records,
+		 * into the user-provided buffer for this dump.
+		*/
+		seq = clear_seq;
+		idx = clear_idx;
+		while (seq < log_next_seq) {
+			len += syslog_print_line(idx, NULL, 0);
+			idx = log_next(idx);
+			seq++;
+		}
+		seq = clear_seq;
+		idx = clear_idx;
+		while (len > size && seq < log_next_seq) {
+			len -= syslog_print_line(idx, NULL, 0);
+			idx = log_next(idx);
+			seq++;
+		}
+
+		/* last message in this dump */
+		next_seq = log_next_seq;
+
+		len = 0;
+		while (len >= 0 && seq < next_seq) {
+			int textlen;
+
+			textlen = syslog_print_line(idx, text, LOG_LINE_MAX);
+			if (textlen < 0) {
+				len = textlen;
+				break;
+			}
+			idx = log_next(idx);
+			seq++;
+
+			raw_spin_unlock_irq(&logbuf_lock);
+			if (copy_to_user(buf + len, text, textlen))
+				len = -EFAULT;
+			else
+				len += textlen;
+			raw_spin_lock_irq(&logbuf_lock);
+
+			if (seq < log_first_seq) {
+				/* messages are gone, move to next one */
+				seq = log_first_seq;
+				idx = log_first_idx;
+			}
+		}
+	}
+
+	if (clear) {
+		clear_seq = log_next_seq;
+		clear_idx = log_next_idx;
+	}
+	raw_spin_unlock_irq(&logbuf_lock);
+
+	kfree(text);
+	return len;
+}
+
 int do_syslog(int type, char __user *buf, int len, bool from_file)
 {
-	unsigned i, j, limit, count;
-	int do_clear = 0;
-	char c;
+	bool clear = false;
+	static int saved_console_loglevel = -1;
 	int error;
 
 	error = check_syslog_permissions(type, from_file);
@@ -364,28 +962,14 @@ int do_syslog(int type, char __user *buf
 			goto out;
 		}
 		error = wait_event_interruptible(log_wait,
-							(log_start - log_end));
+						 syslog_seq != log_next_seq);
 		if (error)
 			goto out;
-		i = 0;
-		raw_spin_lock_irq(&logbuf_lock);
-		while (!error && (log_start != log_end) && i < len) {
-			c = LOG_BUF(log_start);
-			log_start++;
-			raw_spin_unlock_irq(&logbuf_lock);
-			error = __put_user(c,buf);
-			buf++;
-			i++;
-			cond_resched();
-			raw_spin_lock_irq(&logbuf_lock);
-		}
-		raw_spin_unlock_irq(&logbuf_lock);
-		if (!error)
-			error = i;
+		error = syslog_print(buf, len);
 		break;
 	/* Read/clear last kernel messages */
 	case SYSLOG_ACTION_READ_CLEAR:
-		do_clear = 1;
+		clear = true;
 		/* FALL THRU */
 	/* Read last kernel messages */
 	case SYSLOG_ACTION_READ_ALL:
@@ -399,52 +983,11 @@ int do_syslog(int type, char __user *buf
 			error = -EFAULT;
 			goto out;
 		}
-		count = len;
-		if (count > log_buf_len)
-			count = log_buf_len;
-		raw_spin_lock_irq(&logbuf_lock);
-		if (count > logged_chars)
-			count = logged_chars;
-		if (do_clear)
-			logged_chars = 0;
-		limit = log_end;
-		/*
-		 * __put_user() could sleep, and while we sleep
-		 * printk() could overwrite the messages
-		 * we try to copy to user space. Therefore
-		 * the messages are copied in reverse. <manfreds>
-		 */
-		for (i = 0; i < count && !error; i++) {
-			j = limit-1-i;
-			if (j + log_buf_len < log_end)
-				break;
-			c = LOG_BUF(j);
-			raw_spin_unlock_irq(&logbuf_lock);
-			error = __put_user(c,&buf[count-1-i]);
-			cond_resched();
-			raw_spin_lock_irq(&logbuf_lock);
-		}
-		raw_spin_unlock_irq(&logbuf_lock);
-		if (error)
-			break;
-		error = i;
-		if (i != count) {
-			int offset = count-error;
-			/* buffer overflow during copy, correct user buffer. */
-			for (i = 0; i < error; i++) {
-				if (__get_user(c,&buf[i+offset]) ||
-				    __put_user(c,&buf[i])) {
-					error = -EFAULT;
-					break;
-				}
-				cond_resched();
-			}
-		}
+		error = syslog_print_all(buf, len, clear);
 		break;
 	/* Clear ring buffer */
 	case SYSLOG_ACTION_CLEAR:
-		logged_chars = 0;
-		break;
+		syslog_print_all(NULL, 0, true);
 	/* Disable logging to console */
 	case SYSLOG_ACTION_CONSOLE_OFF:
 		if (saved_console_loglevel == -1)
@@ -472,7 +1015,33 @@ int do_syslog(int type, char __user *buf
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		error = log_end - log_start;
+		raw_spin_lock_irq(&logbuf_lock);
+		if (syslog_seq < log_first_seq) {
+			/* messages are gone, move to first one */
+			syslog_seq = log_first_seq;
+			syslog_idx = log_first_idx;
+		}
+		if (from_file) {
+			/*
+			 * Short-cut for poll(/"proc/kmsg") which simply checks
+			 * for pending data, not the size; return the count of
+			 * records, not the length.
+			 */
+			error = log_next_idx - syslog_idx;
+		} else {
+			u64 seq;
+			u32 idx;
+
+			error = 0;
+			seq = syslog_seq;
+			idx = syslog_idx;
+			while (seq < log_next_seq) {
+				error += syslog_print_line(idx, NULL, 0);
+				idx = log_next(idx);
+				seq++;
+			}
+		}
+		raw_spin_unlock_irq(&logbuf_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -501,29 +1070,11 @@ void kdb_syslog_data(char *syslog_data[4
 {
 	syslog_data[0] = log_buf;
 	syslog_data[1] = log_buf + log_buf_len;
-	syslog_data[2] = log_buf + log_end -
-		(logged_chars < log_buf_len ? logged_chars : log_buf_len);
-	syslog_data[3] = log_buf + log_end;
+	syslog_data[2] = log_buf + log_first_idx;
+	syslog_data[3] = log_buf + log_next_idx;
 }
 #endif	/* CONFIG_KGDB_KDB */
 
-/*
- * Call the console drivers on a range of log_buf
- */
-static void __call_console_drivers(unsigned start, unsigned end)
-{
-	struct console *con;
-
-	for_each_console(con) {
-		if (exclusive_console && con != exclusive_console)
-			continue;
-		if ((con->flags & CON_ENABLED) && con->write &&
-				(cpu_online(smp_processor_id()) ||
-				(con->flags & CON_ANYTIME)))
-			con->write(con, &LOG_BUF(start), end - start);
-	}
-}
-
 static bool __read_mostly ignore_loglevel;
 
 static int __init ignore_loglevel_setup(char *str)
@@ -540,142 +1091,33 @@ MODULE_PARM_DESC(ignore_loglevel, "ignor
 	"print all kernel messages to the console.");
 
 /*
- * Write out chars from start to end - 1 inclusive
- */
-static void _call_console_drivers(unsigned start,
-				unsigned end, int msg_log_level)
-{
-	trace_console(&LOG_BUF(0), start, end, log_buf_len);
-
-	if ((msg_log_level < console_loglevel || ignore_loglevel) &&
-			console_drivers && start != end) {
-		if ((start & LOG_BUF_MASK) > (end & LOG_BUF_MASK)) {
-			/* wrapped write */
-			__call_console_drivers(start & LOG_BUF_MASK,
-						log_buf_len);
-			__call_console_drivers(0, end & LOG_BUF_MASK);
-		} else {
-			__call_console_drivers(start, end);
-		}
-	}
-}
-
-/*
- * Parse the syslog header <[0-9]*>. The decimal value represents 32bit, the
- * lower 3 bit are the log level, the rest are the log facility. In case
- * userspace passes usual userspace syslog messages to /dev/kmsg or
- * /dev/ttyprintk, the log prefix might contain the facility. Printk needs
- * to extract the correct log level for in-kernel processing, and not mangle
- * the original value.
- *
- * If a prefix is found, the length of the prefix is returned. If 'level' is
- * passed, it will be filled in with the log level without a possible facility
- * value. If 'special' is passed, the special printk prefix chars are accepted
- * and returned. If no valid header is found, 0 is returned and the passed
- * variables are not touched.
- */
-static size_t log_prefix(const char *p, unsigned int *level, char *special)
-{
-	unsigned int lev = 0;
-	char sp = '\0';
-	size_t len;
-
-	if (p[0] != '<' || !p[1])
-		return 0;
-	if (p[2] == '>') {
-		/* usual single digit level number or special char */
-		switch (p[1]) {
-		case '0' ... '7':
-			lev = p[1] - '0';
-			break;
-		case 'c': /* KERN_CONT */
-		case 'd': /* KERN_DEFAULT */
-			sp = p[1];
-			break;
-		default:
-			return 0;
-		}
-		len = 3;
-	} else {
-		/* multi digit including the level and facility number */
-		char *endp = NULL;
-
-		lev = (simple_strtoul(&p[1], &endp, 10) & 7);
-		if (endp == NULL || endp[0] != '>')
-			return 0;
-		len = (endp + 1) - p;
-	}
-
-	/* do not accept special char if not asked for */
-	if (sp && !special)
-		return 0;
-
-	if (special) {
-		*special = sp;
-		/* return special char, do not touch level */
-		if (sp)
-			return len;
-	}
-
-	if (level)
-		*level = lev;
-	return len;
-}
-
-/*
  * Call the console drivers, asking them to write out
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
  */
-static void call_console_drivers(unsigned start, unsigned end)
+static void call_console_drivers(int level, const char *text, size_t len)
 {
-	unsigned cur_index, start_print;
-	static int msg_level = -1;
+	struct console *con;
 
-	BUG_ON(((int)(start - end)) > 0);
+	trace_console(text, 0, len, len);
 
-	cur_index = start;
-	start_print = start;
-	while (cur_index != end) {
-		if (msg_level < 0 && ((end - cur_index) > 2)) {
-			/* strip log prefix */
-			cur_index += log_prefix(&LOG_BUF(cur_index), &msg_level, NULL);
-			start_print = cur_index;
-		}
-		while (cur_index != end) {
-			char c = LOG_BUF(cur_index);
-
-			cur_index++;
-			if (c == '\n') {
-				if (msg_level < 0) {
-					/*
-					 * printk() has already given us loglevel tags in
-					 * the buffer.  This code is here in case the
-					 * log buffer has wrapped right round and scribbled
-					 * on those tags
-					 */
-					msg_level = default_message_loglevel;
-				}
-				_call_console_drivers(start_print, cur_index, msg_level);
-				msg_level = -1;
-				start_print = cur_index;
-				break;
-			}
-		}
-	}
-	_call_console_drivers(start_print, end, msg_level);
-}
+	if (level >= console_loglevel && !ignore_loglevel)
+		return;
+	if (!console_drivers)
+		return;
 
-static void emit_log_char(char c)
-{
-	LOG_BUF(log_end) = c;
-	log_end++;
-	if (log_end - log_start > log_buf_len)
-		log_start = log_end - log_buf_len;
-	if (log_end - con_start > log_buf_len)
-		con_start = log_end - log_buf_len;
-	if (logged_chars < log_buf_len)
-		logged_chars++;
+	for_each_console(con) {
+		if (exclusive_console && con != exclusive_console)
+			continue;
+		if (!(con->flags & CON_ENABLED))
+			continue;
+		if (!con->write)
+			continue;
+		if (!cpu_online(smp_processor_id()) &&
+		    !(con->flags & CON_ANYTIME))
+			continue;
+		con->write(con, text, len);
+	}
 }
 
 /*
@@ -700,16 +1142,6 @@ static void zap_locks(void)
 	sema_init(&console_sem, 1);
 }
 
-#if defined(CONFIG_PRINTK_TIME)
-static bool printk_time = 1;
-#else
-static bool printk_time = 0;
-#endif
-module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
-
-static bool always_kmsg_dump;
-module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
-
 /* Check if we have any console registered that can be called early in boot. */
 static int have_callable_console(void)
 {
@@ -722,51 +1154,6 @@ static int have_callable_console(void)
 	return 0;
 }
 
-/**
- * printk - print a kernel message
- * @fmt: format string
- *
- * This is printk().  It can be called from any context.  We want it to work.
- *
- * We try to grab the console_lock.  If we succeed, it's easy - we log the output and
- * call the console drivers.  If we fail to get the semaphore we place the output
- * into the log buffer and return.  The current holder of the console_sem will
- * notice the new output in console_unlock(); and will send it to the
- * consoles before releasing the lock.
- *
- * One effect of this deferred printing is that code which calls printk() and
- * then changes console_loglevel may break. This is because console_loglevel
- * is inspected when the actual printing occurs.
- *
- * See also:
- * printf(3)
- *
- * See the vsnprintf() documentation for format string extensions over C99.
- */
-
-asmlinkage int printk(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-#ifdef CONFIG_KGDB_KDB
-	if (unlikely(kdb_trap_printk)) {
-		va_start(args, fmt);
-		r = vkdb_printf(fmt, args);
-		va_end(args);
-		return r;
-	}
-#endif
-	va_start(args, fmt);
-	r = vprintk(fmt, args);
-	va_end(args);
-
-	return r;
-}
-
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int printk_cpu = UINT_MAX;
-
 /*
  * Can we actually use the console at this time on this cpu?
  *
@@ -810,17 +1197,12 @@ static int console_trylock_for_printk(un
 			retval = 0;
 		}
 	}
-	printk_cpu = UINT_MAX;
+	logbuf_cpu = UINT_MAX;
 	if (wake)
 		up(&console_sem);
 	raw_spin_unlock(&logbuf_lock);
 	return retval;
 }
-static const char recursion_bug_msg [] =
-		KERN_CRIT "BUG: recent printk recursion!\n";
-static int recursion_bug;
-static int new_text_line = 1;
-static char printk_buf[1024];
 
 int printk_delay_msec __read_mostly;
 
@@ -836,15 +1218,22 @@ static inline void printk_delay(void)
 	}
 }
 
-asmlinkage int vprintk(const char *fmt, va_list args)
-{
-	int printed_len = 0;
-	int current_log_level = default_message_loglevel;
+asmlinkage int vprintk_emit(int facility, int level,
+			    const char *dict, size_t dictlen,
+			    const char *fmt, va_list args)
+{
+	static int recursion_bug;
+	static char buf[LOG_LINE_MAX];
+	static size_t buflen;
+	static int buflevel;
+	static char textbuf[LOG_LINE_MAX];
+	char *text = textbuf;
+	size_t textlen;
 	unsigned long flags;
 	int this_cpu;
-	char *p;
-	size_t plen;
-	char special;
+	bool newline = false;
+	bool cont = false;
+	int printed_len = 0;
 
 	boot_delay_msec();
 	printk_delay();
@@ -856,7 +1245,7 @@ asmlinkage int vprintk(const char *fmt,
 	/*
 	 * Ouch, printk recursed into itself!
 	 */
-	if (unlikely(printk_cpu == this_cpu)) {
+	if (unlikely(logbuf_cpu == this_cpu)) {
 		/*
 		 * If a crash is occurring during printk() on this CPU,
 		 * then try to get the crash message out but make sure
@@ -873,97 +1262,92 @@ asmlinkage int vprintk(const char *fmt,
 
 	lockdep_off();
 	raw_spin_lock(&logbuf_lock);
-	printk_cpu = this_cpu;
+	logbuf_cpu = this_cpu;
 
 	if (recursion_bug) {
+		static const char recursion_msg[] =
+			"BUG: recent printk recursion!";
+
 		recursion_bug = 0;
-		strcpy(printk_buf, recursion_bug_msg);
-		printed_len = strlen(recursion_bug_msg);
-	}
-	/* Emit the output into the temporary buffer */
-	printed_len += vscnprintf(printk_buf + printed_len,
-				  sizeof(printk_buf) - printed_len, fmt, args);
-
-	p = printk_buf;
-
-	/* Read log level and handle special printk prefix */
-	plen = log_prefix(p, &current_log_level, &special);
-	if (plen) {
-		p += plen;
-
-		switch (special) {
-		case 'c': /* Strip <c> KERN_CONT, continue line */
-			plen = 0;
-			break;
-		case 'd': /* Strip <d> KERN_DEFAULT, start new line */
-			plen = 0;
-		default:
-			if (!new_text_line) {
-				emit_log_char('\n');
-				new_text_line = 1;
-			}
-		}
+		printed_len += strlen(recursion_msg);
+		/* emit KERN_CRIT message */
+		log_store(0, 2, NULL, 0, recursion_msg, printed_len);
 	}
 
 	/*
-	 * Copy the output into log_buf. If the caller didn't provide
-	 * the appropriate log prefix, we insert them here
+	 * The printf needs to come first; we need the syslog
+	 * prefix which might be passed-in as a parameter.
 	 */
-	for (; *p; p++) {
-		if (new_text_line) {
-			new_text_line = 0;
-
-			if (plen) {
-				/* Copy original log prefix */
-				int i;
-
-				for (i = 0; i < plen; i++)
-					emit_log_char(printk_buf[i]);
-				printed_len += plen;
-			} else {
-				/* Add log prefix */
-				emit_log_char('<');
-				emit_log_char(current_log_level + '0');
-				emit_log_char('>');
-				printed_len += 3;
-			}
+	textlen = vscnprintf(text, sizeof(textbuf), fmt, args);
 
-			if (printk_time) {
-				/* Add the current time stamp */
-				char tbuf[50], *tp;
-				unsigned tlen;
-				unsigned long long t;
-				unsigned long nanosec_rem;
-
-				t = cpu_clock(printk_cpu);
-				nanosec_rem = do_div(t, 1000000000);
-				tlen = sprintf(tbuf, "[%5lu.%06lu] ",
-						(unsigned long) t,
-						nanosec_rem / 1000);
-
-				for (tp = tbuf; tp < tbuf + tlen; tp++)
-					emit_log_char(*tp);
-				printed_len += tlen;
-			}
+	/* mark and strip a trailing newline */
+	if (textlen && text[textlen-1] == '\n') {
+		textlen--;
+		newline = true;
+	}
 
-			if (!*p)
-				break;
+	/* strip syslog prefix and extract log level or flags */
+	if (text[0] == '<' && text[1] && text[2] == '>') {
+		switch (text[1]) {
+		case '0' ... '7':
+			if (level == -1)
+				level = text[1] - '0';
+			text += 3;
+			textlen -= 3;
+			break;
+		case 'c':	/* KERN_CONT */
+			cont = true;
+		case 'd':	/* KERN_DEFAULT */
+			text += 3;
+			textlen -= 3;
+			break;
 		}
+	}
 
-		emit_log_char(*p);
-		if (*p == '\n')
-			new_text_line = 1;
+	if (buflen && (!cont || dict)) {
+		/* no continuation; flush existing buffer */
+		log_store(facility, buflevel, NULL, 0, buf, buflen);
+		printed_len += buflen;
+		buflen = 0;
+	}
+
+	if (buflen == 0) {
+		/* remember level for first message in the buffer */
+		if (level == -1)
+			buflevel = default_message_loglevel;
+		else
+			buflevel = level;
+	}
+
+	if (buflen || !newline) {
+		/* append to existing buffer, or buffer until next message */
+		if (buflen + textlen > sizeof(buf))
+			textlen = sizeof(buf) - buflen;
+		memcpy(buf + buflen, text, textlen);
+		buflen += textlen;
+	}
+
+	if (newline) {
+		/* end of line; flush buffer */
+		if (buflen) {
+			log_store(facility, buflevel,
+				  dict, dictlen, buf, buflen);
+			printed_len += buflen;
+			buflen = 0;
+		} else {
+			log_store(facility, buflevel,
+				  dict, dictlen, text, textlen);
+			printed_len += textlen;
+		}
 	}
 
 	/*
-	 * Try to acquire and then immediately release the
-	 * console semaphore. The release will do all the
-	 * actual magic (print out buffers, wake up klogd,
-	 * etc). 
+	 * Try to acquire and then immediately release the console semaphore.
+	 * The release will print out buffers and wake up /dev/kmsg and syslog()
+	 * users.
 	 *
-	 * The console_trylock_for_printk() function
-	 * will release 'logbuf_lock' regardless of whether it
-	 * actually gets the semaphore or not.
+	 * The console_trylock_for_printk() function will release 'logbuf_lock'
+	 * regardless of whether it actually gets the console semaphore or not.
 	 */
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
@@ -974,12 +1358,73 @@ out_restore_irqs:
 
 	return printed_len;
 }
-EXPORT_SYMBOL(printk);
+EXPORT_SYMBOL(vprintk_emit);
+
+asmlinkage int vprintk(const char *fmt, va_list args)
+{
+	return vprintk_emit(0, -1, NULL, 0, fmt, args);
+}
 EXPORT_SYMBOL(vprintk);
 
+asmlinkage int printk_emit(int facility, int level,
+			   const char *dict, size_t dictlen,
+			   const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(facility, level, dict, dictlen, fmt, args);
+	va_end(args);
+
+	return r;
+}
+EXPORT_SYMBOL(printk_emit);
+
+/**
+ * printk - print a kernel message
+ * @fmt: format string
+ *
+ * This is printk(). It can be called from any context. We want it to work.
+ *
+ * We try to grab the console_lock. If we succeed, it's easy - we log the
+ * output and call the console drivers.  If we fail to get the semaphore, we
+ * place the output into the log buffer and return. The current holder of
+ * the console_sem will notice the new output in console_unlock(); and will
+ * send it to the consoles before releasing the lock.
+ *
+ * One effect of this deferred printing is that code which calls printk() and
+ * then changes console_loglevel may break. This is because console_loglevel
+ * is inspected when the actual printing occurs.
+ *
+ * See also:
+ * printf(3)
+ *
+ * See the vsnprintf() documentation for format string extensions over C99.
+ */
+asmlinkage int printk(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+#ifdef CONFIG_KGDB_KDB
+	if (unlikely(kdb_trap_printk)) {
+		va_start(args, fmt);
+		r = vkdb_printf(fmt, args);
+		va_end(args);
+		return r;
+	}
+#endif
+	va_start(args, fmt);
+	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+EXPORT_SYMBOL(printk);
 #else
 
-static void call_console_drivers(unsigned start, unsigned end)
+static void call_console_drivers(int level, const char *text, size_t len)
 {
 }
 
@@ -1217,7 +1662,7 @@ int is_console_locked(void)
 }
 
 /*
- * Delayed printk facility, for scheduler-internal messages:
+ * Delayed printk version, for scheduler-internal messages:
  */
 #define PRINTK_BUF_SIZE		512
 
@@ -1253,6 +1698,10 @@ void wake_up_klogd(void)
 		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
 }
 
+/* the next printk record to write to the console */
+static u64 console_seq;
+static u32 console_idx;
+
 /**
  * console_unlock - unlock the console system
  *
@@ -1263,15 +1712,16 @@ void wake_up_klogd(void)
  * by printk().  If this is the case, console_unlock(); emits
  * the output prior to releasing the lock.
  *
- * If there is output waiting for klogd, we wake it up.
+ * If there is output waiting, we wake it /dev/kmsg and syslog() users.
  *
  * console_unlock(); may be called from any context.
  */
 void console_unlock(void)
 {
+	static u64 seen_seq;
 	unsigned long flags;
-	unsigned _con_start, _log_end;
-	unsigned wake_klogd = 0, retry = 0;
+	bool wake_klogd = false;
+	bool retry;
 
 	if (console_suspended) {
 		up(&console_sem);
@@ -1281,17 +1731,41 @@ void console_unlock(void)
 	console_may_schedule = 0;
 
 again:
-	for ( ; ; ) {
+	for (;;) {
+		struct log *msg;
+		static char text[LOG_LINE_MAX];
+		size_t len;
+		int level;
+
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		wake_klogd |= log_start - log_end;
-		if (con_start == log_end)
-			break;			/* Nothing to print */
-		_con_start = con_start;
-		_log_end = log_end;
-		con_start = log_end;		/* Flush */
+		if (seen_seq != log_next_seq) {
+			wake_klogd = true;
+			seen_seq = log_next_seq;
+		}
+
+		if (console_seq < log_first_seq) {
+			/* messages are gone, move to first one */
+			console_seq = log_first_seq;
+			console_idx = log_first_idx;
+		}
+
+		if (console_seq == log_next_seq)
+			break;
+
+		msg = log_from_idx(console_idx);
+		level = msg->level & 7;
+		len = msg->text_len;
+		if (len+1 >= sizeof(text))
+			len = sizeof(text)-1;
+		memcpy(text, log_text(msg), len);
+		text[len++] = '\n';
+
+		console_idx = log_next(console_idx);
+		console_seq++;
 		raw_spin_unlock(&logbuf_lock);
+
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(_con_start, _log_end);
+		call_console_drivers(level, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
 	}
@@ -1312,8 +1786,7 @@ again:
 	 * flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
-	if (con_start != log_end)
-		retry = 1;
+	retry = console_seq != log_next_seq;
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	if (retry && console_trylock())
@@ -1549,7 +2022,8 @@ void register_console(struct console *ne
 		 * for us.
 		 */
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		con_start = log_start;
+		console_seq = syslog_seq;
+		console_idx = syslog_idx;
 		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 		/*
 		 * We're about to replay the log buffer.  Only do this to the
@@ -1758,6 +2232,9 @@ int kmsg_dump_unregister(struct kmsg_dum
 }
 EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
 
+static bool always_kmsg_dump;
+module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
+
 /**
  * kmsg_dump - dump kernel log to kernel message dumpers.
  * @reason: the reason (oops, panic etc) for dumping
@@ -1767,8 +2244,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
  */
 void kmsg_dump(enum kmsg_dump_reason reason)
 {
-	unsigned long end;
-	unsigned chars;
+	u64 idx;
 	struct kmsg_dumper *dumper;
 	const char *s1, *s2;
 	unsigned long l1, l2;
@@ -1780,24 +2256,27 @@ void kmsg_dump(enum kmsg_dump_reason rea
 	/* Theoretically, the log could move on after we do this, but
 	   there's not a lot we can do about that. The new messages
 	   will overwrite the start of what we dump. */
-	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	end = log_end & LOG_BUF_MASK;
-	chars = logged_chars;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (chars > end) {
-		s1 = log_buf + log_buf_len - chars + end;
-		l1 = chars - end;
+	raw_spin_lock_irqsave(&logbuf_lock, flags);
+	if (syslog_seq < log_first_seq)
+		idx = syslog_idx;
+	else
+		idx = log_first_idx;
+
+	if (idx > log_next_idx) {
+		s1 = log_buf;
+		l1 = log_next_idx;
 
-		s2 = log_buf;
-		l2 = end;
+		s2 = log_buf + idx;
+		l2 = log_buf_len - idx;
 	} else {
 		s1 = "";
 		l1 = 0;
 
-		s2 = log_buf + end - chars;
-		l2 = chars;
+		s2 = log_buf + idx;
+		l2 = log_next_idx - idx;
 	}
+	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(dumper, &dump_list, list)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05 19:47   ` Kay Sievers
@ 2012-04-06  1:12     ` Joe Perches
  2012-04-06  1:31       ` Linus Torvalds
  2012-04-11 11:39     ` Eric W. Biederman
  1 sibling, 1 reply; 34+ messages in thread
From: Joe Perches @ 2012-04-06  1:12 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartmann

On Thu, 2012-04-05 at 21:47 +0200, Kay Sievers wrote:

> Better?
> 
>   [root@mop ~]# cat /dev/kmsg
[]
>   6,5,0; BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
>   6,6,0; BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
>   6 ,7,0; BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)

Not really, why the space between the 6 and the comma?

> This patch extends printk() to be able to attach arbitrary key/value
> pairs to logged messages,

Bad commit changelog.

It does not extend printk().  printk() is a function call.

It extends the message logging subsystem to include the
printk_emit() and vprintk_emit() functions, which can
add decidedly not arbitrary key/value pairs, but some
additional subsystem specific information.

> - Output of dev_printk() is reliably machine-readable now.

No, it's not.

An output header simply contains additional
structured information.  The message portion
can still be arbitrary gobbledygook.

I still think this should be 3 patches not 1.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-06  1:12     ` Joe Perches
@ 2012-04-06  1:31       ` Linus Torvalds
  2012-04-06  3:43         ` Joe Perches
  2012-04-06 18:35         ` Kay Sievers
  0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-04-06  1:31 UTC (permalink / raw)
  To: Joe Perches; +Cc: Kay Sievers, linux-kernel, Greg Kroah-Hartmann

On Thu, Apr 5, 2012 at 6:12 PM, Joe Perches <joe@perches.com> wrote:
>
> I still think this should be 3 patches not 1.

I do agree that that would be lovely. That patch is really hard to
read, and it would be helpful if some of it was split up a bit. For
example, separating out the "move log buffer to be packetized"
(without actually doing the new things) from for the "introduce new
interfaces" would be quite nice.

In fact, I think I could merge something that does the packetization
early, if it is fairly obvious that it doesn't do anything else. Just
making the dmesg ASCII timestamp be something that is a "read buffer"
time thing rather than "generate printk" time thing would be nice, and
I think the sequence numbers can be useful even with all the old
interfaces.

So if somebody wants to help with things like that, and try to turn
this into a more gradual thing, I think we could merge at least some
initial parts of this patch even for 3.4.

Anybody interested?

                  Linus

PS looking some more at the patch, it has some more obvious insanities
in it. Doing "SUBSYSTEM=(NULL subsystem)" is clearly just insane. If
you don't have a subsystem, you shouldn't have that key/value pair -
saying you have a NULL subsystem is just crazy. Details like this
would possibly stand out a bit more if it wasn't one big patch.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-06  1:31       ` Linus Torvalds
@ 2012-04-06  3:43         ` Joe Perches
  2012-04-06 18:35         ` Kay Sievers
  1 sibling, 0 replies; 34+ messages in thread
From: Joe Perches @ 2012-04-06  3:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kay Sievers, linux-kernel, Greg Kroah-Hartmann

On Thu, 2012-04-05 at 18:31 -0700, Linus Torvalds wrote:
> On Thu, Apr 5, 2012 at 6:12 PM, Joe Perches <joe@perches.com> wrote:
> >
> > I still think this should be 3 patches not 1.
> 
> I do agree that that would be lovely. That patch is really hard to
> read, and it would be helpful if some of it was split up a bit. For
> example, separating out the "move log buffer to be packetized"
> (without actually doing the new things) from for the "introduce new
> interfaces" would be quite nice.
> 
> In fact, I think I could merge something that does the packetization
> early, if it is fairly obvious that it doesn't do anything else. Just
> making the dmesg ASCII timestamp be something that is a "read buffer"
> time thing rather than "generate printk" time thing would be nice, and
> I think the sequence numbers can be useful even with all the old
> interfaces.
> 
> So if somebody wants to help with things like that, and try to turn
> this into a more gradual thing, I think we could merge at least some
> initial parts of this patch even for 3.4.
> 
> Anybody interested?

I think it'll be a lot more sensible to
integrate this into a more granular printk
subsystem so I've just done a move and
split up of the printk bits.

kernel/printk.c -> kernel/printk/printk.c
and a breakup of printk.c into 4 parts

kernel/printk/log_buf.[ch]
kernel/printk/console.[ch]
kernel/printk/kmsg.[ch]
kernek/printk/printk.c

I think that's a lot more readable/extensible
and will make adding a new printk_emit and
devkmsg a bit simpler to understand as well.

Maybe it'd be useful to then add some compression
to the log_buf message string bits.

I'll submit that move/separate tomorrow but not
any compression after a few more compilation tests
w/ & w/o the appropriate CONFIG_ options.

Kay would have to respin though.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-06  1:31       ` Linus Torvalds
  2012-04-06  3:43         ` Joe Perches
@ 2012-04-06 18:35         ` Kay Sievers
  2012-04-08  0:47           ` Joe Perches
  2012-04-08  1:02           ` Joe Perches
  1 sibling, 2 replies; 34+ messages in thread
From: Kay Sievers @ 2012-04-06 18:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joe Perches, linux-kernel, Greg Kroah-Hartmann

On Thu, 2012-04-05 at 18:31 -0700, Linus Torvalds wrote:
> On Thu, Apr 5, 2012 at 6:12 PM, Joe Perches <joe@perches.com> wrote:
> >
> > I still think this should be 3 patches not 1.
> 
> I do agree that that would be lovely. That patch is really hard to
> read, and it would be helpful if some of it was split up a bit. For
> example, separating out the "move log buffer to be packetized"
> (without actually doing the new things) from for the "introduce new
> interfaces" would be quite nice.
> 
> In fact, I think I could merge something that does the packetization
> early, if it is fairly obvious that it doesn't do anything else. Just
> making the dmesg ASCII timestamp be something that is a "read buffer"
> time thing rather than "generate printk" time thing would be nice, and
> I think the sequence numbers can be useful even with all the old
> interfaces.
> 
> So if somebody wants to help with things like that, and try to turn
> this into a more gradual thing, I think we could merge at least some
> initial parts of this patch even for 3.4.

Here is only the conversion of the byte- to the record-buffer split out
of the big patch.

Unfortunately, it's still pretty big, and the patch not easy to read;
the code itself looks better than the patch. At this moment I have no
convincing idea how to split that into more pieces.

Note: The output of a few continuation lines look wrong with the patch
applied. There are two pending patches to add a couple of missing
KERN_CONT prefixes for these users.

The crash dump symbol exports, and the printk trace point needs to be
checked by people who know how they expect from them, and to adapt to
the new buffer type; and if they can possibly make use of the new
seqnums to find out if the data behind their pointers which they pass
along is still valid when the buffer gets filled behind their back.

Kay


From: Kay Sievers <kay@vrfy.org>
Subject: printk: convert byte-buffer to variable-length record buffer

- Record-based stream instead of the traditional byte stream
  buffer. All records carry a 64 bit timestamp, the syslog facility
  and priority in the record header.

- Records consume almost the same amount, sometimes less memory than
  the traditional byte stream buffer (if printk_time is enabled). The record
  header is 16 bytes long, plus some padding bytes at the end if needed.
  The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
  the timestamp and a newline.

- Buffer management is based on message sequence numbers. When records
  need to be discarded, the reading heads move on to the next full
  record. Unlike the byte-stream buffer, no old logged lines get
  truncated or partly overwritten by new ones. Sequence numbers also
  allow consumers of the log stream to get notified if any message in
  the stream they are about to read gets discarded during the time
  of reading.

- Better buffered IO support for KERN_CONT continuation lines, when printk()
  is called multiple times for a single line. The use of KERN_CONT is now
  mandatory to use continuation; a few places in the kernel need trivial fixes
  here. The buffering could possibly be extended to per-cpu variables to allow
  better thread-safety for multiple printk() invocations for a single line.

- Full-featured syslog facility value support. Different facilities
  can tag their messages. All userspace-injected messages enforce a
  facility value > 0 now, to be able to reliably distinguish them from
  the kernel-generated messages. Independent subsystems like a
  baseband processor running its own firmware, or a kernel-related
  userspace process can use their own unique facility values. Multiple
  independent log streams can co-exist that way in the same
  buffer. All share the same global sequence number counter to ensure
  proper ordering (and interleaving) and to allow the consumers of the
  log to reliably correlate the events from different facilities.

Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
---
 drivers/char/mem.c     |   55 +-
 include/linux/printk.h |   11 
 kernel/printk.c        | 1020 ++++++++++++++++++++++++++++---------------------
 3 files changed, 642 insertions(+), 444 deletions(-)

--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -810,33 +810,54 @@ static const struct file_operations oldm
 static ssize_t kmsg_writev(struct kiocb *iocb, const struct iovec *iv,
 			   unsigned long count, loff_t pos)
 {
-	char *line, *p;
+	char *buf, *line;
 	int i;
-	ssize_t ret = -EFAULT;
+	int level = default_message_loglevel;
+	int facility = 1;	/* LOG_USER */
 	size_t len = iov_length(iv, count);
+	ssize_t ret = len;
 
-	line = kmalloc(len + 1, GFP_KERNEL);
-	if (line == NULL)
+	if (len > 1024)
+		return -EINVAL;
+	buf = kmalloc(len+1, GFP_KERNEL);
+	if (buf == NULL)
 		return -ENOMEM;
 
-	/*
-	 * copy all vectors into a single string, to ensure we do
-	 * not interleave our log line with other printk calls
-	 */
-	p = line;
+	line = buf;
 	for (i = 0; i < count; i++) {
-		if (copy_from_user(p, iv[i].iov_base, iv[i].iov_len))
+		if (copy_from_user(line, iv[i].iov_base, iv[i].iov_len))
 			goto out;
-		p += iv[i].iov_len;
+		line += iv[i].iov_len;
+	}
+
+	/*
+	 * Extract and skip the syslog prefix <[0-9]*>. Coming from userspace
+	 * the decimal value represents 32bit, the lower 3 bit are the log
+	 * level, the rest are the log facility.
+	 *
+	 * If no prefix or no userspace facility is specified, we
+	 * enforce LOG_USER, to be able to reliably distinguish
+	 * kernel-generated messages from userspace-injected ones.
+	 */
+	line = buf;
+	if (line[0] == '<') {
+		char *endp = NULL;
+
+		i = simple_strtoul(line+1, &endp, 10);
+		if (endp && endp[0] == '>') {
+			level = i & 7;
+			if (i >> 3)
+				facility = i >> 3;
+			endp++;
+			len -= endp - line;
+			line = endp;
+		}
 	}
-	p[0] = '\0';
+	line[len] = '\0';
 
-	ret = printk("%s", line);
-	/* printk can add a prefix */
-	if (ret > len)
-		ret = len;
+	printk_emit(facility, level, NULL, 0, "%s", line);
 out:
-	kfree(line);
+	kfree(buf);
 	return ret;
 }
 
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -95,8 +95,19 @@ extern int printk_needs_cpu(int cpu);
 extern void printk_tick(void);
 
 #ifdef CONFIG_PRINTK
+asmlinkage __printf(5, 0)
+int vprintk_emit(int facility, int level,
+		 const char *dict, size_t dictlen,
+		 const char *fmt, va_list args);
+
 asmlinkage __printf(1, 0)
 int vprintk(const char *fmt, va_list args);
+
+asmlinkage __printf(5, 6) __cold
+asmlinkage int printk_emit(int facility, int level,
+			   const char *dict, size_t dictlen,
+			   const char *fmt, ...);
+
 asmlinkage __printf(1, 2) __cold
 int printk(const char *fmt, ...);
 
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -54,8 +54,6 @@ void asmlinkage __attribute__((weak)) ea
 {
 }
 
-#define __LOG_BUF_LEN	(1 << CONFIG_LOG_BUF_SHIFT)
-
 /* printk's without a loglevel use this.. */
 #define DEFAULT_MESSAGE_LOGLEVEL CONFIG_DEFAULT_MESSAGE_LOGLEVEL
 
@@ -99,24 +97,6 @@ EXPORT_SYMBOL_GPL(console_drivers);
 static int console_locked, console_suspended;
 
 /*
- * logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars
- * It is also used in interesting ways to provide interlocking in
- * console_unlock();.
- */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-#define LOG_BUF_MASK (log_buf_len-1)
-#define LOG_BUF(idx) (log_buf[(idx) & LOG_BUF_MASK])
-
-/*
- * The indices into log_buf are not constrained to log_buf_len - they
- * must be masked before subscripting
- */
-static unsigned log_start;	/* Index into log_buf: next char to be read by syslog() */
-static unsigned con_start;	/* Index into log_buf: next char to be sent to consoles */
-static unsigned log_end;	/* Index into log_buf: most-recently-written-char + 1 */
-
-/*
  * If exclusive_console is non-NULL then only this console is to be printed to.
  */
 static struct console *exclusive_console;
@@ -146,12 +126,176 @@ EXPORT_SYMBOL(console_set_on_cmdline);
 static int console_may_schedule;
 
 #ifdef CONFIG_PRINTK
+/*
+ * The printk log buffer consists of a chain of concatenated variable
+ * length records. Every record starts with a record header, containing
+ * the overall length of the record.
+ *
+ * The heads to the first and last entry in the buffer, as well as the
+ * sequence numbers of these both entries are maintained when messages
+ * are stored..
+ *
+ * If the heads indicate available messages, the length in the header
+ * tells the start next message. A length == 0 for the next message
+ * indicates a wrap-around to the beginning of the buffer.
+ *
+ * Every record carries the monotonic timestamp in microseconds, as well as
+ * the standard userspace syslog level and syslog facility. The usual
+ * kernel messages use LOG_KERN; userspace-injected messages always carry
+ * a matching syslog facility, by default LOG_USER. The origin of every
+ * message can be reliably determined that way.
+ *
+ * The human readable log message directly follows the message header. The
+ * length of the message text is stored in the header, the stored message
+ * is not terminated.
+ *
+ */
+
+struct log {
+	u64 ts_nsec;		/* timestamp in nanoseconds */
+	u16 len;		/* length of entire record */
+	u16 text_len;		/* length of text buffer */
+	u16 dict_len;		/* length of dictionary buffer */
+	u16 level;		/* syslog level + facility */
+};
+
+/*
+ * The logbuf_lock protects kmsg buffer, indices, counters. It is also
+ * used in interesting ways to provide interlocking in console_unlock();
+ */
+static DEFINE_RAW_SPINLOCK(logbuf_lock);
+
+/* cpu currently holding logbuf_lock */
+static volatile unsigned int logbuf_cpu = UINT_MAX;
+
+#define LOG_LINE_MAX 1024
 
+/* record buffer */
+#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
 static char __log_buf[__LOG_BUF_LEN];
 static char *log_buf = __log_buf;
-static int log_buf_len = __LOG_BUF_LEN;
-static unsigned logged_chars; /* Number of chars produced since last read+clear operation */
-static int saved_console_loglevel = -1;
+static u32 log_buf_len = __LOG_BUF_LEN;
+
+/* index and sequence number of the first record stored in the buffer */
+static u64 log_first_seq;
+static u32 log_first_idx;
+
+/* index and sequence number of the next record to store in the buffer */
+static u64 log_next_seq;
+static u32 log_next_idx;
+
+/* the next printk record to read after the last 'clear' command */
+static u64 clear_seq;
+static u32 clear_idx;
+
+/* the next printk record to read by syslog(READ) or /proc/kmsg */
+static u64 syslog_seq;
+static u32 syslog_idx;
+
+/* human readable text of the record */
+static char *log_text(const struct log *msg)
+{
+	return (char *)msg + sizeof(struct log);
+}
+
+/* optional key/value pair dictionary attached to the record */
+static char *log_dict(const struct log *msg)
+{
+	return (char *)msg + sizeof(struct log) + msg->text_len;
+}
+
+/* get record by index; idx must point to valid msg */
+static struct log *log_from_idx(u32 idx)
+{
+	struct log *msg = (struct log *)(log_buf + idx);
+
+	/*
+	 * A length == 0 record is the end of buffer marker. Wrap around and
+	 * read the message at the start of the buffer.
+	 */
+	if (!msg->len)
+		return (struct log *)log_buf;
+	return msg;
+}
+
+/* get next record; idx must point to valid msg */
+static u32 log_next(u32 idx)
+{
+	struct log *msg = (struct log *)(log_buf + idx);
+
+	/* length == 0 indicates the end of the buffer; wrap */
+	/*
+	 * A length == 0 record is the end of buffer marker. Wrap around and
+	 * read the message at the start of the buffer as *this* one, and
+	 * return the one after that.
+	 */
+	if (!msg->len) {
+		msg = (struct log *)log_buf;
+		return msg->len;
+	}
+	return idx + msg->len;
+}
+
+#if !defined(CONFIG_64BIT) || defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+#define LOG_ALIGN 4
+#else
+#define LOG_ALIGN 8
+#endif
+
+/* insert record into the buffer, discard old ones, update heads */
+static void log_store(int facility, int level,
+		      const char *dict, u16 dict_len,
+		      const char *text, u16 text_len)
+{
+	struct log *msg;
+	u32 size, pad_len;
+
+	/* number of '\0' padding bytes to next message */
+	size = sizeof(struct log) + text_len + dict_len;
+	pad_len = (-size) & (LOG_ALIGN - 1);
+	size += pad_len;
+
+	while (log_first_seq < log_next_seq) {
+		u32 free;
+
+		if (log_next_idx > log_first_idx)
+			free = max(log_buf_len - log_next_idx, log_first_idx);
+		else
+			free = log_first_idx - log_next_idx;
+
+		if (free > size + sizeof(struct log))
+			break;
+
+		/* drop old messages until we have enough contiuous space */
+		log_first_idx = log_next(log_first_idx);
+		log_first_seq++;
+	}
+
+	if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
+		/*
+		 * This message + an additional empty header does not fit
+		 * at the end of the buffer. Add an empty header with len == 0
+		 * to signify a wrap around.
+		 */
+		memset(log_buf + log_next_idx, 0, sizeof(struct log));
+		log_next_idx = 0;
+	}
+
+	/* fill message */
+	msg = (struct log *)(log_buf + log_next_idx);
+	memcpy(log_text(msg), text, text_len);
+	msg->text_len = text_len;
+	memcpy(log_dict(msg), dict, dict_len);
+	msg->dict_len = dict_len;
+	msg->level = (facility << 3) | (level & 7);
+	msg->ts_nsec = local_clock();
+	memset(log_dict(msg) + dict_len, 0, pad_len);
+	msg->len = sizeof(struct log) + text_len + dict_len + pad_len;
+
+	/* insert message */
+	log_next_idx += msg->len;
+	log_next_seq++;
+}
 
 #ifdef CONFIG_KEXEC
 /*
@@ -165,9 +309,9 @@ static int saved_console_loglevel = -1;
 void log_buf_kexec_setup(void)
 {
 	VMCOREINFO_SYMBOL(log_buf);
-	VMCOREINFO_SYMBOL(log_end);
 	VMCOREINFO_SYMBOL(log_buf_len);
-	VMCOREINFO_SYMBOL(logged_chars);
+	VMCOREINFO_SYMBOL(log_first_idx);
+	VMCOREINFO_SYMBOL(log_next_idx);
 }
 #endif
 
@@ -191,7 +335,6 @@ early_param("log_buf_len", log_buf_len_s
 void __init setup_log_buf(int early)
 {
 	unsigned long flags;
-	unsigned start, dest_idx, offset;
 	char *new_log_buf;
 	int free;
 
@@ -219,20 +362,8 @@ void __init setup_log_buf(int early)
 	log_buf_len = new_log_buf_len;
 	log_buf = new_log_buf;
 	new_log_buf_len = 0;
-	free = __LOG_BUF_LEN - log_end;
-
-	offset = start = min(con_start, log_start);
-	dest_idx = 0;
-	while (start != log_end) {
-		unsigned log_idx_mask = start & (__LOG_BUF_LEN - 1);
-
-		log_buf[dest_idx] = __log_buf[log_idx_mask];
-		start++;
-		dest_idx++;
-	}
-	log_start -= offset;
-	con_start -= offset;
-	log_end -= offset;
+	free = __LOG_BUF_LEN - log_next_idx;
+	memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	pr_info("log_buf_len: %d\n", log_buf_len);
@@ -332,11 +463,165 @@ static int check_syslog_permissions(int
 	return 0;
 }
 
+#if defined(CONFIG_PRINTK_TIME)
+static bool printk_time = 1;
+#else
+static bool printk_time;
+#endif
+module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
+
+static int syslog_print_line(u32 idx, char *text, size_t size)
+{
+	struct log *msg;
+	size_t len;
+
+	msg = log_from_idx(idx);
+	if (!text) {
+		/* calculate length only */
+		len = 3;
+
+		if (msg->level > 9)
+			len++;
+		if (msg->level > 99)
+			len++;
+
+		if (printk_time)
+			len += 15;
+
+		len += msg->text_len;
+		len++;
+		return len;
+	}
+
+	len = sprintf(text, "<%u>", msg->level);
+
+	if (printk_time) {
+		unsigned long long t = msg->ts_nsec;
+		unsigned long rem_ns = do_div(t, 1000000000);
+
+		len += sprintf(text + len, "[%5lu.%06lu] ",
+				   (unsigned long) t, rem_ns / 1000);
+	}
+
+	if (len + msg->text_len > size)
+		return -EINVAL;
+	memcpy(text + len, log_text(msg), msg->text_len);
+	len += msg->text_len;
+	text[len++] = '\n';
+	return len;
+}
+
+static int syslog_print(char __user *buf, int size)
+{
+	char *text;
+	int len;
+
+	text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+	if (!text)
+		return -ENOMEM;
+
+	raw_spin_lock_irq(&logbuf_lock);
+	if (syslog_seq < log_first_seq) {
+		/* messages are gone, move to first one */
+		syslog_seq = log_first_seq;
+		syslog_idx = log_first_idx;
+	}
+	len = syslog_print_line(syslog_idx, text, LOG_LINE_MAX);
+	syslog_idx = log_next(syslog_idx);
+	syslog_seq++;
+	raw_spin_unlock_irq(&logbuf_lock);
+
+	if (len > 0 && copy_to_user(buf, text, len))
+		len = -EFAULT;
+
+	kfree(text);
+	return len;
+}
+
+static int syslog_print_all(char __user *buf, int size, bool clear)
+{
+	char *text;
+	int len = 0;
+
+	text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+	if (!text)
+		return -ENOMEM;
+
+	raw_spin_lock_irq(&logbuf_lock);
+	if (buf) {
+		u64 next_seq;
+		u64 seq;
+		u32 idx;
+
+		if (clear_seq < log_first_seq) {
+			/* messages are gone, move to first available one */
+			clear_seq = log_first_seq;
+			clear_idx = log_first_idx;
+		}
+
+		/*
+		 * Find first record that fits, including all following records,
+		 * into the user-provided buffer for this dump.
+		*/
+		seq = clear_seq;
+		idx = clear_idx;
+		while (seq < log_next_seq) {
+			len += syslog_print_line(idx, NULL, 0);
+			idx = log_next(idx);
+			seq++;
+		}
+		seq = clear_seq;
+		idx = clear_idx;
+		while (len > size && seq < log_next_seq) {
+			len -= syslog_print_line(idx, NULL, 0);
+			idx = log_next(idx);
+			seq++;
+		}
+
+		/* last message in this dump */
+		next_seq = log_next_seq;
+
+		len = 0;
+		while (len >= 0 && seq < next_seq) {
+			int textlen;
+
+			textlen = syslog_print_line(idx, text, LOG_LINE_MAX);
+			if (textlen < 0) {
+				len = textlen;
+				break;
+			}
+			idx = log_next(idx);
+			seq++;
+
+			raw_spin_unlock_irq(&logbuf_lock);
+			if (copy_to_user(buf + len, text, textlen))
+				len = -EFAULT;
+			else
+				len += textlen;
+			raw_spin_lock_irq(&logbuf_lock);
+
+			if (seq < log_first_seq) {
+				/* messages are gone, move to next one */
+				seq = log_first_seq;
+				idx = log_first_idx;
+			}
+		}
+	}
+
+	if (clear) {
+		clear_seq = log_next_seq;
+		clear_idx = log_next_idx;
+	}
+	raw_spin_unlock_irq(&logbuf_lock);
+
+	kfree(text);
+	return len;
+}
+
 int do_syslog(int type, char __user *buf, int len, bool from_file)
 {
-	unsigned i, j, limit, count;
-	int do_clear = 0;
-	char c;
+	bool clear = false;
+	static int saved_console_loglevel = -1;
 	int error;
 
 	error = check_syslog_permissions(type, from_file);
@@ -364,28 +649,14 @@ int do_syslog(int type, char __user *buf
 			goto out;
 		}
 		error = wait_event_interruptible(log_wait,
-							(log_start - log_end));
+						 syslog_seq != log_next_seq);
 		if (error)
 			goto out;
-		i = 0;
-		raw_spin_lock_irq(&logbuf_lock);
-		while (!error && (log_start != log_end) && i < len) {
-			c = LOG_BUF(log_start);
-			log_start++;
-			raw_spin_unlock_irq(&logbuf_lock);
-			error = __put_user(c,buf);
-			buf++;
-			i++;
-			cond_resched();
-			raw_spin_lock_irq(&logbuf_lock);
-		}
-		raw_spin_unlock_irq(&logbuf_lock);
-		if (!error)
-			error = i;
+		error = syslog_print(buf, len);
 		break;
 	/* Read/clear last kernel messages */
 	case SYSLOG_ACTION_READ_CLEAR:
-		do_clear = 1;
+		clear = true;
 		/* FALL THRU */
 	/* Read last kernel messages */
 	case SYSLOG_ACTION_READ_ALL:
@@ -399,52 +670,11 @@ int do_syslog(int type, char __user *buf
 			error = -EFAULT;
 			goto out;
 		}
-		count = len;
-		if (count > log_buf_len)
-			count = log_buf_len;
-		raw_spin_lock_irq(&logbuf_lock);
-		if (count > logged_chars)
-			count = logged_chars;
-		if (do_clear)
-			logged_chars = 0;
-		limit = log_end;
-		/*
-		 * __put_user() could sleep, and while we sleep
-		 * printk() could overwrite the messages
-		 * we try to copy to user space. Therefore
-		 * the messages are copied in reverse. <manfreds>
-		 */
-		for (i = 0; i < count && !error; i++) {
-			j = limit-1-i;
-			if (j + log_buf_len < log_end)
-				break;
-			c = LOG_BUF(j);
-			raw_spin_unlock_irq(&logbuf_lock);
-			error = __put_user(c,&buf[count-1-i]);
-			cond_resched();
-			raw_spin_lock_irq(&logbuf_lock);
-		}
-		raw_spin_unlock_irq(&logbuf_lock);
-		if (error)
-			break;
-		error = i;
-		if (i != count) {
-			int offset = count-error;
-			/* buffer overflow during copy, correct user buffer. */
-			for (i = 0; i < error; i++) {
-				if (__get_user(c,&buf[i+offset]) ||
-				    __put_user(c,&buf[i])) {
-					error = -EFAULT;
-					break;
-				}
-				cond_resched();
-			}
-		}
+		error = syslog_print_all(buf, len, clear);
 		break;
 	/* Clear ring buffer */
 	case SYSLOG_ACTION_CLEAR:
-		logged_chars = 0;
-		break;
+		syslog_print_all(NULL, 0, true);
 	/* Disable logging to console */
 	case SYSLOG_ACTION_CONSOLE_OFF:
 		if (saved_console_loglevel == -1)
@@ -472,7 +702,33 @@ int do_syslog(int type, char __user *buf
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
-		error = log_end - log_start;
+		raw_spin_lock_irq(&logbuf_lock);
+		if (syslog_seq < log_first_seq) {
+			/* messages are gone, move to first one */
+			syslog_seq = log_first_seq;
+			syslog_idx = log_first_idx;
+		}
+		if (from_file) {
+			/*
+			 * Short-cut for poll(/"proc/kmsg") which simply checks
+			 * for pending data, not the size; return the count of
+			 * records, not the length.
+			 */
+			error = log_next_idx - syslog_idx;
+		} else {
+			u64 seq;
+			u32 idx;
+
+			error = 0;
+			seq = syslog_seq;
+			idx = syslog_idx;
+			while (seq < log_next_seq) {
+				error += syslog_print_line(idx, NULL, 0);
+				idx = log_next(idx);
+				seq++;
+			}
+		}
+		raw_spin_unlock_irq(&logbuf_lock);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -501,29 +757,11 @@ void kdb_syslog_data(char *syslog_data[4
 {
 	syslog_data[0] = log_buf;
 	syslog_data[1] = log_buf + log_buf_len;
-	syslog_data[2] = log_buf + log_end -
-		(logged_chars < log_buf_len ? logged_chars : log_buf_len);
-	syslog_data[3] = log_buf + log_end;
+	syslog_data[2] = log_buf + log_first_idx;
+	syslog_data[3] = log_buf + log_next_idx;
 }
 #endif	/* CONFIG_KGDB_KDB */
 
-/*
- * Call the console drivers on a range of log_buf
- */
-static void __call_console_drivers(unsigned start, unsigned end)
-{
-	struct console *con;
-
-	for_each_console(con) {
-		if (exclusive_console && con != exclusive_console)
-			continue;
-		if ((con->flags & CON_ENABLED) && con->write &&
-				(cpu_online(smp_processor_id()) ||
-				(con->flags & CON_ANYTIME)))
-			con->write(con, &LOG_BUF(start), end - start);
-	}
-}
-
 static bool __read_mostly ignore_loglevel;
 
 static int __init ignore_loglevel_setup(char *str)
@@ -540,142 +778,33 @@ MODULE_PARM_DESC(ignore_loglevel, "ignor
 	"print all kernel messages to the console.");
 
 /*
- * Write out chars from start to end - 1 inclusive
- */
-static void _call_console_drivers(unsigned start,
-				unsigned end, int msg_log_level)
-{
-	trace_console(&LOG_BUF(0), start, end, log_buf_len);
-
-	if ((msg_log_level < console_loglevel || ignore_loglevel) &&
-			console_drivers && start != end) {
-		if ((start & LOG_BUF_MASK) > (end & LOG_BUF_MASK)) {
-			/* wrapped write */
-			__call_console_drivers(start & LOG_BUF_MASK,
-						log_buf_len);
-			__call_console_drivers(0, end & LOG_BUF_MASK);
-		} else {
-			__call_console_drivers(start, end);
-		}
-	}
-}
-
-/*
- * Parse the syslog header <[0-9]*>. The decimal value represents 32bit, the
- * lower 3 bit are the log level, the rest are the log facility. In case
- * userspace passes usual userspace syslog messages to /dev/kmsg or
- * /dev/ttyprintk, the log prefix might contain the facility. Printk needs
- * to extract the correct log level for in-kernel processing, and not mangle
- * the original value.
- *
- * If a prefix is found, the length of the prefix is returned. If 'level' is
- * passed, it will be filled in with the log level without a possible facility
- * value. If 'special' is passed, the special printk prefix chars are accepted
- * and returned. If no valid header is found, 0 is returned and the passed
- * variables are not touched.
- */
-static size_t log_prefix(const char *p, unsigned int *level, char *special)
-{
-	unsigned int lev = 0;
-	char sp = '\0';
-	size_t len;
-
-	if (p[0] != '<' || !p[1])
-		return 0;
-	if (p[2] == '>') {
-		/* usual single digit level number or special char */
-		switch (p[1]) {
-		case '0' ... '7':
-			lev = p[1] - '0';
-			break;
-		case 'c': /* KERN_CONT */
-		case 'd': /* KERN_DEFAULT */
-			sp = p[1];
-			break;
-		default:
-			return 0;
-		}
-		len = 3;
-	} else {
-		/* multi digit including the level and facility number */
-		char *endp = NULL;
-
-		lev = (simple_strtoul(&p[1], &endp, 10) & 7);
-		if (endp == NULL || endp[0] != '>')
-			return 0;
-		len = (endp + 1) - p;
-	}
-
-	/* do not accept special char if not asked for */
-	if (sp && !special)
-		return 0;
-
-	if (special) {
-		*special = sp;
-		/* return special char, do not touch level */
-		if (sp)
-			return len;
-	}
-
-	if (level)
-		*level = lev;
-	return len;
-}
-
-/*
  * Call the console drivers, asking them to write out
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
  */
-static void call_console_drivers(unsigned start, unsigned end)
+static void call_console_drivers(int level, const char *text, size_t len)
 {
-	unsigned cur_index, start_print;
-	static int msg_level = -1;
+	struct console *con;
 
-	BUG_ON(((int)(start - end)) > 0);
+	trace_console(text, 0, len, len);
 
-	cur_index = start;
-	start_print = start;
-	while (cur_index != end) {
-		if (msg_level < 0 && ((end - cur_index) > 2)) {
-			/* strip log prefix */
-			cur_index += log_prefix(&LOG_BUF(cur_index), &msg_level, NULL);
-			start_print = cur_index;
-		}
-		while (cur_index != end) {
-			char c = LOG_BUF(cur_index);
-
-			cur_index++;
-			if (c == '\n') {
-				if (msg_level < 0) {
-					/*
-					 * printk() has already given us loglevel tags in
-					 * the buffer.  This code is here in case the
-					 * log buffer has wrapped right round and scribbled
-					 * on those tags
-					 */
-					msg_level = default_message_loglevel;
-				}
-				_call_console_drivers(start_print, cur_index, msg_level);
-				msg_level = -1;
-				start_print = cur_index;
-				break;
-			}
-		}
-	}
-	_call_console_drivers(start_print, end, msg_level);
-}
+	if (level >= console_loglevel && !ignore_loglevel)
+		return;
+	if (!console_drivers)
+		return;
 
-static void emit_log_char(char c)
-{
-	LOG_BUF(log_end) = c;
-	log_end++;
-	if (log_end - log_start > log_buf_len)
-		log_start = log_end - log_buf_len;
-	if (log_end - con_start > log_buf_len)
-		con_start = log_end - log_buf_len;
-	if (logged_chars < log_buf_len)
-		logged_chars++;
+	for_each_console(con) {
+		if (exclusive_console && con != exclusive_console)
+			continue;
+		if (!(con->flags & CON_ENABLED))
+			continue;
+		if (!con->write)
+			continue;
+		if (!cpu_online(smp_processor_id()) &&
+		    !(con->flags & CON_ANYTIME))
+			continue;
+		con->write(con, text, len);
+	}
 }
 
 /*
@@ -700,16 +829,6 @@ static void zap_locks(void)
 	sema_init(&console_sem, 1);
 }
 
-#if defined(CONFIG_PRINTK_TIME)
-static bool printk_time = 1;
-#else
-static bool printk_time = 0;
-#endif
-module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
-
-static bool always_kmsg_dump;
-module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
-
 /* Check if we have any console registered that can be called early in boot. */
 static int have_callable_console(void)
 {
@@ -722,51 +841,6 @@ static int have_callable_console(void)
 	return 0;
 }
 
-/**
- * printk - print a kernel message
- * @fmt: format string
- *
- * This is printk().  It can be called from any context.  We want it to work.
- *
- * We try to grab the console_lock.  If we succeed, it's easy - we log the output and
- * call the console drivers.  If we fail to get the semaphore we place the output
- * into the log buffer and return.  The current holder of the console_sem will
- * notice the new output in console_unlock(); and will send it to the
- * consoles before releasing the lock.
- *
- * One effect of this deferred printing is that code which calls printk() and
- * then changes console_loglevel may break. This is because console_loglevel
- * is inspected when the actual printing occurs.
- *
- * See also:
- * printf(3)
- *
- * See the vsnprintf() documentation for format string extensions over C99.
- */
-
-asmlinkage int printk(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-#ifdef CONFIG_KGDB_KDB
-	if (unlikely(kdb_trap_printk)) {
-		va_start(args, fmt);
-		r = vkdb_printf(fmt, args);
-		va_end(args);
-		return r;
-	}
-#endif
-	va_start(args, fmt);
-	r = vprintk(fmt, args);
-	va_end(args);
-
-	return r;
-}
-
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int printk_cpu = UINT_MAX;
-
 /*
  * Can we actually use the console at this time on this cpu?
  *
@@ -810,17 +884,12 @@ static int console_trylock_for_printk(un
 			retval = 0;
 		}
 	}
-	printk_cpu = UINT_MAX;
+	logbuf_cpu = UINT_MAX;
 	if (wake)
 		up(&console_sem);
 	raw_spin_unlock(&logbuf_lock);
 	return retval;
 }
-static const char recursion_bug_msg [] =
-		KERN_CRIT "BUG: recent printk recursion!\n";
-static int recursion_bug;
-static int new_text_line = 1;
-static char printk_buf[1024];
 
 int printk_delay_msec __read_mostly;
 
@@ -836,15 +905,22 @@ static inline void printk_delay(void)
 	}
 }
 
-asmlinkage int vprintk(const char *fmt, va_list args)
-{
-	int printed_len = 0;
-	int current_log_level = default_message_loglevel;
+asmlinkage int vprintk_emit(int facility, int level,
+			    const char *dict, size_t dictlen,
+			    const char *fmt, va_list args)
+{
+	static int recursion_bug;
+	static char buf[LOG_LINE_MAX];
+	static size_t buflen;
+	static int buflevel;
+	static char textbuf[LOG_LINE_MAX];
+	char *text = textbuf;
+	size_t textlen;
 	unsigned long flags;
 	int this_cpu;
-	char *p;
-	size_t plen;
-	char special;
+	bool newline = false;
+	bool cont = false;
+	int printed_len = 0;
 
 	boot_delay_msec();
 	printk_delay();
@@ -856,7 +932,7 @@ asmlinkage int vprintk(const char *fmt,
 	/*
 	 * Ouch, printk recursed into itself!
 	 */
-	if (unlikely(printk_cpu == this_cpu)) {
+	if (unlikely(logbuf_cpu == this_cpu)) {
 		/*
 		 * If a crash is occurring during printk() on this CPU,
 		 * then try to get the crash message out but make sure
@@ -873,97 +949,92 @@ asmlinkage int vprintk(const char *fmt,
 
 	lockdep_off();
 	raw_spin_lock(&logbuf_lock);
-	printk_cpu = this_cpu;
+	logbuf_cpu = this_cpu;
 
 	if (recursion_bug) {
+		static const char recursion_msg[] =
+			"BUG: recent printk recursion!";
+
 		recursion_bug = 0;
-		strcpy(printk_buf, recursion_bug_msg);
-		printed_len = strlen(recursion_bug_msg);
-	}
-	/* Emit the output into the temporary buffer */
-	printed_len += vscnprintf(printk_buf + printed_len,
-				  sizeof(printk_buf) - printed_len, fmt, args);
-
-	p = printk_buf;
-
-	/* Read log level and handle special printk prefix */
-	plen = log_prefix(p, &current_log_level, &special);
-	if (plen) {
-		p += plen;
-
-		switch (special) {
-		case 'c': /* Strip <c> KERN_CONT, continue line */
-			plen = 0;
-			break;
-		case 'd': /* Strip <d> KERN_DEFAULT, start new line */
-			plen = 0;
-		default:
-			if (!new_text_line) {
-				emit_log_char('\n');
-				new_text_line = 1;
-			}
-		}
+		printed_len += strlen(recursion_msg);
+		/* emit KERN_CRIT message */
+		log_store(0, 2, NULL, 0, recursion_msg, printed_len);
 	}
 
 	/*
-	 * Copy the output into log_buf. If the caller didn't provide
-	 * the appropriate log prefix, we insert them here
+	 * The printf needs to come first; we need the syslog
+	 * prefix which might be passed-in as a parameter.
 	 */
-	for (; *p; p++) {
-		if (new_text_line) {
-			new_text_line = 0;
-
-			if (plen) {
-				/* Copy original log prefix */
-				int i;
-
-				for (i = 0; i < plen; i++)
-					emit_log_char(printk_buf[i]);
-				printed_len += plen;
-			} else {
-				/* Add log prefix */
-				emit_log_char('<');
-				emit_log_char(current_log_level + '0');
-				emit_log_char('>');
-				printed_len += 3;
-			}
+	textlen = vscnprintf(text, sizeof(textbuf), fmt, args);
 
-			if (printk_time) {
-				/* Add the current time stamp */
-				char tbuf[50], *tp;
-				unsigned tlen;
-				unsigned long long t;
-				unsigned long nanosec_rem;
-
-				t = cpu_clock(printk_cpu);
-				nanosec_rem = do_div(t, 1000000000);
-				tlen = sprintf(tbuf, "[%5lu.%06lu] ",
-						(unsigned long) t,
-						nanosec_rem / 1000);
-
-				for (tp = tbuf; tp < tbuf + tlen; tp++)
-					emit_log_char(*tp);
-				printed_len += tlen;
-			}
+	/* mark and strip a trailing newline */
+	if (textlen && text[textlen-1] == '\n') {
+		textlen--;
+		newline = true;
+	}
 
-			if (!*p)
-				break;
+	/* strip syslog prefix and extract log level or flags */
+	if (text[0] == '<' && text[1] && text[2] == '>') {
+		switch (text[1]) {
+		case '0' ... '7':
+			if (level == -1)
+				level = text[1] - '0';
+			text += 3;
+			textlen -= 3;
+			break;
+		case 'c':	/* KERN_CONT */
+			cont = true;
+		case 'd':	/* KERN_DEFAULT */
+			text += 3;
+			textlen -= 3;
+			break;
 		}
+	}
 
-		emit_log_char(*p);
-		if (*p == '\n')
-			new_text_line = 1;
+	if (buflen && (!cont || dict)) {
+		/* no continuation; flush existing buffer */
+		log_store(facility, buflevel, NULL, 0, buf, buflen);
+		printed_len += buflen;
+		buflen = 0;
+	}
+
+	if (buflen == 0) {
+		/* remember level for first message in the buffer */
+		if (level == -1)
+			buflevel = default_message_loglevel;
+		else
+			buflevel = level;
+	}
+
+	if (buflen || !newline) {
+		/* append to existing buffer, or buffer until next message */
+		if (buflen + textlen > sizeof(buf))
+			textlen = sizeof(buf) - buflen;
+		memcpy(buf + buflen, text, textlen);
+		buflen += textlen;
+	}
+
+	if (newline) {
+		/* end of line; flush buffer */
+		if (buflen) {
+			log_store(facility, buflevel,
+				  dict, dictlen, buf, buflen);
+			printed_len += buflen;
+			buflen = 0;
+		} else {
+			log_store(facility, buflevel,
+				  dict, dictlen, text, textlen);
+			printed_len += textlen;
+		}
 	}
 
 	/*
-	 * Try to acquire and then immediately release the
-	 * console semaphore. The release will do all the
-	 * actual magic (print out buffers, wake up klogd,
-	 * etc). 
+	 * Try to acquire and then immediately release the console semaphore.
+	 * The release will print out buffers and wake up /dev/kmsg and syslog()
+	 * users.
 	 *
-	 * The console_trylock_for_printk() function
-	 * will release 'logbuf_lock' regardless of whether it
-	 * actually gets the semaphore or not.
+	 * The console_trylock_for_printk() function will release 'logbuf_lock'
+	 * regardless of whether it actually gets the console semaphore or not.
 	 */
 	if (console_trylock_for_printk(this_cpu))
 		console_unlock();
@@ -974,12 +1045,73 @@ out_restore_irqs:
 
 	return printed_len;
 }
-EXPORT_SYMBOL(printk);
+EXPORT_SYMBOL(vprintk_emit);
+
+asmlinkage int vprintk(const char *fmt, va_list args)
+{
+	return vprintk_emit(0, -1, NULL, 0, fmt, args);
+}
 EXPORT_SYMBOL(vprintk);
 
+asmlinkage int printk_emit(int facility, int level,
+			   const char *dict, size_t dictlen,
+			   const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(facility, level, dict, dictlen, fmt, args);
+	va_end(args);
+
+	return r;
+}
+EXPORT_SYMBOL(printk_emit);
+
+/**
+ * printk - print a kernel message
+ * @fmt: format string
+ *
+ * This is printk(). It can be called from any context. We want it to work.
+ *
+ * We try to grab the console_lock. If we succeed, it's easy - we log the
+ * output and call the console drivers.  If we fail to get the semaphore, we
+ * place the output into the log buffer and return. The current holder of
+ * the console_sem will notice the new output in console_unlock(); and will
+ * send it to the consoles before releasing the lock.
+ *
+ * One effect of this deferred printing is that code which calls printk() and
+ * then changes console_loglevel may break. This is because console_loglevel
+ * is inspected when the actual printing occurs.
+ *
+ * See also:
+ * printf(3)
+ *
+ * See the vsnprintf() documentation for format string extensions over C99.
+ */
+asmlinkage int printk(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+#ifdef CONFIG_KGDB_KDB
+	if (unlikely(kdb_trap_printk)) {
+		va_start(args, fmt);
+		r = vkdb_printf(fmt, args);
+		va_end(args);
+		return r;
+	}
+#endif
+	va_start(args, fmt);
+	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+EXPORT_SYMBOL(printk);
 #else
 
-static void call_console_drivers(unsigned start, unsigned end)
+static void call_console_drivers(int level, const char *text, size_t len)
 {
 }
 
@@ -1217,7 +1349,7 @@ int is_console_locked(void)
 }
 
 /*
- * Delayed printk facility, for scheduler-internal messages:
+ * Delayed printk version, for scheduler-internal messages:
  */
 #define PRINTK_BUF_SIZE		512
 
@@ -1253,6 +1385,10 @@ void wake_up_klogd(void)
 		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
 }
 
+/* the next printk record to write to the console */
+static u64 console_seq;
+static u32 console_idx;
+
 /**
  * console_unlock - unlock the console system
  *
@@ -1263,15 +1399,16 @@ void wake_up_klogd(void)
  * by printk().  If this is the case, console_unlock(); emits
  * the output prior to releasing the lock.
  *
- * If there is output waiting for klogd, we wake it up.
+ * If there is output waiting, we wake it /dev/kmsg and syslog() users.
  *
  * console_unlock(); may be called from any context.
  */
 void console_unlock(void)
 {
+	static u64 seen_seq;
 	unsigned long flags;
-	unsigned _con_start, _log_end;
-	unsigned wake_klogd = 0, retry = 0;
+	bool wake_klogd = false;
+	bool retry;
 
 	if (console_suspended) {
 		up(&console_sem);
@@ -1281,17 +1418,41 @@ void console_unlock(void)
 	console_may_schedule = 0;
 
 again:
-	for ( ; ; ) {
+	for (;;) {
+		struct log *msg;
+		static char text[LOG_LINE_MAX];
+		size_t len;
+		int level;
+
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		wake_klogd |= log_start - log_end;
-		if (con_start == log_end)
-			break;			/* Nothing to print */
-		_con_start = con_start;
-		_log_end = log_end;
-		con_start = log_end;		/* Flush */
+		if (seen_seq != log_next_seq) {
+			wake_klogd = true;
+			seen_seq = log_next_seq;
+		}
+
+		if (console_seq < log_first_seq) {
+			/* messages are gone, move to first one */
+			console_seq = log_first_seq;
+			console_idx = log_first_idx;
+		}
+
+		if (console_seq == log_next_seq)
+			break;
+
+		msg = log_from_idx(console_idx);
+		level = msg->level & 7;
+		len = msg->text_len;
+		if (len+1 >= sizeof(text))
+			len = sizeof(text)-1;
+		memcpy(text, log_text(msg), len);
+		text[len++] = '\n';
+
+		console_idx = log_next(console_idx);
+		console_seq++;
 		raw_spin_unlock(&logbuf_lock);
+
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(_con_start, _log_end);
+		call_console_drivers(level, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
 	}
@@ -1312,8 +1473,7 @@ again:
 	 * flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
-	if (con_start != log_end)
-		retry = 1;
+	retry = console_seq != log_next_seq;
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	if (retry && console_trylock())
@@ -1549,7 +1709,8 @@ void register_console(struct console *ne
 		 * for us.
 		 */
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		con_start = log_start;
+		console_seq = syslog_seq;
+		console_idx = syslog_idx;
 		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 		/*
 		 * We're about to replay the log buffer.  Only do this to the
@@ -1758,6 +1919,9 @@ int kmsg_dump_unregister(struct kmsg_dum
 }
 EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
 
+static bool always_kmsg_dump;
+module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
+
 /**
  * kmsg_dump - dump kernel log to kernel message dumpers.
  * @reason: the reason (oops, panic etc) for dumping
@@ -1767,8 +1931,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
  */
 void kmsg_dump(enum kmsg_dump_reason reason)
 {
-	unsigned long end;
-	unsigned chars;
+	u64 idx;
 	struct kmsg_dumper *dumper;
 	const char *s1, *s2;
 	unsigned long l1, l2;
@@ -1780,24 +1943,27 @@ void kmsg_dump(enum kmsg_dump_reason rea
 	/* Theoretically, the log could move on after we do this, but
 	   there's not a lot we can do about that. The new messages
 	   will overwrite the start of what we dump. */
-	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	end = log_end & LOG_BUF_MASK;
-	chars = logged_chars;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (chars > end) {
-		s1 = log_buf + log_buf_len - chars + end;
-		l1 = chars - end;
+	raw_spin_lock_irqsave(&logbuf_lock, flags);
+	if (syslog_seq < log_first_seq)
+		idx = syslog_idx;
+	else
+		idx = log_first_idx;
+
+	if (idx > log_next_idx) {
+		s1 = log_buf;
+		l1 = log_next_idx;
 
-		s2 = log_buf;
-		l2 = end;
+		s2 = log_buf + idx;
+		l2 = log_buf_len - idx;
 	} else {
 		s1 = "";
 		l1 = 0;
 
-		s2 = log_buf + end - chars;
-		l2 = chars;
+		s2 = log_buf + idx;
+		l2 = log_next_idx - idx;
 	}
+	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(dumper, &dump_list, list)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-04 19:59 [PATCH] printk: support structured and multi-facility log messages Kay Sievers
                   ` (4 preceding siblings ...)
  2012-04-05 17:09 ` Linus Torvalds
@ 2012-04-07  0:26 ` Jiri Kosina
  2012-04-07  0:59   ` Joe Perches
  5 siblings, 1 reply; 34+ messages in thread
From: Jiri Kosina @ 2012-04-07  0:26 UTC (permalink / raw)
  To: Kay Sievers
  Cc: linux-kernel, Greg Kroah-Hartmann, Linus Torvalds, Ingo Molnar

On Wed, 4 Apr 2012, Kay Sievers wrote:

> - Output of dev_printk() is reliably machine-readable now. In addition
>   to the printed plain text message, it creates a log dictionary with the
>   following properties:
>     SUBSYSTEM=     - the driver-core subsytem name
>     DEVICE=
>       b12:8        - block dev_t
>       c127:3       - char dev_t
>       n8           - netdev ifindex
>       +sound:card0 - subsystem:devname

One of the questions that hasn't been raised yet, and which I personally 
consider crucial -- are we making printk() interface part of userspace ABI 
now by this?

Today, we are free to change any printk()s and not feel guilty about it at 
all. With this, we are making the whole thing much more systematic and 
friendly for automatic userspace consumption. Nnot really different to, 
for example uevents, which I believe everyone considers userspace ABI now.

Do we really want that?

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-07  0:26 ` Jiri Kosina
@ 2012-04-07  0:59   ` Joe Perches
  2012-04-07  1:14     ` Jiri Kosina
  0 siblings, 1 reply; 34+ messages in thread
From: Joe Perches @ 2012-04-07  0:59 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Kay Sievers, linux-kernel, Greg Kroah-Hartmann, Linus Torvalds,
	Ingo Molnar

On Sat, 2012-04-07 at 02:26 +0200, Jiri Kosina wrote:
> On Wed, 4 Apr 2012, Kay Sievers wrote:
> 
> > - Output of dev_printk() is reliably machine-readable now. In addition
> >   to the printed plain text message, it creates a log dictionary with the
> >   following properties:
> >     SUBSYSTEM=     - the driver-core subsytem name
> >     DEVICE=
> >       b12:8        - block dev_t
> >       c127:3       - char dev_t
> >       n8           - netdev ifindex
> >       +sound:card0 - subsystem:devname
> 
> One of the questions that hasn't been raised yet, and which I personally 
> consider crucial -- are we making printk() interface part of userspace ABI 
> now by this?

I hope not.

> Today, we are free to change any printk()s and not feel guilty about it at 
> all. With this, we are making the whole thing much more systematic and 
> friendly for automatic userspace consumption.

It _may_ be that new KEYWORD=VALUE combinations may become
systematic and an ABI, but the content of the message is
still arbitrary and should be designated as changeable
without notice.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-07  0:59   ` Joe Perches
@ 2012-04-07  1:14     ` Jiri Kosina
  2012-04-07  1:51       ` Joe Perches
  0 siblings, 1 reply; 34+ messages in thread
From: Jiri Kosina @ 2012-04-07  1:14 UTC (permalink / raw)
  To: Joe Perches
  Cc: Kay Sievers, linux-kernel, Greg Kroah-Hartmann, Linus Torvalds,
	Ingo Molnar

On Fri, 6 Apr 2012, Joe Perches wrote:

> > > - Output of dev_printk() is reliably machine-readable now. In addition
> > >   to the printed plain text message, it creates a log dictionary with the
> > >   following properties:
> > >     SUBSYSTEM=     - the driver-core subsytem name
> > >     DEVICE=
> > >       b12:8        - block dev_t
> > >       c127:3       - char dev_t
> > >       n8           - netdev ifindex
> > >       +sound:card0 - subsystem:devname
> > 
> > One of the questions that hasn't been raised yet, and which I personally 
> > consider crucial -- are we making printk() interface part of userspace ABI 
> > now by this?
> 
> I hope not.

And how exactly do we prevent that? It's a systematic and well-defined 
interface between kernel and userspace, and as such it's hard to not 
consider it to be ABI.

> > Today, we are free to change any printk()s and not feel guilty about it at 
> > all. With this, we are making the whole thing much more systematic and 
> > friendly for automatic userspace consumption.
> 
> It _may_ be that new KEYWORD=VALUE combinations may become systematic 
> and an ABI, but the content of the message is still arbitrary and should 
> be designated as changeable without notice.

I bet there would be "you broke my userspace app because it was depending 
on the printk() you just removed" bugreport coming from someone very soon. 

We can then either start explaining why this structured and well-defined 
printk() is not ABI, or just don't allow for that to happen in the first 
place by keeping printk() what it has always been -- unstructured linear 
flow of log entries destined only to be read by humans.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-07  1:14     ` Jiri Kosina
@ 2012-04-07  1:51       ` Joe Perches
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Perches @ 2012-04-07  1:51 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Kay Sievers, linux-kernel, Greg Kroah-Hartmann, Linus Torvalds,
	Ingo Molnar

On Sat, 2012-04-07 at 03:14 +0200, Jiri Kosina wrote:
> On Fri, 6 Apr 2012, Joe Perches wrote:
> 
> > > > - Output of dev_printk() is reliably machine-readable now. In addition
> > > >   to the printed plain text message, it creates a log dictionary with the
> > > >   following properties:
> > > >     SUBSYSTEM=     - the driver-core subsytem name
> > > >     DEVICE=
> > > >       b12:8        - block dev_t
> > > >       c127:3       - char dev_t
> > > >       n8           - netdev ifindex
> > > >       +sound:card0 - subsystem:devname
> > > 
> > > One of the questions that hasn't been raised yet, and which I personally 
> > > consider crucial -- are we making printk() interface part of userspace ABI 
> > > now by this?
> > 
> > I hope not.
> 
> And how exactly do we prevent that? It's a systematic and well-defined 
> interface between kernel and userspace, and as such it's hard to not 
> consider it to be ABI.
> 
> > > Today, we are free to change any printk()s and not feel guilty about it at 
> > > all. With this, we are making the whole thing much more systematic and 
> > > friendly for automatic userspace consumption.
> > 
> > It _may_ be that new KEYWORD=VALUE combinations may become systematic 
> > and an ABI, but the content of the message is still arbitrary and should 
> > be designated as changeable without notice.
> 
> I bet there would be "you broke my userspace app because it was depending 
> on the printk() you just removed" bugreport coming from someone very soon. 
> 
> We can then either start explaining why this structured and well-defined 
> printk() is not ABI, or just don't allow for that to happen in the first 
> place by keeping printk() what it has always been -- unstructured linear 
> flow of log entries destined only to be read by humans.

I think the commit log is poor.

printk() is effectively unmodified.

Generic helpers like dev_printk() can have an extra bit
prepended in the form of device/class/name.

All dev_printk device/class/name does is show what device
emitted a message.  It doesn't limit the message content.
The content of the message after that device/class/name
is still arbitrary.

No other use has been proposed, that's why I think it's
not all that valuable (yet?).

I do think that printk_emit, which I think is a bad name,
( maybe printk_tva() or printk_desc() ) has some value.

The record oriented change also has value as it can more
easily be optionally extended to compress/decompress the
text message to save memory space per record.  Of course
that'd only be valuable when CONFIG_LOG_BUF_SHIFT is
large enough to gain something by using compression.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-06 18:35         ` Kay Sievers
@ 2012-04-08  0:47           ` Joe Perches
  2012-04-08  1:02           ` Joe Perches
  1 sibling, 0 replies; 34+ messages in thread
From: Joe Perches @ 2012-04-08  0:47 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartmann

On Fri, 2012-04-06 at 20:35 +0200, Kay Sievers wrote:
> On Thu, 2012-04-05 at 18:31 -0700, Linus Torvalds wrote:
> > On Thu, Apr 5, 2012 at 6:12 PM, Joe Perches <joe@perches.com> wrote:
> > >
> > > I still think this should be 3 patches not 1.
> > 
> > I do agree that that would be lovely. That patch is really hard to
> > read, and it would be helpful if some of it was split up a bit. For
> > example, separating out the "move log buffer to be packetized"
> > (without actually doing the new things) from for the "introduce new
> > interfaces" would be quite nice.
> > 
> > In fact, I think I could merge something that does the packetization
> > early, if it is fairly obvious that it doesn't do anything else. Just
> > making the dmesg ASCII timestamp be something that is a "read buffer"
> > time thing rather than "generate printk" time thing would be nice, and
> > I think the sequence numbers can be useful even with all the old
> > interfaces.
> > 
> > So if somebody wants to help with things like that, and try to turn
> > this into a more gradual thing, I think we could merge at least some
> > initial parts of this patch even for 3.4.
> 
> Here is only the conversion of the byte- to the record-buffer split out
> of the big patch.
> 
> Unfortunately, it's still pretty big, and the patch not easy to read;
> the code itself looks better than the patch. At this moment I have no
> convincing idea how to split that into more pieces.

I'll spend some time on breaking up Kay's
patch in the next couple of weeks.

I'll try to get my printk restructuring patch
out by Monday.

I think Kay's still relatively large patch can
be broken into a sequence of:

1.  log_buf record oriented patch only
2.  S
 what I think is an
intelligible sequence of small concept pieces
and submit it end of April.  I'll be more or
less unavailable until then though.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-06 18:35         ` Kay Sievers
  2012-04-08  0:47           ` Joe Perches
@ 2012-04-08  1:02           ` Joe Perches
  2012-04-10 17:21             ` Joe Perches
  1 sibling, 1 reply; 34+ messages in thread
From: Joe Perches @ 2012-04-08  1:02 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartmann

(grumble, evolution is a stupid email client, resending)

On Fri, 2012-04-06 at 20:35 +0200, Kay Sievers wrote:
> On Thu, 2012-04-05 at 18:31 -0700, Linus Torvalds wrote:
> > On Thu, Apr 5, 2012 at 6:12 PM, Joe Perches <joe@perches.com> wrote:
> > >
> > > I still think this should be 3 patches not 1.
> > 
> > I do agree that that would be lovely. That patch is really hard to
> > read, and it would be helpful if some of it was split up a bit. For
> > example, separating out the "move log buffer to be packetized"
> > (without actually doing the new things) from for the "introduce new
> > interfaces" would be quite nice.
> > 
> > In fact, I think I could merge something that does the packetization
> > early, if it is fairly obvious that it doesn't do anything else. Just
> > making the dmesg ASCII timestamp be something that is a "read buffer"
> > time thing rather than "generate printk" time thing would be nice, and
> > I think the sequence numbers can be useful even with all the old
> > interfaces.
> > 
> > So if somebody wants to help with things like that, and try to turn
> > this into a more gradual thing, I think we could merge at least some
> > initial parts of this patch even for 3.4.
> 
> Here is only the conversion of the byte- to the record-buffer split out
> of the big patch.
> 
> Unfortunately, it's still pretty big, and the patch not easy to read;
> the code itself looks better than the patch. At this moment I have no
> convincing idea how to split that into more pieces.

I'll spend some time on breaking up Kay's
patch in the next couple of weeks.

I'll try to get my printk restructuring patch
out by Monday.

I think Kay's still relatively large patch can
be broken into a smaller sequence of:

1.  Make log_buf ascii "record" oriented
2.  Add struct for timestamp/sequence and
    integrate it into log_buf with syslog_print
3.  Add printk_emit and key/value support
    into log_buf

After that, I'll try adding lzo compression to
the message bits and see if that is any benefit.

After Monday, I'll be effectively unavailable
for a few weeks though.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-08  1:02           ` Joe Perches
@ 2012-04-10 17:21             ` Joe Perches
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Perches @ 2012-04-10 17:21 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartmann, Andrew Morton

On Sat, 2012-04-07 at 18:02 -0700, Joe Perches wrote:
> I'll spend some time on breaking up Kay's
> patch in the next couple of weeks.
> 
> I'll try to get my printk restructuring patch
> out by Monday.

I didn't get it done, and I won't even try to
look at lkml again for a couple of weeks.

If anyone else does, great, otherwise I should
have what I think is an acceptable patch set
shortly after that.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05 19:47   ` Kay Sievers
  2012-04-06  1:12     ` Joe Perches
@ 2012-04-11 11:39     ` Eric W. Biederman
  1 sibling, 0 replies; 34+ messages in thread
From: Eric W. Biederman @ 2012-04-11 11:39 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartmann

Kay Sievers <kay@vrfy.org> writes:

> On Thu, 2012-04-05 at 10:09 -0700, Linus Torvalds wrote:
>> On Wed, Apr 4, 2012 at 12:59 PM, Kay Sievers <kay@vrfy.org> wrote:
>> > From: Kay Sievers <kay@vrfy.org>
>> > Subject: printk: support structured and multi-facility log messages
>
>> Other than this issue, I actually don't have any problems with the
>> code. Most of it seems fairly reasonable. But this one just made me go
>> "wtf, how can something so reasonable then be so completely crazy in
>> this regard".
>
> Binary support is just what we are required to support in userspace
> logging. We can surely live without that in the kernel.
>
> What we are looking for is mainly the 'context' of the message, and not
> possible binary data; and that will still be provided the same way with
> the now simpler and more line-based output.
>
> Better?
>
>   [root@mop ~]# cat /dev/kmsg
>   5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
>   6,1,0;Command line: root=/dev/sda1 console=tty0 console=ttyS0
>   6,2,0;BIOS-provided physical RAM map:
>   6,3,0; BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
>   6,4,0; BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
>   6,5,0; BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
>   6,6,0; BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
>   6 ,7,0; BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
>   6,8,0; BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
>   6,9,0; BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
>   6,10,0;NX (Execute Disable) protection: active
>   6,11,0;DMI 2.4 present.
>   [...]
>   4,382,6189353;sr0: scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
>   6,383,6190667;cdrom: Uniform CD-ROM driver Revision: 3.20
>   7,384,6193747;sr 1:0:0:0: Attached scsi CD-ROM sr0
>    SUBSYSTEM=scsi
>    DEVICE=+scsi:1:0:0:0

These aren't shell variables we are creating so why put the keys in
uppercase?  All that shouting hurts my eyes.

>   6,385,9202973;EXT4-fs (sda1): re-mounted. Opts: (null)
>   6,386,10929285;Bluetooth: Core ver 2.16
>   6,387,10930852;NET: Registered protocol family 31
>   6,388,10932686;Bluetooth: HCI device and connection manager initialized
>   6,389,10935723;Bluetooth: HCI socket layer initialized
>   6,390,10937847;Bluetooth: L2CAP socket layer initialized
>   ...

So I look at this output I look at how much more bloated my screen feels
as single lines turn into multiple lines and I ask.  Is there any reason
for not encoding structured syslog data the way structured data is
encoded in the syslog rfc RFC5424?  Especially as recently my screens
like to get shorter and wider.

Aka preceding the message with a string like:
[dev subsys="name" dev="major:minor]

aka
[dev subsys="scsi" dev_name="1:0:0:0"] sr 1:0:0:0: Attached scsi CD-ROM sr0

I can see arguments for going different ways but having multiple
sections for different kinds of values seems like a life saver for
long term maintenance.

Wrapping key/value pairs in [] seems compatible with what we are doing
with timestamps already.  So it feels more evolutionary.

Making the printk buffer record oriented might be totally reasonable,
however I have to point out that you can move to RFC5424 style
structured output in printks like dev_printk without changes to
printk at all.

Maybe I am wrong.  Maybe this is all good cleanup.  But it certainly
looks like doing everything the hard way.

>  #ifdef CONFIG_KEXEC
>  /*
> @@ -165,9 +622,9 @@ static int saved_console_loglevel = -1;
>  void log_buf_kexec_setup(void)
>  {
>  	VMCOREINFO_SYMBOL(log_buf);
> -	VMCOREINFO_SYMBOL(log_end);
>  	VMCOREINFO_SYMBOL(log_buf_len);
> -	VMCOREINFO_SYMBOL(logged_chars);
> +	VMCOREINFO_SYMBOL(log_first_idx);
> +	VMCOREINFO_SYMBOL(log_next_idx);
>  }
>  #endif

Is their an accompanying patch to update everyone's crash dump analysis
tools to extract the new format of the syslog ring buffer?

Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] printk: support structured and multi-facility log messages
  2012-04-05 17:53   ` Linus Torvalds
@ 2012-04-13 13:42     ` Stijn Devriendt
  0 siblings, 0 replies; 34+ messages in thread
From: Stijn Devriendt @ 2012-04-13 13:42 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, Greg Kroah-Hartmann, Linus Torvalds

Now that we're talking vsnprintf helpers... Wouldn't it be nice if those
key/value pairs could be used in the message?

const char* kvpairs = "key=value\n";
printk_emit(..., kvpairs, ... "Something went wrong with %{key}s")

... with a saner choice of specifiers, of course.

Stijn

On Thu, Apr 5, 2012 at 7:53 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Apr 5, 2012 at 10:09 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> If you want to escape something, you escape NUL and special characters
>> using the standard *text* escaping mechanisms, not using some random
>> hodge-podge that you just made up.
>
> Btw, we could certainly add more support for escaping into vsnprintf() itself.
>
> For example, we already do all those nice helpers for converting
> different forms of common binary data into ascii: "%pU" for uuid's,
> '%pS' for symbol addresses etc.
>
> If there really are cases where people want small blobs of data and it
> would make sense, we could easily do "%.*pQ" for "quoted blob" or
> whatever, where the precision is obviously the size of the blob (so
> "%.4pQ" would be a quoted four-byte entity.
>
> We could also extend on the current '%pm', which was really designed
> for mac-addresses, but the format is actually just a hex sequence. We
> could extend that too to just take the precision into account, and
> just default to a six-byte mac address. But we could trivially make
> "%.16pm" print out a 16-byte hex string instead.
>
> I would also be very open to having some auto-quoting support for
> '%s': it might even be something we could enable by default. So that
> if you print out strings that contain special characters (including
> '\n' etc), we could make sure that vsnprintf() actually quotes them.
> We don't have to follow the POSIX rules for things, after all.
>
> So I'm very open to making it easier to use ASCII.
>
> I'm *very* closed to making any kernel logging facility use binary
> data. If you want binary data, go somewhere else. Seriously.
>
>                  Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2012-04-13 13:42 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-04 19:59 [PATCH] printk: support structured and multi-facility log messages Kay Sievers
2012-04-04 21:05 ` Greg Kroah-Hartmann
2012-04-04 21:14   ` Kay Sievers
2012-04-05  0:31     ` Greg Kroah-Hartmann
2012-04-04 21:16   ` richard -rw- weinberger
2012-04-04 21:20     ` Kay Sievers
2012-04-04 23:51 ` Joe Perches
2012-04-05  0:33   ` Greg Kroah-Hartmann
2012-04-05  0:40     ` Joe Perches
2012-04-05  7:46       ` Kay Sievers
2012-04-05  8:08         ` Sam Ravnborg
2012-04-05  8:35           ` Kay Sievers
2012-04-05 11:44             ` Joe Perches
2012-04-05  8:38 ` Joe Perches
2012-04-05  8:44   ` Kay Sievers
2012-04-05 15:05 ` Ingo Molnar
2012-04-05 15:25   ` Kay Sievers
2012-04-05 17:18     ` Ingo Molnar
2012-04-05 17:09 ` Linus Torvalds
2012-04-05 17:53   ` Linus Torvalds
2012-04-13 13:42     ` Stijn Devriendt
2012-04-05 19:47   ` Kay Sievers
2012-04-06  1:12     ` Joe Perches
2012-04-06  1:31       ` Linus Torvalds
2012-04-06  3:43         ` Joe Perches
2012-04-06 18:35         ` Kay Sievers
2012-04-08  0:47           ` Joe Perches
2012-04-08  1:02           ` Joe Perches
2012-04-10 17:21             ` Joe Perches
2012-04-11 11:39     ` Eric W. Biederman
2012-04-07  0:26 ` Jiri Kosina
2012-04-07  0:59   ` Joe Perches
2012-04-07  1:14     ` Jiri Kosina
2012-04-07  1:51       ` Joe Perches

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.