netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] printk, netconsole: implement reliable netconsole
@ 2015-04-16 23:03 Tejun Heo
  2015-04-16 23:03 ` [PATCH 01/16] printk: guard the amount written per line by devkmsg_read() Tejun Heo
                   ` (17 more replies)
  0 siblings, 18 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev

In a lot of configurations, netconsole is a useful way to collect
system logs; however, all netconsole does is simply emitting UDP
packets for the raw messages and there's no way for the receiver to
find out whether the packets were lost and/or reordered in flight.

printk already keeps log metadata which contains enough information to
make netconsole reliable.  This patchset does the followings.

* Make printk metadata available to console drivers.  A console driver
  can request this mode by setting CON_EXTENDED.  The metadata is
  emitted in the same format as /dev/kmsg.  This also makes all
  logging metadata including facility, loglevel and dictionary
  available to console receivers.

* Implement extended mode support in netconsole.  When enabled,
  netconsole transmits messages with extended header which is enough
  for the receiver to detect missing messages.

* Implement netconsole retransmission support.  Matching rx socket on
  the source port is automatically created for extended targets and
  the log receiver can request retransmission by sending reponse
  packets.  This is completely decoupled from the main write path and
  doesn't make netconsole less robust when things start go south.

* Implement netconsole ack support.  The response packet can
  optionally contain ack which enables emergency transmission timer.
  If acked sequence lags the current sequence for over 10s, netconsole
  repeatedly re-sends unacked messages with increasing interval.  This
  ensures that the receiver has the latest messages and also that all
  messages are transferred even while the kernel is failing as long as
  timer and netpoll are operational.  This too is completely decoupled
  from the main write path and doesn't make netconsole less robust.

* Implement the receiver library and simple receiver using it
  respectively in tools/lib/netconsole/libncrx.a and tools/ncrx/ncrx.
  In a simulated test with heavy packet loss (50%), ncrx logs all
  messages reliably and handle exceptional conditions including
  reboots as expected.

An obvious alternative for reliable loggin would be using a separate
TCP connection in addition to the UDP packets; however, I decided for
UDP based retransmission and ack mechanism for the following reasons.

* kernel side doesn't get simpler by using TCP.  It'd still need to
  transmit extended format messages, which BTW are useful regardless
  of reliable transmission, to match up UDP and TCP messages and
  detect missing ones from TCP send buffer filling up.  Also, the
  timeout and emergency transmission support would still be necessary
  to ensure that messages are transmitted in case of, e.g., network
  stack faiure.  It'd at least be about the same amount of code as the
  UDP based implementation.

* Receiver side might be a bit simpler but not by much.  It'd still
  need to keep track of the UDP based messages and then match them up
  with TCP messages and put messages from both sources in order (each
  stream may miss different ones) and would have to deal with
  reestablishing connections after reboots.  The only part which can
  completely go away would be the actual ack and retransmission part
  and that isn't a lot of logic.

* When the network condition is good, the only thing the UDP based
  implementation adds is occassional ack messages.  TCP based
  implementation would end up transmitting all messages twice which
  still isn't much but kinda silly given that using TCP doesn't lower
  the complexity in meaningful ways.

This patchset contains the following 16 patches.

 0001-printk-guard-the-amount-written-per-line-by-devkmsg_.patch
 0002-printk-factor-out-message-formatting-from-devkmsg_re.patch
 0003-printk-move-LOG_NOCONS-skipping-into-call_console_dr.patch
 0004-printk-implement-support-for-extended-console-driver.patch
 0005-printk-implement-log_seq_range-and-ext_log_from_seq.patch
 0006-netconsole-make-netconsole_target-enabled-a-bool.patch
 0007-netconsole-factor-out-alloc_netconsole_target.patch
 0008-netconsole-punt-disabling-to-workqueue-from-netdevic.patch
 0009-netconsole-replace-target_list_lock-with-console_loc.patch
 0010-netconsole-introduce-netconsole_mutex.patch
 0011-netconsole-consolidate-enable-disable-and-create-des.patch
 0012-netconsole-implement-extended-console-support.patch
 0013-netconsole-implement-retransmission-support-for-exte.patch
 0014-netconsole-implement-ack-handling-and-emergency-tran.patch
 0015-netconsole-implement-netconsole-receiver-library.patch
 0016-netconsole-update-documentation-for-extended-netcons.patch

0001-0005 implement extended console support in printk.

0006-0011 are prep patches for netconsole.

0012-0014 implement extended mode, retransmission and ack support.

0015 implements receiver library, libncrx, and a simple receiver using
the library, ncrx.

0016 updates documentation.

As the patchset touches both printk and netconsole, I'm not sure how
these patches should be routed once acked.  Either -mm or net should
work, I think.

This patchset is on top of linus#master[1] and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-netconsole-ext

diffstat follows.  Thanks.

 Documentation/networking/netconsole.txt |   95 +++
 drivers/net/netconsole.c                |  800 +++++++++++++++++++++++-----
 include/linux/console.h                 |    1 
 include/linux/printk.h                  |   16 
 kernel/printk/printk.c                  |  411 +++++++++++---
 tools/Makefile                          |   16 
 tools/lib/netconsole/Makefile           |   36 +
 tools/lib/netconsole/ncrx.c             |  906 ++++++++++++++++++++++++++++++++
 tools/lib/netconsole/ncrx.h             |  204 +++++++
 tools/ncrx/Makefile                     |   14 
 tools/ncrx/ncrx.c                       |  143 +++++
 11 files changed, 2419 insertions(+), 223 deletions(-)

--
tejun

[1] 497a5df7bf6f ("Merge tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip")

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/16] printk: guard the amount written per line by devkmsg_read()
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-20 12:11   ` Petr Mladek
  2015-04-16 23:03 ` [PATCH 02/16] printk: factor out message formatting from devkmsg_read() Tejun Heo
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo, Kay Sievers, Petr Mladek

devkmsg_read() uses 8k buffer and assumes that the formatted output
message won't overrun which seems safe given LOG_LINE_MAX, the current
use of dict and the escaping method being used; however, we're
planning to use devkmsg formatting wider and accounting for the buffer
size properly isn't that complicated.

This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates
devkmsg_read() so that it limits output accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Petr Mladek <pmladek@suse.cz>
---
 include/linux/printk.h |  2 ++
 kernel/printk/printk.c | 35 +++++++++++++++++++++++------------
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9b30871..58b1fec 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -30,6 +30,8 @@ static inline const char *printk_skip_level(const char *buffer)
 	return buffer;
 }
 
+#define CONSOLE_EXT_LOG_MAX	8192
+
 /* printk's without a loglevel use this.. */
 #define MESSAGE_LOGLEVEL_DEFAULT CONFIG_MESSAGE_LOGLEVEL_DEFAULT
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 879edfc..b6e24af 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -512,7 +512,7 @@ struct devkmsg_user {
 	u32 idx;
 	enum log_flags prev;
 	struct mutex lock;
-	char buf[8192];
+	char buf[CONSOLE_EXT_LOG_MAX];
 };
 
 static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
@@ -565,11 +565,18 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
 	return ret;
 }
 
+static void append_char(char **pp, char *e, char c)
+{
+	if (*pp < e)
+		*(*pp)++ = c;
+}
+
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
 	struct devkmsg_user *user = file->private_data;
 	struct printk_log *msg;
+	char *p, *e;
 	u64 ts_usec;
 	size_t i;
 	char cont = '-';
@@ -579,6 +586,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (!user)
 		return -EBADF;
 
+	p = user->buf;
+	e = user->buf + sizeof(user->buf);
+
 	ret = mutex_lock_interruptible(&user->lock);
 	if (ret)
 		return ret;
@@ -625,9 +635,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 		 ((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
 		cont = '+';
 
-	len = sprintf(user->buf, "%u,%llu,%llu,%c;",
-		      (msg->facility << 3) | msg->level,
-		      user->seq, ts_usec, cont);
+	p += scnprintf(p, e - p, "%u,%llu,%llu,%c;",
+		       (msg->facility << 3) | msg->level,
+		       user->seq, ts_usec, cont);
 	user->prev = msg->flags;
 
 	/* escape non-printable characters */
@@ -635,11 +645,11 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 		unsigned char c = log_text(msg)[i];
 
 		if (c < ' ' || c >= 127 || c == '\\')
-			len += sprintf(user->buf + len, "\\x%02x", c);
+			p += scnprintf(p, e - p, "\\x%02x", c);
 		else
-			user->buf[len++] = c;
+			append_char(&p, e, c);
 	}
-	user->buf[len++] = '\n';
+	append_char(&p, e, '\n');
 
 	if (msg->dict_len) {
 		bool line = true;
@@ -648,30 +658,31 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			unsigned char c = log_dict(msg)[i];
 
 			if (line) {
-				user->buf[len++] = ' ';
+				append_char(&p, e, ' ');
 				line = false;
 			}
 
 			if (c == '\0') {
-				user->buf[len++] = '\n';
+				append_char(&p, e, '\n');
 				line = true;
 				continue;
 			}
 
 			if (c < ' ' || c >= 127 || c == '\\') {
-				len += sprintf(user->buf + len, "\\x%02x", c);
+				p += scnprintf(p, e - p, "\\x%02x", c);
 				continue;
 			}
 
-			user->buf[len++] = c;
+			append_char(&p, e, c);
 		}
-		user->buf[len++] = '\n';
+		append_char(&p, e, '\n');
 	}
 
 	user->idx = log_next(user->idx);
 	user->seq++;
 	raw_spin_unlock_irq(&logbuf_lock);
 
+	len = p - user->buf;
 	if (len > count) {
 		ret = -EINVAL;
 		goto out;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/16] printk: factor out message formatting from devkmsg_read()
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
  2015-04-16 23:03 ` [PATCH 01/16] printk: guard the amount written per line by devkmsg_read() Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-20 12:30   ` Petr Mladek
  2015-04-16 23:03 ` [PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers() Tejun Heo
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo, Kay Sievers, Petr Mladek

The extended message formatting used for /dev/kmsg will be used
implement extended consoles.  Factor out msg_print_ext_header() and
msg_print_ext_body() from devkmsg_read().

This is pure restructuring.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 157 ++++++++++++++++++++++++++-----------------------
 1 file changed, 85 insertions(+), 72 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b6e24af..5ea6709 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -505,6 +505,86 @@ int check_syslog_permissions(int type, bool from_file)
 	return security_syslog(type);
 }
 
+static void append_char(char **pp, char *e, char c)
+{
+	if (*pp < e)
+		*(*pp)++ = c;
+}
+
+static ssize_t msg_print_ext_header(char *buf, size_t size,
+				    struct printk_log *msg, u64 seq,
+				    enum log_flags prev_flags)
+{
+	u64 ts_usec = msg->ts_nsec;
+	char cont = '-';
+
+	do_div(ts_usec, 1000);
+
+	/*
+	 * If we couldn't merge continuation line fragments during the print,
+	 * export the stored flags to allow an optional external merge of the
+	 * records. Merging the records isn't always neccessarily correct, like
+	 * when we hit a race during printing. In most cases though, it produces
+	 * better readable output. 'c' in the record flags mark the first
+	 * fragment of a line, '+' the following.
+	 */
+	if (msg->flags & LOG_CONT && !(prev_flags & LOG_CONT))
+		cont = 'c';
+	else if ((msg->flags & LOG_CONT) ||
+		 ((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
+		cont = '+';
+
+	return scnprintf(buf, size, "%u,%llu,%llu,%c;",
+		       (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+}
+
+static ssize_t msg_print_ext_body(char *buf, size_t size,
+				  char *dict, size_t dict_len,
+				  char *text, size_t text_len)
+{
+	char *p = buf, *e = buf + size;
+	size_t i;
+
+	/* escape non-printable characters */
+	for (i = 0; i < text_len; i++) {
+		unsigned char c = text[i];
+
+		if (c < ' ' || c >= 127 || c == '\\')
+			p += scnprintf(p, e - p, "\\x%02x", c);
+		else
+			append_char(&p, e, c);
+	}
+	append_char(&p, e, '\n');
+
+	if (dict_len) {
+		bool line = true;
+
+		for (i = 0; i < dict_len; i++) {
+			unsigned char c = dict[i];
+
+			if (line) {
+				append_char(&p, e, ' ');
+				line = false;
+			}
+
+			if (c == '\0') {
+				append_char(&p, e, '\n');
+				line = true;
+				continue;
+			}
+
+			if (c < ' ' || c >= 127 || c == '\\') {
+				p += scnprintf(p, e - p, "\\x%02x", c);
+				continue;
+			}
+
+			append_char(&p, e, c);
+		}
+		append_char(&p, e, '\n');
+	}
+
+	return p - buf;
+}
 
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
@@ -565,30 +645,17 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
 	return ret;
 }
 
-static void append_char(char **pp, char *e, char c)
-{
-	if (*pp < e)
-		*(*pp)++ = c;
-}
-
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
 	struct devkmsg_user *user = file->private_data;
 	struct printk_log *msg;
-	char *p, *e;
-	u64 ts_usec;
-	size_t i;
-	char cont = '-';
 	size_t len;
 	ssize_t ret;
 
 	if (!user)
 		return -EBADF;
 
-	p = user->buf;
-	e = user->buf + sizeof(user->buf);
-
 	ret = mutex_lock_interruptible(&user->lock);
 	if (ret)
 		return ret;
@@ -618,71 +685,17 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	}
 
 	msg = log_from_idx(user->idx);
-	ts_usec = msg->ts_nsec;
-	do_div(ts_usec, 1000);
-
-	/*
-	 * If we couldn't merge continuation line fragments during the print,
-	 * export the stored flags to allow an optional external merge of the
-	 * records. Merging the records isn't always neccessarily correct, like
-	 * when we hit a race during printing. In most cases though, it produces
-	 * better readable output. 'c' in the record flags mark the first
-	 * fragment of a line, '+' the following.
-	 */
-	if (msg->flags & LOG_CONT && !(user->prev & LOG_CONT))
-		cont = 'c';
-	else if ((msg->flags & LOG_CONT) ||
-		 ((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
-		cont = '+';
+	len = msg_print_ext_header(user->buf, sizeof(user->buf),
+				   msg, user->seq, user->prev);
+	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
+				  log_dict(msg), msg->dict_len,
+				  log_text(msg), msg->text_len);
 
-	p += scnprintf(p, e - p, "%u,%llu,%llu,%c;",
-		       (msg->facility << 3) | msg->level,
-		       user->seq, ts_usec, cont);
 	user->prev = msg->flags;
-
-	/* escape non-printable characters */
-	for (i = 0; i < msg->text_len; i++) {
-		unsigned char c = log_text(msg)[i];
-
-		if (c < ' ' || c >= 127 || c == '\\')
-			p += scnprintf(p, e - p, "\\x%02x", c);
-		else
-			append_char(&p, e, c);
-	}
-	append_char(&p, e, '\n');
-
-	if (msg->dict_len) {
-		bool line = true;
-
-		for (i = 0; i < msg->dict_len; i++) {
-			unsigned char c = log_dict(msg)[i];
-
-			if (line) {
-				append_char(&p, e, ' ');
-				line = false;
-			}
-
-			if (c == '\0') {
-				append_char(&p, e, '\n');
-				line = true;
-				continue;
-			}
-
-			if (c < ' ' || c >= 127 || c == '\\') {
-				p += scnprintf(p, e - p, "\\x%02x", c);
-				continue;
-			}
-
-			append_char(&p, e, c);
-		}
-		append_char(&p, e, '\n');
-	}
-
 	user->idx = log_next(user->idx);
 	user->seq++;
 	raw_spin_unlock_irq(&logbuf_lock);
 
-	len = p - user->buf;
 	if (len > count) {
 		ret = -EINVAL;
 		goto out;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers()
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
  2015-04-16 23:03 ` [PATCH 01/16] printk: guard the amount written per line by devkmsg_read() Tejun Heo
  2015-04-16 23:03 ` [PATCH 02/16] printk: factor out message formatting from devkmsg_read() Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-20 12:50   ` Petr Mladek
  2015-04-16 23:03 ` [PATCH 04/16] printk: implement support for extended console drivers Tejun Heo
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo, Kay Sievers, Petr Mladek

When a line is printed by multiple printk invocations, each chunk is
directly sent out to console drivers so that they don't get lost.
When the line is completed and stored in the log buffer, the line is
suppressed from going out to consoles as that'd lead to duplicate
outputs.  This is tracked with LOG_NOCONS flag.

The suppression is currently implemented in console_unlock() which
skips invoking call_console_drivers() for LOG_NOCONS messages.  This
patch moves the filtering into call_console_drivers() in preparation
of the planned extended console drivers which will deal with the
duplicate messages themselves.

While this makes call_console_drivers() iterate over LOG_NOCONS
messages, this is extremely unlikely to matter especially given that
continuation lines aren't that common and also simplifies
console_unlock() a bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5ea6709..0175c46 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1417,7 +1417,8 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
  */
-static void call_console_drivers(int level, const char *text, size_t len)
+static void call_console_drivers(int level, bool nocons,
+				 const char *text, size_t len)
 {
 	struct console *con;
 
@@ -1438,6 +1439,13 @@ static void call_console_drivers(int level, const char *text, size_t len)
 		if (!cpu_online(smp_processor_id()) &&
 		    !(con->flags & CON_ANYTIME))
 			continue;
+		/*
+		 * Skip record we have buffered and already printed
+		 * directly to the console when we received it.
+		 */
+		if (nocons)
+			continue;
+
 		con->write(con, text, len);
 	}
 }
@@ -1919,7 +1927,8 @@ static struct cont {
 } cont;
 static struct printk_log *log_from_idx(u32 idx) { return NULL; }
 static u32 log_next(u32 idx) { return 0; }
-static void call_console_drivers(int level, const char *text, size_t len) {}
+static void call_console_drivers(int level, bool nocons,
+				 const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 			     bool syslog, char *buf, size_t size) { return 0; }
 static size_t cont_print_text(char *text, size_t size) { return 0; }
@@ -2190,7 +2199,7 @@ static void console_cont_flush(char *text, size_t size)
 	len = cont_print_text(text, size);
 	raw_spin_unlock(&logbuf_lock);
 	stop_critical_timings();
-	call_console_drivers(cont.level, text, len);
+	call_console_drivers(cont.level, false, text, len);
 	start_critical_timings();
 	local_irq_restore(flags);
 	return;
@@ -2234,6 +2243,7 @@ again:
 		struct printk_log *msg;
 		size_t len;
 		int level;
+		bool nocons;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		if (seen_seq != log_next_seq) {
@@ -2252,38 +2262,30 @@ again:
 		} else {
 			len = 0;
 		}
-skip:
+
 		if (console_seq == log_next_seq)
 			break;
 
 		msg = log_from_idx(console_idx);
-		if (msg->flags & LOG_NOCONS) {
-			/*
-			 * Skip record we have buffered and already printed
-			 * directly to the console when we received it.
-			 */
-			console_idx = log_next(console_idx);
-			console_seq++;
-			/*
-			 * We will get here again when we register a new
-			 * CON_PRINTBUFFER console. Clear the flag so we
-			 * will properly dump everything later.
-			 */
-			msg->flags &= ~LOG_NOCONS;
-			console_prev = msg->flags;
-			goto skip;
-		}
-
 		level = msg->level;
+		nocons = msg->flags & LOG_NOCONS;
 		len += msg_print_text(msg, console_prev, false,
 				      text + len, sizeof(text) - len);
 		console_idx = log_next(console_idx);
 		console_seq++;
 		console_prev = msg->flags;
+
+		/*
+		 * The log will be processed again when we register a new
+		 * CON_PRINTBUFFER console. Clear the flag so we will
+		 * properly dump everything later.
+		 */
+		msg->flags &= ~LOG_NOCONS;
+
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(level, text, len);
+		call_console_drivers(level, nocons, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
 	}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/16] printk: implement support for extended console drivers
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (2 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers() Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-20 15:43   ` Petr Mladek
  2015-04-16 23:03 ` [PATCH 05/16] printk: implement log_seq_range() and ext_log_from_seq() Tejun Heo
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo, Kay Sievers, Petr Mladek

printk log_buf keeps various metadata for each message including its
sequence number and timestamp.  The metadata is currently available
only through /dev/kmsg and stripped out before passed onto console
drivers.  We want this metadata to be available to console drivers
too.  Immediately, it's to implement reliable netconsole but may be
useful for other console devices too.

This patch implements support for extended console drivers.  Consoles
can indicate that they process extended messages by setting the new
CON_EXTENDED flag and they'll fed messages formatted the same way as
/dev/kmsg output as follows.

 <level>,<sequnum>,<timestamp>,<contflag>;<message text>

One special case is fragments.  Message fragments are output
immediately to consoles to avoid losing them in case of crashes.  For
normal consoles, this is handled by later suppressing the assembled
result and /dev/kmsg only shows fully assembled message; however,
extended consoles would need both the fragments, to avoid losing
message in case of a crash, and the assembled result, to tell how the
fragments are assembled and which sequence number got assigned to it.

To help matching up the fragments with the resulting message,
fragments are emitted in the following format.

 <level>,-,<timestamp>,-,fragid=<fragid>;<message fragment>

And later when the assembly is complete, the following is transmitted,

 <level>,<sequnum>,<timestamp>,<contflag>,fragid=<fragid>;<message text>

* Extended message formatting for console drivers is enabled iff there
  are registered extended consoles.

* Comment describing extended message formats updated to help
  distinguishing variable with verbatim terms.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Petr Mladek <pmladek@suse.cz>
---
 include/linux/console.h |   1 +
 kernel/printk/printk.c  | 141 +++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 123 insertions(+), 19 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 7571a16..04bbd09 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -115,6 +115,7 @@ static inline int con_debug_leave(void)
 #define CON_BOOT	(8)
 #define CON_ANYTIME	(16) /* Safe to call when cpu is offline */
 #define CON_BRL		(32) /* Used for a braille device */
+#define CON_EXTENDED	(64) /* Use the extended output format a la /dev/kmsg */
 
 struct console {
 	char	name[16];
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0175c46..349a37b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -84,6 +84,8 @@ static struct lockdep_map console_lock_dep_map = {
 };
 #endif
 
+static int nr_ext_console_drivers;
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -195,14 +197,28 @@ static int console_may_schedule;
  * need to be changed in the future, when the requirements change.
  *
  * /dev/kmsg exports the structured data in the following line format:
- *   "level,sequnum,timestamp;<message text>\n"
+ *   "<level>,<sequnum>,<timestamp>,<contflag>;<message text>\n"
  *
  * The optional key/value pairs are attached as continuation lines starting
  * with a space character and terminated by a newline. All possible
  * non-prinatable characters are escaped in the "\xff" notation.
  *
  * Users of the export format should ignore possible additional values
- * separated by ',', and find the message after the ';' character.
+ * separated by ',', and find the message after the ';' character. All
+ * optional header fields should have the form "<key>=<value>".
+ *
+ * For consoles with CON_EXTENDED set, a message formatted like the
+ * following may also be printed. This is a continuation fragment which are
+ * being assembled and will be re-transmitted with a normal header once
+ * assembly finishes. The fragments are sent out immediately to avoid
+ * losing them over a crash.
+ *   "<level>,-,<timestamp>,-,fragid=<fragid>;<message fragment>\n"
+ *
+ * On completion of assembly, the following is transmitted.
+ *   "<level>,<sequnum>,<timestamp>,<contflag>,fragid=<fragid>;<message text>\n"
+ *
+ * Extended consoles should identify and handle duplicates by matching the
+ * fragids of the fragments and assembled messages.
  */
 
 enum log_flags {
@@ -210,6 +226,7 @@ enum log_flags {
 	LOG_NEWLINE	= 2,	/* text ended with a newline */
 	LOG_PREFIX	= 4,	/* text started with a prefix */
 	LOG_CONT	= 8,	/* text is a fragment of a continuation line */
+	LOG_DICT_META	= 16,	/* dict contains console meta information */
 };
 
 struct printk_log {
@@ -292,6 +309,12 @@ static char *log_dict(const struct printk_log *msg)
 	return (char *)msg + sizeof(struct printk_log) + msg->text_len;
 }
 
+/* if LOG_DICT_META is set, the dict buffer carries printk internal info */
+static size_t log_dict_len(const struct printk_log *msg)
+{
+	return (msg->flags & LOG_DICT_META) ? 0 : msg->dict_len;
+}
+
 /* get record by index; idx must point to valid msg */
 static struct printk_log *log_from_idx(u32 idx)
 {
@@ -516,7 +539,9 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
 				    enum log_flags prev_flags)
 {
 	u64 ts_usec = msg->ts_nsec;
+	u64 fragid;
 	char cont = '-';
+	char fragid_buf[32] = "";
 
 	do_div(ts_usec, 1000);
 
@@ -534,8 +559,14 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
 		 ((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
 		cont = '+';
 
-	return scnprintf(buf, size, "%u,%llu,%llu,%c;",
-		       (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+	/* see cont_flush() */
+	if ((msg->flags & LOG_DICT_META) && msg->dict_len &&
+	    sscanf(log_dict(msg), "FRAGID=%llu", &fragid))
+		scnprintf(fragid_buf, sizeof(fragid_buf),
+			  ",fragid=%llu", fragid);
+	return scnprintf(buf, size, "%u,%llu,%llu,%c%s;",
+			 (msg->facility << 3) | msg->level, seq, ts_usec, cont,
+			 fragid_buf);
 }
 
 static ssize_t msg_print_ext_body(char *buf, size_t size,
@@ -688,7 +719,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	len = msg_print_ext_header(user->buf, sizeof(user->buf),
 				   msg, user->seq, user->prev);
 	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
-				  log_dict(msg), msg->dict_len,
+				  log_dict(msg), log_dict_len(msg),
 				  log_text(msg), msg->text_len);
 
 	user->prev = msg->flags;
@@ -1418,6 +1449,7 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
  * The console_lock must be held.
  */
 static void call_console_drivers(int level, bool nocons,
+				 const char *ext_text, size_t ext_len,
 				 const char *text, size_t len)
 {
 	struct console *con;
@@ -1440,13 +1472,16 @@ static void call_console_drivers(int level, bool nocons,
 		    !(con->flags & CON_ANYTIME))
 			continue;
 		/*
-		 * Skip record we have buffered and already printed
-		 * directly to the console when we received it.
+		 * For !ext consoles, skip record we have buffered and
+		 * already printed directly when we received it.  Ext
+		 * consoles are responsible for filtering on their ends.
 		 */
-		if (nocons)
-			continue;
-
-		con->write(con, text, len);
+		if (con->flags & CON_EXTENDED) {
+			if (ext_len)
+				con->write(con, ext_text, ext_len);
+		} else if (!nocons) {
+			con->write(con, text, len);
+		}
 	}
 }
 
@@ -1546,6 +1581,7 @@ static inline void printk_delay(void)
  */
 static struct cont {
 	char buf[LOG_LINE_MAX];
+	u64 fragid;			/* unique fragment id */
 	size_t len;			/* length == 0 means unused buffer */
 	size_t cons;			/* bytes written to console */
 	struct task_struct *owner;	/* task of first print*/
@@ -1564,13 +1600,23 @@ static void cont_flush(enum log_flags flags)
 		return;
 
 	if (cont.cons) {
+		char dict_buf[32];
+		size_t dict_len;
+
 		/*
 		 * If a fragment of this line was directly flushed to the
 		 * console; wait for the console to pick up the rest of the
-		 * line. LOG_NOCONS suppresses a duplicated output.
+		 * line. LOG_NOCONS suppresses a duplicated output for !ext
+		 * consoles. ext consoles handle the duplicates themselves
+		 * and expect fragid in the header of the duplicate
+		 * messages for that. Record the fragid in the dict buffer
+		 * which fragmented messages never use.
 		 */
-		log_store(cont.facility, cont.level, flags | LOG_NOCONS,
-			  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
+		dict_len = scnprintf(dict_buf, sizeof(dict_buf), "FRAGID=%llu",
+				     cont.fragid);
+		log_store(cont.facility, cont.level,
+			  flags | LOG_NOCONS | LOG_DICT_META,
+			  cont.ts_nsec, dict_buf, dict_len, cont.buf, cont.len);
 		cont.flags = flags;
 		cont.flushed = true;
 	} else {
@@ -1596,6 +1642,9 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
 	}
 
 	if (!cont.len) {
+		static u64 fragid;
+
+		cont.fragid = fragid++;
 		cont.facility = facility;
 		cont.level = level;
 		cont.owner = current;
@@ -1920,14 +1969,28 @@ static u32 log_first_idx;
 static u64 log_next_seq;
 static enum log_flags console_prev;
 static struct cont {
+	char buf[1];
+	u64 fragid;
 	size_t len;
 	size_t cons;
+	u64 ts_nsec;
 	u8 level;
+	u8 facility;
 	bool flushed:1;
 } cont;
+static char *log_text(const struct printk_log *msg) { return NULL; }
+static char *log_dict(const struct printk_log *msg) { return NULL; }
+static size_t log_dict_len(const struct printk_log *msg) { return 0; }
 static struct printk_log *log_from_idx(u32 idx) { return NULL; }
 static u32 log_next(u32 idx) { return 0; }
+static ssize_t msg_print_ext_header(char *buf, size_t size,
+				    struct printk_log *msg, u64 seq,
+				    enum log_flags prev_flags) { return 0; }
+static ssize_t msg_print_ext_body(char *buf, size_t size,
+				  char *dict, size_t dict_len,
+				  char *text, size_t text_len) { return 0; }
 static void call_console_drivers(int level, bool nocons,
+				 const char *ext_text, size_t ext_len,
 				 const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 			     bool syslog, char *buf, size_t size) { return 0; }
@@ -2178,10 +2241,11 @@ int is_console_locked(void)
 	return console_locked;
 }
 
-static void console_cont_flush(char *text, size_t size)
+static void console_cont_flush(char *ext_text, size_t ext_size,
+			       char *text, size_t size)
 {
 	unsigned long flags;
-	size_t len;
+	size_t len, ext_len = 0;
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
 
@@ -2196,10 +2260,30 @@ static void console_cont_flush(char *text, size_t size)
 	if (console_seq < log_next_seq && !cont.cons)
 		goto out;
 
+	/*
+	 * Fragment dump for consoles. Will be printed again when the
+	 * assembly is complete.
+	 */
+	if (nr_ext_console_drivers && cont.len > cont.cons) {
+		u64 ts_usec = cont.ts_nsec;
+
+		do_div(ts_usec, 1000);
+
+		ext_len = scnprintf(ext_text, ext_size,
+				    "%u,-,%llu,-,fragid=%llu;",
+				    (cont.facility << 3) | cont.level, ts_usec,
+				    cont.fragid);
+		ext_len += msg_print_ext_body(ext_text + ext_len,
+					      ext_size - ext_len, NULL, 0,
+					      cont.buf + cont.cons,
+					      cont.len - cont.cons);
+	}
+
 	len = cont_print_text(text, size);
+
 	raw_spin_unlock(&logbuf_lock);
 	stop_critical_timings();
-	call_console_drivers(cont.level, false, text, len);
+	call_console_drivers(cont.level, false, ext_text, ext_len, text, len);
 	start_critical_timings();
 	local_irq_restore(flags);
 	return;
@@ -2223,6 +2307,7 @@ out:
  */
 void console_unlock(void)
 {
+	static char ext_text[CONSOLE_EXT_LOG_MAX];
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
 	static u64 seen_seq;
 	unsigned long flags;
@@ -2237,11 +2322,12 @@ void console_unlock(void)
 	console_may_schedule = 0;
 
 	/* flush buffered message fragment immediately to console */
-	console_cont_flush(text, sizeof(text));
+	console_cont_flush(ext_text, sizeof(ext_text), text, sizeof(text));
 again:
 	for (;;) {
 		struct printk_log *msg;
 		size_t len;
+		size_t ext_len = 0;
 		int level;
 		bool nocons;
 
@@ -2271,6 +2357,15 @@ again:
 		nocons = msg->flags & LOG_NOCONS;
 		len += msg_print_text(msg, console_prev, false,
 				      text + len, sizeof(text) - len);
+		if (nr_ext_console_drivers) {
+			ext_len = msg_print_ext_header(ext_text,
+						sizeof(ext_text),
+						msg, console_seq, console_prev);
+			ext_len += msg_print_ext_body(ext_text + ext_len,
+						sizeof(ext_text) - ext_len,
+						log_dict(msg), log_dict_len(msg),
+						log_text(msg), msg->text_len);
+		}
 		console_idx = log_next(console_idx);
 		console_seq++;
 		console_prev = msg->flags;
@@ -2285,7 +2380,8 @@ again:
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(level, nocons, text, len);
+		call_console_drivers(level, nocons,
+				     ext_text, ext_len, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
 	}
@@ -2540,6 +2636,10 @@ void register_console(struct console *newcon)
 		newcon->next = console_drivers->next;
 		console_drivers->next = newcon;
 	}
+
+	if (newcon->flags & CON_EXTENDED)
+		nr_ext_console_drivers++;
+
 	if (newcon->flags & CON_PRINTBUFFER) {
 		/*
 		 * console_unlock(); will print out the buffered messages
@@ -2612,6 +2712,9 @@ int unregister_console(struct console *console)
 		}
 	}
 
+	if (!res && (console->flags & CON_EXTENDED))
+		nr_ext_console_drivers--;
+
 	/*
 	 * If this isn't the last console and it has CON_CONSDEV set, we
 	 * need to set it on the next preferred console.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/16] printk: implement log_seq_range() and ext_log_from_seq()
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (3 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 04/16] printk: implement support for extended console drivers Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 06/16] netconsole: make netconsole_target->enabled a bool Tejun Heo
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo, Kay Sievers, Petr Mladek

Implement two functions to access log messages by their sequence
numbers.

* log_seq_range() determines the currently available sequence number
  range.

* ext_log_from_seq() outputs log message identified by a given
  sequence number in the extended format.

These will be used to implement reliable netconsole.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Petr Mladek <pmladek@suse.cz>
---
 include/linux/printk.h | 14 ++++++++
 kernel/printk/printk.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 110 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 58b1fec..4fd3bb4 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -6,6 +6,7 @@
 #include <linux/kern_levels.h>
 #include <linux/linkage.h>
 #include <linux/cache.h>
+#include <linux/errno.h>
 
 extern const char linux_banner[];
 extern const char linux_proc_banner[];
@@ -169,6 +170,8 @@ void __init setup_log_buf(int early);
 void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
+void log_seq_range(u64 *beginp, u64 *endp);
+ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -228,6 +231,17 @@ static inline void dump_stack_print_info(const char *log_lvl)
 static inline void show_regs_print_info(const char *log_lvl)
 {
 }
+
+static inline void log_seq_range(u64 *begin_seqp, u64 *end_seqp)
+{
+	*begin_seqp = 0;
+	*end_seqp = 0;
+}
+
+static inline ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq)
+{
+	return -ERANGE;
+}
 #endif
 
 extern asmlinkage void dump_stack(void) __cold;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 349a37b..904413e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -617,6 +617,102 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 	return p - buf;
 }
 
+/**
+ * log_seq_range - get the range of available log sequence numbers
+ * @begin_seqp: output parameter for the begin of the range
+ * @end_seqp: output parameter for the end of the range
+ *
+ * Returns the log sequence number range [*@begin_seqp, *@ends_seqp) which
+ * is currently available to be retrieved using ext_log_from_seq().  Note
+ * that the range may shift anytime.
+ */
+void log_seq_range(u64 *begin_seqp, u64 *end_seqp)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&logbuf_lock, flags);
+	*begin_seqp = log_first_seq;
+	*end_seqp = log_next_seq;
+	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+}
+EXPORT_SYMBOL_GPL(log_seq_range);
+
+/**
+ * ext_log_from_seq - obtain log message in extended format from its seq number
+ * @buf: output buffer
+ * @size: size of @buf
+ * @log_seq: target log sequence number
+ *
+ * Print the log message with @log_seq as its sequence number into @buf
+ * using the extended format.  On success, returns the length of formatted
+ * output excluding the terminating '\0'.  If @log_seq doesn't match any
+ * message, -ERANGE is returned.
+ *
+ * Due to the way messages are stored, accessing @log_seq requires seeking
+ * the log buffer sequentially.  As the last looked up position is cached,
+ * accessing messages in ascending sequence order is recommended.
+ */
+ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq)
+{
+	static u64 seq_hint;
+	static u32 idx_hint;
+	static enum log_flags prev_flags_hint;
+	struct printk_log *msg;
+	enum log_flags prev_flags = 0;
+	unsigned long flags;
+	ssize_t len;
+	u32 seq;
+	u32 prev_idx, idx;
+
+	raw_spin_lock_irqsave(&logbuf_lock, flags);
+
+	if (log_seq < log_first_seq || log_seq >= log_next_seq) {
+		len = -ERANGE;
+		goto out_unlock;
+	}
+
+	/*
+	 * Seek to @log_seq to determine its index. The last looked up seq
+	 * and index are cached to accelerate in-order accesses.  For now,
+	 * @prev_flags is best effort.  It'd be better to restructure it so
+	 * that the current entry carries all the relevant information
+	 * without depending on the previous one.
+	 */
+	seq = log_first_seq;
+	idx = log_first_idx;
+
+	if (seq < seq_hint && seq_hint <= log_seq) {
+		seq = seq_hint;
+		idx = idx_hint;
+		prev_flags = prev_flags_hint;
+	}
+
+	prev_idx = idx;
+	while (seq < log_seq) {
+		seq++;
+		prev_idx = idx;
+		idx = log_next(idx);
+	}
+
+	if (prev_idx != idx)
+		prev_flags = log_from_idx(prev_idx)->flags;
+	msg = log_from_idx(idx);
+
+	/* entry located, format it */
+	len = msg_print_ext_header(buf, size, msg, seq, prev_flags);
+	len += msg_print_ext_body(buf + len, size - len,
+				  log_dict(msg), msg->dict_len,
+				  log_text(msg), msg->text_len);
+
+	seq_hint = seq;
+	idx_hint = idx;
+	prev_flags_hint = prev_flags;
+out_unlock:
+	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+	return len;
+}
+EXPORT_SYMBOL_GPL(ext_log_from_seq);
+
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
 	u64 seq;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/16] netconsole: make netconsole_target->enabled a bool
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (4 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 05/16] printk: implement log_seq_range() and ext_log_from_seq() Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 07/16] netconsole: factor out alloc_netconsole_target() Tejun Heo
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

We'll add more bool's to netconsole_target.  Let's convert ->enabled
to bool first so that things stay consistent.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 15731d1..09d4e12 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -104,7 +104,7 @@ struct netconsole_target {
 #ifdef	CONFIG_NETCONSOLE_DYNAMIC
 	struct config_item	item;
 #endif
-	int			enabled;
+	bool			enabled;
 	struct mutex		mutex;
 	struct netpoll		np;
 };
@@ -197,7 +197,7 @@ static struct netconsole_target *alloc_param_target(char *target_config)
 	if (err)
 		goto fail;
 
-	nt->enabled = 1;
+	nt->enabled = true;
 
 	return nt;
 
@@ -322,13 +322,13 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 		return err;
 	if (enabled < 0 || enabled > 1)
 		return -EINVAL;
-	if (enabled == nt->enabled) {
+	if ((bool)enabled == nt->enabled) {
 		pr_info("network logging has already %s\n",
 			nt->enabled ? "started" : "stopped");
 		return -EINVAL;
 	}
 
-	if (enabled) {	/* 1 */
+	if (enabled) {	/* true */
 		/*
 		 * Skip netpoll_parse_options() -- all the attributes are
 		 * already configured via configfs. Just print them out.
@@ -340,13 +340,13 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 			return err;
 
 		pr_info("netconsole: network logging started\n");
-	} else {	/* 0 */
+	} else {	/* false */
 		/* We need to disable the netconsole before cleaning it up
 		 * otherwise we might end up in write_msg() with
-		 * nt->np.dev == NULL and nt->enabled == 1
+		 * nt->np.dev == NULL and nt->enabled == true
 		 */
 		spin_lock_irqsave(&target_list_lock, flags);
-		nt->enabled = 0;
+		nt->enabled = false;
 		spin_unlock_irqrestore(&target_list_lock, flags);
 		netpoll_cleanup(&nt->np);
 	}
@@ -594,7 +594,7 @@ static struct config_item *make_netconsole_target(struct config_group *group,
 
 	/*
 	 * Allocate and initialize with defaults.
-	 * Target is disabled at creation (enabled == 0).
+	 * Target is disabled at creation (!enabled).
 	 */
 	nt = kzalloc(sizeof(*nt), GFP_KERNEL);
 	if (!nt)
@@ -695,7 +695,7 @@ restart:
 				spin_lock_irqsave(&target_list_lock, flags);
 				dev_put(nt->np.dev);
 				nt->np.dev = NULL;
-				nt->enabled = 0;
+				nt->enabled = false;
 				stopped = true;
 				netconsole_target_put(nt);
 				goto restart;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/16] netconsole: factor out alloc_netconsole_target()
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (5 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 06/16] netconsole: make netconsole_target->enabled a bool Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 08/16] netconsole: punt disabling to workqueue from netdevice_notifier Tejun Heo
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

alloc_param_target() and make_netconsole_target() were duplicating the
same allocation and init logic.  Factor it out to
alloc_netconsole_target().

This is pure reorganization.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 09d4e12..17692b8 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -167,19 +167,17 @@ static void netconsole_target_put(struct netconsole_target *nt)
 
 #endif	/* CONFIG_NETCONSOLE_DYNAMIC */
 
-/* Allocate new target (from boot/module param) and setup netpoll for it */
-static struct netconsole_target *alloc_param_target(char *target_config)
+/*
+ * Allocate and initialize with defaults.  Note that these targets get
+ * their config_item fields zeroed-out.
+ */
+static struct netconsole_target *alloc_netconsole_target(void)
 {
-	int err = -ENOMEM;
 	struct netconsole_target *nt;
 
-	/*
-	 * Allocate and initialize with defaults.
-	 * Note that these targets get their config_item fields zeroed-out.
-	 */
 	nt = kzalloc(sizeof(*nt), GFP_KERNEL);
 	if (!nt)
-		goto fail;
+		return NULL;
 
 	nt->np.name = "netconsole";
 	strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
@@ -188,6 +186,19 @@ static struct netconsole_target *alloc_param_target(char *target_config)
 	mutex_init(&nt->mutex);
 	eth_broadcast_addr(nt->np.remote_mac);
 
+	return nt;
+}
+
+/* Allocate new target (from boot/module param) and setup netpoll for it */
+static struct netconsole_target *alloc_param_target(char *target_config)
+{
+	int err = -ENOMEM;
+	struct netconsole_target *nt;
+
+	nt = alloc_netconsole_target();
+	if (!nt)
+		goto fail;
+
 	/* Parse parameters and setup netpoll */
 	err = netpoll_parse_options(&nt->np, target_config);
 	if (err)
@@ -592,21 +603,10 @@ static struct config_item *make_netconsole_target(struct config_group *group,
 	unsigned long flags;
 	struct netconsole_target *nt;
 
-	/*
-	 * Allocate and initialize with defaults.
-	 * Target is disabled at creation (!enabled).
-	 */
-	nt = kzalloc(sizeof(*nt), GFP_KERNEL);
+	nt = alloc_netconsole_target();
 	if (!nt)
 		return ERR_PTR(-ENOMEM);
 
-	nt->np.name = "netconsole";
-	strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
-	nt->np.local_port = 6665;
-	nt->np.remote_port = 6666;
-	mutex_init(&nt->mutex);
-	eth_broadcast_addr(nt->np.remote_mac);
-
 	/* Initialize the config_item member */
 	config_item_init_type_name(&nt->item, name, &netconsole_target_type);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/16] netconsole: punt disabling to workqueue from netdevice_notifier
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (6 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 07/16] netconsole: factor out alloc_netconsole_target() Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 09/16] netconsole: replace target_list_lock with console_lock Tejun Heo
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

The netdevice_notifier callback, netconsole_netdev_event(), needs to
perform netpoll_cleanup() for the affected targets; however, the
notifier is called with rtnl_lock held which the netpoll_cleanup()
path also grabs.  To avoid deadlock, the path uses __netpoll_cleanup()
instead and making the code path different from others.

The planned reliable netconsole support will add more logic to the
cleanup path making the slightly different paths painful.  Let's punt
netconsole_target disabling to workqueue so that it can later share
the same cleanup path.  This would also allow ditching
target_list_lock and depending on console_lock for synchronization.

Note that this introduces a narrow race window where the asynchronous
disabling may race against disabling from configfs ending up executing
netpoll_cleanup() more than once on the same instance.  The follow up
patches will fix it by introducing a mutex to protect overall enable /
disable operations; unfortunately, the locking update couldn't be
ordered before this change due to the locking order with rtnl_lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 58 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 17692b8..d355776 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -105,6 +105,7 @@ struct netconsole_target {
 	struct config_item	item;
 #endif
 	bool			enabled;
+	bool			disable_scheduled;
 	struct mutex		mutex;
 	struct netpoll		np;
 };
@@ -660,6 +661,39 @@ static struct configfs_subsystem netconsole_subsys = {
 
 #endif	/* CONFIG_NETCONSOLE_DYNAMIC */
 
+static void netconsole_deferred_disable_work_fn(struct work_struct *work)
+{
+	struct netconsole_target *nt, *to_disable;
+	unsigned long flags;
+
+repeat:
+	to_disable = NULL;
+	spin_lock_irqsave(&target_list_lock, flags);
+	list_for_each_entry(nt, &target_list, list) {
+		if (!nt->disable_scheduled)
+			continue;
+		nt->disable_scheduled = false;
+
+		if (!nt->enabled)
+			continue;
+
+		netconsole_target_get(nt);
+		nt->enabled = false;
+		to_disable = nt;
+		break;
+	}
+	spin_unlock_irqrestore(&target_list_lock, flags);
+
+	if (to_disable) {
+		netpoll_cleanup(&to_disable->np);
+		netconsole_target_put(to_disable);
+		goto repeat;
+	}
+}
+
+static DECLARE_WORK(netconsole_deferred_disable_work,
+		    netconsole_deferred_disable_work_fn);
+
 /* Handle network interface device notifications */
 static int netconsole_netdev_event(struct notifier_block *this,
 				   unsigned long event, void *ptr)
@@ -674,9 +708,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 		goto done;
 
 	spin_lock_irqsave(&target_list_lock, flags);
-restart:
 	list_for_each_entry(nt, &target_list, list) {
-		netconsole_target_get(nt);
 		if (nt->np.dev == dev) {
 			switch (event) {
 			case NETDEV_CHANGENAME:
@@ -685,23 +717,14 @@ restart:
 			case NETDEV_RELEASE:
 			case NETDEV_JOIN:
 			case NETDEV_UNREGISTER:
-				/* rtnl_lock already held
-				 * we might sleep in __netpoll_cleanup()
-				 */
-				spin_unlock_irqrestore(&target_list_lock, flags);
-
-				__netpoll_cleanup(&nt->np);
-
-				spin_lock_irqsave(&target_list_lock, flags);
-				dev_put(nt->np.dev);
-				nt->np.dev = NULL;
-				nt->enabled = false;
+				/* rtnl_lock already held, punt to workqueue */
+				if (nt->enabled && !nt->disable_scheduled) {
+					nt->disable_scheduled = true;
+					schedule_work(&netconsole_deferred_disable_work);
+				}
 				stopped = true;
-				netconsole_target_put(nt);
-				goto restart;
 			}
 		}
-		netconsole_target_put(nt);
 	}
 	spin_unlock_irqrestore(&target_list_lock, flags);
 	if (stopped) {
@@ -810,7 +833,7 @@ static int __init init_netconsole(void)
 
 undonotifier:
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
-
+	cancel_work_sync(&netconsole_deferred_disable_work);
 fail:
 	pr_err("cleaning up\n");
 
@@ -834,6 +857,7 @@ static void __exit cleanup_netconsole(void)
 	unregister_console(&netconsole);
 	dynamic_netconsole_exit();
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
+	cancel_work_sync(&netconsole_deferred_disable_work);
 
 	/*
 	 * Targets created via configfs pin references on our module
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/16] netconsole: replace target_list_lock with console_lock
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (7 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 08/16] netconsole: punt disabling to workqueue from netdevice_notifier Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 10/16] netconsole: introduce netconsole_mutex Tejun Heo
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

netconsole has been using a spinlock - target_list_lock - to protect
the list of configured netconsole targets and their enable/disable
states.  With the disabling from netdevice_notifier moved off to a
workqueue by the previous patch and thus outside of rtnl_lock,
target_list_lock can be replaced with console_lock, which allows us to
avoid grabbing an extra lock in the log write path and can simplify
locking when involving other subsystems as console_lock is only
trylocked from printk path.

This patch replaces target_list_lock with console_lock.  The
conversion is one-to-one except for write_msg().  The function is
called with console_lock() already held so no further locking is
necessary; however, as netpoll_send_udp() expects irq to be disabled,
explicit irq save/restore pair is added around it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 50 ++++++++++++++++++------------------------------
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index d355776..57c02ab 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -73,12 +73,12 @@ static int __init option_setup(char *opt)
 __setup("netconsole=", option_setup);
 #endif	/* MODULE */
 
-/* Linked list of all configured targets */
+/*
+ * Linked list of all configured targets.  The list and each target's
+ * enable/disable state are protected by console_lock.
+ */
 static LIST_HEAD(target_list);
 
-/* This needs to be a spinlock because write_msg() cannot sleep */
-static DEFINE_SPINLOCK(target_list_lock);
-
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:	Links this target into the target_list.
@@ -325,7 +325,6 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 			     const char *buf,
 			     size_t count)
 {
-	unsigned long flags;
 	int enabled;
 	int err;
 
@@ -357,9 +356,9 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 		 * otherwise we might end up in write_msg() with
 		 * nt->np.dev == NULL and nt->enabled == true
 		 */
-		spin_lock_irqsave(&target_list_lock, flags);
+		console_lock();
 		nt->enabled = false;
-		spin_unlock_irqrestore(&target_list_lock, flags);
+		console_unlock();
 		netpoll_cleanup(&nt->np);
 	}
 
@@ -601,7 +600,6 @@ static struct config_item_type netconsole_target_type = {
 static struct config_item *make_netconsole_target(struct config_group *group,
 						  const char *name)
 {
-	unsigned long flags;
 	struct netconsole_target *nt;
 
 	nt = alloc_netconsole_target();
@@ -612,9 +610,9 @@ static struct config_item *make_netconsole_target(struct config_group *group,
 	config_item_init_type_name(&nt->item, name, &netconsole_target_type);
 
 	/* Adding, but it is disabled */
-	spin_lock_irqsave(&target_list_lock, flags);
+	console_lock();
 	list_add(&nt->list, &target_list);
-	spin_unlock_irqrestore(&target_list_lock, flags);
+	console_unlock();
 
 	return &nt->item;
 }
@@ -622,12 +620,11 @@ static struct config_item *make_netconsole_target(struct config_group *group,
 static void drop_netconsole_target(struct config_group *group,
 				   struct config_item *item)
 {
-	unsigned long flags;
 	struct netconsole_target *nt = to_target(item);
 
-	spin_lock_irqsave(&target_list_lock, flags);
+	console_lock();
 	list_del(&nt->list);
-	spin_unlock_irqrestore(&target_list_lock, flags);
+	console_unlock();
 
 	/*
 	 * The target may have never been enabled, or was manually disabled
@@ -664,11 +661,10 @@ static struct configfs_subsystem netconsole_subsys = {
 static void netconsole_deferred_disable_work_fn(struct work_struct *work)
 {
 	struct netconsole_target *nt, *to_disable;
-	unsigned long flags;
 
 repeat:
 	to_disable = NULL;
-	spin_lock_irqsave(&target_list_lock, flags);
+	console_lock();
 	list_for_each_entry(nt, &target_list, list) {
 		if (!nt->disable_scheduled)
 			continue;
@@ -682,7 +678,7 @@ repeat:
 		to_disable = nt;
 		break;
 	}
-	spin_unlock_irqrestore(&target_list_lock, flags);
+	console_unlock();
 
 	if (to_disable) {
 		netpoll_cleanup(&to_disable->np);
@@ -698,7 +694,6 @@ static DECLARE_WORK(netconsole_deferred_disable_work,
 static int netconsole_netdev_event(struct notifier_block *this,
 				   unsigned long event, void *ptr)
 {
-	unsigned long flags;
 	struct netconsole_target *nt;
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	bool stopped = false;
@@ -707,7 +702,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 	      event == NETDEV_RELEASE || event == NETDEV_JOIN))
 		goto done;
 
-	spin_lock_irqsave(&target_list_lock, flags);
+	console_lock();
 	list_for_each_entry(nt, &target_list, list) {
 		if (nt->np.dev == dev) {
 			switch (event) {
@@ -726,7 +721,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 			}
 		}
 	}
-	spin_unlock_irqrestore(&target_list_lock, flags);
+	console_unlock();
 	if (stopped) {
 		const char *msg = "had an event";
 		switch (event) {
@@ -755,7 +750,6 @@ static struct notifier_block netconsole_netdev_notifier = {
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
 	int frag, left;
-	unsigned long flags;
 	struct netconsole_target *nt;
 	const char *tmp;
 
@@ -765,9 +759,7 @@ static void write_msg(struct console *con, const char *msg, unsigned int len)
 	if (list_empty(&target_list))
 		return;
 
-	spin_lock_irqsave(&target_list_lock, flags);
 	list_for_each_entry(nt, &target_list, list) {
-		netconsole_target_get(nt);
 		if (nt->enabled && netif_running(nt->np.dev)) {
 			/*
 			 * We nest this inside the for-each-target loop above
@@ -783,9 +775,7 @@ static void write_msg(struct console *con, const char *msg, unsigned int len)
 				left -= frag;
 			}
 		}
-		netconsole_target_put(nt);
 	}
-	spin_unlock_irqrestore(&target_list_lock, flags);
 }
 
 static struct console netconsole = {
@@ -798,7 +788,6 @@ static int __init init_netconsole(void)
 {
 	int err;
 	struct netconsole_target *nt, *tmp;
-	unsigned long flags;
 	char *target_config;
 	char *input = config;
 
@@ -812,9 +801,9 @@ static int __init init_netconsole(void)
 			/* Dump existing printks when we register */
 			netconsole.flags |= CON_PRINTBUFFER;
 
-			spin_lock_irqsave(&target_list_lock, flags);
+			console_lock();
 			list_add(&nt->list, &target_list);
-			spin_unlock_irqrestore(&target_list_lock, flags);
+			console_unlock();
 		}
 	}
 
@@ -839,8 +828,8 @@ fail:
 
 	/*
 	 * Remove all targets and destroy them (only targets created
-	 * from the boot/module option exist here). Skipping the list
-	 * lock is safe here, and netpoll_cleanup() will sleep.
+	 * from the boot/module option exist here). Skipping the console
+	 * lock is safe here.
 	 */
 	list_for_each_entry_safe(nt, tmp, &target_list, list) {
 		list_del(&nt->list);
@@ -864,8 +853,7 @@ static void __exit cleanup_netconsole(void)
 	 * and would first be rmdir(2)'ed from userspace. We reach
 	 * here only when they are already destroyed, and only those
 	 * created from the boot/module option are left, so remove and
-	 * destroy them. Skipping the list lock is safe here, and
-	 * netpoll_cleanup() will sleep.
+	 * destroy them. Skipping the console lock is safe here.
 	 */
 	list_for_each_entry_safe(nt, tmp, &target_list, list) {
 		list_del(&nt->list);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/16] netconsole: introduce netconsole_mutex
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (8 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 09/16] netconsole: replace target_list_lock with console_lock Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 11/16] netconsole: consolidate enable/disable and create/destroy paths Tejun Heo
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

console_lock protects the target_list itself and setting and clearing
of its ->enabled flag; however, nothing protects the overall
enable/disable operations which we'll need to make netconsole
management more dynamic for the scheduled reliable transmission
support.  Also, an earlier patch introduced a small race window where
dynamic console disable may compete against asynchronous disable
kicked off from netdevice_notifier.

This patch adds netconsole_mutex which protects all target
create/destroy and enable/disable operations.  It also replaces
netconsole_target->mutex used by dynamic consoles.  The above
mentioned race is removed by this change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 57c02ab..f0ac9f6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -75,10 +75,14 @@ __setup("netconsole=", option_setup);
 
 /*
  * Linked list of all configured targets.  The list and each target's
- * enable/disable state are protected by console_lock.
+ * enable/disable state are protected by both netconsole_mutex and
+ * console_lock.
  */
 static LIST_HEAD(target_list);
 
+/* protects target creation/destruction and enable/disable */
+static DEFINE_MUTEX(netconsole_mutex);
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:	Links this target into the target_list.
@@ -106,7 +110,6 @@ struct netconsole_target {
 #endif
 	bool			enabled;
 	bool			disable_scheduled;
-	struct mutex		mutex;
 	struct netpoll		np;
 };
 
@@ -184,7 +187,6 @@ static struct netconsole_target *alloc_netconsole_target(void)
 	strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
 	nt->np.local_port = 6665;
 	nt->np.remote_port = 6666;
-	mutex_init(&nt->mutex);
 	eth_broadcast_addr(nt->np.remote_mac);
 
 	return nt;
@@ -196,6 +198,8 @@ static struct netconsole_target *alloc_param_target(char *target_config)
 	int err = -ENOMEM;
 	struct netconsole_target *nt;
 
+	lockdep_assert_held(&netconsole_mutex);
+
 	nt = alloc_netconsole_target();
 	if (!nt)
 		goto fail;
@@ -573,10 +577,10 @@ static ssize_t netconsole_target_attr_store(struct config_item *item,
 	struct netconsole_target_attr *na =
 		container_of(attr, struct netconsole_target_attr, attr);
 
-	mutex_lock(&nt->mutex);
+	mutex_lock(&netconsole_mutex);
 	if (na->store)
 		ret = na->store(nt, buf, count);
-	mutex_unlock(&nt->mutex);
+	mutex_unlock(&netconsole_mutex);
 
 	return ret;
 }
@@ -610,9 +614,11 @@ static struct config_item *make_netconsole_target(struct config_group *group,
 	config_item_init_type_name(&nt->item, name, &netconsole_target_type);
 
 	/* Adding, but it is disabled */
+	mutex_lock(&netconsole_mutex);
 	console_lock();
 	list_add(&nt->list, &target_list);
 	console_unlock();
+	mutex_unlock(&netconsole_mutex);
 
 	return &nt->item;
 }
@@ -622,6 +628,7 @@ static void drop_netconsole_target(struct config_group *group,
 {
 	struct netconsole_target *nt = to_target(item);
 
+	mutex_lock(&netconsole_mutex);
 	console_lock();
 	list_del(&nt->list);
 	console_unlock();
@@ -634,6 +641,7 @@ static void drop_netconsole_target(struct config_group *group,
 		netpoll_cleanup(&nt->np);
 
 	config_item_put(&nt->item);
+	mutex_unlock(&netconsole_mutex);
 }
 
 static struct configfs_group_operations netconsole_subsys_group_ops = {
@@ -662,6 +670,7 @@ static void netconsole_deferred_disable_work_fn(struct work_struct *work)
 {
 	struct netconsole_target *nt, *to_disable;
 
+	mutex_lock(&netconsole_mutex);
 repeat:
 	to_disable = NULL;
 	console_lock();
@@ -685,6 +694,8 @@ repeat:
 		netconsole_target_put(to_disable);
 		goto repeat;
 	}
+
+	mutex_unlock(&netconsole_mutex);
 }
 
 static DECLARE_WORK(netconsole_deferred_disable_work,
@@ -791,6 +802,8 @@ static int __init init_netconsole(void)
 	char *target_config;
 	char *input = config;
 
+	mutex_lock(&netconsole_mutex);
+
 	if (strnlen(input, MAX_PARAM_LENGTH)) {
 		while ((target_config = strsep(&input, ";"))) {
 			nt = alloc_param_target(target_config);
@@ -818,24 +831,21 @@ static int __init init_netconsole(void)
 	register_console(&netconsole);
 	pr_info("network logging started\n");
 
+	mutex_unlock(&netconsole_mutex);
 	return err;
 
 undonotifier:
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
-	cancel_work_sync(&netconsole_deferred_disable_work);
 fail:
 	pr_err("cleaning up\n");
 
-	/*
-	 * Remove all targets and destroy them (only targets created
-	 * from the boot/module option exist here). Skipping the console
-	 * lock is safe here.
-	 */
+	/* targets are already inactive, skipping the console lock is safe */
 	list_for_each_entry_safe(nt, tmp, &target_list, list) {
 		list_del(&nt->list);
 		free_param_target(nt);
 	}
-
+	mutex_unlock(&netconsole_mutex);
+	cancel_work_sync(&netconsole_deferred_disable_work);
 	return err;
 }
 
@@ -843,22 +853,20 @@ static void __exit cleanup_netconsole(void)
 {
 	struct netconsole_target *nt, *tmp;
 
+	mutex_lock(&netconsole_mutex);
+
 	unregister_console(&netconsole);
 	dynamic_netconsole_exit();
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
-	cancel_work_sync(&netconsole_deferred_disable_work);
 
-	/*
-	 * Targets created via configfs pin references on our module
-	 * and would first be rmdir(2)'ed from userspace. We reach
-	 * here only when they are already destroyed, and only those
-	 * created from the boot/module option are left, so remove and
-	 * destroy them. Skipping the console lock is safe here.
-	 */
+	/* targets are already inactive, skipping the console lock is safe */
 	list_for_each_entry_safe(nt, tmp, &target_list, list) {
 		list_del(&nt->list);
 		free_param_target(nt);
 	}
+
+	mutex_unlock(&netconsole_mutex);
+	cancel_work_sync(&netconsole_deferred_disable_work);
 }
 
 /*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/16] netconsole: consolidate enable/disable and create/destroy paths
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (9 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 10/16] netconsole: introduce netconsole_mutex Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 12/16] netconsole: implement extended console support Tejun Heo
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

netconsole management paths are scattered around in different places.
This patch reorganizes them so that

* All enable logic is in netconsole_enable() and disable in
  netconsole_disable().  Both should be called with netconsole_mutex
  held and netconsole_disable() may be invoked without intervening
  enable.

* alloc_param_target() now also handles linking the new target to
  target_list.  It's renamed to create_param_target() and now returns
  errno.

* store_enabled() now uses netconsole_enable/disable() instead of
  open-coding the logic.  This also fixes the missing synchronization
  against write path when manipulating ->enabled flag.

* drop_netconsole_target() and netconsole_deferred_disable_work_fn()
  updated to use netconsole_disable().

* init/cleanup_netconsole()'s destruction paths are consolidated into
  netconsole_destroy_all() which uses netconsole_disable().
  free_param_target() is no longer used and removed.

While the conversions aren't one-to-one equivalent, this patch
shouldn't cause any visible behavior differences and makes extension
of the enable/disable logic a lot easier.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 147 +++++++++++++++++++++++++----------------------
 1 file changed, 79 insertions(+), 68 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index f0ac9f6..d72d902 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -83,6 +83,8 @@ static LIST_HEAD(target_list);
 /* protects target creation/destruction and enable/disable */
 static DEFINE_MUTEX(netconsole_mutex);
 
+static struct console netconsole;
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:	Links this target into the target_list.
@@ -192,8 +194,43 @@ static struct netconsole_target *alloc_netconsole_target(void)
 	return nt;
 }
 
+static int netconsole_enable(struct netconsole_target *nt)
+{
+	int err;
+
+	lockdep_assert_held(&netconsole_mutex);
+	WARN_ON_ONCE(nt->enabled);
+
+	err = netpoll_setup(&nt->np);
+	if (err)
+		return err;
+
+	console_lock();
+	nt->enabled = true;
+	console_unlock();
+	return 0;
+}
+
+static void netconsole_disable(struct netconsole_target *nt)
+{
+	lockdep_assert_held(&netconsole_mutex);
+
+	/*
+	 * We need to disable the netconsole before cleaning it up
+	 * otherwise we might end up in write_msg() with !nt->np.dev &&
+	 * nt->enabled.
+	 */
+	if (nt->enabled) {
+		console_lock();
+		nt->enabled = false;
+		console_unlock();
+
+		netpoll_cleanup(&nt->np);
+	}
+}
+
 /* Allocate new target (from boot/module param) and setup netpoll for it */
-static struct netconsole_target *alloc_param_target(char *target_config)
+static int create_param_target(char *target_config)
 {
 	int err = -ENOMEM;
 	struct netconsole_target *nt;
@@ -209,24 +246,26 @@ static struct netconsole_target *alloc_param_target(char *target_config)
 	if (err)
 		goto fail;
 
-	err = netpoll_setup(&nt->np);
+	console_lock();
+	list_add(&nt->list, &target_list);
+	console_unlock();
+
+	err = netconsole_enable(nt);
 	if (err)
-		goto fail;
+		goto fail_del;
 
-	nt->enabled = true;
+	/* Dump existing printks when we register */
+	netconsole.flags |= CON_PRINTBUFFER;
 
-	return nt;
+	return 0;
 
+fail_del:
+	console_lock();
+	list_del(&nt->list);
+	console_unlock();
 fail:
 	kfree(nt);
-	return ERR_PTR(err);
-}
-
-/* Cleanup netpoll for given target (from boot/module param) and free it */
-static void free_param_target(struct netconsole_target *nt)
-{
-	netpoll_cleanup(&nt->np);
-	kfree(nt);
+	return err;
 }
 
 #ifdef	CONFIG_NETCONSOLE_DYNAMIC
@@ -350,24 +389,15 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 		 */
 		netpoll_print_options(&nt->np);
 
-		err = netpoll_setup(&nt->np);
+		err = netconsole_enable(nt);
 		if (err)
 			return err;
 
 		pr_info("netconsole: network logging started\n");
 	} else {	/* false */
-		/* We need to disable the netconsole before cleaning it up
-		 * otherwise we might end up in write_msg() with
-		 * nt->np.dev == NULL and nt->enabled == true
-		 */
-		console_lock();
-		nt->enabled = false;
-		console_unlock();
-		netpoll_cleanup(&nt->np);
+		netconsole_disable(nt);
 	}
 
-	nt->enabled = enabled;
-
 	return strnlen(buf, count);
 }
 
@@ -633,13 +663,7 @@ static void drop_netconsole_target(struct config_group *group,
 	list_del(&nt->list);
 	console_unlock();
 
-	/*
-	 * The target may have never been enabled, or was manually disabled
-	 * before being removed so netpoll may have already been cleaned up.
-	 */
-	if (nt->enabled)
-		netpoll_cleanup(&nt->np);
-
+	netconsole_disable(nt);
 	config_item_put(&nt->item);
 	mutex_unlock(&netconsole_mutex);
 }
@@ -675,22 +699,16 @@ repeat:
 	to_disable = NULL;
 	console_lock();
 	list_for_each_entry(nt, &target_list, list) {
-		if (!nt->disable_scheduled)
-			continue;
-		nt->disable_scheduled = false;
-
-		if (!nt->enabled)
-			continue;
-
-		netconsole_target_get(nt);
-		nt->enabled = false;
-		to_disable = nt;
-		break;
+		if (nt->disable_scheduled) {
+			nt->disable_scheduled = false;
+			netconsole_target_get(nt);
+			to_disable = nt;
+		}
 	}
 	console_unlock();
 
 	if (to_disable) {
-		netpoll_cleanup(&to_disable->np);
+		netconsole_disable(to_disable);
 		netconsole_target_put(to_disable);
 		goto repeat;
 	}
@@ -795,10 +813,23 @@ static struct console netconsole = {
 	.write	= write_msg,
 };
 
+static void __init_or_module netconsole_destroy_all(void)
+{
+	struct netconsole_target *nt, *tmp;
+
+	lockdep_assert_held(&netconsole_mutex);
+
+	/* targets are already inactive, skipping the console lock is safe */
+	list_for_each_entry_safe(nt, tmp, &target_list, list) {
+		list_del(&nt->list);
+		netconsole_disable(nt);
+		kfree(nt);
+	}
+}
+
 static int __init init_netconsole(void)
 {
 	int err;
-	struct netconsole_target *nt, *tmp;
 	char *target_config;
 	char *input = config;
 
@@ -806,17 +837,9 @@ static int __init init_netconsole(void)
 
 	if (strnlen(input, MAX_PARAM_LENGTH)) {
 		while ((target_config = strsep(&input, ";"))) {
-			nt = alloc_param_target(target_config);
-			if (IS_ERR(nt)) {
-				err = PTR_ERR(nt);
+			err = create_param_target(target_config);
+			if (err)
 				goto fail;
-			}
-			/* Dump existing printks when we register */
-			netconsole.flags |= CON_PRINTBUFFER;
-
-			console_lock();
-			list_add(&nt->list, &target_list);
-			console_unlock();
 		}
 	}
 
@@ -838,12 +861,7 @@ undonotifier:
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
 fail:
 	pr_err("cleaning up\n");
-
-	/* targets are already inactive, skipping the console lock is safe */
-	list_for_each_entry_safe(nt, tmp, &target_list, list) {
-		list_del(&nt->list);
-		free_param_target(nt);
-	}
+	netconsole_destroy_all();
 	mutex_unlock(&netconsole_mutex);
 	cancel_work_sync(&netconsole_deferred_disable_work);
 	return err;
@@ -851,19 +869,12 @@ fail:
 
 static void __exit cleanup_netconsole(void)
 {
-	struct netconsole_target *nt, *tmp;
-
 	mutex_lock(&netconsole_mutex);
 
 	unregister_console(&netconsole);
 	dynamic_netconsole_exit();
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
-
-	/* targets are already inactive, skipping the console lock is safe */
-	list_for_each_entry_safe(nt, tmp, &target_list, list) {
-		list_del(&nt->list);
-		free_param_target(nt);
-	}
+	netconsole_destroy_all();
 
 	mutex_unlock(&netconsole_mutex);
 	cancel_work_sync(&netconsole_deferred_disable_work);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/16] netconsole: implement extended console support
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (10 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 11/16] netconsole: consolidate enable/disable and create/destroy paths Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 13/16] netconsole: implement retransmission support for extended consoles Tejun Heo
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

netconsole transmits raw console messages using one or multiple UDP
packets and there's no way to find out whether the packets are lost in
transit or received out of order.  Depending on the setup, this can
make the logging significantly unreliable and untrustworthy.

With the new extended console support, printk now can be told to
expose log metadata including the message sequence number to console
drivers which can be used by log receivers to determine whether and
which messages are missing and reorder messages received out of order.

This patch implements extended console support for netconsole which
can be enabled by either prepending "+" to a netconsole boot param
entry or echoing 1 to "extended" file in configfs.  When enabled,
netconsole transmits extended log messages with headers identical to
/dev/kmsg output.

netconsole may have to split a single messages to multiple fragments.
In this case, if the extended mode is enabled, an optional header of
the form "ncfrag=OFF@IDX/NR" is added to each fragment where OFF is
the byte offset of the message body, IDX is the 0-based fragment index
and NR is the number of total fragments for this message.

To avoid unnecessarily forcing printk to format extended messages,
extended netconsole is registered with printk iff it's actually used.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index d72d902..626d9f0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -83,6 +83,10 @@ static LIST_HEAD(target_list);
 /* protects target creation/destruction and enable/disable */
 static DEFINE_MUTEX(netconsole_mutex);
 
+static bool netconsole_ext_used_during_init;
+static bool netconsole_ext_registered;
+static struct console netconsole_ext;
+
 static struct console netconsole;
 
 /**
@@ -112,6 +116,7 @@ struct netconsole_target {
 #endif
 	bool			enabled;
 	bool			disable_scheduled;
+	bool			extended;
 	struct netpoll		np;
 };
 
@@ -194,6 +199,81 @@ static struct netconsole_target *alloc_netconsole_target(void)
 	return nt;
 }
 
+/**
+ * send_ext_msg_udp - send extended log message to target
+ * @nt: target to send message to
+ * @msg: extended log message to send
+ * @msg_len: length of message
+ *
+ * Transfer extended log @msg to @nt.  If @msg is too long, it'll be split
+ * and transmitted in multiple chunks with ncfrag header field added to
+ * enable correct reassembly.
+ */
+static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
+			     int msg_len)
+{
+	static char buf[MAX_PRINT_CHUNK];
+	const int max_extra_len = sizeof(",ncfrag=0000@00/00");
+	const char *header, *body;
+	int header_len = msg_len, body_len = 0;
+	int chunk_len, nr_chunks, i;
+
+	if (!nt->enabled || !netif_running(nt->np.dev))
+		return;
+
+	if (msg_len <= MAX_PRINT_CHUNK) {
+		netpoll_send_udp(&nt->np, msg, msg_len);
+		return;
+	}
+
+	/* need to insert extra header fields, detect header and body */
+	header = msg;
+	body = memchr(msg, ';', msg_len);
+	if (body) {
+		header_len = body - header;
+		body_len = msg_len - header_len - 1;
+		body++;
+	}
+
+	chunk_len = MAX_PRINT_CHUNK - header_len - max_extra_len;
+	if (WARN_ON_ONCE(chunk_len <= 0))
+		return;
+
+	/*
+	 * Transfer possibly multiple chunks with extra header fields.
+	 *
+	 * If @msg needs to be split to fit MAX_PRINT_CHUNK, add
+	 * "ncfrag=<byte-offset>@<0-based-chunk-index>/<total-chunks>" to
+	 * enable proper reassembly on receiver side.
+	 */
+	memcpy(buf, header, header_len);
+	nr_chunks = DIV_ROUND_UP(body_len, chunk_len);
+
+	for (i = 0; i < nr_chunks; i++) {
+		int this_header = header_len;
+		int this_chunk;
+
+		if (nr_chunks > 1)
+			this_header += scnprintf(buf + this_header,
+						 sizeof(buf) - this_header,
+						 ",ncfrag=%d@%d/%d",
+						 i * chunk_len, i, nr_chunks);
+		if (this_header < sizeof(buf))
+			buf[this_header++] = ';';
+
+		if (WARN_ON_ONCE(this_header + chunk_len > MAX_PRINT_CHUNK))
+			return;
+
+		this_chunk = min(body_len, chunk_len);
+		memcpy(buf + this_header, body, this_chunk);
+
+		netpoll_send_udp(&nt->np, buf, this_header + this_chunk);
+
+		body += this_chunk;
+		body_len -= this_chunk;
+	}
+}
+
 static int netconsole_enable(struct netconsole_target *nt)
 {
 	int err;
@@ -241,6 +321,11 @@ static int create_param_target(char *target_config)
 	if (!nt)
 		goto fail;
 
+	if (*target_config == '+') {
+		nt->extended = 1;
+		target_config++;
+	}
+
 	/* Parse parameters and setup netpoll */
 	err = netpoll_parse_options(&nt->np, target_config);
 	if (err)
@@ -255,7 +340,12 @@ static int create_param_target(char *target_config)
 		goto fail_del;
 
 	/* Dump existing printks when we register */
-	netconsole.flags |= CON_PRINTBUFFER;
+	if (nt->extended) {
+		netconsole_ext.flags |= CON_PRINTBUFFER;
+		netconsole_ext_used_during_init = true;
+	} else {
+		netconsole.flags |= CON_PRINTBUFFER;
+	}
 
 	return 0;
 
@@ -313,6 +403,11 @@ static ssize_t show_enabled(struct netconsole_target *nt, char *buf)
 	return snprintf(buf, PAGE_SIZE, "%d\n", nt->enabled);
 }
 
+static ssize_t show_extended(struct netconsole_target *nt, char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", nt->extended);
+}
+
 static ssize_t show_dev_name(struct netconsole_target *nt, char *buf)
 {
 	return snprintf(buf, PAGE_SIZE, "%s\n", nt->np.dev_name);
@@ -401,6 +496,30 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 	return strnlen(buf, count);
 }
 
+static ssize_t store_extended(struct netconsole_target *nt,
+			      const char *buf,
+			      size_t count)
+{
+	int extended;
+	int err;
+
+	if (nt->enabled) {
+		pr_err("target (%s) is enabled, disable to update parameters\n",
+		       config_item_name(&nt->item));
+		return -EINVAL;
+	}
+
+	err = kstrtoint(buf, 10, &extended);
+	if (err < 0)
+		return err;
+	if (extended < 0 || extended > 1)
+		return -EINVAL;
+
+	nt->extended = extended;
+
+	return strnlen(buf, count);
+}
+
 static ssize_t store_dev_name(struct netconsole_target *nt,
 			      const char *buf,
 			      size_t count)
@@ -553,6 +672,7 @@ static struct netconsole_target_attr netconsole_target_##_name =	\
 	__CONFIGFS_ATTR(_name, S_IRUGO | S_IWUSR, show_##_name, store_##_name)
 
 NETCONSOLE_TARGET_ATTR_RW(enabled);
+NETCONSOLE_TARGET_ATTR_RW(extended);
 NETCONSOLE_TARGET_ATTR_RW(dev_name);
 NETCONSOLE_TARGET_ATTR_RW(local_port);
 NETCONSOLE_TARGET_ATTR_RW(remote_port);
@@ -563,6 +683,7 @@ NETCONSOLE_TARGET_ATTR_RW(remote_mac);
 
 static struct configfs_attribute *netconsole_target_attrs[] = {
 	&netconsole_target_enabled.attr,
+	&netconsole_target_extended.attr,
 	&netconsole_target_dev_name.attr,
 	&netconsole_target_local_port.attr,
 	&netconsole_target_remote_port.attr,
@@ -645,9 +766,16 @@ static struct config_item *make_netconsole_target(struct config_group *group,
 
 	/* Adding, but it is disabled */
 	mutex_lock(&netconsole_mutex);
+
 	console_lock();
 	list_add(&nt->list, &target_list);
 	console_unlock();
+
+	if (nt->extended && !netconsole_ext_registered) {
+		register_console(&netconsole_ext);
+		netconsole_ext_registered = true;
+	}
+
 	mutex_unlock(&netconsole_mutex);
 
 	return &nt->item;
@@ -776,6 +904,19 @@ static struct notifier_block netconsole_netdev_notifier = {
 	.notifier_call  = netconsole_netdev_event,
 };
 
+static void write_ext_msg(struct console *con, const char *msg,
+			  unsigned int len)
+{
+	struct netconsole_target *nt;
+
+	if ((oops_only && !oops_in_progress) || list_empty(&target_list))
+		return;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (nt->extended)
+			send_ext_msg_udp(nt, msg, len);
+}
+
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
 	int frag, left;
@@ -789,6 +930,8 @@ static void write_msg(struct console *con, const char *msg, unsigned int len)
 		return;
 
 	list_for_each_entry(nt, &target_list, list) {
+		if (nt->extended)
+			continue;
 		if (nt->enabled && netif_running(nt->np.dev)) {
 			/*
 			 * We nest this inside the for-each-target loop above
@@ -807,6 +950,12 @@ static void write_msg(struct console *con, const char *msg, unsigned int len)
 	}
 }
 
+static struct console netconsole_ext = {
+	.name	= "netcon_ext",
+	.flags	= CON_ENABLED | CON_EXTENDED,
+	.write	= write_ext_msg,
+};
+
 static struct console netconsole = {
 	.name	= "netcon",
 	.flags	= CON_ENABLED,
@@ -851,6 +1000,11 @@ static int __init init_netconsole(void)
 	if (err)
 		goto undonotifier;
 
+	if (netconsole_ext_used_during_init) {
+		register_console(&netconsole_ext);
+		netconsole_ext_registered = true;
+	}
+
 	register_console(&netconsole);
 	pr_info("network logging started\n");
 
@@ -871,6 +1025,9 @@ static void __exit cleanup_netconsole(void)
 {
 	mutex_lock(&netconsole_mutex);
 
+	if (netconsole_ext_registered)
+		unregister_console(&netconsole_ext);
+
 	unregister_console(&netconsole);
 	dynamic_netconsole_exit();
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/16] netconsole: implement retransmission support for extended consoles
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (11 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 12/16] netconsole: implement extended console support Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 14/16] netconsole: implement ack handling and emergency transmission Tejun Heo
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

With extended netconsole, the logger can reliably determine which
messages are missing.  This patch implements receiver for extended
netconsole which accepts retransmission request packets coming into
the the netconsole source port and sends back the requested messages.

The receiving socket is automatically created when an extended console
is enabled.  A poll_table with a custom callback is queued on the
socket.  When a packet is received on the socket, a work item is
scheduled which reads and parses the payload and sends back the
requested ones.  A retransmission packet looks like the following.

 nca <missing-seq> <missing-seq>...

The receiver doesn't interfere with the normal transmission path.
Even if something goes wrong with the network stack or receiver
itself, the normal transmission path should keep working.

This can be used to implement reliable netconsole logger.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 238 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 238 insertions(+)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 626d9f0..b2763e0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -45,7 +45,10 @@
 #include <linux/kernel.h>
 #include <linux/string.h>
 #include <linux/netpoll.h>
+#include <linux/net.h>
 #include <linux/inet.h>
+#include <linux/file.h>
+#include <linux/poll.h>
 #include <linux/configfs.h>
 #include <linux/etherdevice.h>
 
@@ -89,6 +92,14 @@ static struct console netconsole_ext;
 
 static struct console netconsole;
 
+union sockaddr_in46 {
+	struct sockaddr		addr;
+	struct sockaddr_in6	in6;
+	struct sockaddr_in	in4;
+};
+
+static char *netconsole_ext_tx_buf;	/* CONSOLE_EXT_LOG_MAX */
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:	Links this target into the target_list.
@@ -118,6 +129,14 @@ struct netconsole_target {
 	bool			disable_scheduled;
 	bool			extended;
 	struct netpoll		np;
+
+	/* response packet handling for extended netconsoles */
+	union sockaddr_in46	raddr;		/* logging target address */
+	struct file		*rx_file;	/* sock to receive responses */
+	poll_table		rx_ptable;	/* for polling the socket */
+	wait_queue_t		rx_wait;	/* ditto */
+	wait_queue_head_t	*rx_waitq;	/* ditto */
+	struct work_struct	rx_work;	/* receive & process packets */
 };
 
 #ifdef	CONFIG_NETCONSOLE_DYNAMIC
@@ -274,6 +293,219 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 	}
 }
 
+static bool sockaddr_in46_equal(union sockaddr_in46 *a, union sockaddr_in46 *b)
+{
+	if (a->addr.sa_family != b->addr.sa_family)
+		return false;
+	if (a->in4.sin_port != b->in4.sin_port)
+		return false;
+	if (a->addr.sa_family == AF_INET &&
+	    memcmp(&a->in4.sin_addr, &b->in4.sin_addr, sizeof(a->in4.sin_addr)))
+		return false;
+	if (a->addr.sa_family == AF_INET6 &&
+	    memcmp(&a->in6.sin6_addr, &b->in6.sin6_addr, sizeof(a->in6.sin6_addr)))
+		return false;
+	return true;
+}
+
+/**
+ * netconsole_rx_work_fn - work function to handle netconsole ack packets
+ * @work: netconsole_target->rx_work
+ *
+ * This function is scheduled when packets are received on the source port
+ * of extended netconsole target and responds to ack messages from the
+ * remote logger.  An ack message has the following format.
+ *
+ * nca <missing-seq> <missing-seq> ...
+ *
+ * There can be any number of missing-seq's as long as the whole payload
+ * fits inside MAX_PRINT_CHUNK.  This function re-transmits each message
+ * matching the missing-seq's if available.  This can be used to implement
+ * reliable logging from the receiver side.
+ *
+ * The rx path doesn't interfere with the normal tx path except for when it
+ * actually sends out the messages, and that path doesn't depend on
+ * anything more working compared to the normal tx path.  As such, this
+ * shouldn't make netconsole any less robust when things start going south.
+ */
+static void netconsole_rx_work_fn(struct work_struct *work)
+{
+	struct netconsole_target *nt =
+		container_of(work, struct netconsole_target, rx_work);
+	union sockaddr_in46 raddr;
+	struct msghdr msgh = { .msg_name = &raddr.addr, };
+	struct kvec iov;
+	char *tx_buf = netconsole_ext_tx_buf;
+	char *rx_buf, *pos, *tok;
+	u64 seq;
+	int len;
+
+	rx_buf = kmalloc(MAX_PRINT_CHUNK + 1, GFP_KERNEL);
+	if (!rx_buf && printk_ratelimit()) {
+		pr_warning("failed to allocate RX buffer\n");
+		return;
+	}
+
+	iov.iov_base = rx_buf;
+	iov.iov_len = MAX_PRINT_CHUNK;
+repeat:
+	msgh.msg_namelen = sizeof(raddr);
+	len = kernel_recvmsg(sock_from_file(nt->rx_file, &len), &msgh, &iov, 1,
+			     MAX_PRINT_CHUNK, MSG_DONTWAIT);
+	if (len < 0) {
+		if (len != -EAGAIN && printk_ratelimit())
+			pr_warning("RX failed err=%d\n", len);
+		kfree(rx_buf);
+		return;
+	}
+
+	if (!sockaddr_in46_equal(msgh.msg_name, &nt->raddr)) {
+		if (printk_ratelimit())
+			pr_warning("stray packet from %pIS:%u\n",
+				   &raddr.addr, ntohs(raddr.in4.sin_port));
+		goto repeat;
+	}
+
+	rx_buf[len] = '\0';
+	pos = rx_buf;
+	tok = strsep(&pos, " ");
+
+	/* nca header */
+	if (strncmp(tok, "nca", 3)) {
+		if (printk_ratelimit())
+			pr_warning("malformed packet from %pIS:%u\n",
+				   &nt->raddr, ntohs(nt->raddr.in4.sin_port));
+		goto repeat;
+	}
+	tok += 3;
+
+	console_lock();
+
+	/* <missing-seq>... */
+	while ((tok = strsep(&pos, " "))) {
+		if (sscanf(tok, "%llu", &seq)) {
+			len = ext_log_from_seq(tx_buf, CONSOLE_EXT_LOG_MAX, seq);
+			if (len >= 0)
+				send_ext_msg_udp(nt, tx_buf, len);
+		}
+	}
+
+	console_unlock();
+	goto repeat;
+}
+
+static int netconsole_rx_wait_fn(wait_queue_t *wait, unsigned mode, int flags,
+				 void *key)
+{
+	struct netconsole_target *nt =
+		container_of(wait, struct netconsole_target, rx_wait);
+
+	schedule_work(&nt->rx_work);
+	return 0;
+}
+
+static void netconsole_rx_ptable_queue_fn(struct file *file,
+					  wait_queue_head_t *waitq,
+					  poll_table *ptable)
+{
+	struct netconsole_target *nt =
+		container_of(ptable, struct netconsole_target, rx_ptable);
+
+	nt->rx_waitq = waitq;
+	add_wait_queue(waitq, &nt->rx_wait);
+}
+
+static void netconsole_enable_rx(struct netconsole_target *nt)
+{
+	union sockaddr_in46 laddr = { }, raddr = { };
+	struct socket *sock;
+	struct file *file;
+	int addr_len, ret;
+
+	if (!netconsole_ext_tx_buf) {
+		char *buf;
+
+		buf = kmalloc(CONSOLE_EXT_LOG_MAX, GFP_KERNEL);
+		if (!buf) {
+			kfree(buf);
+			pr_warning("failed to allocate TX buffer\n");
+			return;
+		}
+
+		if (cmpxchg(&netconsole_ext_tx_buf, NULL, buf))
+			kfree(buf);
+	}
+
+	if (!nt->np.ipv6) {
+		laddr.in4.sin_family = AF_INET;
+		laddr.in4.sin_port = htons(nt->np.local_port);
+		laddr.in4.sin_addr = nt->np.local_ip.in;
+		raddr.in4.sin_family = AF_INET;
+		raddr.in4.sin_port = htons(nt->np.remote_port);
+		raddr.in4.sin_addr = nt->np.remote_ip.in;
+		addr_len = sizeof(laddr.in4);
+	} else {
+		laddr.in6.sin6_family = AF_INET6;
+		laddr.in6.sin6_port = htons(nt->np.local_port);
+		laddr.in6.sin6_addr = nt->np.local_ip.in6;
+		raddr.in6.sin6_family = AF_INET6;
+		raddr.in6.sin6_port = htons(nt->np.remote_port);
+		raddr.in6.sin6_addr = nt->np.remote_ip.in6;
+		addr_len = sizeof(laddr.in6);
+	}
+
+	ret = sock_create_kern(laddr.addr.sa_family, SOCK_DGRAM, IPPROTO_UDP,
+			       &sock);
+	if (ret) {
+		pr_warning("failed to create rx socket, err=%d\n", ret);
+		return;
+	}
+
+	file = sock_alloc_file(sock, 0, NULL);
+	if (IS_ERR(file)) {
+		pr_warning("failed to rx socket file, err=%ld\n", PTR_ERR(file));
+		sock_release(sock);
+		return;
+	}
+
+	ret = sock->ops->bind(sock, &laddr.addr, addr_len);
+	if (ret) {
+		pr_warning("failed to bind rx socket at %pIS:%u, err=%d\n",
+			   &laddr.addr, ntohs(laddr.in4.sin_port), ret);
+		fput(file);
+		return;
+	}
+
+	nt->raddr = raddr;
+	init_poll_funcptr(&nt->rx_ptable, netconsole_rx_ptable_queue_fn);
+	init_waitqueue_func_entry(&nt->rx_wait, netconsole_rx_wait_fn);
+	INIT_WORK(&nt->rx_work, netconsole_rx_work_fn);
+
+	ret = file->f_op->poll(file, &nt->rx_ptable);
+	if (ret < 0) {
+		pr_warning("failed to poll rx socket, err=%d\n", ret);
+		fput(file);
+		return;
+	}
+
+	if (ret & POLLIN)
+		schedule_work(&nt->rx_work);
+
+	nt->rx_file = file;
+}
+
+static void netconsole_disable_rx(struct netconsole_target *nt)
+{
+	if (!nt->rx_file)
+		return;
+
+	remove_wait_queue(nt->rx_waitq, &nt->rx_wait);
+	cancel_work_sync(&nt->rx_work);
+
+	fput(nt->rx_file);
+	nt->rx_file = NULL;
+}
+
 static int netconsole_enable(struct netconsole_target *nt)
 {
 	int err;
@@ -285,6 +517,9 @@ static int netconsole_enable(struct netconsole_target *nt)
 	if (err)
 		return err;
 
+	if (nt->extended)
+		netconsole_enable_rx(nt);
+
 	console_lock();
 	nt->enabled = true;
 	console_unlock();
@@ -305,6 +540,7 @@ static void netconsole_disable(struct netconsole_target *nt)
 		nt->enabled = false;
 		console_unlock();
 
+		netconsole_disable_rx(nt);
 		netpoll_cleanup(&nt->np);
 	}
 }
@@ -974,6 +1210,8 @@ static void __init_or_module netconsole_destroy_all(void)
 		netconsole_disable(nt);
 		kfree(nt);
 	}
+
+	kfree(netconsole_ext_tx_buf);
 }
 
 static int __init init_netconsole(void)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/16] netconsole: implement ack handling and emergency transmission
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (12 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 13/16] netconsole: implement retransmission support for extended consoles Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 15/16] netconsole: implement netconsole receiver library Tejun Heo
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

While the retransmission support added by the previous patch goes a
long way towards enabling implementation of reliable remote logger, it
depends on a lot larger part of the kernel working and there's no
non-polling way of discovering whether the latest messages have been
lost.

This patch implements an optional ack handling.  An extended remote
logger can respond with a message formatted like the following.

 nca[<ack-seq>] <missing-seq> <missing-seq>...

The optional <ack-seq> enables ack handling.  Whenever ack lags
transmission by more than ACK_TIMEOUT (10s), the netconsole target
will retransmit all unacked messages with increasing interval (100ms
between message at the beginning, exponentially backing out to 10s).
This ensures that the remote logger has a very high chance of getting
all the messages even after severe failures as long as the timer and
netpoll are working.

This doesn't interfere with the normal transmission path.  Even if
something goes wrong with timer, the normal transmission path should
keep working.

Emergency transmissions are marked with "emg=1" header.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 drivers/net/netconsole.c | 136 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 127 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index b2763e0..82c8be0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -49,6 +49,7 @@
 #include <linux/inet.h>
 #include <linux/file.h>
 #include <linux/poll.h>
+#include <linux/timer.h>
 #include <linux/configfs.h>
 #include <linux/etherdevice.h>
 
@@ -59,6 +60,10 @@ MODULE_LICENSE("GPL");
 #define MAX_PARAM_LENGTH	256
 #define MAX_PRINT_CHUNK		1000
 
+#define ACK_TIMEOUT		(10 * HZ)
+#define EMG_TX_MIN_INTV		(HZ / 10)
+#define EMG_TX_MAX_INTV		HZ
+
 static char config[MAX_PARAM_LENGTH];
 module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@<tgt-ip>/[tgt-macaddr]");
@@ -137,6 +142,13 @@ struct netconsole_target {
 	wait_queue_t		rx_wait;	/* ditto */
 	wait_queue_head_t	*rx_waitq;	/* ditto */
 	struct work_struct	rx_work;	/* receive & process packets */
+
+	/* ack handling and emergency transmission for extended netconsoles */
+	unsigned long		ack_tstmp;	/* pending ack timestamp */
+	u64			ack_seq;	/* last acked sequence */
+	struct timer_list	ack_timer;	/* ack timeout */
+	unsigned long		emg_tx_intv;	/* emergency tx interval */
+	u64			emg_tx_seq;	/* current emg tx sequence */
 };
 
 #ifdef	CONFIG_NETCONSOLE_DYNAMIC
@@ -223,16 +235,17 @@ static struct netconsole_target *alloc_netconsole_target(void)
  * @nt: target to send message to
  * @msg: extended log message to send
  * @msg_len: length of message
+ * @emg_tx: is it for emergency transmission?
  *
  * Transfer extended log @msg to @nt.  If @msg is too long, it'll be split
  * and transmitted in multiple chunks with ncfrag header field added to
- * enable correct reassembly.
+ * enable correct reassembly.  If @emg_tx, emg header field is added.
  */
 static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
-			     int msg_len)
+			     int msg_len, bool emg_tx)
 {
 	static char buf[MAX_PRINT_CHUNK];
-	const int max_extra_len = sizeof(",ncfrag=0000@00/00");
+	const int max_extra_len = sizeof(",emg=1,ncfrag=0000@00/00");
 	const char *header, *body;
 	int header_len = msg_len, body_len = 0;
 	int chunk_len, nr_chunks, i;
@@ -240,7 +253,7 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 	if (!nt->enabled || !netif_running(nt->np.dev))
 		return;
 
-	if (msg_len <= MAX_PRINT_CHUNK) {
+	if (!emg_tx && msg_len <= MAX_PRINT_CHUNK) {
 		netpoll_send_udp(&nt->np, msg, msg_len);
 		return;
 	}
@@ -261,6 +274,8 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 	/*
 	 * Transfer possibly multiple chunks with extra header fields.
 	 *
+	 * For emergency transfers due to missing acks, add "emg=1".
+	 *
 	 * If @msg needs to be split to fit MAX_PRINT_CHUNK, add
 	 * "ncfrag=<byte-offset>@<0-based-chunk-index>/<total-chunks>" to
 	 * enable proper reassembly on receiver side.
@@ -272,6 +287,10 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 		int this_header = header_len;
 		int this_chunk;
 
+		if (emg_tx)
+			this_header += scnprintf(buf + this_header,
+						 sizeof(buf) - this_header,
+						 ",emg=1");
 		if (nr_chunks > 1)
 			this_header += scnprintf(buf + this_header,
 						 sizeof(buf) - this_header,
@@ -309,6 +328,78 @@ static bool sockaddr_in46_equal(union sockaddr_in46 *a, union sockaddr_in46 *b)
 }
 
 /**
+ * netconsole_ack_timer_fn - timer function to handle netconsole ack timeouts
+ * @data: netconsole_target
+ *
+ * For an extended netconsole, if the remote logger acked messages
+ * previously but is failing to ack new messages for over ACK_TIMEOUT, this
+ * timer kicks in for emergency transmission.
+ *
+ * The lack of ack may be caused by a number of things including the ack
+ * packet being dropped or the network stack oopsing.  Whatever the cause
+ * may be, we repeatedly retransmit the unacked message at increasing
+ * interval which should allow the remote logger to eventually obtain all
+ * messages as long as the packets are going out.
+ *
+ * This mechanism depends only on the timer and netpoll working.
+ */
+static void netconsole_ack_timer_fn(unsigned long data)
+{
+	struct netconsole_target *nt = (void *)data;
+	unsigned long now = jiffies;
+	unsigned long next_at;
+	char *buf = netconsole_ext_tx_buf;
+	u64 begin_seq, end_seq, seq;
+	int len;
+
+	if (!console_trylock()) {
+		mod_timer(&nt->ack_timer, now + 1);
+		return;
+	}
+
+	log_seq_range(&begin_seq, &end_seq);
+	if (nt->ack_seq == end_seq - 1) {
+		nt->ack_tstmp = 0;
+		next_at = now + ACK_TIMEOUT;
+		goto out_unlock;
+	}
+	if (time_before(now, nt->ack_tstmp + ACK_TIMEOUT)) {
+		next_at = nt->ack_tstmp + ACK_TIMEOUT;
+		goto out_unlock;
+	}
+
+	/* zero emg_tx_intv indicates that we're starting emergency tx */
+	if (!nt->emg_tx_intv) {
+		nt->emg_tx_intv = EMG_TX_MIN_INTV;
+		nt->emg_tx_seq = nt->ack_seq + 1;
+	}
+
+	seq = nt->emg_tx_seq;
+	do {
+		len = ext_log_from_seq(buf, CONSOLE_EXT_LOG_MAX, seq);
+	} while (len < 0 && ++seq < end_seq);
+
+	if (len >= 0) {
+		send_ext_msg_udp(nt, buf, len, true);
+		seq++;
+	}
+
+	nt->emg_tx_seq = seq;
+
+	if (nt->emg_tx_seq >= end_seq) {
+		/* all messages transferred, bump up intv and repeat */
+		nt->emg_tx_intv = min_t(unsigned long, 2 * nt->emg_tx_intv,
+					EMG_TX_MAX_INTV);
+		nt->emg_tx_seq = nt->ack_seq + 1;
+	}
+
+	next_at = now + nt->emg_tx_intv;
+out_unlock:
+	console_unlock();
+	mod_timer(&nt->ack_timer, next_at);
+}
+
+/**
  * netconsole_rx_work_fn - work function to handle netconsole ack packets
  * @work: netconsole_target->rx_work
  *
@@ -316,7 +407,7 @@ static bool sockaddr_in46_equal(union sockaddr_in46 *a, union sockaddr_in46 *b)
  * of extended netconsole target and responds to ack messages from the
  * remote logger.  An ack message has the following format.
  *
- * nca <missing-seq> <missing-seq> ...
+ * nca[<ack-seq>] <missing-seq> <missing-seq> ...
  *
  * There can be any number of missing-seq's as long as the whole payload
  * fits inside MAX_PRINT_CHUNK.  This function re-transmits each message
@@ -327,6 +418,10 @@ static bool sockaddr_in46_equal(union sockaddr_in46 *a, union sockaddr_in46 *b)
  * actually sends out the messages, and that path doesn't depend on
  * anything more working compared to the normal tx path.  As such, this
  * shouldn't make netconsole any less robust when things start going south.
+ *
+ * If <ack-seq> exists, emergency tx is enabled.  Whenever ack is lagging
+ * by more than ACK_TIMEOUT, netconsole will repeatedly send out unacked
+ * messages with increasing interval.  See netconsole_ack_timer_fn().
  */
 static void netconsole_rx_work_fn(struct work_struct *work)
 {
@@ -335,6 +430,7 @@ static void netconsole_rx_work_fn(struct work_struct *work)
 	union sockaddr_in46 raddr;
 	struct msghdr msgh = { .msg_name = &raddr.addr, };
 	struct kvec iov;
+	bool ack_was_enabled = nt->ack_seq;
 	char *tx_buf = netconsole_ext_tx_buf;
 	char *rx_buf, *pos, *tok;
 	u64 seq;
@@ -355,6 +451,9 @@ repeat:
 	if (len < 0) {
 		if (len != -EAGAIN && printk_ratelimit())
 			pr_warning("RX failed err=%d\n", len);
+		if (!ack_was_enabled && nt->ack_seq)
+			pr_info("ACK timeout enabled for %pIS:%d\n",
+				&nt->raddr.addr, ntohs(nt->raddr.in4.sin_port));
 		kfree(rx_buf);
 		return;
 	}
@@ -381,12 +480,25 @@ repeat:
 
 	console_lock();
 
+	/* <ack-seq> */
+	if (sscanf(tok, "%llu", &seq)) {
+		u64 begin_seq, end_seq;
+
+		log_seq_range(&begin_seq, &end_seq);
+		if (seq && seq < end_seq) {
+			nt->ack_seq = max(seq, nt->ack_seq);
+			nt->emg_tx_intv = 0;
+			if (!ack_was_enabled)
+				mod_timer(&nt->ack_timer, jiffies);
+		}
+	}
+
 	/* <missing-seq>... */
 	while ((tok = strsep(&pos, " "))) {
 		if (sscanf(tok, "%llu", &seq)) {
 			len = ext_log_from_seq(tx_buf, CONSOLE_EXT_LOG_MAX, seq);
 			if (len >= 0)
-				send_ext_msg_udp(nt, tx_buf, len);
+				send_ext_msg_udp(nt, tx_buf, len, false);
 		}
 	}
 
@@ -480,6 +592,7 @@ static void netconsole_enable_rx(struct netconsole_target *nt)
 	init_poll_funcptr(&nt->rx_ptable, netconsole_rx_ptable_queue_fn);
 	init_waitqueue_func_entry(&nt->rx_wait, netconsole_rx_wait_fn);
 	INIT_WORK(&nt->rx_work, netconsole_rx_work_fn);
+	setup_timer(&nt->ack_timer, netconsole_ack_timer_fn, (unsigned long)nt);
 
 	ret = file->f_op->poll(file, &nt->rx_ptable);
 	if (ret < 0) {
@@ -501,6 +614,7 @@ static void netconsole_disable_rx(struct netconsole_target *nt)
 
 	remove_wait_queue(nt->rx_waitq, &nt->rx_wait);
 	cancel_work_sync(&nt->rx_work);
+	del_timer_sync(&nt->ack_timer);
 
 	fput(nt->rx_file);
 	nt->rx_file = NULL;
@@ -1148,9 +1262,13 @@ static void write_ext_msg(struct console *con, const char *msg,
 	if ((oops_only && !oops_in_progress) || list_empty(&target_list))
 		return;
 
-	list_for_each_entry(nt, &target_list, list)
-		if (nt->extended)
-			send_ext_msg_udp(nt, msg, len);
+	list_for_each_entry(nt, &target_list, list) {
+		if (nt->extended) {
+			send_ext_msg_udp(nt, msg, len, false);
+			if (!nt->ack_tstmp)
+				nt->ack_tstmp = jiffies ?: 1;
+		}
+	}
 }
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 15/16] netconsole: implement netconsole receiver library
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (13 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 14/16] netconsole: implement ack handling and emergency transmission Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-16 23:03 ` [PATCH 16/16] netconsole: update documentation for extended netconsole Tejun Heo
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

This patch implements libncrx.a and a simple receiver program ncrx
using the library.  The library makes use of the extended header,
retransmission and ack support to implement reliable netconsole
receiver.  The library is structured as a pure state machine leaving
all IO and timing handling to the library user and should be easy to
fit into any environment.

The reordering and retransmission mechanisms are fairly simple.  Each
receiver instance has a ring buffer where messages are slotted
according to their sequence number.  A hole in sequence numbers
trigger retransmission after the sequence progresses certain number of
slots or certain amount of duration has passed.

See tools/lib/netconsole/ncrx.h for information on usage.

Tested with simulated heavy packet loss (50%).  Messages are
transferred without loss and all the common scenarios including
reboots are handled correctly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 tools/Makefile                |  16 +-
 tools/lib/netconsole/Makefile |  36 ++
 tools/lib/netconsole/ncrx.c   | 906 ++++++++++++++++++++++++++++++++++++++++++
 tools/lib/netconsole/ncrx.h   | 204 ++++++++++
 tools/ncrx/Makefile           |  14 +
 tools/ncrx/ncrx.c             | 143 +++++++
 6 files changed, 1318 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/netconsole/Makefile
 create mode 100644 tools/lib/netconsole/ncrx.c
 create mode 100644 tools/lib/netconsole/ncrx.h
 create mode 100644 tools/ncrx/Makefile
 create mode 100644 tools/ncrx/ncrx.c

diff --git a/tools/Makefile b/tools/Makefile
index 9a617ad..9f588f7 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -9,6 +9,7 @@ help:
 	@echo '  firewire   - the userspace part of nosy, an IEEE-1394 traffic sniffer'
 	@echo '  hv         - tools used when in Hyper-V clients'
 	@echo '  lguest     - a minimal 32-bit x86 hypervisor'
+	@echo '  ncrx       - simple reliable netconsole logger'
 	@echo '  perf       - Linux performance measurement and analysis tool'
 	@echo '  selftests  - various kernel selftests'
 	@echo '  turbostat  - Intel CPU idle stats and freq reporting tool'
@@ -50,6 +51,12 @@ liblockdep: FORCE
 libapikfs: FORCE
 	$(call descend,lib/api)
 
+libncrx: FORCE
+	$(call descend,lib/netconsole)
+
+ncrx: libncrx FORCE
+	$(call descend,$@)
+
 perf: libapikfs FORCE
 	$(call descend,$@)
 
@@ -100,6 +107,12 @@ liblockdep_clean:
 libapikfs_clean:
 	$(call descend,lib/api,clean)
 
+libncrx_clean:
+	$(call descend,lib/netconsole,clean)
+
+ncrx_clean: libncrx_clean
+	$(call descend,$(@:_clean=),clean)
+
 perf_clean: libapikfs_clean
 	$(call descend,$(@:_clean=),clean)
 
@@ -114,6 +127,7 @@ tmon_clean:
 
 clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean lguest_clean \
 		perf_clean selftests_clean turbostat_clean usb_clean virtio_clean \
-		vm_clean net_clean x86_energy_perf_policy_clean tmon_clean
+		vm_clean net_clean x86_energy_perf_policy_clean tmon_clean \
+		ncrx_clean
 
 .PHONY: FORCE
diff --git a/tools/lib/netconsole/Makefile b/tools/lib/netconsole/Makefile
new file mode 100644
index 0000000..6bc7997
--- /dev/null
+++ b/tools/lib/netconsole/Makefile
@@ -0,0 +1,36 @@
+include ../../scripts/Makefile.include
+
+CC = $(CROSS_COMPILE)gcc
+AR = $(CROSS_COMPILE)ar
+
+# guard against environment variables
+LIB_H=
+LIB_OBJS=
+
+LIB_H += ncrx.h
+LIB_OBJS += $(OUTPUT)ncrx.o
+
+LIBFILE = libncrx.a
+
+CFLAGS = -g -Wall -O2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -fPIC
+ALL_CFLAGS = $(CFLAGS) $(BASIC_CFLAGS)
+ALL_LDFLAGS = $(LDFLAGS)
+
+RM = rm -f
+
+$(LIBFILE): $(LIB_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) rcs $(OUTPUT)$@ $(LIB_OBJS)
+
+$(LIB_OBJS): $(LIB_H)
+
+$(OUTPUT)%.o: %.c
+	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
+$(OUTPUT)%.s: %.c
+	$(QUIET_CC)$(CC) -S $(ALL_CFLAGS) $<
+$(OUTPUT)%.o: %.S
+	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
+
+clean:
+	$(call QUIET_CLEAN, libncrx) $(RM) $(LIB_OBJS) $(LIBFILE)
+
+.PHONY: clean
diff --git a/tools/lib/netconsole/ncrx.c b/tools/lib/netconsole/ncrx.c
new file mode 100644
index 0000000..8a87e6b
--- /dev/null
+++ b/tools/lib/netconsole/ncrx.c
@@ -0,0 +1,906 @@
+/*
+ * ncrx - extended netconsole receiver library
+ *
+ * Copyright (C) 2015		Facebook, Inc
+ * Copyright (C) 2015		Tejun Heo <tj@kernel.org>
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <time.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <netinet/udp.h>
+
+#include "ncrx.h"
+
+/* max payload len for responses, this is what netconsole uses on tx side */
+#define NCRX_RESP_MAX		1000
+/* oos history is tracked with a uint32_t */
+#define NCRX_OOS_MAX		32
+
+struct ncrx_msg_list {
+	struct ncrx_list	head;
+	int			nr;		/* number of msgs on the list */
+};
+
+struct ncrx_slot {
+	struct ncrx_msg		*msg;
+	uint64_t		timestamp;	/* last activity on this slot */
+	struct ncrx_list	hole_node;	/* anchored @ ncrx->hole_list */
+};
+
+struct ncrx {
+	struct ncrx_param	p;
+
+	uint64_t		now;		/* latest time in msecs */
+
+	int			head;		/* next slot to use */
+	int			tail;		/* last slot in use */
+	uint64_t		head_seq;	/* next expected seq, unset=0 */
+	struct ncrx_slot	*slots;		/* msg slots */
+	struct ncrx_list	hole_list;	/* missing or !complete slots */
+
+	uint32_t		oos_history;	/* bit history of oos msgs */
+	struct ncrx_msg_list	oos_list;	/* buffered oos msgs */
+
+	struct ncrx_msg_list	frag_list;	/* buffered printk frag msgs */
+
+	struct ncrx_msg_list	retired_list;	/* msgs to be fetched by user */
+
+	uint64_t		acked_seq;	/* last seq acked, unset=max */
+	uint64_t		acked_at;	/* and when */
+
+	/* response buffer for ncrx_response() */
+	char			resp_buf[NCRX_RESP_MAX + 1];
+	int			resp_len;
+};
+
+static const struct ncrx_param ncrx_dfl_param = {
+	.nr_slots		= NCRX_DFL_NR_SLOTS,
+
+	.ack_intv		= NCRX_DFL_ACK_INTV,
+	.retx_intv		= NCRX_DFL_RETX_INTV,
+	.retx_stride		= NCRX_DFL_RETX_STRIDE,
+	.msg_timeout		= NCRX_DFL_MSG_TIMEOUT,
+
+	.oos_thr		= NCRX_DFL_OOS_THR,
+	.oos_intv		= NCRX_DFL_OOS_INTV,
+	.oos_timeout		= NCRX_DFL_OOS_TIMEOUT,
+
+	.frag_max		= NCRX_DFL_FRAG_MAX,
+	.frag_timeout		= NCRX_DFL_FRAG_TIMEOUT,
+};
+
+/* utilities mostly stolen from kernel */
+#define min(x, y) ({							\
+	typeof(x) _min1 = (x);						\
+	typeof(y) _min2 = (y);						\
+	(void) (&_min1 == &_min2);					\
+	_min1 < _min2 ? _min1 : _min2; })
+
+#define max(x, y) ({							\
+	typeof(x) _max1 = (x);						\
+	typeof(y) _max2 = (y);						\
+	(void) (&_max1 == &_max2);					\
+	_max1 > _max2 ? _max1 : _max2; })
+
+#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+
+#define container_of(ptr, type, member) ({				\
+	const typeof( ((type *)0)->member ) *__mptr = (ptr);		\
+	(type *)( (char *)__mptr - offsetof(type,member) );})
+
+/* ncrx_msg from its ->node */
+#define node_to_msg(ptr)	container_of(ptr, struct ncrx_msg, node)
+
+/* iterate msg_list */
+#define msg_list_for_each(pos, n, list)					\
+	for (pos = node_to_msg((list)->head.next),			\
+		     n = node_to_msg(pos->node.next);			\
+	     &pos->node != &(list)->head;				\
+	     pos = n, n = node_to_msg(pos->node.next))
+
+/* ncrx_slot from its ->hole_node */
+#define hole_to_slot(ptr)						\
+	container_of(ptr, struct ncrx_slot, hole_node)
+
+/* iterate hole_list */
+#define hole_list_for_each(pos, n, list)				\
+	for (pos = hole_to_slot((list)->next),				\
+		     n = hole_to_slot(pos->hole_node.next);		\
+	     &pos->hole_node != (list);					\
+	     pos = n, n = hole_to_slot(pos->hole_node.next))
+
+static unsigned int hweight32(uint32_t w)
+{
+	w -= (w >> 1) & 0x55555555;
+	w =  (w & 0x33333333) + ((w >> 2) & 0x33333333);
+	w =  (w + (w >> 4)) & 0x0f0f0f0f;
+	return (w * 0x01010101) >> 24;
+}
+
+static void init_list(struct ncrx_list *head)
+{
+	head->next = head;
+	head->prev = head;
+}
+
+static int list_empty(struct ncrx_list *head)
+{
+	return head->next == head;
+}
+
+static void list_del(struct ncrx_list *head)
+{
+	struct ncrx_list *prev = head->prev;
+	struct ncrx_list *next = head->next;
+
+	prev->next = next;
+	next->prev = prev;
+	init_list(head);
+}
+
+static void list_append(struct ncrx_list *node, struct ncrx_list *list)
+{
+	struct ncrx_list *prev = list->prev;
+
+	assert(node->next == node && node->prev == node);
+
+	node->next = list;
+	node->prev = prev;
+	prev->next = node;
+	list->prev = node;
+}
+
+static void msg_list_del(struct ncrx_msg *msg, struct ncrx_msg_list *list)
+{
+	list_del(&msg->node);
+	list->nr--;
+
+	if (!list->nr)
+		assert(list->head.next == &list->head &&
+		       list->head.prev == &list->head);
+}
+
+static void msg_list_append(struct ncrx_msg *msg, struct ncrx_msg_list *list)
+{
+	list_append(&msg->node, &list->head);
+	list->nr++;
+}
+
+static struct ncrx_msg *msg_list_peek(struct ncrx_msg_list *list)
+{
+	if (list_empty(&list->head))
+		return NULL;
+	return node_to_msg(list->head.next);
+}
+
+static struct ncrx_msg *msg_list_pop(struct ncrx_msg_list *list)
+{
+	struct ncrx_msg *msg;
+
+	msg = msg_list_peek(list);
+	if (msg)
+		msg_list_del(msg, list);
+	return msg;
+}
+
+/*
+ * Parse @payload into @msg.  The data is not copied into @msg's buffer.
+ * @msg->text and ->dict are updated to point into @payload instead.
+ */
+static int parse_packet(const char *payload, struct ncrx_msg *msg)
+{
+	char buf[1024];
+	char *p, *tok;
+	int idx;
+
+	memset(msg, 0, sizeof(*msg));
+
+	p = strchr(payload, ';');
+	if (!p || p - payload >= sizeof(buf))
+		goto einval;
+	memcpy(buf, payload, p - payload);
+	buf[p - payload] = '\0';
+
+	msg->text = p + 1;
+	msg->text_len = strlen(msg->text);
+	if (msg->text_len > NCRX_LINE_MAX)
+		msg->text_len = NCRX_LINE_MAX;
+
+	/* <level>,<sequnum>,<timestamp>,<contflag>[,KEY=VAL]* */
+	idx = 0;
+	p = buf;
+	while ((tok = strsep(&p, ","))) {
+		char *endp, *key, *val;
+		unsigned long long v;
+
+		switch (idx++) {
+		case 0:
+			v = strtoul(tok, &endp, 0);
+			if (*endp != '\0' || v > UINT8_MAX)
+				goto einval;
+			msg->facility = v >> 3;
+			msg->level = v & ((1 << 3) - 1);
+			continue;
+		case 1:
+			v = strtoull(tok, &endp, 0);
+			if (*endp != '\0') {
+				if (tok[0] == '-' && tok[1] == '\0')
+					msg->frag = 1;
+				else
+					goto einval;
+			} else {
+				msg->seq = v;
+			}
+			continue;
+		case 2:
+			v = strtoull(tok, &endp, 0);
+			if (*endp != '\0')
+				goto einval;
+			msg->ts_usec = v;
+			continue;
+		case 3:
+			if (tok[0] == 'c')
+				msg->cont_start = 1;
+			else if (tok[0] == '+')
+				msg->cont = 1;
+			continue;
+		}
+
+		val = tok;
+		key = strsep(&val, "=");
+		if (!val)
+			continue;
+		if (!strcmp(key, "ncfrag")) {
+			unsigned nf_off, nf_idx, nf_nr;
+
+			if (sscanf(val, "%u@%u/%u",
+				   &nf_off, &nf_idx, &nf_nr) != 3)
+				goto einval;
+			if (nf_off + msg->text_len >= NCRX_LINE_MAX ||
+			    nf_idx >= nf_nr || nf_nr > 32)
+				goto einval;
+
+			msg->ncfrag_bitmap = UINT32_MAX << nf_nr;
+			msg->ncfrag_bitmap |= 1 << nf_idx;
+
+			msg->ncfrag_off = nf_off;
+		} else if (!strcmp(key, "fragid")) {
+			v = strtoull(val, &endp, 0);
+			if (*endp != '\0')
+				goto einval;
+
+			msg->has_fragid = 1;
+			msg->fragid = v;
+		} else if (!strcmp(key, "emg")) {
+			v = strtoul(val, &endp, 0);
+			if (*endp != '\0')
+				goto einval;
+			msg->emg = v;
+		}
+	}
+	return 0;
+einval:
+	errno = EINVAL;
+	return -1;
+}
+
+/* how far @idx is behind @ncrx->head */
+static int slot_dist(int idx, struct ncrx *ncrx)
+{
+	int dist = ncrx->head - idx;
+	return dist >= 0 ? dist : dist + ncrx->p.nr_slots;
+}
+
+/* number of occupied slots */
+static int nr_queued(struct ncrx *ncrx)
+{
+	return slot_dist(ncrx->tail, ncrx);
+}
+
+/* seq of the last queued message */
+static uint64_t tail_seq(struct ncrx *ncrx)
+{
+	return ncrx->head_seq - nr_queued(ncrx);
+}
+
+/* slot index of a message with sequence number @ncrx->head_seq + @delta */
+static int seq_delta_idx(struct ncrx *ncrx, int delta)
+{
+	int idx = ncrx->head + delta;
+
+	if (idx < 0)
+		return idx + ncrx->p.nr_slots;
+	else if (idx >= ncrx->p.nr_slots)
+		return idx - ncrx->p.nr_slots;
+	else
+		return idx;
+}
+
+/* is @slot completely empty? */
+static int slot_is_free(struct ncrx_slot *slot)
+{
+	return !slot->msg && list_empty(&slot->hole_node);
+}
+
+/* @slot may have just been completed, if so, remove it from hole_list */
+static void slot_maybe_complete(struct ncrx_slot *slot)
+{
+	struct ncrx_msg *msg = slot->msg;
+
+	if (!msg || msg->ncfrag_bitmap || list_empty(&slot->hole_node))
+		return;
+
+	list_del(&slot->hole_node);
+
+	msg->dict = strchr(msg->text, '\n');
+	if (msg->dict) {
+		int len = msg->text_len;
+		msg->text_len = msg->dict - msg->text;
+		msg->text[msg->text_len] = '\0';
+		msg->dict_len = len - msg->text_len - 1;
+		msg->dict++;
+	}
+}
+
+/* retire the last queued slot whether complete or not */
+static void retire_tail(struct ncrx *ncrx)
+{
+	int ntail = (ncrx->tail + 1) % ncrx->p.nr_slots;
+	struct ncrx_slot *slot = &ncrx->slots[ncrx->tail];
+	struct ncrx_slot *nslot = &ncrx->slots[ntail];
+
+	if (slot->msg) {
+		msg_list_append(slot->msg, &ncrx->retired_list);
+		slot->msg = NULL;
+	}
+
+	list_del(&slot->hole_node);	/* free slot is never a hole */
+	ncrx->tail = ntail;
+	/*
+	 * Activities of past msgs are considered activities for newer ones
+	 * too.  This prevents oos interval verdicts from flipping as
+	 * sequence progresses.
+	 */
+	nslot->timestamp = max(slot->timestamp, nslot->timestamp);
+}
+
+/* make room for message with seq ncrx->head_seq + @delta */
+static void make_room(struct ncrx *ncrx, int delta)
+{
+	int i;
+
+	/* head_seq is for the next msg, need to advance for 0 @delta too */
+	for (i = 0; i <= delta; i++) {
+		struct ncrx_slot *slot;
+		int max_busy = ncrx->p.nr_slots - ncrx->p.retx_intv;
+
+		/* a new slot is considered hole until it gets completed */
+		slot = &ncrx->slots[ncrx->head];
+		assert(slot_is_free(slot));
+		list_append(&slot->hole_node, &ncrx->hole_list);
+		slot->timestamp = ncrx->now;
+
+		/*
+		 * Wind the ring buffer and push out if overflowed.  Always
+		 * keep at least one stride empty so that retransmissions
+		 * of expired slots don't count as oos.
+		 */
+		ncrx->head_seq++;
+		ncrx->head = (ncrx->head + 1) % ncrx->p.nr_slots;
+		slot = &ncrx->slots[ncrx->head];
+		if (slot_dist(ncrx->tail, ncrx) > max_busy)
+			retire_tail(ncrx);
+	}
+}
+
+/*
+ * Get slot for @tmsg.  On success, returns pointer to the slot which may
+ * be free or occupied with partial or complete message.  Returns NULL with
+ * errno set to ERANGE if oos, NULL / ENOENT if already retired.
+ */
+static struct ncrx_slot *get_seq_slot(struct ncrx_msg *tmsg, struct ncrx *ncrx)
+{
+	struct ncrx_slot *slot;
+	int64_t delta;
+	int idx;
+
+	/* new seq stream */
+	if (!ncrx->head_seq) {
+		ncrx->head_seq = tmsg->seq;
+		ncrx->acked_seq = UINT64_MAX;
+		tmsg->seq_reset = 1;
+	}
+
+	delta = tmsg->seq - ncrx->head_seq;
+
+	/*
+	 * Consider oos if outside reorder window or if the slot is
+	 * complete and the last activity on it was more than oos_intv ago.
+	 * Emergency messages are never considered oos as they don't follow
+	 * the usual transmission pattern and may repeat indefinitely.
+	 */
+	if (-delta > ncrx->p.nr_slots || delta > ncrx->p.nr_slots) {
+		errno = ERANGE;
+		return NULL;
+	}
+
+	idx = seq_delta_idx(ncrx, delta);
+	slot = &ncrx->slots[idx];
+
+	if (-delta > nr_queued(ncrx)) {
+		int is_free = slot_is_free(slot);
+
+		if (!tmsg->emg &&
+		    (!is_free ||
+		     slot->timestamp + ncrx->p.oos_intv < ncrx->now)) {
+			errno = ERANGE;
+			return NULL;
+		}
+
+		if (is_free)
+			slot->timestamp = ncrx->now;
+		errno = ENOENT;
+		return NULL;
+	}
+
+	make_room(ncrx, delta);
+	slot->timestamp = ncrx->now;
+
+	return slot;
+}
+
+/* make @src's copy, if @src is a fragment, allocate full size as it may grow */
+static struct ncrx_msg *copy_msg(struct ncrx_msg *src)
+{
+	struct ncrx_msg *dst;
+	int data_len;
+
+	assert(!src->dict && !src->dict_len);
+
+	if (src->frag || src->ncfrag_bitmap)
+		data_len = NCRX_RESP_MAX;
+	else
+		data_len = src->text_len;
+
+	dst = malloc(sizeof(*dst) + data_len + 1);
+	if (!dst)
+		return NULL;
+
+	*dst = *src;
+	init_list(&dst->node);
+
+	dst->text = dst->buf;
+	memcpy(dst->text, src->text, src->text_len);
+	dst->text[dst->text_len] = '\0';
+
+	return dst;
+}
+
+/*
+ * @tmsg is a newly parsed msg which has printk fragment information.  If
+ * it's a fragment, queue it on @ncrx->frag_list until it either gets shot
+ * down by the matching complete msg, times out or is pushed out by other
+ * frag msgs.  If a complete message, shoot down the matching frags.
+ */
+static int queue_frag_msg(struct ncrx_msg *tmsg, struct ncrx *ncrx)
+{
+	struct ncrx_msg *match = NULL;
+	struct ncrx_msg *msg, *nmsg;
+
+	if (!tmsg->has_fragid) {
+		errno = EINVAL;
+		return -1;
+	}
+
+	/* look for a match */
+	msg_list_for_each(msg, nmsg, &ncrx->frag_list) {
+		if (msg->fragid == tmsg->fragid) {
+			match = msg;
+			break;
+		}
+	}
+
+	/* @tmsg is a complete one, shoot down the match */
+	if (!tmsg->frag) {
+		if (match) {
+			msg_list_del(msg, &ncrx->frag_list);
+			free(msg);
+		}
+		return 0;
+	}
+
+	/* @tmsg is a frag and there's a match, append new to the old */
+	if (match) {
+		int len = msg->text_len;
+		int left = NCRX_LINE_MAX - len;
+		int new_len = tmsg->text_len;
+
+		if (new_len > left)
+			new_len = left;
+
+		memcpy(msg->text + len, tmsg->text, new_len);
+		msg->text_len += new_len;
+		msg->text[msg->text_len] = '\0';
+		return 0;
+	}
+
+	/* @tmsg is a new frag, allocate, queue and overflow */
+	msg = copy_msg(tmsg);
+	if (!msg)
+		return -1;
+
+	msg_list_append(msg, &ncrx->frag_list);
+
+	if (ncrx->frag_list.nr > ncrx->p.frag_max) {
+		msg = msg_list_pop(&ncrx->frag_list);
+		msg_list_append(msg, &ncrx->retired_list);
+	}
+
+	return 0;
+}
+
+/*
+ * @tmsg is a newly parsed msg which is out-of-sequence.  Queue it on
+ * @ncrx->oos_list until the message times out, gets pushed out by other
+ * oos messages or the sequence stream gets reset.
+ */
+static int queue_oos_msg(struct ncrx_msg *tmsg, struct ncrx *ncrx)
+{
+	struct ncrx_slot *slot;
+	struct ncrx_msg *msg, *nmsg, *first;
+
+	msg = copy_msg(tmsg);
+	if (!msg)
+		return -1;
+
+	msg_list_append(msg, &ncrx->oos_list);
+
+	/*
+	 * Shifted left automatically on each new msg.  Set oos and see if
+	 * there have been too many oos among the last 32 messages.
+	 */
+	ncrx->oos_history |= 1;
+	if (hweight32(ncrx->oos_history) < ncrx->p.oos_thr) {
+		/* nope, handle oos overflow and handle */
+		if (ncrx->oos_list.nr > NCRX_OOS_MAX) {
+			msg = msg_list_pop(&ncrx->oos_list);
+			msg->oos = 1;
+			msg_list_append(msg, &ncrx->retired_list);
+		}
+		return 0;
+	}
+
+	/*
+	 * The current sequence stream seems no good.  Let's reset by
+	 * retiring all pending, picking the oos msg with the lowest seq,
+	 * queueing it to reset the seq and then queueing all other oos
+	 * msgs.  If a msg is still oos after reset, just retire it.
+	 */
+	while (ncrx->tail != ncrx->head) {
+		slot = &ncrx->slots[ncrx->tail];
+		retire_tail(ncrx);
+	}
+
+	ncrx->head_seq = 0;
+	ncrx->acked_seq = UINT64_MAX;
+
+	first = node_to_msg(ncrx->oos_list.head.next);
+	msg_list_for_each(msg, nmsg, &ncrx->oos_list)
+		first = msg->seq < first->seq ? msg : first;
+
+	msg_list_del(first, &ncrx->oos_list);
+	slot = get_seq_slot(first, ncrx);
+	slot->msg = first;
+	slot_maybe_complete(slot);
+
+	while ((msg = msg_list_pop(&ncrx->oos_list))) {
+		slot = get_seq_slot(msg, ncrx);
+		if (slot) {
+			slot->msg = msg;
+			slot_maybe_complete(slot);
+		} else {
+			msg->oos = 1;
+			msg_list_append(msg, &ncrx->retired_list);
+		}
+	}
+
+	return 0;
+}
+
+/* @payload has just been received, parse and queue it */
+static int ncrx_queue_payload(const char *payload, struct ncrx *ncrx)
+{
+	struct ncrx_msg tmsg;
+	struct ncrx_slot *slot;
+	int new_msg = 0;
+
+	if (parse_packet(payload, &tmsg))
+		return -1;
+
+	tmsg.rx_at = ncrx->now;
+	ncrx->oos_history <<= 1;
+
+	/* ack immediately if logging source is doing emergency transmissions */
+	if (tmsg.emg) {
+		ncrx->acked_seq = UINT64_MAX;
+		ncrx->acked_at = 0;
+	}
+
+	/* handle printk frags, note that this is different from ncfrags */
+	if (tmsg.has_fragid || tmsg.frag) {
+		if (queue_frag_msg(&tmsg, ncrx))
+			return -1;
+		/* if @tmsg is frag, there's nothing more to do */
+		if (tmsg.frag)
+			return 0;
+	}
+
+	/* get the matching slot and allocate a new message if empty */
+	slot = get_seq_slot(&tmsg, ncrx);
+	if (slot && !slot->msg) {
+		slot->msg = copy_msg(&tmsg);
+		new_msg = 1;
+	}
+	if (!slot || !slot->msg) {
+		if (errno == ENOENT)
+			return 0;
+		if (errno == ERANGE)
+			return queue_oos_msg(&tmsg, ncrx);
+		return -1;
+	}
+
+	if (!new_msg && slot->msg->ncfrag_bitmap) {
+		struct ncrx_msg *msg = slot->msg;
+		int off = tmsg.ncfrag_off;
+
+		/* ncfrags, assemble in the slot */
+		msg->ncfrag_bitmap |= tmsg.ncfrag_bitmap;
+		memcpy(msg->text + off, tmsg.text, tmsg.text_len);
+		if (msg->text_len < off + tmsg.text_len)
+			msg->text_len = off + tmsg.text_len;
+		msg->text[msg->text_len] = '\0';
+
+		if (msg->ncfrag_bitmap == UINT32_MAX)
+			msg->ncfrag_bitmap = 0;
+	}
+
+	slot_maybe_complete(slot);
+
+	return 0;
+}
+
+/*
+ * Build ncrx_response() output.  Ack for the last retired msg is always
+ * added.  If @slot is non-NULL, re-transmission for it is also added.
+ */
+static void ncrx_build_resp(struct ncrx_slot *slot, struct ncrx *ncrx)
+{
+	/* no msg received? */
+	if (!ncrx->head_seq)
+		return;
+
+	/* "nca<ack-seq>" */
+	if (!ncrx->resp_len) {
+		ncrx->acked_seq = tail_seq(ncrx) - 1;
+		ncrx->acked_at = ncrx->now;
+
+		ncrx->resp_len = snprintf(ncrx->resp_buf, NCRX_RESP_MAX,
+					  "nca%"PRIu64, ncrx->acked_seq);
+	}
+
+	/* " <missing-seq>..." truncated to NCRX_RESP_MAX */
+	if (slot) {
+		int idx = slot - ncrx->slots;
+		int len;
+
+		len = snprintf(ncrx->resp_buf + ncrx->resp_len,
+			       NCRX_RESP_MAX - ncrx->resp_len, " %"PRIu64,
+			       ncrx->head_seq - slot_dist(idx, ncrx));
+		if (ncrx->resp_len + len <= NCRX_RESP_MAX) {
+			ncrx->resp_len += len;
+			ncrx->resp_buf[ncrx->resp_len] = '\0';
+		}
+	}
+}
+
+int ncrx_process(const char *payload, uint64_t now, struct ncrx *ncrx)
+{
+	struct ncrx_slot *slot, *tmp_slot;
+	struct ncrx_msg *msg;
+	uint64_t old_head_seq = ncrx->head_seq;
+	int dist_retx, ret = 0;
+
+	if (now < ncrx->now)
+		fprintf(stderr, "ncrx: time regressed %"PRIu64"->%"PRIu64"\n",
+			ncrx->now, now);
+
+	ncrx->now = now;
+	ncrx->resp_len = 0;
+
+	/*
+	 * If fully acked, keep last ack timestamp current so that new
+	 * messages arriving doesn't trigger ack timeout immediately.
+	 */
+	if (ncrx->acked_seq == tail_seq(ncrx) - 1)
+		ncrx->acked_at = now;
+
+	/* parse and queue @payload */
+	if (payload)
+		ret = ncrx_queue_payload(payload, ncrx);
+
+	/* retire complete & timed-out msgs from tail */
+	while (ncrx->tail != ncrx->head) {
+		struct ncrx_slot *slot = &ncrx->slots[ncrx->tail];
+
+		if ((!slot->msg || !list_empty(&slot->hole_node)) &&
+		    slot->timestamp + ncrx->p.msg_timeout >= now)
+			break;
+		retire_tail(ncrx);
+	}
+
+	/* retire timed-out oos msgs */
+	while ((msg = msg_list_peek(&ncrx->oos_list))) {
+		if (msg->rx_at + ncrx->p.oos_timeout >= now)
+			break;
+		msg->oos = 1;
+		msg_list_del(msg, &ncrx->oos_list);
+		msg_list_append(msg, &ncrx->retired_list);
+	}
+
+	/* retire timed-out frag msgs */
+	while ((msg = msg_list_peek(&ncrx->frag_list))) {
+		if (msg->rx_at + ncrx->p.frag_timeout >= now)
+			break;
+		msg_list_del(msg, &ncrx->frag_list);
+		msg_list_append(msg, &ncrx->retired_list);
+	}
+
+	/* ack pending and timeout expired? */
+	if (ncrx->acked_seq != tail_seq(ncrx) - 1 &&
+	    ncrx->acked_at + ncrx->p.ack_intv < now)
+		ncrx_build_resp(NULL, ncrx);
+
+	/* head passed one or more re-transmission boundaries? */
+	dist_retx = old_head_seq / ncrx->p.retx_stride !=
+		ncrx->head_seq / ncrx->p.retx_stride;
+
+	hole_list_for_each(slot, tmp_slot, &ncrx->hole_list) {
+		int retx = 0;
+
+		/*
+		 * If so, request re-tx of holes further away than stride.
+		 * This ensures that a missing seq is requested at least
+		 * certain number of times regardless of incoming rate.
+		 */
+		if (dist_retx &&
+		    slot_dist(slot - ncrx->slots, ncrx) > ncrx->p.retx_stride)
+			retx = 1;
+
+		/* request re-tx every retx_intv */
+		if (now - slot->timestamp >= ncrx->p.retx_intv) {
+			slot->timestamp = now;
+			retx = 1;
+		}
+
+		if (retx)
+			ncrx_build_resp(slot, ncrx);
+	}
+
+	return ret;
+}
+
+const char *ncrx_response(struct ncrx *ncrx, int *lenp)
+{
+	if (lenp)
+		*lenp = ncrx->resp_len;
+	if (ncrx->resp_len)
+		return ncrx->resp_buf;
+	return NULL;
+}
+
+struct ncrx_msg *ncrx_next_msg(struct ncrx *ncrx)
+{
+	return msg_list_pop(&ncrx->retired_list);
+}
+
+uint64_t ncrx_invoke_process_at(struct ncrx *ncrx)
+{
+	uint64_t when = UINT64_MAX;
+	struct ncrx_msg *msg;
+
+	/* ack pending? */
+	if (ncrx->head_seq && ncrx->acked_seq != tail_seq(ncrx) - 1)
+		when = min(when, ncrx->acked_at + ncrx->p.ack_intv);
+
+	/*
+	 * Holes to request for retransmission?  msg_timeout is the same
+	 * condition but way longer.  Checking on retx_intv is enough.
+	 */
+	if (!list_empty(&ncrx->hole_list))
+		when = min(when, ncrx->now + ncrx->p.retx_intv);
+
+	/* oos timeout */
+	if ((msg = msg_list_peek(&ncrx->oos_list)))
+		when = min(when, msg->rx_at + ncrx->p.oos_timeout);
+
+	/* frag timeout */
+	if ((msg = msg_list_peek(&ncrx->frag_list)))
+		when = min(when, msg->rx_at + ncrx->p.frag_timeout);
+
+	/* min 10ms intv to avoid busy loop in case something goes bonkers */
+	return max(when, ncrx->now + 10);
+}
+
+struct ncrx *ncrx_create(const struct ncrx_param *param)
+{
+	const struct ncrx_param *dfl = &ncrx_dfl_param;
+	struct ncrx_param *p;
+	struct ncrx *ncrx;
+	int i;
+
+	ncrx = calloc(1, sizeof(*ncrx));
+	if (!ncrx)
+		return NULL;
+
+	p = &ncrx->p;
+	if (param) {
+		p->nr_slots	= param->nr_slots	?: dfl->nr_slots;
+
+		p->ack_intv	= param->ack_intv	?: dfl->ack_intv;
+		p->retx_intv	= param->retx_intv	?: dfl->retx_intv;
+		p->retx_stride	= param->retx_stride	?: dfl->retx_stride;
+		p->msg_timeout	= param->msg_timeout	?: dfl->msg_timeout;
+
+		p->oos_thr	= param->oos_thr	?: dfl->oos_thr;
+		p->oos_intv	= param->oos_intv	?: dfl->oos_intv;
+		p->oos_timeout	= param->oos_timeout	?: dfl->oos_timeout;
+
+		p->frag_max	= param->frag_max	?: dfl->frag_max;
+		p->frag_timeout	= param->frag_timeout	?: dfl->frag_timeout;
+	} else {
+		*p = *dfl;
+	}
+
+	ncrx->acked_seq = UINT64_MAX;
+	init_list(&ncrx->hole_list);
+	init_list(&ncrx->oos_list.head);
+	init_list(&ncrx->frag_list.head);
+	init_list(&ncrx->retired_list.head);
+
+	ncrx->slots = calloc(ncrx->p.nr_slots, sizeof(ncrx->slots[0]));
+	if (!ncrx->slots)
+		return NULL;
+
+	for (i = 0; i < ncrx->p.nr_slots; i++)
+		init_list(&ncrx->slots[i].hole_node);
+
+	return ncrx;
+}
+
+void ncrx_destroy(struct ncrx *ncrx)
+{
+	struct ncrx_msg *msg;
+	int i;
+
+	for (i = 0; i < ncrx->p.nr_slots; i++)
+		free(ncrx->slots[i].msg);
+
+	while ((msg = msg_list_pop(&ncrx->oos_list)))
+		free(msg);
+
+	while ((msg = msg_list_pop(&ncrx->frag_list)))
+		free(msg);
+
+	while ((msg = msg_list_pop(&ncrx->retired_list)))
+		free(msg);
+
+	free(ncrx->slots);
+	free(ncrx);
+}
diff --git a/tools/lib/netconsole/ncrx.h b/tools/lib/netconsole/ncrx.h
new file mode 100644
index 0000000..a0e1652
--- /dev/null
+++ b/tools/lib/netconsole/ncrx.h
@@ -0,0 +1,204 @@
+/*
+ * ncrx - extended netconsole receiver library
+ *
+ * Copyright (C) 2015		Facebook, Inc
+ * Copyright (C) 2015		Tejun Heo <tj@kernel.org>
+ */
+#ifndef __NETCONSOLE_NCRX__
+#define __NETCONSOLE_NCRX__
+
+#include <inttypes.h>
+
+#define NCRX_LINE_MAX		8192
+
+struct ncrx_list {
+	struct ncrx_list	*next;
+	struct ncrx_list	*prev;
+};
+
+/*
+ * ncrx_msg represents a single log message and what gets returned from
+ * ncrx_next_msg().  Most of the public fields are self-explanatory except
+ * for the followings.
+ *
+ * frag
+ *	The message is partial fragment of multi-part messages printed
+ *	using KERN_CONT.  Usually, ncrx keeps these fragments until the
+ *	full copy is available and then discards them; however, if the log
+ *	source fails to send the completed copy for some reason, the
+ *	fragments are pushed out as are.
+ *
+ * oos
+ *	The message's sequence number doesn't match up with the current
+ *	message stream.  Could be from a foreign source or corrupt.  Ignore
+ *	when counting missing messages.
+ *
+ * seq_reset
+ *	The sequence number stream has jumped.  This usually happens when
+ *	the log source reboots.  The first message returned after ncrx
+ *	initialization always has this flag set.
+ */
+struct ncrx_msg {
+	/* public fields */
+	uint64_t		seq;		/* printk sequence number */
+	uint64_t		ts_usec;	/* printk timestamp in usec */
+	char			*text;		/* message body */
+	char			*dict;		/* optional dictionary */
+	int			text_len;	/* message body length */
+	int			dict_len;	/* dictionary length */
+
+	uint8_t			facility;	/* log facility */
+	uint8_t			level;		/* printk level */
+	unsigned		cont_start:1;	/* first of continued msgs */
+	unsigned		cont:1;		/* continuation of prev msg */
+	unsigned		frag:1;		/* fragment, no seq assigned */
+	unsigned		oos:1;		/* sequence out-of-order */
+	unsigned		seq_reset:1;	/* sequence reset */
+
+	/* private fields */
+	struct ncrx_list	node;
+	uint64_t		rx_at;		/* rx timestamp in msec */
+	uint32_t		ncfrag_bitmap;	/* netconsole frag map */
+	int			ncfrag_off;	/* netconsole frag offset */
+	uint64_t		fragid;		/* printk frag ID */
+
+	unsigned		has_fragid:1;	/* fragid valid */
+	unsigned		emg:1;		/* emergency transmission */
+
+	char			buf[];
+};
+
+/*
+ * ncrx paramters.  Specify NULL to use defaults for all.  Specify 0 to use
+ * deafult for individual parameters.  All time periods are in millisecs.
+ *
+ * nr_slots
+ *	The number of reorder slots.  This bounds the maximum memory which
+ *	may be consumed by the ncrx instance.  Lowering this number
+ *	increases the chance of the ordering window passing by a missing
+ *	message before it can be obtained leading to missed messages.
+ *
+ * ack_intv
+ *	A received message is acked after this period.  Transmission side
+ *	ack timeout is 10s and this should be shorter than that.
+ *
+ * retx_intv
+ *	Retransmission request is sent and repeated every this period.
+ *
+ * retx_stride
+ *	A missing message generates retransmission request whenever it gets
+ *	pushed back this number of slots by newly arriving message.
+ *
+ * msg_timeout
+ *	A missing message expires after this period and the sequence number
+ *	will be skipped in the output.
+ *
+ * oos_thr
+ *	Among last 32 message, if more than this number of messages are
+ *	out-of-order, the message stream is reset.
+ *
+ * oos_intv
+ *	A message is considered out-of-sequence only if the last message
+ *	received with the sequence number is older than this.
+ *
+ * oos_timeout
+ *	If sequence is not reset in this period after reception of an
+ *	out-of-order message, the message is output.
+ *
+ * frag_max
+ *	The maximum number of fragments to buffer.
+ *
+ * frag_timeout
+ *	If a fragment doesn't get shot down by reception of its matching
+ *	full message in this period, the fragment is output.
+ */
+struct ncrx_param {
+	int			nr_slots;
+
+	int			ack_intv;
+	int			retx_intv;
+	int			retx_stride;
+	int			msg_timeout;
+
+	int			oos_thr;
+	int			oos_intv;
+	int			oos_timeout;
+
+	int			frag_max;
+	int			frag_timeout;
+};
+
+/* default params */
+#define NCRX_DFL_NR_SLOTS	8192
+
+#define NCRX_DFL_ACK_INTV	5000
+#define NCRX_DFL_RETX_INTV	1000
+#define NCRX_DFL_RETX_STRIDE	256
+#define NCRX_DFL_MSG_TIMEOUT	30000
+
+#define NCRX_DFL_OOS_THR	(32 * 3 / 5)			/* 19 */
+#define NCRX_DFL_OOS_INTV	5000
+#define NCRX_DFL_OOS_TIMEOUT	NCRX_DFL_MSG_TIMEOUT
+
+#define NCRX_DFL_FRAG_MAX	32
+#define NCRX_DFL_FRAG_TIMEOUT	NCRX_DFL_MSG_TIMEOUT
+
+/*
+ * A ncrx instance is created by ncrx_create() and destroyed by
+ * ncrx_destroy().  All accesses to a given instance must be serialized;
+ * however, a process may create any number of instances and use them
+ * concurrently.
+ */
+struct ncrx;
+
+struct ncrx *ncrx_create(const struct ncrx_param *param);
+void ncrx_destroy(struct ncrx *ncrx);
+
+/*
+ * A ncrx instance doesn't do any IO or blocking.  It's just a state
+ * machine that the user can feed data into and get the results out of.
+ *
+ * ncrx_process()
+ *	Process @payload of a packet.  @now is the current time in msecs.
+ *	The origin doesn't matter as long as it's monotonously increasing.
+ *	@payload may be NULL.  See ncrx_invoke_process_at().
+ *
+ *	Returns 0 on success.  1 on failure with errno set.  EINVAL
+ *	indicates that @payload is not a valid extended netconsole message.
+ *
+ * ncrx_response()
+ *	The response to send to log source.  If the user calls this
+ *	function after each ncrx_process() invocation and sends back the
+ *	output, re- and emergency transmissions are activated increasing
+ *	the reliability especially if the network is flaky.  If not, ncrx
+ *	will passively reorder and assemble messages.
+ *
+ *	Returns pointer to '\0' terminated response string or NULL if
+ *	there's nothing to send back.  If @lenp is not NULL, *@lenp is set
+ *	to the length of the response string.
+ *
+ * ncrx_next_msg()
+ *	Fetches the next completed message.  Call repeatedly until NULL is
+ *	returned after each ncrx_process() invocation.  Each message should
+ *	be free()'d by the user after consumption.
+ *
+ * ncrx_invoke_process_at()
+ *	Message processing is timing dependent and ncrx often needs to take
+ *	actions after a certain time period even when there hasn't been any
+ *	new packets.  This function indicates when the caller should invoke
+ *	ncrx_process() at the latest.
+ *
+ *	The returned time is relative to @now previously provided to
+ *	ncrx_process().  e.g. if ncrx_process() needs to be invoked after 4
+ *	seconds since the last invocation where @now was 60000, this
+ *	function will return 64000.  Returns UINT64_MAX if there's no
+ *	pending timing dependent operation.
+ *
+ * See tools/ncrx/ncrx.c for a simple example.
+ */
+int ncrx_process(const char *payload, uint64_t now, struct ncrx *ncrx);
+const char *ncrx_response(struct ncrx *ncrx, int *lenp);
+struct ncrx_msg *ncrx_next_msg(struct ncrx *ncrx);
+uint64_t ncrx_invoke_process_at(struct ncrx *ncrx);
+
+#endif	/* __NETCONSOLE_NCRX__ */
diff --git a/tools/ncrx/Makefile b/tools/ncrx/Makefile
new file mode 100644
index 0000000..f3e9859
--- /dev/null
+++ b/tools/ncrx/Makefile
@@ -0,0 +1,14 @@
+# Makefile for ncrx
+
+CC = $(CROSS_COMPILE)gcc
+CFLAGS += -Wall -O2 -I../lib
+
+LIBS = ../lib/netconsole/libncrx.a
+LDFLAGS += $(LIBS)
+
+all: ncrx
+%: %.c $(LIBS)
+	$(CC) $(CFLAGS) -o $@ $^
+
+clean:
+	$(RM) ncrx
diff --git a/tools/ncrx/ncrx.c b/tools/ncrx/ncrx.c
new file mode 100644
index 0000000..f462377
--- /dev/null
+++ b/tools/ncrx/ncrx.c
@@ -0,0 +1,143 @@
+/*
+ * ncrx - simple extended netconsole receiver
+ *
+ * Copyright (C) 2015		Facebook, Inc
+ * Copyright (C) 2015		Tejun Heo <tj@kernel.org>
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <time.h>
+#include <poll.h>
+#include <ctype.h>
+#include <errno.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <netinet/udp.h>
+#include <netconsole/ncrx.h>
+
+int main(int argc, char **argv)
+{
+	char buf[NCRX_LINE_MAX + 1];
+	struct ncrx *ncrx;
+	struct sockaddr_in laddr = { };
+	uint64_t next_seq = 0, next_at = UINT64_MAX;
+	int fd;
+
+	if (argc != 2) {
+		fprintf(stderr, "Usage: ncrx PORT\n");
+		return 1;
+	}
+
+	fd = socket(AF_INET, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("socket");
+		return 1;
+	}
+
+	laddr.sin_family = AF_INET;
+	laddr.sin_addr.s_addr = htonl(INADDR_ANY);
+	laddr.sin_port = htons(atoi(argv[1]));
+
+	if (bind(fd, (struct sockaddr *)&laddr, sizeof(laddr)) < 0) {
+		perror("bind");
+		return 1;
+	}
+
+	ncrx = ncrx_create(NULL);
+	if (!ncrx) {
+		perror("ncrx_create");
+		return 1;
+	}
+
+	while (1) {
+		struct pollfd pfd = { .fd = fd, .events = POLLIN };
+		struct sockaddr_in raddr;
+		struct ncrx_msg *msg;
+		struct timespec ts;
+		socklen_t raddr_len = sizeof(raddr);
+		char *payload = NULL;
+		const char *resp;
+		uint64_t now;
+		int timeout;
+		int len;
+
+		/* determine sleep interval and poll */
+		timeout = -1;
+		if (next_at != UINT64_MAX) {
+			timeout = 0;
+			if (next_at > now)
+				timeout = next_at - now;
+		}
+
+		if (poll(&pfd, 1, timeout) < 0) {
+			perror("poll");
+			return 1;
+		}
+
+		/* receive message */
+		len = recvfrom(fd, buf, sizeof(buf) - 1, MSG_DONTWAIT,
+			       (struct sockaddr *)&raddr, &raddr_len);
+
+		payload = NULL;
+		if (len >= 0) {
+			buf[len] = '\0';
+			payload = buf;
+		} else if (errno != EAGAIN) {
+			perror("recv");
+			return 1;
+		}
+
+		/* determine the current time */
+		if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
+			perror("clock_gettime");
+			return 1;
+		}
+		now = ts.tv_sec * 1000 + ts.tv_nsec / 1000000;
+
+		/* process the payload and perform rx operations */
+		if (ncrx_process(payload, now, ncrx) && errno != ENOENT) {
+			if (errno == EINVAL) {
+				while (len && isspace(payload[len - 1]))
+					payload[--len] = '\0';
+				printf("[%12s] %s\n", "INVAL", payload);
+			} else {
+				perror("ncrx_process");
+			}
+		}
+
+		resp = ncrx_response(ncrx, &len);
+		if (resp && sendto(fd, resp, len, 0,
+				   (struct sockaddr *)&raddr, raddr_len) < 0)
+			perror("sendto");
+
+		while ((msg = ncrx_next_msg(ncrx))) {
+			if (msg->frag) {
+				printf("[%12s] %s\n", "FRAG", msg->text);
+				continue;
+			}
+			if (msg->oos) {
+				printf("[%12s] %s\n", "OOS", msg->text);
+				continue;
+			}
+			if (msg->seq_reset) {
+				printf("[%12s] seq=%"PRIu64"\n", "SEQ RESET",
+				       msg->seq);
+				next_seq = msg->seq;
+			}
+			if (msg->seq != next_seq) {
+				printf("[%12s] %"PRIu64" messages skipped\n",
+				       "SEQ SKIPPED", msg->seq - next_seq);
+			}
+
+			next_seq = msg->seq + 1;
+
+			printf("[%5"PRIu64".%06"PRIu64"] %s\n",
+			       msg->ts_usec / 1000000, msg->ts_usec % 1000000,
+			       msg->text);
+		}
+
+		next_at = ncrx_invoke_process_at(ncrx);
+	}
+
+	return 0;
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 16/16] netconsole: update documentation for extended netconsole
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (14 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 15/16] netconsole: implement netconsole receiver library Tejun Heo
@ 2015-04-16 23:03 ` Tejun Heo
  2015-04-17 15:35 ` [PATCHSET] printk, netconsole: implement reliable netconsole Tetsuo Handa
  2015-04-19  7:25 ` Rob Landley
  17 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev, Tejun Heo

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
---
 Documentation/networking/netconsole.txt | 95 ++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt
index a5d574a..e7a57c2 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -2,6 +2,7 @@
 started by Ingo Molnar <mingo@redhat.com>, 2001.09.17
 2.6 port and netpoll api by Matt Mackall <mpm@selenic.com>, Sep 9 2003
 IPv6 support by Cong Wang <xiyou.wangcong@gmail.com>, Jan 1 2013
+Reliable extended console support by Tejun Heo <tj@kernel.org>, Apr 16 2015
 
 Please send bug reports to Matt Mackall <mpm@selenic.com>
 Satyam Sharma <satyam.sharma@gmail.com>, and Cong Wang <xiyou.wangcong@gmail.com>
@@ -24,9 +25,10 @@ Sender and receiver configuration:
 It takes a string configuration parameter "netconsole" in the
 following format:
 
- netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
+ netconsole=[+][src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
 
    where
+        +             if present, enable reliable extended console support
         src-port      source for UDP packets (defaults to 6665)
         src-ip        source IP to use (interface address)
         dev           network interface (eth0)
@@ -107,6 +109,7 @@ To remove a target:
 The interface exposes these parameters of a netconsole target to userspace:
 
 	enabled		Is this target currently enabled?	(read-write)
+	extended	Extended mode enabled			(read-write)
 	dev_name	Local network interface name		(read-write)
 	local_port	Source UDP port to use			(read-write)
 	remote_port	Remote agent's UDP port			(read-write)
@@ -132,6 +135,96 @@ You can also update the local interface dynamically. This is especially
 useful if you want to use interfaces that have newly come up (and may not
 have existed when netconsole was loaded / initialized).
 
+Reliable extended console:
+==========================
+
+If '+' is prefixed to the configuration line or "extended" config file
+is set to 1, reliable extended console support is enabled. An example
+boot param follows.
+
+ linux netconsole=+4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc
+
+Reliable extended console support is consisted of three parts.
+
+1. Extended format messages
+---------------------------
+
+Log messages are transmitted with extended metadata header in the
+following format which is the same as /dev/kmsg.
+
+ <level>,<sequnum>,<timestamp>,<contflag>;<message text>
+
+Non printable characters in <message text> are escaped using "\xff"
+notation. If the message contains optional dictionary, verbatim
+newline is used as the delimeter. printk fragments from KERN_CONT
+messages are transmitted in the following form.
+
+ <level>,-,<timestamp>,-,fragid=<fragid>;<message fragment>
+
+Where <fragid> uniquely identifies the message being assembled. Later
+when the assembly is complete, the completed message with sequence
+number is transmitted. The receiver is responsible for handling the
+overlaps between fragements and the final assembled message.
+
+If a message doesn't fit in 1000 bytes, the message is split into
+multiple fragments by netconsole. These fragments are transmitted with
+"ncfrag" header field added.
+
+ ncfrag=<byte-offset>@<0-based-chunk-index>/<total-chunks>
+
+For example,
+
+ 6,416,1758426,-,ncfrag=0@0/2;the first chunk,
+ 6,416,1758426,-,ncfrag=16@1/2;the second chunk
+
+2. Retransmission requests
+--------------------------
+
+If extended console support is enabled, netconsole listens for packets
+arriving on the source port. The receiver can send the packets of the
+following form to request messages with specific sequence numbers.
+
+ nca <missing-seq> <missing-seq>...
+
+There can be any number of <missing-seq> as long as they fit inside
+1000 bytes. If the messages with the matching sequence numbers exist,
+netconsole will retransmit them immediately.
+
+3. ACK and emergency transmission
+---------------------------------
+
+If the receiver sends back a packet with the following format,
+netconsole enables ACK support.
+
+ nca<ack-seq> <missing-seq> <missing-seq>...
+
+If the ack sequence lags the current sequence by more than 10s,
+emergency transmission is started and all the unacked messages are
+retransmitted periodically with increasing interval. In the first
+round, each unacked message is transmitted every 100ms. In the next,
+every 200ms. The interval doubles until it reaches 1s and stays there.
+
+Each emergency transmitted message has "emg=1" header field added to
+it. For example,
+
+ 6,416,1758426,-,emg=1;netpoll: netconsole: local port 6666
+
+This exists to ensure that messages reach the destination even when
+things went pear-shaped. As long as netpoll and timer are working, the
+logging target has pretty good chance of receiving all messages.
+
+Receiver
+--------
+
+libncrx is a library which implements reliable remote logger using the
+above features. The library is available under tools/lib/netconsole.
+The library is a pure state machine with packet and timer plumbing
+left to the library user to ease integration into larger framework.
+See tools/lib/netconsole/ncrx.h for details.
+
+tools/ncrx/ncrx is a simple receiver program using libncrx which can
+be used as an example.
+
 Miscellaneous notes:
 ====================
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (15 preceding siblings ...)
  2015-04-16 23:03 ` [PATCH 16/16] netconsole: update documentation for extended netconsole Tejun Heo
@ 2015-04-17 15:35 ` Tetsuo Handa
  2015-04-17 16:28   ` Tejun Heo
  2015-04-19  7:25 ` Rob Landley
  17 siblings, 1 reply; 46+ messages in thread
From: Tetsuo Handa @ 2015-04-17 15:35 UTC (permalink / raw)
  To: tj; +Cc: akpm, davem, linux-kernel, netdev

Tejun Heo wrote:
> * Implement netconsole retransmission support.  Matching rx socket on
>   the source port is automatically created for extended targets and
>   the log receiver can request retransmission by sending reponse
>   packets.  This is completely decoupled from the main write path and
>   doesn't make netconsole less robust when things start go south.

If the sender side can wait for retransmission, why can't we use
userspace programs (e.g. rsyslogd)?

For me, netconsole is mainly for saving kernel messages which cannot be
waited for a few deciseconds (e.g. system reset by softdog's timeout) and
for saving kernel messages (e.g. SysRq-t) during disk I/O hang up.
I have a logger for receiving netconsole messages at
http://sourceforge.jp/projects/akari/scm/svn/tree/head/branches/udplogger/
and things I expect for netconsole are shown below.

  (a) spool the message to send up to "1 line" or "adimn configured size"
      so that total number of small UDP packets can be reduced

  (b) don't hesitate to send the spooled message immediately if either
      kernel panic or system reset is in progress

  (c) allow different console log level for different console drivers
      so that I can send kernel messages via netconsole without making
      local console noisy

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 15:35 ` [PATCHSET] printk, netconsole: implement reliable netconsole Tetsuo Handa
@ 2015-04-17 16:28   ` Tejun Heo
  2015-04-17 17:17     ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-17 16:28 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, davem, linux-kernel, netdev

Hello,

On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote:
> If the sender side can wait for retransmission, why can't we use
> userspace programs (e.g. rsyslogd)?

Because the system may be oopsing, ooming or threshing excessively
rendering the userland inoperable and that's exactly when we want
those log messages to be transmitted out of the system.  This will get
log out as long as timer and netpoll are running which is the case
under a lot of circumstances.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 16:28   ` Tejun Heo
@ 2015-04-17 17:17     ` David Miller
  2015-04-17 17:37       ` Tejun Heo
  2015-04-21 21:51       ` Stephen Hemminger
  0 siblings, 2 replies; 46+ messages in thread
From: David Miller @ 2015-04-17 17:17 UTC (permalink / raw)
  To: tj; +Cc: penguin-kernel, akpm, linux-kernel, netdev

From: Tejun Heo <tj@kernel.org>
Date: Fri, 17 Apr 2015 12:28:26 -0400

> On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote:
>> If the sender side can wait for retransmission, why can't we use
>> userspace programs (e.g. rsyslogd)?
> 
> Because the system may be oopsing, ooming or threshing excessively
> rendering the userland inoperable and that's exactly when we want
> those log messages to be transmitted out of the system.

If userland cannot run properly, it is almost certain that neither will
your complex reliability layer logic.

I tend to agree with Tetsuo, that in-kernel netconsole should remain
as simple as possible and once it starts to have any smarts and less
trivial logic the job belongs in userspace.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 17:17     ` David Miller
@ 2015-04-17 17:37       ` Tejun Heo
  2015-04-17 17:43         ` Tetsuo Handa
                           ` (2 more replies)
  2015-04-21 21:51       ` Stephen Hemminger
  1 sibling, 3 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-17 17:37 UTC (permalink / raw)
  To: David Miller; +Cc: penguin-kernel, akpm, linux-kernel, netdev

Hello, David.

On Fri, Apr 17, 2015 at 01:17:12PM -0400, David Miller wrote:
> If userland cannot run properly, it is almost certain that neither will
> your complex reliability layer logic.

* The bulk of patches are to pipe extended log messages to console
  drivers and let netconsole relay them to the receiver (and quite a
  bit of refactoring in the process), which, regardless of the
  reliability logic, is beneficial as we're currently losing
  structured logging (dictionary) and other metadata over consoles and
  regardless of where the reliability logic is implemented, it's a lot
  easier to have messages IDs.

* The only thing necessary for reliable transmission are timer and
  netpoll.  There sure are cases where they go down too but there's a
  pretty big gap between those two going down and userland getting
  hosed, but where to put the retransmission and reliability logic
  definitely is debatable.

* That said, the "reliability" part of the patch series are just two
  patches - 13 and 14, both of which are actually pretty simple.

> I tend to agree with Tetsuo, that in-kernel netconsole should remain
> as simple as possible and once it starts to have any smarts and less
> trivial logic the job belongs in userspace.

Upto patch 12, it's just the same mechanism transferring extended
messages.  It doesn't add any smartness to netconsole per-se except
that it can now emit messages with metadata headers.  What do you
think about them?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 17:37       ` Tejun Heo
@ 2015-04-17 17:43         ` Tetsuo Handa
  2015-04-17 17:45           ` Tejun Heo
  2015-04-17 18:04         ` Tejun Heo
  2015-04-17 18:55         ` David Miller
  2 siblings, 1 reply; 46+ messages in thread
From: Tetsuo Handa @ 2015-04-17 17:43 UTC (permalink / raw)
  To: tj, davem; +Cc: akpm, linux-kernel, netdev

Tejun Heo wrote:
> Hello, David.
> 
> On Fri, Apr 17, 2015 at 01:17:12PM -0400, David Miller wrote:
> > If userland cannot run properly, it is almost certain that neither will
> > your complex reliability layer logic.
> 
> * The bulk of patches are to pipe extended log messages to console
>   drivers and let netconsole relay them to the receiver (and quite a
>   bit of refactoring in the process), which, regardless of the
>   reliability logic, is beneficial as we're currently losing
>   structured logging (dictionary) and other metadata over consoles and
>   regardless of where the reliability logic is implemented, it's a lot
>   easier to have messages IDs.
> 
> * The only thing necessary for reliable transmission are timer and
>   netpoll.  There sure are cases where they go down too but there's a
>   pretty big gap between those two going down and userland getting
>   hosed, but where to put the retransmission and reliability logic
>   definitely is debatable.
> 
> * That said, the "reliability" part of the patch series are just two
>   patches - 13 and 14, both of which are actually pretty simple.
> 
> > I tend to agree with Tetsuo, that in-kernel netconsole should remain
> > as simple as possible and once it starts to have any smarts and less
> > trivial logic the job belongs in userspace.
> 
> Upto patch 12, it's just the same mechanism transferring extended
> messages.  It doesn't add any smartness to netconsole per-se except
> that it can now emit messages with metadata headers.  What do you
> think about them?

So, this patchset aims for obtaining kernel messages under problematic
condition. You have to hold messages until ack is delivered. This means
that printk buffer can become full before burst messages (e.g. SysRq-t)
are acked due to packet loss in the network.

printk() cannot wait for ack. Trying to wait for ack would break something.
How can you transmit subsequent kernel messages which failed to enqueue
due to waiting for ack for previous kernel messages?

> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 17:43         ` Tetsuo Handa
@ 2015-04-17 17:45           ` Tejun Heo
  2015-04-17 18:03             ` Tetsuo Handa
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-17 17:45 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: davem, akpm, linux-kernel, netdev

On Sat, Apr 18, 2015 at 02:43:30AM +0900, Tetsuo Handa wrote:
> > Upto patch 12, it's just the same mechanism transferring extended
> > messages.  It doesn't add any smartness to netconsole per-se except
> > that it can now emit messages with metadata headers.  What do you
> > think about them?
> 
> So, this patchset aims for obtaining kernel messages under problematic
> condition. You have to hold messages until ack is delivered. This means
> that printk buffer can become full before burst messages (e.g. SysRq-t)
> are acked due to packet loss in the network.
> 
> printk() cannot wait for ack. Trying to wait for ack would break something.
> How can you transmit subsequent kernel messages which failed to enqueue
> due to waiting for ack for previous kernel messages?

Well, if log buffer overflows and the messages aren't at the logging
target yet, they're lost.  It's the same as doing dmesg on localhost,
isn't it?  This doesn't have much to do with where the reliability
logic is implemented and is exactly the same with local logging too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 17:45           ` Tejun Heo
@ 2015-04-17 18:03             ` Tetsuo Handa
  2015-04-17 18:07               ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Tetsuo Handa @ 2015-04-17 18:03 UTC (permalink / raw)
  To: tj; +Cc: davem, akpm, linux-kernel, netdev

Tejun Heo wrote:
> > printk() cannot wait for ack. Trying to wait for ack would break something.
> > How can you transmit subsequent kernel messages which failed to enqueue
> > due to waiting for ack for previous kernel messages?
> 
> Well, if log buffer overflows and the messages aren't at the logging
> target yet, they're lost.  It's the same as doing dmesg on localhost,
> isn't it?  This doesn't have much to do with where the reliability
> logic is implemented and is exactly the same with local logging too.

If you tolerate loss of kernel messages, adding sequence number to each UDP
packet will be sufficient for finding out whether the packets were lost and/or
reordered in flight.

  printk("Hello");
   => netconsole sends "00000000 Hello" using UDP
  printk("netconsole");
   => netconsole sends "00000001 netconsole" using UDP
  printk("world\n");
   => netconsole sends "00000002 world\n" using UDP

It might be nice to allow administrator to prefix a sequence number
to netconsole messages for those who are using special receiver
program (e.g. ncrx) which checks that sequence number.

> 
> Thanks.
> 
> -- 
> tejun
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 17:37       ` Tejun Heo
  2015-04-17 17:43         ` Tetsuo Handa
@ 2015-04-17 18:04         ` Tejun Heo
  2015-04-17 18:55         ` David Miller
  2 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-17 18:04 UTC (permalink / raw)
  To: David Miller; +Cc: penguin-kernel, akpm, linux-kernel, netdev

Just a bit of addition.

On Fri, Apr 17, 2015 at 01:37:54PM -0400, Tejun Heo wrote:
> Upto patch 12, it's just the same mechanism transferring extended
> messages.  It doesn't add any smartness to netconsole per-se except
> that it can now emit messages with metadata headers.  What do you
> think about them?

So, as long as netconsole can send messages with metadata header,
moving the reliability part to userland is trivial.  All that's
necessary is a program which follows /dev/kmsg, keeps the unacked
sequences and implement the same retransmission mechanism.  It'd be
less reobust in certain failure scenarios and a bit more cumbersome to
set up but nothing major and if we do that there'd be no reason to
keep the userland part in the kernel tree.

If the retransmission and timer parts are bothering, moving those to
userland sounds like an acceptable compromise.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 18:03             ` Tetsuo Handa
@ 2015-04-17 18:07               ` Tejun Heo
  2015-04-17 18:20                 ` Tetsuo Handa
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-17 18:07 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: davem, akpm, linux-kernel, netdev

On Sat, Apr 18, 2015 at 03:03:46AM +0900, Tetsuo Handa wrote:
> If you tolerate loss of kernel messages, adding sequence number to each UDP

Well, there's a difference between accepting loss when log buffer
overflows and when any packets get lost.

> packet will be sufficient for finding out whether the packets were lost and/or
> reordered in flight.
> 
>   printk("Hello");
>    => netconsole sends "00000000 Hello" using UDP
>   printk("netconsole");
>    => netconsole sends "00000001 netconsole" using UDP
>   printk("world\n");
>    => netconsole sends "00000002 world\n" using UDP
> 
> It might be nice to allow administrator to prefix a sequence number
> to netconsole messages for those who are using special receiver
> program (e.g. ncrx) which checks that sequence number.

That said, this is pretty much what the first 12 patches do (except
for the last printk patch, which can be taken out).  We already have
sequencing and established format to expose them to userland - try cat
/dev/kmsg, which btw is what local loggers on modern systems use
anyway.  Why introduce netconsole's own version of metadata?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 18:07               ` Tejun Heo
@ 2015-04-17 18:20                 ` Tetsuo Handa
  2015-04-17 18:26                   ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Tetsuo Handa @ 2015-04-17 18:20 UTC (permalink / raw)
  To: tj; +Cc: davem, akpm, linux-kernel, netdev

Tejun Heo wrote:
> On Sat, Apr 18, 2015 at 03:03:46AM +0900, Tetsuo Handa wrote:
> > packet will be sufficient for finding out whether the packets were lost and/or
> > reordered in flight.
> > 
> >   printk("Hello");
> >    => netconsole sends "00000000 Hello" using UDP
> >   printk("netconsole");
> >    => netconsole sends "00000001 netconsole" using UDP
> >   printk("world\n");
> >    => netconsole sends "00000002 world\n" using UDP
> > 
> > It might be nice to allow administrator to prefix a sequence number
> > to netconsole messages for those who are using special receiver
> > program (e.g. ncrx) which checks that sequence number.
> 
> That said, this is pretty much what the first 12 patches do (except
> for the last printk patch, which can be taken out).  We already have
> sequencing and established format to expose them to userland - try cat
> /dev/kmsg, which btw is what local loggers on modern systems use
> anyway.  Why introduce netconsole's own version of metadata?

I didn't mean to introduce netconsole's own version of metadata.
I meant we don't need to implement in-kernel retry logic.

If we can assume that scheduler is working, adding a kernel thread that
does

  while (1) {
      read messages with metadata from /dev/kmsg
      send them using UDP network
  }

might be easier than modifying netconsole module.

> 
> Thanks.
> 
> -- 
> tejun
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 18:20                 ` Tetsuo Handa
@ 2015-04-17 18:26                   ` Tejun Heo
  2015-04-18 13:09                     ` Tetsuo Handa
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-17 18:26 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: davem, akpm, linux-kernel, netdev

Hello,

On Sat, Apr 18, 2015 at 03:20:41AM +0900, Tetsuo Handa wrote:
> I didn't mean to introduce netconsole's own version of metadata.
> I meant we don't need to implement in-kernel retry logic.

Hmmm?  I'm not really following where this discussion is headed.  No,
we don't have to put it in the kernel.  We can punt the retry part to
userland as I wrote in another message at some cost to robustness.

> If we can assume that scheduler is working, adding a kernel thread that
> does
> 
>   while (1) {
>       read messages with metadata from /dev/kmsg
>       send them using UDP network
>   }
> 
> might be easier than modifying netconsole module.

But, I mean, if we are gonna do that in kernel, we better do it
properly where it belongs.  What's up with "easier than modifying
netconsole module"?  Why is netconsole special?  And how would the
above be any less complex than a single timer function?  What am I
missing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 17:37       ` Tejun Heo
  2015-04-17 17:43         ` Tetsuo Handa
  2015-04-17 18:04         ` Tejun Heo
@ 2015-04-17 18:55         ` David Miller
  2015-04-17 19:52           ` Tejun Heo
  2 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2015-04-17 18:55 UTC (permalink / raw)
  To: tj; +Cc: penguin-kernel, akpm, linux-kernel, netdev

From: Tejun Heo <tj@kernel.org>
Date: Fri, 17 Apr 2015 13:37:54 -0400

> Hello, David.
> 
> On Fri, Apr 17, 2015 at 01:17:12PM -0400, David Miller wrote:
>> If userland cannot run properly, it is almost certain that neither will
>> your complex reliability layer logic.
> 
> * The bulk of patches are to pipe extended log messages to console
>   drivers and let netconsole relay them to the receiver (and quite a
>   bit of refactoring in the process), which, regardless of the
>   reliability logic, is beneficial as we're currently losing
>   structured logging (dictionary) and other metadata over consoles and
>   regardless of where the reliability logic is implemented, it's a lot
>   easier to have messages IDs.

I do not argue against cleanups and good restructuring of the existing
code.  But you have decided to mix that up with something that is not
exactly non-controversial.

You'd do well to seperate the cleanups from the fundamental changes,
so they can be handled separately.

> * The only thing necessary for reliable transmission are timer and
>   netpoll.  There sure are cases where they go down too but there's a
>   pretty big gap between those two going down and userland getting
>   hosed, but where to put the retransmission and reliability logic
>   definitely is debatable.

I fundamentally disagree, exactly on this point.

If you take an OOPS in a software interrupt handler (basically, all of
the networking receive paths and part of the transmit paths, for
example) you're not going to be taking timer interrupts.

And that's the value of netconsole, the chance (albeit not %100) of
getting messages in those scenerios.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 18:55         ` David Miller
@ 2015-04-17 19:52           ` Tejun Heo
  2015-04-17 20:06             ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-17 19:52 UTC (permalink / raw)
  To: David Miller; +Cc: penguin-kernel, akpm, linux-kernel, netdev

Hello,

On Fri, Apr 17, 2015 at 02:55:37PM -0400, David Miller wrote:
> > * The bulk of patches are to pipe extended log messages to console
> >   drivers and let netconsole relay them to the receiver (and quite a
> >   bit of refactoring in the process), which, regardless of the
> >   reliability logic, is beneficial as we're currently losing
> >   structured logging (dictionary) and other metadata over consoles and
> >   regardless of where the reliability logic is implemented, it's a lot
> >   easier to have messages IDs.
> 
> I do not argue against cleanups and good restructuring of the existing
> code.  But you have decided to mix that up with something that is not
> exactly non-controversial.

Is the controlversial part referring to sending extended messages or
the reliability part or both?

> You'd do well to seperate the cleanups from the fundamental changes,
> so they can be handled separately.

Hmmm... yeah, probably would have been a better idea.  FWIW, the
patches are stacked roughly in the order of escalating
controversiness.  Will split the series up.

> > * The only thing necessary for reliable transmission are timer and
> >   netpoll.  There sure are cases where they go down too but there's a
> >   pretty big gap between those two going down and userland getting
> >   hosed, but where to put the retransmission and reliability logic
> >   definitely is debatable.
> 
> I fundamentally disagree, exactly on this point.
> 
> If you take an OOPS in a software interrupt handler (basically, all of
> the networking receive paths and part of the transmit paths, for
> example) you're not going to be taking timer interrupts.

Sure, if irq handling is hosed, this won't work but I think there are
enough other failure modes like oopsing while holding a mutex or
falling into infinite loop while holding task_list lock (IIRC we had
something simliar a while ago due to iterator bug).  Whether being
more robust in those cases is worthwhile is definitely debatable.  I
thought the added complexity was small enough but the judgement can
easily fall on the other side.

> And that's the value of netconsole, the chance (albeit not %100) of
> getting messages in those scenerios.

None of the changes harm that in any way.  Anyways, I'll split up the
extended message and the rest.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 19:52           ` Tejun Heo
@ 2015-04-17 20:06             ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2015-04-17 20:06 UTC (permalink / raw)
  To: tj; +Cc: penguin-kernel, akpm, linux-kernel, netdev

From: Tejun Heo <tj@kernel.org>
Date: Fri, 17 Apr 2015 15:52:38 -0400

> Hello,
> 
> On Fri, Apr 17, 2015 at 02:55:37PM -0400, David Miller wrote:
>> > * The bulk of patches are to pipe extended log messages to console
>> >   drivers and let netconsole relay them to the receiver (and quite a
>> >   bit of refactoring in the process), which, regardless of the
>> >   reliability logic, is beneficial as we're currently losing
>> >   structured logging (dictionary) and other metadata over consoles and
>> >   regardless of where the reliability logic is implemented, it's a lot
>> >   easier to have messages IDs.
>> 
>> I do not argue against cleanups and good restructuring of the existing
>> code.  But you have decided to mix that up with something that is not
>> exactly non-controversial.
> 
> Is the controlversial part referring to sending extended messages or
> the reliability part or both?

Anything outside of the non-side-effect cleanups.

> Hmmm... yeah, probably would have been a better idea.  FWIW, the
> patches are stacked roughly in the order of escalating
> controversiness.  Will split the series up.

Thanks.

> Sure, if irq handling is hosed, this won't work but I think there are
> enough other failure modes like oopsing while holding a mutex or
> falling into infinite loop while holding task_list lock (IIRC we had
> something simliar a while ago due to iterator bug).

If you oops while holding a mutex, unless it's the console mutex the
logging process can schedule and likely get the message transmitted.

What we're going to keep discussing is the fact that in return for all
of your unnecessary added complexity, we get something that only applies
in an extremely narrow scope of situations.

That is a very poor value proposition.

It took nearly two decades to get rid of all of the races and locking
problems with current netpoll/netconsole, and it's as simple as can
possibly be.  I do not want to even think about having to worry about
a reliability layer on top of it, that's just too much.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 18:26                   ` Tejun Heo
@ 2015-04-18 13:09                     ` Tetsuo Handa
  0 siblings, 0 replies; 46+ messages in thread
From: Tetsuo Handa @ 2015-04-18 13:09 UTC (permalink / raw)
  To: tj; +Cc: davem, akpm, linux-kernel, netdev

Tejun Heo wrote:
> > If we can assume that scheduler is working, adding a kernel thread that
> > does
> > 
> >   while (1) {
> >       read messages with metadata from /dev/kmsg
> >       send them using UDP network
> >   }
> > 
> > might be easier than modifying netconsole module.
> 
> But, I mean, if we are gonna do that in kernel, we better do it
> properly where it belongs.  What's up with "easier than modifying
> netconsole module"?  Why is netconsole special?  And how would the
> above be any less complex than a single timer function?  What am I
> missing?

User space daemon is sometimes disturbed unexpectedly due to

  (a) SIGKILL by OOM-killer
  (b) spurious ptrace() by somebody
  (c) spurious signals such as SIGSTOP / SIGINT
  (d) stalls triggered by page faults under OOM condition
  (e) other problems such as scheduler being not working

We have built-in protection for (a) named /proc/$pid/oom_score_adj , but
we need to configure access control modules for protecting (b) and (c),
and we don't have protection for (d). Thinking from OOM stall discussion,
(d) is fatal when trying to obtain kernel messages under problematic
condition. I thought that a kernel thread that does

  while (1) {
      read messages with metadata from /dev/kmsg
      send them using UDP network
  }

is automatically protected from (a), (b), (c) and (d), and it could be
implemented outside of netconsole module.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
                   ` (16 preceding siblings ...)
  2015-04-17 15:35 ` [PATCHSET] printk, netconsole: implement reliable netconsole Tetsuo Handa
@ 2015-04-19  7:25 ` Rob Landley
  2015-04-20 12:00   ` David Laight
  2015-04-20 14:33   ` Tejun Heo
  17 siblings, 2 replies; 46+ messages in thread
From: Rob Landley @ 2015-04-19  7:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Morton, David S. Miller, Kernel Mailing List, netdev

On Thu, Apr 16, 2015 at 6:03 PM, Tejun Heo <tj@kernel.org> wrote:
> In a lot of configurations, netconsole is a useful way to collect
> system logs; however, all netconsole does is simply emitting UDP
> packets for the raw messages and there's no way for the receiver to
> find out whether the packets were lost and/or reordered in flight.

Except a modern nonsaturated LAN shouldn't do that.

If you have two machines plugged into a hub, and that's _all_ that's
plugged in, packets should never get dropped. This was the original
use case of netconsole was that the sender and the receiver were
plugged into the same router.

However, even on a quite active LAN the days of ethernet doing CDMA
requiring retransmits are long gone, even 100baseT routers have been
cacheing and retransmitting data internally so each connection can go
at the full 11 megabytes/second with the backplane running fast enough
to keep them all active at the same time. (That's why it's so hard to
find a _hub_ anymore, it's all routers. The point of routers is they
cache internally and send the packet only out the connection they
should go, based on an internal routing table because they listened to
incoming packets to figure out who lives where and do arp-like
things.)

And of course gigabit is a point to point protocol that has nothing to
do with conventional ethernet at _all_ other than the name, as far as
I know it can't _not_ do this.

So are you trying to program around a problem you've actually _seen_,
or are you attempting to reinvent TCP/IP yet again based on top of UDP
(Drink!) because of a purely theoretical issue?

Or are you trying to route netconsole, unencapsulated, across the
internet? Of course the internet itself refusing to drop packets but
instead buffering and retransmitting them even when doing so turns out
to be a really bad idea is sort of where this whole "bufferbloat"
thing came from. So again, even in that context, is this a problem
you've actually _seen_?

> printk already keeps log metadata which contains enough information to
> make netconsole reliable.  This patchset does the followings.

Adds a giant amount of complexity without quite explaining why.

Rob

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-19  7:25 ` Rob Landley
@ 2015-04-20 12:00   ` David Laight
  2015-04-20 14:33   ` Tejun Heo
  1 sibling, 0 replies; 46+ messages in thread
From: David Laight @ 2015-04-20 12:00 UTC (permalink / raw)
  To: 'Rob Landley', Tejun Heo
  Cc: Andrew Morton, David S. Miller, Kernel Mailing List, netdev

From: Of Rob Landley
> Sent: 19 April 2015 08:25
> On Thu, Apr 16, 2015 at 6:03 PM, Tejun Heo <tj@kernel.org> wrote:
> > In a lot of configurations, netconsole is a useful way to collect
> > system logs; however, all netconsole does is simply emitting UDP
> > packets for the raw messages and there's no way for the receiver to
> > find out whether the packets were lost and/or reordered in flight.
> 
> Except a modern nonsaturated LAN shouldn't do that.
> 
> If you have two machines plugged into a hub, and that's _all_ that's
> plugged in, packets should never get dropped. This was the original
> use case of netconsole was that the sender and the receiver were
> plugged into the same router.
> 
> However, even on a quite active LAN the days of ethernet doing CDMA
> requiring retransmits are long gone, even 100baseT routers have been
> cacheing and retransmitting data internally so each connection can go
> at the full 11 megabytes/second with the backplane running fast enough
> to keep them all active at the same time. (That's why it's so hard to
> find a _hub_ anymore, it's all routers
...

Most machines are plugged into switches (not routers), many of them
will send 'pause' frames which the host mac may act on.
In which case packet loss is not expected (unless you have broadcast storms
when all bets are off).

Additionally, within a local network you shouldn't really get any packet
loss since no segments should be 100% loaded.
So for testing it is not unreasonable to expect no lost packets in netconsole
traffic.

	David



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/16] printk: guard the amount written per line by devkmsg_read()
  2015-04-16 23:03 ` [PATCH 01/16] printk: guard the amount written per line by devkmsg_read() Tejun Heo
@ 2015-04-20 12:11   ` Petr Mladek
  2015-04-20 12:33     ` Petr Mladek
  0 siblings, 1 reply; 46+ messages in thread
From: Petr Mladek @ 2015-04-20 12:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Thu 2015-04-16 19:03:38, Tejun Heo wrote:
> devkmsg_read() uses 8k buffer and assumes that the formatted output
> message won't overrun which seems safe given LOG_LINE_MAX, the current
> use of dict and the escaping method being used; however, we're
> planning to use devkmsg formatting wider and accounting for the buffer
> size properly isn't that complicated.
> 
> This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates
> devkmsg_read() so that it limits output accordingly.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Reviewed-by: Petr Mladek <pmladek@suse.cz>

It is just a refactoring and does not modify the current behavior.


> Cc: Kay Sievers <kay@vrfy.org>
> Cc: Petr Mladek <pmladek@suse.cz>
> ---
>  include/linux/printk.h |  2 ++
>  kernel/printk/printk.c | 35 +++++++++++++++++++++++------------
>  2 files changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 9b30871..58b1fec 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -30,6 +30,8 @@ static inline const char *printk_skip_level(const char *buffer)
>  	return buffer;
>  }
>  
> +#define CONSOLE_EXT_LOG_MAX	8192

If you do a respin from some reason. I would suggest to remove
"CONSOLE_" because it is used also for devkmsg.

Best Regards,
Petr

> +
>  /* printk's without a loglevel use this.. */
>  #define MESSAGE_LOGLEVEL_DEFAULT CONFIG_MESSAGE_LOGLEVEL_DEFAULT
>  
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 879edfc..b6e24af 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -512,7 +512,7 @@ struct devkmsg_user {
>  	u32 idx;
>  	enum log_flags prev;
>  	struct mutex lock;
> -	char buf[8192];
> +	char buf[CONSOLE_EXT_LOG_MAX];
>  };
>  
>  static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
> @@ -565,11 +565,18 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
>  	return ret;
>  }
>  
> +static void append_char(char **pp, char *e, char c)
> +{
> +	if (*pp < e)
> +		*(*pp)++ = c;
> +}
> +
>  static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  			    size_t count, loff_t *ppos)
>  {
>  	struct devkmsg_user *user = file->private_data;
>  	struct printk_log *msg;
> +	char *p, *e;
>  	u64 ts_usec;
>  	size_t i;
>  	char cont = '-';
> @@ -579,6 +586,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  	if (!user)
>  		return -EBADF;
>  
> +	p = user->buf;
> +	e = user->buf + sizeof(user->buf);
> +
>  	ret = mutex_lock_interruptible(&user->lock);
>  	if (ret)
>  		return ret;
> @@ -625,9 +635,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  		 ((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
>  		cont = '+';
>  
> -	len = sprintf(user->buf, "%u,%llu,%llu,%c;",
> -		      (msg->facility << 3) | msg->level,
> -		      user->seq, ts_usec, cont);
> +	p += scnprintf(p, e - p, "%u,%llu,%llu,%c;",
> +		       (msg->facility << 3) | msg->level,
> +		       user->seq, ts_usec, cont);
>  	user->prev = msg->flags;
>  
>  	/* escape non-printable characters */
> @@ -635,11 +645,11 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  		unsigned char c = log_text(msg)[i];
>  
>  		if (c < ' ' || c >= 127 || c == '\\')
> -			len += sprintf(user->buf + len, "\\x%02x", c);
> +			p += scnprintf(p, e - p, "\\x%02x", c);
>  		else
> -			user->buf[len++] = c;
> +			append_char(&p, e, c);
>  	}
> -	user->buf[len++] = '\n';
> +	append_char(&p, e, '\n');
>  
>  	if (msg->dict_len) {
>  		bool line = true;
> @@ -648,30 +658,31 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  			unsigned char c = log_dict(msg)[i];
>  
>  			if (line) {
> -				user->buf[len++] = ' ';
> +				append_char(&p, e, ' ');
>  				line = false;
>  			}
>  
>  			if (c == '\0') {
> -				user->buf[len++] = '\n';
> +				append_char(&p, e, '\n');
>  				line = true;
>  				continue;
>  			}
>  
>  			if (c < ' ' || c >= 127 || c == '\\') {
> -				len += sprintf(user->buf + len, "\\x%02x", c);
> +				p += scnprintf(p, e - p, "\\x%02x", c);
>  				continue;
>  			}
>  
> -			user->buf[len++] = c;
> +			append_char(&p, e, c);
>  		}
> -		user->buf[len++] = '\n';
> +		append_char(&p, e, '\n');
>  	}
>  
>  	user->idx = log_next(user->idx);
>  	user->seq++;
>  	raw_spin_unlock_irq(&logbuf_lock);
>  
> +	len = p - user->buf;
>  	if (len > count) {
>  		ret = -EINVAL;
>  		goto out;
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/16] printk: factor out message formatting from devkmsg_read()
  2015-04-16 23:03 ` [PATCH 02/16] printk: factor out message formatting from devkmsg_read() Tejun Heo
@ 2015-04-20 12:30   ` Petr Mladek
  0 siblings, 0 replies; 46+ messages in thread
From: Petr Mladek @ 2015-04-20 12:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Thu 2015-04-16 19:03:39, Tejun Heo wrote:
> The extended message formatting used for /dev/kmsg will be used
> implement extended consoles.  Factor out msg_print_ext_header() and
> msg_print_ext_body() from devkmsg_read().
> 
> This is pure restructuring.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Reviewed-by: Petr Mladek <pmladek@suse.cz>

I like the split of the long function.

Best Regards,
Petr

> Cc: Kay Sievers <kay@vrfy.org>
> Cc: Petr Mladek <pmladek@suse.cz>
> ---
>  kernel/printk/printk.c | 157 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 85 insertions(+), 72 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b6e24af..5ea6709 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -505,6 +505,86 @@ int check_syslog_permissions(int type, bool from_file)
>  	return security_syslog(type);
>  }
>  
> +static void append_char(char **pp, char *e, char c)
> +{
> +	if (*pp < e)
> +		*(*pp)++ = c;
> +}
> +
> +static ssize_t msg_print_ext_header(char *buf, size_t size,
> +				    struct printk_log *msg, u64 seq,
> +				    enum log_flags prev_flags)
> +{
> +	u64 ts_usec = msg->ts_nsec;
> +	char cont = '-';
> +
> +	do_div(ts_usec, 1000);
> +
> +	/*
> +	 * If we couldn't merge continuation line fragments during the print,
> +	 * export the stored flags to allow an optional external merge of the
> +	 * records. Merging the records isn't always neccessarily correct, like
> +	 * when we hit a race during printing. In most cases though, it produces
> +	 * better readable output. 'c' in the record flags mark the first
> +	 * fragment of a line, '+' the following.
> +	 */
> +	if (msg->flags & LOG_CONT && !(prev_flags & LOG_CONT))
> +		cont = 'c';
> +	else if ((msg->flags & LOG_CONT) ||
> +		 ((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
> +		cont = '+';
> +
> +	return scnprintf(buf, size, "%u,%llu,%llu,%c;",
> +		       (msg->facility << 3) | msg->level, seq, ts_usec, cont);
> +}
> +
> +static ssize_t msg_print_ext_body(char *buf, size_t size,
> +				  char *dict, size_t dict_len,
> +				  char *text, size_t text_len)
> +{
> +	char *p = buf, *e = buf + size;
> +	size_t i;
> +
> +	/* escape non-printable characters */
> +	for (i = 0; i < text_len; i++) {
> +		unsigned char c = text[i];
> +
> +		if (c < ' ' || c >= 127 || c == '\\')
> +			p += scnprintf(p, e - p, "\\x%02x", c);
> +		else
> +			append_char(&p, e, c);
> +	}
> +	append_char(&p, e, '\n');
> +
> +	if (dict_len) {
> +		bool line = true;
> +
> +		for (i = 0; i < dict_len; i++) {
> +			unsigned char c = dict[i];
> +
> +			if (line) {
> +				append_char(&p, e, ' ');
> +				line = false;
> +			}
> +
> +			if (c == '\0') {
> +				append_char(&p, e, '\n');
> +				line = true;
> +				continue;
> +			}
> +
> +			if (c < ' ' || c >= 127 || c == '\\') {
> +				p += scnprintf(p, e - p, "\\x%02x", c);
> +				continue;
> +			}
> +
> +			append_char(&p, e, c);
> +		}
> +		append_char(&p, e, '\n');
> +	}
> +
> +	return p - buf;
> +}
>  
>  /* /dev/kmsg - userspace message inject/listen interface */
>  struct devkmsg_user {
> @@ -565,30 +645,17 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
>  	return ret;
>  }
>  
> -static void append_char(char **pp, char *e, char c)
> -{
> -	if (*pp < e)
> -		*(*pp)++ = c;
> -}
> -
>  static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  			    size_t count, loff_t *ppos)
>  {
>  	struct devkmsg_user *user = file->private_data;
>  	struct printk_log *msg;
> -	char *p, *e;
> -	u64 ts_usec;
> -	size_t i;
> -	char cont = '-';
>  	size_t len;
>  	ssize_t ret;
>  
>  	if (!user)
>  		return -EBADF;
>  
> -	p = user->buf;
> -	e = user->buf + sizeof(user->buf);
> -
>  	ret = mutex_lock_interruptible(&user->lock);
>  	if (ret)
>  		return ret;
> @@ -618,71 +685,17 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  	}
>  
>  	msg = log_from_idx(user->idx);
> -	ts_usec = msg->ts_nsec;
> -	do_div(ts_usec, 1000);
> -
> -	/*
> -	 * If we couldn't merge continuation line fragments during the print,
> -	 * export the stored flags to allow an optional external merge of the
> -	 * records. Merging the records isn't always neccessarily correct, like
> -	 * when we hit a race during printing. In most cases though, it produces
> -	 * better readable output. 'c' in the record flags mark the first
> -	 * fragment of a line, '+' the following.
> -	 */
> -	if (msg->flags & LOG_CONT && !(user->prev & LOG_CONT))
> -		cont = 'c';
> -	else if ((msg->flags & LOG_CONT) ||
> -		 ((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
> -		cont = '+';
> +	len = msg_print_ext_header(user->buf, sizeof(user->buf),
> +				   msg, user->seq, user->prev);
> +	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
> +				  log_dict(msg), msg->dict_len,
> +				  log_text(msg), msg->text_len);
>  
> -	p += scnprintf(p, e - p, "%u,%llu,%llu,%c;",
> -		       (msg->facility << 3) | msg->level,
> -		       user->seq, ts_usec, cont);
>  	user->prev = msg->flags;
> -
> -	/* escape non-printable characters */
> -	for (i = 0; i < msg->text_len; i++) {
> -		unsigned char c = log_text(msg)[i];
> -
> -		if (c < ' ' || c >= 127 || c == '\\')
> -			p += scnprintf(p, e - p, "\\x%02x", c);
> -		else
> -			append_char(&p, e, c);
> -	}
> -	append_char(&p, e, '\n');
> -
> -	if (msg->dict_len) {
> -		bool line = true;
> -
> -		for (i = 0; i < msg->dict_len; i++) {
> -			unsigned char c = log_dict(msg)[i];
> -
> -			if (line) {
> -				append_char(&p, e, ' ');
> -				line = false;
> -			}
> -
> -			if (c == '\0') {
> -				append_char(&p, e, '\n');
> -				line = true;
> -				continue;
> -			}
> -
> -			if (c < ' ' || c >= 127 || c == '\\') {
> -				p += scnprintf(p, e - p, "\\x%02x", c);
> -				continue;
> -			}
> -
> -			append_char(&p, e, c);
> -		}
> -		append_char(&p, e, '\n');
> -	}
> -
>  	user->idx = log_next(user->idx);
>  	user->seq++;
>  	raw_spin_unlock_irq(&logbuf_lock);
>  
> -	len = p - user->buf;
>  	if (len > count) {
>  		ret = -EINVAL;
>  		goto out;
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/16] printk: guard the amount written per line by devkmsg_read()
  2015-04-20 12:11   ` Petr Mladek
@ 2015-04-20 12:33     ` Petr Mladek
  0 siblings, 0 replies; 46+ messages in thread
From: Petr Mladek @ 2015-04-20 12:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Mon 2015-04-20 14:11:36, Petr Mladek wrote:
> On Thu 2015-04-16 19:03:38, Tejun Heo wrote:
> > devkmsg_read() uses 8k buffer and assumes that the formatted output
> > message won't overrun which seems safe given LOG_LINE_MAX, the current
> > use of dict and the escaping method being used; however, we're
> > planning to use devkmsg formatting wider and accounting for the buffer
> > size properly isn't that complicated.
> > 
> > This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates
> > devkmsg_read() so that it limits output accordingly.
> > 
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> 
> Reviewed-by: Petr Mladek <pmladek@suse.cz>
> 
> It is just a refactoring and does not modify the current behavior.

Ah, to make it clear. It did not modify the behavior except for
adding the check for potential buffer overflow.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers()
  2015-04-16 23:03 ` [PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers() Tejun Heo
@ 2015-04-20 12:50   ` Petr Mladek
  0 siblings, 0 replies; 46+ messages in thread
From: Petr Mladek @ 2015-04-20 12:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Thu 2015-04-16 19:03:40, Tejun Heo wrote:
> When a line is printed by multiple printk invocations, each chunk is
> directly sent out to console drivers so that they don't get lost.
> When the line is completed and stored in the log buffer, the line is
> suppressed from going out to consoles as that'd lead to duplicate
> outputs.  This is tracked with LOG_NOCONS flag.
> 
> The suppression is currently implemented in console_unlock() which
> skips invoking call_console_drivers() for LOG_NOCONS messages.  This
> patch moves the filtering into call_console_drivers() in preparation
> of the planned extended console drivers which will deal with the
> duplicate messages themselves.
> 
> While this makes call_console_drivers() iterate over LOG_NOCONS
> messages, this is extremely unlikely to matter especially given that
> continuation lines aren't that common and also simplifies
> console_unlock() a bit.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Kay Sievers <kay@vrfy.org>
> Cc: Petr Mladek <pmladek@suse.cz>
> ---
>  kernel/printk/printk.c | 46 ++++++++++++++++++++++++----------------------
>  1 file changed, 24 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5ea6709..0175c46 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1417,7 +1417,8 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
>   * log_buf[start] to log_buf[end - 1].
>   * The console_lock must be held.
>   */
> -static void call_console_drivers(int level, const char *text, size_t len)
> +static void call_console_drivers(int level, bool nocons,
> +				 const char *text, size_t len)
>  {
>  	struct console *con;
>  
> @@ -1438,6 +1439,13 @@ static void call_console_drivers(int level, const char *text, size_t len)
>  		if (!cpu_online(smp_processor_id()) &&
>  		    !(con->flags & CON_ANYTIME))
>  			continue;
> +		/*
> +		 * Skip record we have buffered and already printed
> +		 * directly to the console when we received it.
> +		 */
> +		if (nocons)
> +			continue;
> +
>  		con->write(con, text, len);
>  	}
>  }
> @@ -1919,7 +1927,8 @@ static struct cont {
>  } cont;
>  static struct printk_log *log_from_idx(u32 idx) { return NULL; }
>  static u32 log_next(u32 idx) { return 0; }
> -static void call_console_drivers(int level, const char *text, size_t len) {}
> +static void call_console_drivers(int level, bool nocons,
> +				 const char *text, size_t len) {}
>  static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
>  			     bool syslog, char *buf, size_t size) { return 0; }
>  static size_t cont_print_text(char *text, size_t size) { return 0; }
> @@ -2190,7 +2199,7 @@ static void console_cont_flush(char *text, size_t size)
>  	len = cont_print_text(text, size);
>  	raw_spin_unlock(&logbuf_lock);
>  	stop_critical_timings();
> -	call_console_drivers(cont.level, text, len);
> +	call_console_drivers(cont.level, false, text, len);
>  	start_critical_timings();
>  	local_irq_restore(flags);
>  	return;
> @@ -2234,6 +2243,7 @@ again:
>  		struct printk_log *msg;
>  		size_t len;
>  		int level;
> +		bool nocons;
>  
>  		raw_spin_lock_irqsave(&logbuf_lock, flags);
>  		if (seen_seq != log_next_seq) {
> @@ -2252,38 +2262,30 @@ again:
>  		} else {
>  			len = 0;
>  		}
> -skip:
> +
>  		if (console_seq == log_next_seq)
>  			break;
>  
>  		msg = log_from_idx(console_idx);
> -		if (msg->flags & LOG_NOCONS) {
> -			/*
> -			 * Skip record we have buffered and already printed
> -			 * directly to the console when we received it.
> -			 */
> -			console_idx = log_next(console_idx);
> -			console_seq++;
> -			/*
> -			 * We will get here again when we register a new
> -			 * CON_PRINTBUFFER console. Clear the flag so we
> -			 * will properly dump everything later.
> -			 */
> -			msg->flags &= ~LOG_NOCONS;
> -			console_prev = msg->flags;
> -			goto skip;
> -		}
> -
>  		level = msg->level;
> +		nocons = msg->flags & LOG_NOCONS;
>  		len += msg_print_text(msg, console_prev, false,
>  				      text + len, sizeof(text) - len);
>  		console_idx = log_next(console_idx);
>  		console_seq++;
>  		console_prev = msg->flags;
> +
> +		/*
> +		 * The log will be processed again when we register a new
> +		 * CON_PRINTBUFFER console. Clear the flag so we will
> +		 * properly dump everything later.
> +		 */
> +		msg->flags &= ~LOG_NOCONS;

The previous code cleared LOG_NOCONS before saving flags into
console_prev. Well, the change does not have any real effect.
And the new order makes more sense.

This patch looks fine to me.

Reviewed-by: Petr Mladek <pmladek@suse.cz>

Best Regards,
Petr

>  		raw_spin_unlock(&logbuf_lock);
>  
>  		stop_critical_timings();	/* don't trace print latency */
> -		call_console_drivers(level, text, len);
> +		call_console_drivers(level, nocons, text, len);
>  		start_critical_timings();
>  		local_irq_restore(flags);
>  	}
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-19  7:25 ` Rob Landley
  2015-04-20 12:00   ` David Laight
@ 2015-04-20 14:33   ` Tejun Heo
  1 sibling, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-20 14:33 UTC (permalink / raw)
  To: Rob Landley; +Cc: Andrew Morton, David S. Miller, Kernel Mailing List, netdev

Hello, Rob.

On Sun, Apr 19, 2015 at 02:25:09AM -0500, Rob Landley wrote:
> If you have two machines plugged into a hub, and that's _all_ that's
> plugged in, packets should never get dropped. This was the original
> use case of netconsole was that the sender and the receiver were
> plugged into the same router.

Development aid on local network hasn't been the only use case for a
very long time now.  I haven't seen too many large scale setups and
two of them were using netconsole as a way to collect kernel messages
cluster-wide and having issues with lost messages.  One was running it
over a separate lower speed network from the main one which they used
for most managerial tasks including deployment and packet losses
weren't that unusual.

The other is running on the same network but the log collector isn't
per-rack so the packets end up getting routed through congested parts
of the network again experiencing messages losses.

> So are you trying to program around a problem you've actually _seen_,
> or are you attempting to reinvent TCP/IP yet again based on top of UDP
> (Drink!) because of a purely theoretical issue?

At larger scale, the problem is very real.  Let's forget about the
reliability part.  The main thing is being able to identify message
sequences so that the receiver can put the message streams back
together.

That said, once that's there, whether the "reliability" part is done
with TCP doesn't make that much of difference as it'd still need to
put back the two message streams together, but again this doesn't
matter.  Let's just ignore this part.

> > printk already keeps log metadata which contains enough information to
> > make netconsole reliable.  This patchset does the followings.
> 
> Adds a giant amount of complexity without quite explaining why.

The only signficant complexity is on the receiver side and it doesn't
even have to be in the kernel.  CON_EXTENDED and emitting extended
messages are pretty straight-forward changes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/16] printk: implement support for extended console drivers
  2015-04-16 23:03 ` [PATCH 04/16] printk: implement support for extended console drivers Tejun Heo
@ 2015-04-20 15:43   ` Petr Mladek
  2015-04-21 10:03     ` Petr Mladek
  2015-04-27 21:09     ` Tejun Heo
  0 siblings, 2 replies; 46+ messages in thread
From: Petr Mladek @ 2015-04-20 15:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Thu 2015-04-16 19:03:41, Tejun Heo wrote:
> printk log_buf keeps various metadata for each message including its
> sequence number and timestamp.  The metadata is currently available
> only through /dev/kmsg and stripped out before passed onto console
> drivers.  We want this metadata to be available to console drivers
> too.  Immediately, it's to implement reliable netconsole but may be
> useful for other console devices too.
> 
> This patch implements support for extended console drivers.  Consoles
> can indicate that they process extended messages by setting the new
> CON_EXTENDED flag and they'll fed messages formatted the same way as
> /dev/kmsg output as follows.
> 
>  <level>,<sequnum>,<timestamp>,<contflag>;<message text>
> 
> One special case is fragments.  Message fragments are output
> immediately to consoles to avoid losing them in case of crashes.  For
> normal consoles, this is handled by later suppressing the assembled
> result and /dev/kmsg only shows fully assembled message; however,
> extended consoles would need both the fragments, to avoid losing
> message in case of a crash, and the assembled result, to tell how the
> fragments are assembled and which sequence number got assigned to it.
> 
> To help matching up the fragments with the resulting message,
> fragments are emitted in the following format.
> 
>  <level>,-,<timestamp>,-,fragid=<fragid>;<message fragment>
> 
> And later when the assembly is complete, the following is transmitted,
> 
>  <level>,<sequnum>,<timestamp>,<contflag>,fragid=<fragid>;<message text>
> 
> * Extended message formatting for console drivers is enabled iff there
                                                               ^^^

s/iff/if/

>   are registered extended consoles.
> 
> * Comment describing extended message formats updated to help
>   distinguishing variable with verbatim terms.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Kay Sievers <kay@vrfy.org>
> Cc: Petr Mladek <pmladek@suse.cz>
> ---
>  include/linux/console.h |   1 +
>  kernel/printk/printk.c  | 141 +++++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 123 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 7571a16..04bbd09 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -115,6 +115,7 @@ static inline int con_debug_leave(void)
>  #define CON_BOOT	(8)
>  #define CON_ANYTIME	(16) /* Safe to call when cpu is offline */
>  #define CON_BRL		(32) /* Used for a braille device */
> +#define CON_EXTENDED	(64) /* Use the extended output format a la /dev/kmsg */
>  
>  struct console {
>  	char	name[16];
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 0175c46..349a37b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -84,6 +84,8 @@ static struct lockdep_map console_lock_dep_map = {
>  };
>  #endif
>  
> +static int nr_ext_console_drivers;
> +
>  /*
>   * Helper macros to handle lockdep when locking/unlocking console_sem. We use
>   * macros instead of functions so that _RET_IP_ contains useful information.
> @@ -195,14 +197,28 @@ static int console_may_schedule;
>   * need to be changed in the future, when the requirements change.
>   *
>   * /dev/kmsg exports the structured data in the following line format:
> - *   "level,sequnum,timestamp;<message text>\n"
> + *   "<level>,<sequnum>,<timestamp>,<contflag>;<message text>\n"
>   *
>   * The optional key/value pairs are attached as continuation lines starting
>   * with a space character and terminated by a newline. All possible
>   * non-prinatable characters are escaped in the "\xff" notation.
>   *
>   * Users of the export format should ignore possible additional values
> - * separated by ',', and find the message after the ';' character.
> + * separated by ',', and find the message after the ';' character. All
> + * optional header fields should have the form "<key>=<value>".
> + *
> + * For consoles with CON_EXTENDED set, a message formatted like the
> + * following may also be printed. This is a continuation fragment which are
> + * being assembled and will be re-transmitted with a normal header once
> + * assembly finishes. The fragments are sent out immediately to avoid
> + * losing them over a crash.
> + *   "<level>,-,<timestamp>,-,fragid=<fragid>;<message fragment>\n"
> + *
> + * On completion of assembly, the following is transmitted.
> + *   "<level>,<sequnum>,<timestamp>,<contflag>,fragid=<fragid>;<message text>\n"
> + *
> + * Extended consoles should identify and handle duplicates by matching the
> + * fragids of the fragments and assembled messages.
>   */
>  
>  enum log_flags {
> @@ -210,6 +226,7 @@ enum log_flags {
>  	LOG_NEWLINE	= 2,	/* text ended with a newline */
>  	LOG_PREFIX	= 4,	/* text started with a prefix */
>  	LOG_CONT	= 8,	/* text is a fragment of a continuation line */
> +	LOG_DICT_META	= 16,	/* dict contains console meta information */
>  };
>  
>  struct printk_log {
> @@ -292,6 +309,12 @@ static char *log_dict(const struct printk_log *msg)
>  	return (char *)msg + sizeof(struct printk_log) + msg->text_len;
>  }
>  
> +/* if LOG_DICT_META is set, the dict buffer carries printk internal info */
> +static size_t log_dict_len(const struct printk_log *msg)
> +{
> +	return (msg->flags & LOG_DICT_META) ? 0 : msg->dict_len;

We should document somewhere that the internal info and user-provided
dictionary newer could by stored in the same message.

It is because the internal info is used only with cont buffers.
And cont buffer is always stored into the log buffer separately,
see cont_flush() in vprintk_emit(). It took me a lot of time
to confirm this by reading the code.

> +}
> +
>  /* get record by index; idx must point to valid msg */
>  static struct printk_log *log_from_idx(u32 idx)
>  {
> @@ -516,7 +539,9 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
>  				    enum log_flags prev_flags)
>  {
>  	u64 ts_usec = msg->ts_nsec;
> +	u64 fragid;
>  	char cont = '-';
> +	char fragid_buf[32] = "";
>  
>  	do_div(ts_usec, 1000);
>  
> @@ -534,8 +559,14 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
>  		 ((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
>  		cont = '+';
>  
> -	return scnprintf(buf, size, "%u,%llu,%llu,%c;",
> -		       (msg->facility << 3) | msg->level, seq, ts_usec, cont);
> +	/* see cont_flush() */
> +	if ((msg->flags & LOG_DICT_META) && msg->dict_len &&
> +	    sscanf(log_dict(msg), "FRAGID=%llu", &fragid))

I was afraid that there might be a potential buffer overflow because
the user-provided dict need not be limited by '\0'. But LOG_DICT_META
is used only with the internal data that are safe. We might document
this as well.

BTW: Do you want to print the internal dict when calling this function
in devkmsg_read()?


> +		scnprintf(fragid_buf, sizeof(fragid_buf),
> +			  ",fragid=%llu", fragid);
> +	return scnprintf(buf, size, "%u,%llu,%llu,%c%s;",
> +			 (msg->facility << 3) | msg->level, seq, ts_usec, cont,
> +			 fragid_buf);

The above comment suggests that  "cont" and "fragid_buf" are delimited
by a comma but it is not used here. Is it by intention.

>  }
>  
>  static ssize_t msg_print_ext_body(char *buf, size_t size,
> @@ -688,7 +719,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
>  	len = msg_print_ext_header(user->buf, sizeof(user->buf),
>  				   msg, user->seq, user->prev);
>  	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
> -				  log_dict(msg), msg->dict_len,
> +				  log_dict(msg), log_dict_len(msg),
>  				  log_text(msg), msg->text_len);
>  
>  	user->prev = msg->flags;
> @@ -1418,6 +1449,7 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
>   * The console_lock must be held.
>   */
>  static void call_console_drivers(int level, bool nocons,
> +				 const char *ext_text, size_t ext_len,
>  				 const char *text, size_t len)
>  {
>  	struct console *con;
> @@ -1440,13 +1472,16 @@ static void call_console_drivers(int level, bool nocons,
>  		    !(con->flags & CON_ANYTIME))
>  			continue;
>  		/*
> -		 * Skip record we have buffered and already printed
> -		 * directly to the console when we received it.
> +		 * For !ext consoles, skip record we have buffered and
> +		 * already printed directly when we received it.  Ext
> +		 * consoles are responsible for filtering on their ends.
>  		 */
> -		if (nocons)
> -			continue;
> -
> -		con->write(con, text, len);
> +		if (con->flags & CON_EXTENDED) {
> +			if (ext_len)
> +				con->write(con, ext_text, ext_len);
> +		} else if (!nocons) {
> +			con->write(con, text, len);
> +		}
>  	}
>  }
>  
> @@ -1546,6 +1581,7 @@ static inline void printk_delay(void)
>   */
>  static struct cont {
>  	char buf[LOG_LINE_MAX];
> +	u64 fragid;			/* unique fragment id */
>  	size_t len;			/* length == 0 means unused buffer */
>  	size_t cons;			/* bytes written to console */
>  	struct task_struct *owner;	/* task of first print*/
> @@ -1564,13 +1600,23 @@ static void cont_flush(enum log_flags flags)
>  		return;
>  
>  	if (cont.cons) {
> +		char dict_buf[32];
> +		size_t dict_len;
> +
>  		/*
>  		 * If a fragment of this line was directly flushed to the
>  		 * console; wait for the console to pick up the rest of the
> -		 * line. LOG_NOCONS suppresses a duplicated output.
> +		 * line. LOG_NOCONS suppresses a duplicated output for !ext
> +		 * consoles. ext consoles handle the duplicates themselves
> +		 * and expect fragid in the header of the duplicate
> +		 * messages for that. Record the fragid in the dict buffer
> +		 * which fragmented messages never use.
>  		 */
> -		log_store(cont.facility, cont.level, flags | LOG_NOCONS,
> -			  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
> +		dict_len = scnprintf(dict_buf, sizeof(dict_buf), "FRAGID=%llu",
> +				     cont.fragid);
> +		log_store(cont.facility, cont.level,
> +			  flags | LOG_NOCONS | LOG_DICT_META,
> +			  cont.ts_nsec, dict_buf, dict_len, cont.buf, cont.len);

I wonder if it would make sense to restart fragid here. Another line
will get distinguished by "seqnum".

By other words, the first fragment for each seqnum would be "0".
The whole lines (without cont buffer) could always print "0".

Sigh, the whole mess is caused by the fact that we could not extend
struct printk_log easily. It would be much better if we could put
fragid there. I finally understand why you need to reuse the dict.


Sigh, it adds another twist into the already complex printk code.

Another solution would be to allow to disable the continuous buffer
via some boot option or /sys/kernel/debug entry if you want to debug
such a problem and have problem with the partial flushing.

Hmm, I wonder what are the typical values passed via "dict" and if we
need to handle this entry such a special way. It would make sense to
always print dict values to the ext consoles and handle this as a
yet another "normal" dict value. It would allow us to remove
some hacks at least.

Best Regards,
Petr



>  		cont.flags = flags;
>  		cont.flushed = true;
>  	} else {
> @@ -1596,6 +1642,9 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
>  	}
>  
>  	if (!cont.len) {
> +		static u64 fragid;
> +
> +		cont.fragid = fragid++;
>  		cont.facility = facility;
>  		cont.level = level;
>  		cont.owner = current;
> @@ -1920,14 +1969,28 @@ static u32 log_first_idx;
>  static u64 log_next_seq;
>  static enum log_flags console_prev;
>  static struct cont {
> +	char buf[1];
> +	u64 fragid;
>  	size_t len;
>  	size_t cons;
> +	u64 ts_nsec;
>  	u8 level;
> +	u8 facility;
>  	bool flushed:1;
>  } cont;
> +static char *log_text(const struct printk_log *msg) { return NULL; }
> +static char *log_dict(const struct printk_log *msg) { return NULL; }
> +static size_t log_dict_len(const struct printk_log *msg) { return 0; }
>  static struct printk_log *log_from_idx(u32 idx) { return NULL; }
>  static u32 log_next(u32 idx) { return 0; }
> +static ssize_t msg_print_ext_header(char *buf, size_t size,
> +				    struct printk_log *msg, u64 seq,
> +				    enum log_flags prev_flags) { return 0; }
> +static ssize_t msg_print_ext_body(char *buf, size_t size,
> +				  char *dict, size_t dict_len,
> +				  char *text, size_t text_len) { return 0; }
>  static void call_console_drivers(int level, bool nocons,
> +				 const char *ext_text, size_t ext_len,
>  				 const char *text, size_t len) {}
>  static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
>  			     bool syslog, char *buf, size_t size) { return 0; }
> @@ -2178,10 +2241,11 @@ int is_console_locked(void)
>  	return console_locked;
>  }
>  
> -static void console_cont_flush(char *text, size_t size)
> +static void console_cont_flush(char *ext_text, size_t ext_size,
> +			       char *text, size_t size)
>  {
>  	unsigned long flags;
> -	size_t len;
> +	size_t len, ext_len = 0;
>  
>  	raw_spin_lock_irqsave(&logbuf_lock, flags);
>  
> @@ -2196,10 +2260,30 @@ static void console_cont_flush(char *text, size_t size)
>  	if (console_seq < log_next_seq && !cont.cons)
>  		goto out;
>  
> +	/*
> +	 * Fragment dump for consoles. Will be printed again when the
> +	 * assembly is complete.
> +	 */
> +	if (nr_ext_console_drivers && cont.len > cont.cons) {
> +		u64 ts_usec = cont.ts_nsec;
> +
> +		do_div(ts_usec, 1000);
> +
> +		ext_len = scnprintf(ext_text, ext_size,
> +				    "%u,-,%llu,-,fragid=%llu;",
> +				    (cont.facility << 3) | cont.level, ts_usec,
> +				    cont.fragid);
> +		ext_len += msg_print_ext_body(ext_text + ext_len,
> +					      ext_size - ext_len, NULL, 0,
> +					      cont.buf + cont.cons,
> +					      cont.len - cont.cons);
> +	}
> +
>  	len = cont_print_text(text, size);
> +
>  	raw_spin_unlock(&logbuf_lock);
>  	stop_critical_timings();
> -	call_console_drivers(cont.level, false, text, len);
> +	call_console_drivers(cont.level, false, ext_text, ext_len, text, len);
>  	start_critical_timings();
>  	local_irq_restore(flags);
>  	return;
> @@ -2223,6 +2307,7 @@ out:
>   */
>  void console_unlock(void)
>  {
> +	static char ext_text[CONSOLE_EXT_LOG_MAX];
>  	static char text[LOG_LINE_MAX + PREFIX_MAX];
>  	static u64 seen_seq;
>  	unsigned long flags;
> @@ -2237,11 +2322,12 @@ void console_unlock(void)
>  	console_may_schedule = 0;
>  
>  	/* flush buffered message fragment immediately to console */
> -	console_cont_flush(text, sizeof(text));
> +	console_cont_flush(ext_text, sizeof(ext_text), text, sizeof(text));
>  again:
>  	for (;;) {
>  		struct printk_log *msg;
>  		size_t len;
> +		size_t ext_len = 0;
>  		int level;
>  		bool nocons;
>  
> @@ -2271,6 +2357,15 @@ again:
>  		nocons = msg->flags & LOG_NOCONS;
>  		len += msg_print_text(msg, console_prev, false,
>  				      text + len, sizeof(text) - len);
> +		if (nr_ext_console_drivers) {
> +			ext_len = msg_print_ext_header(ext_text,
> +						sizeof(ext_text),
> +						msg, console_seq, console_prev);
> +			ext_len += msg_print_ext_body(ext_text + ext_len,
> +						sizeof(ext_text) - ext_len,
> +						log_dict(msg), log_dict_len(msg),
> +						log_text(msg), msg->text_len);
> +		}
>  		console_idx = log_next(console_idx);
>  		console_seq++;
>  		console_prev = msg->flags;
> @@ -2285,7 +2380,8 @@ again:
>  		raw_spin_unlock(&logbuf_lock);
>  
>  		stop_critical_timings();	/* don't trace print latency */
> -		call_console_drivers(level, nocons, text, len);
> +		call_console_drivers(level, nocons,
> +				     ext_text, ext_len, text, len);
>  		start_critical_timings();
>  		local_irq_restore(flags);
>  	}
> @@ -2540,6 +2636,10 @@ void register_console(struct console *newcon)
>  		newcon->next = console_drivers->next;
>  		console_drivers->next = newcon;
>  	}
> +
> +	if (newcon->flags & CON_EXTENDED)
> +		nr_ext_console_drivers++;
> +
>  	if (newcon->flags & CON_PRINTBUFFER) {
>  		/*
>  		 * console_unlock(); will print out the buffered messages
> @@ -2612,6 +2712,9 @@ int unregister_console(struct console *console)
>  		}
>  	}
>  
> +	if (!res && (console->flags & CON_EXTENDED))
> +		nr_ext_console_drivers--;
> +
>  	/*
>  	 * If this isn't the last console and it has CON_CONSDEV set, we
>  	 * need to set it on the next preferred console.
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/16] printk: implement support for extended console drivers
  2015-04-20 15:43   ` Petr Mladek
@ 2015-04-21 10:03     ` Petr Mladek
  2015-04-27 21:09     ` Tejun Heo
  1 sibling, 0 replies; 46+ messages in thread
From: Petr Mladek @ 2015-04-21 10:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Mon 2015-04-20 17:43:07, Petr Mladek wrote:
> On Thu 2015-04-16 19:03:41, Tejun Heo wrote:
> > printk log_buf keeps various metadata for each message including its
> > sequence number and timestamp.  The metadata is currently available
> > only through /dev/kmsg and stripped out before passed onto console
> > drivers.  We want this metadata to be available to console drivers
> > too.  Immediately, it's to implement reliable netconsole but may be
> > useful for other console devices too.
> > 
> > This patch implements support for extended console drivers.  Consoles
> > can indicate that they process extended messages by setting the new
> > CON_EXTENDED flag and they'll fed messages formatted the same way as
> > /dev/kmsg output as follows.
> > 
> >  <level>,<sequnum>,<timestamp>,<contflag>;<message text>
> > 
> > One special case is fragments.  Message fragments are output
> > immediately to consoles to avoid losing them in case of crashes.  For
> > normal consoles, this is handled by later suppressing the assembled
> > result and /dev/kmsg only shows fully assembled message; however,
> > extended consoles would need both the fragments, to avoid losing
> > message in case of a crash, and the assembled result, to tell how the
> > fragments are assembled and which sequence number got assigned to it.
> > 
> > To help matching up the fragments with the resulting message,
> > fragments are emitted in the following format.
> > 
> >  <level>,-,<timestamp>,-,fragid=<fragid>;<message fragment>
> > 
> > And later when the assembly is complete, the following is transmitted,
> > 
> >  <level>,<sequnum>,<timestamp>,<contflag>,fragid=<fragid>;<message text>
> > 
> > * Extended message formatting for console drivers is enabled iff there

[...]

> Sigh, it adds another twist into the already complex printk code.
> 
> Another solution would be to allow to disable the continuous buffer
> via some boot option or /sys/kernel/debug entry if you want to debug
> such a problem and have problem with the partial flushing.
> 
> Hmm, I wonder what are the typical values passed via "dict" and if we
> need to handle this entry such a special way. It would make sense to
> always print dict values to the ext consoles and handle this as a
> yet another "normal" dict value. It would allow us to remove
> some hacks at least.

Another solution would be to always flush cont buffer when it is being
printed to the console. Then the messages might by distinguished by
the seqnum and fragid won't be needed. The question is if the cont
buffer will still have any affect after this change.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET] printk, netconsole: implement reliable netconsole
  2015-04-17 17:17     ` David Miller
  2015-04-17 17:37       ` Tejun Heo
@ 2015-04-21 21:51       ` Stephen Hemminger
  1 sibling, 0 replies; 46+ messages in thread
From: Stephen Hemminger @ 2015-04-21 21:51 UTC (permalink / raw)
  To: David Miller; +Cc: tj, penguin-kernel, akpm, linux-kernel, netdev

On Fri, 17 Apr 2015 13:17:12 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Tejun Heo <tj@kernel.org>
> Date: Fri, 17 Apr 2015 12:28:26 -0400
> 
> > On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote:
> >> If the sender side can wait for retransmission, why can't we use
> >> userspace programs (e.g. rsyslogd)?
> > 
> > Because the system may be oopsing, ooming or threshing excessively
> > rendering the userland inoperable and that's exactly when we want
> > those log messages to be transmitted out of the system.
> 
> If userland cannot run properly, it is almost certain that neither will
> your complex reliability layer logic.
> 
> I tend to agree with Tetsuo, that in-kernel netconsole should remain
> as simple as possible and once it starts to have any smarts and less
> trivial logic the job belongs in userspace.

Keep existing netconsole as simple as possible. It is not meant as
reliable, secure logging.

"Those who do not understand TCP are doomed to reinvent it"

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/16] printk: implement support for extended console drivers
  2015-04-20 15:43   ` Petr Mladek
  2015-04-21 10:03     ` Petr Mladek
@ 2015-04-27 21:09     ` Tejun Heo
  2015-04-28  9:42       ` Petr Mladek
  1 sibling, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-27 21:09 UTC (permalink / raw)
  To: Petr Mladek; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

Hello, Petr.

Sorry about the delay.

On Mon, Apr 20, 2015 at 05:43:07PM +0200, Petr Mladek wrote:
...
> > * Extended message formatting for console drivers is enabled iff there
> 
> s/iff/if/

if and only if

> I was afraid that there might be a potential buffer overflow because
> the user-provided dict need not be limited by '\0'. But LOG_DICT_META
> is used only with the internal data that are safe. We might document
> this as well.
> 
> BTW: Do you want to print the internal dict when calling this function
> in devkmsg_read()?

No, plesae see below.

> > +		scnprintf(fragid_buf, sizeof(fragid_buf),
> > +			  ",fragid=%llu", fragid);
> > +	return scnprintf(buf, size, "%u,%llu,%llu,%c%s;",
> > +			 (msg->facility << 3) | msg->level, seq, ts_usec, cont,
> > +			 fragid_buf);
> 
> The above comment suggests that  "cont" and "fragid_buf" are delimited
> by a comma but it is not used here. Is it by intention.

Hmm... how is it not?  The fragid printf has preceding comma.

> > +		dict_len = scnprintf(dict_buf, sizeof(dict_buf), "FRAGID=%llu",
> > +				     cont.fragid);
> > +		log_store(cont.facility, cont.level,
> > +			  flags | LOG_NOCONS | LOG_DICT_META,
> > +			  cont.ts_nsec, dict_buf, dict_len, cont.buf, cont.len);
> 
> I wonder if it would make sense to restart fragid here. Another line
> will get distinguished by "seqnum".

That'd assume that there can only ever be a single continuation
buffer, which is true now but it's possible that we may want to make
it per-cpu in the future.

> Sigh, the whole mess is caused by the fact that we could not extend
> struct printk_log easily. It would be much better if we could put
> fragid there. I finally understand why you need to reuse the dict.

I've been thinking about it and using dict area for internal metadata
is indeed quite messy.  I think a better way to do it is declaring
dict_len as a union w/ fragid.  This'd limit the fragid to 16bit but
that should be more than enough and we can do away with the string
formatting and reading back.

> Another solution would be to allow to disable the continuous buffer
> via some boot option or /sys/kernel/debug entry if you want to debug
> such a problem and have problem with the partial flushing.

It isn't really about debugging partial flushing itself but rather
always being able to push out the messages before the printk
invocation finishes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/16] printk: implement support for extended console drivers
  2015-04-27 21:09     ` Tejun Heo
@ 2015-04-28  9:42       ` Petr Mladek
  2015-04-28 14:10         ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Petr Mladek @ 2015-04-28  9:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Mon 2015-04-27 17:09:22, Tejun Heo wrote:
> Hello, Petr.
> 
> Sorry about the delay.

np

> On Mon, Apr 20, 2015 at 05:43:07PM +0200, Petr Mladek wrote:
> > I was afraid that there might be a potential buffer overflow because
> > the user-provided dict need not be limited by '\0'. But LOG_DICT_META
> > is used only with the internal data that are safe. We might document
> > this as well.
> > 
> > BTW: Do you want to print the internal dict when calling this function
> > in devkmsg_read()?
> 
> No, plesae see below.

I see, it is not printed there because the dict_len is zero for the
internal info.

> > > +		scnprintf(fragid_buf, sizeof(fragid_buf),
> > > +			  ",fragid=%llu", fragid);
> > > +	return scnprintf(buf, size, "%u,%llu,%llu,%c%s;",
> > > +			 (msg->facility << 3) | msg->level, seq, ts_usec, cont,
> > > +			 fragid_buf);
> > 
> > The above comment suggests that  "cont" and "fragid_buf" are delimited
> > by a comma but it is not used here. Is it by intention.
> 
> Hmm... how is it not?  The fragid printf has preceding comma.

I see it now.

> > > +		dict_len = scnprintf(dict_buf, sizeof(dict_buf), "FRAGID=%llu",
> > > +				     cont.fragid);
> > > +		log_store(cont.facility, cont.level,
> > > +			  flags | LOG_NOCONS | LOG_DICT_META,
> > > +			  cont.ts_nsec, dict_buf, dict_len, cont.buf, cont.len);
> > 
> > I wonder if it would make sense to restart fragid here. Another line
> > will get distinguished by "seqnum".
> 
> That'd assume that there can only ever be a single continuation
> buffer, which is true now but it's possible that we may want to make
> it per-cpu in the future.

I am not sure if any more complications will get accepted. Anyway,
each CPU should print an independent message. Therefore each pre-CPU
cont buffer would be printed on separate line and would get another
seqnum. We could have per-CPU fragid counter.


> > Sigh, the whole mess is caused by the fact that we could not extend
> > struct printk_log easily. It would be much better if we could put
> > fragid there. I finally understand why you need to reuse the dict.
> 
> I've been thinking about it and using dict area for internal metadata
> is indeed quite messy.  I think a better way to do it is declaring
> dict_len as a union w/ fragid.  This'd limit the fragid to 16bit but
> that should be more than enough and we can do away with the string
> formatting and reading back.

Nice idea. I like it much more.


> > Another solution would be to allow to disable the continuous buffer
> > via some boot option or /sys/kernel/debug entry if you want to debug
> > such a problem and have problem with the partial flushing.
>
> It isn't really about debugging partial flushing itself but rather
> always being able to push out the messages before the printk
> invocation finishes.

I did not mean cont-buffer debugging. I meant debugging with
netconsole.

Note that cont buffer is only a memory optimization. You could put
every text snippet into the ring buffer and console immediately.
You need to set LOG_CONT instead of LOG_NEWLINE flag. This is
already used when the line is too long for the cont buffer or
when there is printed a message from another cpu in the middle
of a continuing message.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/16] printk: implement support for extended console drivers
  2015-04-28  9:42       ` Petr Mladek
@ 2015-04-28 14:10         ` Tejun Heo
  2015-04-28 14:24           ` Petr Mladek
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2015-04-28 14:10 UTC (permalink / raw)
  To: Petr Mladek; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

Hello, Petr.

On Tue, Apr 28, 2015 at 11:42:44AM +0200, Petr Mladek wrote:
> Note that cont buffer is only a memory optimization. You could put
> every text snippet into the ring buffer and console immediately.
> You need to set LOG_CONT instead of LOG_NEWLINE flag. This is
> already used when the line is too long for the cont buffer or
> when there is printed a message from another cpu in the middle
> of a continuing message.

Yeah, hmmm, I wonder whether the right solution here is to bypass cont
buffer if ext_console is present.  This would use a bit more memory
and hurt users catting /proc/kmsg but everyone else should be able to
get the same result and things get a lot simpler on both sending and
receiving sides.  What do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/16] printk: implement support for extended console drivers
  2015-04-28 14:10         ` Tejun Heo
@ 2015-04-28 14:24           ` Petr Mladek
  0 siblings, 0 replies; 46+ messages in thread
From: Petr Mladek @ 2015-04-28 14:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, davem, linux-kernel, netdev, Kay Sievers

On Tue 2015-04-28 10:10:14, Tejun Heo wrote:
> Hello, Petr.
> 
> On Tue, Apr 28, 2015 at 11:42:44AM +0200, Petr Mladek wrote:
> > Note that cont buffer is only a memory optimization. You could put
> > every text snippet into the ring buffer and console immediately.
> > You need to set LOG_CONT instead of LOG_NEWLINE flag. This is
> > already used when the line is too long for the cont buffer or
> > when there is printed a message from another cpu in the middle
> > of a continuing message.
> 
> Yeah, hmmm, I wonder whether the right solution here is to bypass cont
> buffer if ext_console is present.  This would use a bit more memory
> and hurt users catting /proc/kmsg but everyone else should be able to
> get the same result and things get a lot simpler on both sending and
> receiving sides.  What do you think?

I would personally give it a try. If it works well, we might
eventually get rid of the cont buffer completely and make
the printk code a bit easier again.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-04-28 14:21 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
2015-04-16 23:03 ` [PATCH 01/16] printk: guard the amount written per line by devkmsg_read() Tejun Heo
2015-04-20 12:11   ` Petr Mladek
2015-04-20 12:33     ` Petr Mladek
2015-04-16 23:03 ` [PATCH 02/16] printk: factor out message formatting from devkmsg_read() Tejun Heo
2015-04-20 12:30   ` Petr Mladek
2015-04-16 23:03 ` [PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers() Tejun Heo
2015-04-20 12:50   ` Petr Mladek
2015-04-16 23:03 ` [PATCH 04/16] printk: implement support for extended console drivers Tejun Heo
2015-04-20 15:43   ` Petr Mladek
2015-04-21 10:03     ` Petr Mladek
2015-04-27 21:09     ` Tejun Heo
2015-04-28  9:42       ` Petr Mladek
2015-04-28 14:10         ` Tejun Heo
2015-04-28 14:24           ` Petr Mladek
2015-04-16 23:03 ` [PATCH 05/16] printk: implement log_seq_range() and ext_log_from_seq() Tejun Heo
2015-04-16 23:03 ` [PATCH 06/16] netconsole: make netconsole_target->enabled a bool Tejun Heo
2015-04-16 23:03 ` [PATCH 07/16] netconsole: factor out alloc_netconsole_target() Tejun Heo
2015-04-16 23:03 ` [PATCH 08/16] netconsole: punt disabling to workqueue from netdevice_notifier Tejun Heo
2015-04-16 23:03 ` [PATCH 09/16] netconsole: replace target_list_lock with console_lock Tejun Heo
2015-04-16 23:03 ` [PATCH 10/16] netconsole: introduce netconsole_mutex Tejun Heo
2015-04-16 23:03 ` [PATCH 11/16] netconsole: consolidate enable/disable and create/destroy paths Tejun Heo
2015-04-16 23:03 ` [PATCH 12/16] netconsole: implement extended console support Tejun Heo
2015-04-16 23:03 ` [PATCH 13/16] netconsole: implement retransmission support for extended consoles Tejun Heo
2015-04-16 23:03 ` [PATCH 14/16] netconsole: implement ack handling and emergency transmission Tejun Heo
2015-04-16 23:03 ` [PATCH 15/16] netconsole: implement netconsole receiver library Tejun Heo
2015-04-16 23:03 ` [PATCH 16/16] netconsole: update documentation for extended netconsole Tejun Heo
2015-04-17 15:35 ` [PATCHSET] printk, netconsole: implement reliable netconsole Tetsuo Handa
2015-04-17 16:28   ` Tejun Heo
2015-04-17 17:17     ` David Miller
2015-04-17 17:37       ` Tejun Heo
2015-04-17 17:43         ` Tetsuo Handa
2015-04-17 17:45           ` Tejun Heo
2015-04-17 18:03             ` Tetsuo Handa
2015-04-17 18:07               ` Tejun Heo
2015-04-17 18:20                 ` Tetsuo Handa
2015-04-17 18:26                   ` Tejun Heo
2015-04-18 13:09                     ` Tetsuo Handa
2015-04-17 18:04         ` Tejun Heo
2015-04-17 18:55         ` David Miller
2015-04-17 19:52           ` Tejun Heo
2015-04-17 20:06             ` David Miller
2015-04-21 21:51       ` Stephen Hemminger
2015-04-19  7:25 ` Rob Landley
2015-04-20 12:00   ` David Laight
2015-04-20 14:33   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).