netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] printk, netconsole: implement reliable netconsole
@ 2015-04-16 23:03 Tejun Heo
  2015-04-16 23:03 ` [PATCH 01/16] printk: guard the amount written per line by devkmsg_read() Tejun Heo
                   ` (17 more replies)
  0 siblings, 18 replies; 46+ messages in thread
From: Tejun Heo @ 2015-04-16 23:03 UTC (permalink / raw)
  To: akpm, davem; +Cc: linux-kernel, netdev

In a lot of configurations, netconsole is a useful way to collect
system logs; however, all netconsole does is simply emitting UDP
packets for the raw messages and there's no way for the receiver to
find out whether the packets were lost and/or reordered in flight.

printk already keeps log metadata which contains enough information to
make netconsole reliable.  This patchset does the followings.

* Make printk metadata available to console drivers.  A console driver
  can request this mode by setting CON_EXTENDED.  The metadata is
  emitted in the same format as /dev/kmsg.  This also makes all
  logging metadata including facility, loglevel and dictionary
  available to console receivers.

* Implement extended mode support in netconsole.  When enabled,
  netconsole transmits messages with extended header which is enough
  for the receiver to detect missing messages.

* Implement netconsole retransmission support.  Matching rx socket on
  the source port is automatically created for extended targets and
  the log receiver can request retransmission by sending reponse
  packets.  This is completely decoupled from the main write path and
  doesn't make netconsole less robust when things start go south.

* Implement netconsole ack support.  The response packet can
  optionally contain ack which enables emergency transmission timer.
  If acked sequence lags the current sequence for over 10s, netconsole
  repeatedly re-sends unacked messages with increasing interval.  This
  ensures that the receiver has the latest messages and also that all
  messages are transferred even while the kernel is failing as long as
  timer and netpoll are operational.  This too is completely decoupled
  from the main write path and doesn't make netconsole less robust.

* Implement the receiver library and simple receiver using it
  respectively in tools/lib/netconsole/libncrx.a and tools/ncrx/ncrx.
  In a simulated test with heavy packet loss (50%), ncrx logs all
  messages reliably and handle exceptional conditions including
  reboots as expected.

An obvious alternative for reliable loggin would be using a separate
TCP connection in addition to the UDP packets; however, I decided for
UDP based retransmission and ack mechanism for the following reasons.

* kernel side doesn't get simpler by using TCP.  It'd still need to
  transmit extended format messages, which BTW are useful regardless
  of reliable transmission, to match up UDP and TCP messages and
  detect missing ones from TCP send buffer filling up.  Also, the
  timeout and emergency transmission support would still be necessary
  to ensure that messages are transmitted in case of, e.g., network
  stack faiure.  It'd at least be about the same amount of code as the
  UDP based implementation.

* Receiver side might be a bit simpler but not by much.  It'd still
  need to keep track of the UDP based messages and then match them up
  with TCP messages and put messages from both sources in order (each
  stream may miss different ones) and would have to deal with
  reestablishing connections after reboots.  The only part which can
  completely go away would be the actual ack and retransmission part
  and that isn't a lot of logic.

* When the network condition is good, the only thing the UDP based
  implementation adds is occassional ack messages.  TCP based
  implementation would end up transmitting all messages twice which
  still isn't much but kinda silly given that using TCP doesn't lower
  the complexity in meaningful ways.

This patchset contains the following 16 patches.

 0001-printk-guard-the-amount-written-per-line-by-devkmsg_.patch
 0002-printk-factor-out-message-formatting-from-devkmsg_re.patch
 0003-printk-move-LOG_NOCONS-skipping-into-call_console_dr.patch
 0004-printk-implement-support-for-extended-console-driver.patch
 0005-printk-implement-log_seq_range-and-ext_log_from_seq.patch
 0006-netconsole-make-netconsole_target-enabled-a-bool.patch
 0007-netconsole-factor-out-alloc_netconsole_target.patch
 0008-netconsole-punt-disabling-to-workqueue-from-netdevic.patch
 0009-netconsole-replace-target_list_lock-with-console_loc.patch
 0010-netconsole-introduce-netconsole_mutex.patch
 0011-netconsole-consolidate-enable-disable-and-create-des.patch
 0012-netconsole-implement-extended-console-support.patch
 0013-netconsole-implement-retransmission-support-for-exte.patch
 0014-netconsole-implement-ack-handling-and-emergency-tran.patch
 0015-netconsole-implement-netconsole-receiver-library.patch
 0016-netconsole-update-documentation-for-extended-netcons.patch

0001-0005 implement extended console support in printk.

0006-0011 are prep patches for netconsole.

0012-0014 implement extended mode, retransmission and ack support.

0015 implements receiver library, libncrx, and a simple receiver using
the library, ncrx.

0016 updates documentation.

As the patchset touches both printk and netconsole, I'm not sure how
these patches should be routed once acked.  Either -mm or net should
work, I think.

This patchset is on top of linus#master[1] and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-netconsole-ext

diffstat follows.  Thanks.

 Documentation/networking/netconsole.txt |   95 +++
 drivers/net/netconsole.c                |  800 +++++++++++++++++++++++-----
 include/linux/console.h                 |    1 
 include/linux/printk.h                  |   16 
 kernel/printk/printk.c                  |  411 +++++++++++---
 tools/Makefile                          |   16 
 tools/lib/netconsole/Makefile           |   36 +
 tools/lib/netconsole/ncrx.c             |  906 ++++++++++++++++++++++++++++++++
 tools/lib/netconsole/ncrx.h             |  204 +++++++
 tools/ncrx/Makefile                     |   14 
 tools/ncrx/ncrx.c                       |  143 +++++
 11 files changed, 2419 insertions(+), 223 deletions(-)

--
tejun

[1] 497a5df7bf6f ("Merge tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip")

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-04-28 14:21 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-16 23:03 [PATCHSET] printk, netconsole: implement reliable netconsole Tejun Heo
2015-04-16 23:03 ` [PATCH 01/16] printk: guard the amount written per line by devkmsg_read() Tejun Heo
2015-04-20 12:11   ` Petr Mladek
2015-04-20 12:33     ` Petr Mladek
2015-04-16 23:03 ` [PATCH 02/16] printk: factor out message formatting from devkmsg_read() Tejun Heo
2015-04-20 12:30   ` Petr Mladek
2015-04-16 23:03 ` [PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers() Tejun Heo
2015-04-20 12:50   ` Petr Mladek
2015-04-16 23:03 ` [PATCH 04/16] printk: implement support for extended console drivers Tejun Heo
2015-04-20 15:43   ` Petr Mladek
2015-04-21 10:03     ` Petr Mladek
2015-04-27 21:09     ` Tejun Heo
2015-04-28  9:42       ` Petr Mladek
2015-04-28 14:10         ` Tejun Heo
2015-04-28 14:24           ` Petr Mladek
2015-04-16 23:03 ` [PATCH 05/16] printk: implement log_seq_range() and ext_log_from_seq() Tejun Heo
2015-04-16 23:03 ` [PATCH 06/16] netconsole: make netconsole_target->enabled a bool Tejun Heo
2015-04-16 23:03 ` [PATCH 07/16] netconsole: factor out alloc_netconsole_target() Tejun Heo
2015-04-16 23:03 ` [PATCH 08/16] netconsole: punt disabling to workqueue from netdevice_notifier Tejun Heo
2015-04-16 23:03 ` [PATCH 09/16] netconsole: replace target_list_lock with console_lock Tejun Heo
2015-04-16 23:03 ` [PATCH 10/16] netconsole: introduce netconsole_mutex Tejun Heo
2015-04-16 23:03 ` [PATCH 11/16] netconsole: consolidate enable/disable and create/destroy paths Tejun Heo
2015-04-16 23:03 ` [PATCH 12/16] netconsole: implement extended console support Tejun Heo
2015-04-16 23:03 ` [PATCH 13/16] netconsole: implement retransmission support for extended consoles Tejun Heo
2015-04-16 23:03 ` [PATCH 14/16] netconsole: implement ack handling and emergency transmission Tejun Heo
2015-04-16 23:03 ` [PATCH 15/16] netconsole: implement netconsole receiver library Tejun Heo
2015-04-16 23:03 ` [PATCH 16/16] netconsole: update documentation for extended netconsole Tejun Heo
2015-04-17 15:35 ` [PATCHSET] printk, netconsole: implement reliable netconsole Tetsuo Handa
2015-04-17 16:28   ` Tejun Heo
2015-04-17 17:17     ` David Miller
2015-04-17 17:37       ` Tejun Heo
2015-04-17 17:43         ` Tetsuo Handa
2015-04-17 17:45           ` Tejun Heo
2015-04-17 18:03             ` Tetsuo Handa
2015-04-17 18:07               ` Tejun Heo
2015-04-17 18:20                 ` Tetsuo Handa
2015-04-17 18:26                   ` Tejun Heo
2015-04-18 13:09                     ` Tetsuo Handa
2015-04-17 18:04         ` Tejun Heo
2015-04-17 18:55         ` David Miller
2015-04-17 19:52           ` Tejun Heo
2015-04-17 20:06             ` David Miller
2015-04-21 21:51       ` Stephen Hemminger
2015-04-19  7:25 ` Rob Landley
2015-04-20 12:00   ` David Laight
2015-04-20 14:33   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).