All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v3] netconsole: implement extended console support
@ 2015-05-11 16:41 Tejun Heo
  2015-05-11 16:41 ` [PATCH 1/4] netconsole: remove unnecessary netconsole_target_get/out() from write_msg() Tejun Heo
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Tejun Heo @ 2015-05-11 16:41 UTC (permalink / raw)
  To: davem; +Cc: akpm, linux-kernel, netdev, penguin-kernel, sd

This patchset is v3 of netconsole extended console support.  v1 was
part of "printk, netconsole: implement reliable netconsole"
patchset[1].  The printk part is broken off to a separate patchset[2]
"printk: implement extended console support" which this patchset is
dependent upon.

Changes from the last version[3] are

* 0001-netconsole-remove-unnecessary-netconsole_target_get-.patch
  added so that we don't end up adding unnecessary get/put to
  write_ext_msg() for consistency.

* In 0004-netconsole-implement-extended-console-support.patch,
  send_ext_msg_udp() restructured to address Tetsuo and Sabrina's
  review points.

netconsole emits one or more udp messages per each log message and
only transmits the body, which works fine when it's used as a
debugging tool on local network; however, netconsole, due to its
advantages for troubleshooting kernel issues, is also used as a
mechanism to collect kernel messages at larger scale where the packets
may have to travel across congested networks or networks with multiple
paths.

Of the handful large cluster setups that I've seen, two were using
netconsole for fleet-wide kernel logging and having problem with lost
messages.  One was a HPC cluster which had a dedicated slower
management network which was used for all management traffic where
packet losses were fairly common for several different reasons - the
network itself could get fairly overloaded at times and IPMI sharing
the interface didn't seem to help either.  The other is a large web
service cluster where the aggregator is some hops away and packet
losses do happen from time to time.

Because netconsole packets don't carry any metadata, it's impossible
to tell what happened to the messages during transit and even
combining it with messages transmitted via a separate reliable
mechanism is challenging as it boils down to matching message content
textually.

The "printk, netconsole: implement reliable netconsole" patchset[1]
implements extended console support.  If a console driver sets
CON_EXTENDED, printk formats each message in the same way /dev/kmsg
messages are formatted which includes all metadata and, for structured
log messages, KEY=VALUE dictionary.

This patchset implements extended console support for netconsole,
which allows log consumers access to complete log information and to
tell which messages are missing and/or reordered, which can be used to
implement reliable kernel message logging when combined with userland
helpers.

Changes to netconsole are straight-forward.  It optionally registers a
separate extended console driver.  printk passes in extended format
messages which are transmitted the same way.  The only complication is
when the message is longer than the maximum payload size (1k).  As
each message should have proper header and the log receiver should be
able to tell which part the fragment is, netconsole duplicates full
header on each fragment and also adds an extra ncfrag=OFF/LEN header.

 0001-netconsole-remove-unnecessary-netconsole_target_get-.patch
 0002-netconsole-make-netconsole_target-enabled-a-bool.patch
 0003-netconsole-make-all-dynamic-netconsoles-share-a-mute.patch
 0004-netconsole-implement-extended-console-support.patch

diffstat follows.  Thanks.

 Documentation/networking/netconsole.txt |   35 ++++++
 drivers/net/netconsole.c                |  169 ++++++++++++++++++++++++++++----
 2 files changed, 185 insertions(+), 19 deletions(-)

--
tejun

[1] http://lkml.kernel.org/g/1429225433-11946-1-git-send-email-tj@kernel.org
[2] http://lkml.kernel.org/g/1430318704-32374-1-git-send-email-tj@kernel.org
[3] http://lkml.kernel.org/g/1430505220-25160-1-git-send-email-tj@kernel.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-05-14 15:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-11 16:41 [PATCHSET v3] netconsole: implement extended console support Tejun Heo
2015-05-11 16:41 ` [PATCH 1/4] netconsole: remove unnecessary netconsole_target_get/out() from write_msg() Tejun Heo
2015-05-11 16:41 ` [PATCH 2/4] netconsole: make netconsole_target->enabled a bool Tejun Heo
2015-05-11 16:41 ` [PATCH 3/4] netconsole: make all dynamic netconsoles share a mutex Tejun Heo
2015-05-11 16:41 ` [PATCH 4/4] netconsole: implement extended console support Tejun Heo
2015-05-11 17:23   ` David Miller
2015-05-11 20:37     ` Tejun Heo
2015-05-12 23:23       ` David Miller
2015-05-13 15:46         ` Tejun Heo
2015-05-14  4:39           ` David Miller
2015-05-14 15:12             ` Tejun Heo
2015-05-12 23:36   ` Andrew Morton
2015-05-13  2:41     ` David Miller
2015-05-13 15:32     ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.