* [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting
@ 2003-07-08 17:31 Jim Keniston
2003-07-08 17:59 ` Andrew Morton
2003-07-10 4:42 ` James Morris
0 siblings, 2 replies; 21+ messages in thread
From: Jim Keniston @ 2003-07-08 17:31 UTC (permalink / raw)
To: LKML
Cc: netdev, Andrew Morton, David S. Miller, Jeff Garzik, Alan Cox,
Randy Dunlap
[-- Attachment #1: Type: text/plain, Size: 2475 bytes --]
Andrew Morton's 2.6 must-fix list includes the following item:
> o We need a kernel side API for reporting error events to userspace (could
> be async to 2.6 itself)
>
> (Prototype core based on netlink exists)
The enclosed patches provide a mechanism for reporting error events
to user-mode applications via netlink. This mechanism supplements
the text-oriented printk mechanism, providing a way to log binary
data or a mixture of text+binary.
Patch #1, closely based on a prototype by Dave Miller, implements the
NETLINK_KERROR protocol for AF_NETLINK sockets. It provides two
functions for broadcasting data packets to user-mode applications:
in one, the caller provides a single data buffer, and in the other,
the caller provides an iovec[].
Patch #2 (see accompanying post) provides an API built on patch #1's
infrastructure. Patch #2's functions capture context about the error
(e.g., driver/module, severity level, in interrupt or not, pid/uid/gid,
CPU ID), pack this information into a header, add the error-specific
data, and send the resulting packet via netlink. The two principal
functions are:
- evl_write(), which accepts an arbitrarily defined buffer of
error-specific data; and
- evl_printf(), which accepts a format string plus args, printk-style.
Rather than combining the format and args, evl_printf() keeps them
separate, as various developers have suggested. Thus the receiving
application can easily determine both the type of error (as indicated
by the raw format string) and the args' values, without parsing the
message string.
Applications that respond to kernel errors can establish
AF_NETLINK/NETLINK_KERROR sockets and receive the error packets
directly; or they can register with an event subsystem (e.g., see
evlog.sourceforge.net), which will deliver events that match specific
criteria.
These patches are posted on evlog.sourceforge.net. (Click on "Latest
Release"; then scroll down to "evlog-2.5_kernel/evlog + netlink". Or
just follow the links posted below.) Also posted there is a tar file,
kerrord.tar.gz, which contains:
- a sample module that logs errors using evl_write() and evl_printf();
and
- a sample daemon that reads such errors from netlink and logs them.
Jim Keniston
IBM Linux Technology Center
http://prdownloads.sourceforge.net/evlog/kerror-2.5.74.patch?download
http://prdownloads.sourceforge.net/evlog/evlog-2.5.74.patch?download
http://prdownloads.sourceforge.net/evlog/kerrord.tar.gz?download
-----
[-- Attachment #2: kerror-2.5.74.patch --]
[-- Type: text/plain, Size: 5625 bytes --]
diff -Naur linux.org/include/linux/kerror.h linux.kerror.patched/include/linux/kerror.h
--- linux.org/include/linux/kerror.h Wed Dec 31 16:00:00 1969
+++ linux.kerror.patched/include/linux/kerror.h Thu Jul 3 14:18:46 2003
@@ -0,0 +1,27 @@
+#ifndef _KERROR_H
+#define _KERROR_H
+
+#ifdef __KERNEL__
+#include <linux/config.h>
+#include <linux/uio.h>
+#include <asm/types.h>
+
+#ifdef CONFIG_NET
+extern int kernel_error_event(void *data, size_t len, __u32 groups);
+extern int kernel_error_event_iov(const struct iovec *iov,
+ unsigned int nseg, __u32 groups);
+#else
+static inline int kernel_error_event(void *data, size_t len, __u32 groups)
+ { return -ENOSYS; }
+static inline int kernel_error_event_iov(const struct iovec *iov,
+ unsigned int nseg, __u32 groups)
+ { return -ENOSYS; }
+#endif /* CONFIG_NET */
+#endif /* __KERNEL__ */
+
+#define KERROR_GROUP_RAW 0x00000001
+#define KERROR_GROUP_EVLOG 0x00000002
+
+#define KERROR_GROUP_ALL (~(u32)0)
+
+#endif /* _KERROR_H */
diff -Naur linux.org/include/linux/netlink.h linux.kerror.patched/include/linux/netlink.h
--- linux.org/include/linux/netlink.h Thu Jul 3 14:18:46 2003
+++ linux.kerror.patched/include/linux/netlink.h Thu Jul 3 14:18:46 2003
@@ -10,6 +10,7 @@
#define NETLINK_TCPDIAG 4 /* TCP socket monitoring */
#define NETLINK_NFLOG 5 /* netfilter/iptables ULOG */
#define NETLINK_XFRM 6 /* ipsec */
+#define NETLINK_KERROR 7 /* kernel error event facility */
#define NETLINK_ARPD 8
#define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */
#define NETLINK_IP6_FW 13
diff -Naur linux.org/net/netlink/Makefile linux.kerror.patched/net/netlink/Makefile
--- linux.org/net/netlink/Makefile Thu Jul 3 14:18:46 2003
+++ linux.kerror.patched/net/netlink/Makefile Thu Jul 3 14:18:46 2003
@@ -2,5 +2,5 @@
# Makefile for the netlink driver.
#
-obj-y := af_netlink.o
+obj-y := af_netlink.o kerror.o
obj-$(CONFIG_NETLINK_DEV) += netlink_dev.o
diff -Naur linux.org/net/netlink/kerror.c linux.kerror.patched/net/netlink/kerror.c
--- linux.org/net/netlink/kerror.c Wed Dec 31 16:00:00 1969
+++ linux.kerror.patched/net/netlink/kerror.c Thu Jul 3 14:18:46 2003
@@ -0,0 +1,110 @@
+/* kerror.c: Kernel error event logging facility.
+ *
+ * Copyright (C) 2003 David S. Miller (davem@redhat.com)
+ * June 2003 - Jim Keniston and Dan Stekloff (kenistoj and dsteklof@us.ibm.com)
+ * Fixed a couple of bugs and added iovec interface.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/netlink.h>
+#include <linux/kerror.h>
+#include <linux/init.h>
+#include <linux/uio.h>
+#include <net/sock.h>
+
+static struct sock *kerror_nl;
+
+static void kerror_netlink_rcv(struct sock *sk, int len)
+{
+ do {
+ struct sk_buff *skb;
+
+ while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) {
+ /* Just ignore all the messages, they cannot be
+ * destined for the kernel.
+ */
+ kfree_skb(skb);
+ }
+ } while (kerror_nl && kerror_nl->sk_receive_queue.qlen);
+}
+
+/**
+ * kernel_error_event_iov() - Broadcast packet to NETLINK_KERROR sockets.
+ * @iov: the packet's data
+ * @nseg: number of segments in iov[]
+ * @groups: as with kernel_error_event()
+ */
+int kernel_error_event_iov(const struct iovec *iov, unsigned int nseg,
+ u32 groups)
+{
+ struct sk_buff *skb;
+ struct nlmsghdr *nlh;
+ unsigned char *b, *p;
+ size_t len;
+ unsigned int seg;
+
+ if (!groups)
+ return -EINVAL;
+
+ len = iov_length(iov, nseg);
+ skb = alloc_skb(NLMSG_SPACE(len), GFP_ATOMIC);
+ if (skb == NULL)
+ return -ENOMEM;
+
+ b = skb->tail;
+
+ nlh = NLMSG_PUT(skb, current->pid, 0, 0, len);
+ nlh->nlmsg_flags = 0;
+
+ p = NLMSG_DATA(nlh);
+ for (seg = 0; seg < nseg; seg++) {
+ memcpy(p, (const void*)iov[seg].iov_base, iov[seg].iov_len);
+ p += iov[seg].iov_len;
+ }
+ nlh->nlmsg_len = skb->tail - b;
+
+ NETLINK_CB(skb).dst_groups = groups;
+
+ return netlink_broadcast(kerror_nl, skb, 0, ~0, GFP_ATOMIC);
+
+nlmsg_failure:
+ kfree_skb(skb);
+ return -EINVAL;
+}
+
+/**
+ * kernel_error_event() - Broadcast packet to NETLINK_KERROR sockets.
+ * @data, @len: the packet's data
+ * @groups: the group(s) to which the packet pertains -- e.g.,
+ * KERROR_GROUP_EVLOG. On a recvmsg(), this shows up in
+ * ((struct sockaddr_nl*)(msg->msg_name))->nl_groups.
+ */
+int kernel_error_event(void *data, size_t len, u32 groups)
+{
+ struct iovec iov;
+ iov.iov_base = data;
+ iov.iov_len = len;
+ return kernel_error_event_iov(&iov, 1, groups);
+}
+
+static int __init kerror_init(void)
+{
+ printk(KERN_INFO "Initializing KERROR netlink socket\n");
+
+ kerror_nl = netlink_kernel_create(NETLINK_KERROR, kerror_netlink_rcv);
+ if (kerror_nl == NULL)
+ panic("kerror_init: cannot initialize kerror_nl\n");
+
+ return 0;
+}
+
+static void __exit kerror_exit(void)
+{
+ sock_release(kerror_nl->sk_socket);
+}
+
+module_init(kerror_init);
+module_exit(kerror_exit);
diff -Naur linux.org/net/netsyms.c linux.kerror.patched/net/netsyms.c
--- linux.org/net/netsyms.c Thu Jul 3 14:18:46 2003
+++ linux.kerror.patched/net/netsyms.c Thu Jul 3 14:18:46 2003
@@ -83,6 +83,7 @@
#endif
#include <linux/rtnetlink.h>
+#include <linux/kerror.h>
#ifdef CONFIG_IPX_MODULE
extern struct datalink_proto *make_EII_client(void);
@@ -505,6 +506,8 @@
EXPORT_SYMBOL(netlink_set_nonroot);
EXPORT_SYMBOL(netlink_register_notifier);
EXPORT_SYMBOL(netlink_unregister_notifier);
+EXPORT_SYMBOL(kernel_error_event);
+EXPORT_SYMBOL(kernel_error_event_iov);
#if defined(CONFIG_NETLINK_DEV) || defined(CONFIG_NETLINK_DEV_MODULE)
EXPORT_SYMBOL(netlink_attach);
EXPORT_SYMBOL(netlink_detach);
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-08 17:31 [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting Jim Keniston @ 2003-07-08 17:59 ` Andrew Morton 2003-07-08 19:22 ` Jim Keniston 2003-07-08 19:45 ` Jim Keniston 2003-07-10 4:42 ` James Morris 1 sibling, 2 replies; 21+ messages in thread From: Andrew Morton @ 2003-07-08 17:59 UTC (permalink / raw) To: Jim Keniston; +Cc: linux-kernel, netdev, davem, jgarzik, alan, rddunlap Jim Keniston <jkenisto@us.ibm.com> wrote: > > The enclosed patches provide a mechanism for reporting error events > to user-mode applications via netlink. Seems sane to me. It needs to handle %z as well as %Z. The layout of `struct kern_log_entry' may be problematic. Think of the situation where a 64-bit kernel constructs one of these and sends it up to 32-bit userspace. Will the structure layout be the same under the 32-bit compiler as under the 64-bit one? How do you actually intend to use this? Will it be by adding new evt_printf() calls to particular drivers, or replacing existing printk's or both? Would it make sense for evt_printf() to fall back to printk() if nobody is listening to the log stream? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-08 17:59 ` Andrew Morton @ 2003-07-08 19:22 ` Jim Keniston 2003-07-08 19:45 ` Jim Keniston 1 sibling, 0 replies; 21+ messages in thread From: Jim Keniston @ 2003-07-08 19:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev, davem, jgarzik, alan, rddunlap Andrew Morton wrote: > > Jim Keniston <jkenisto@us.ibm.com> wrote: > > > > The enclosed patches provide a mechanism for reporting error events > > to user-mode applications via netlink. > > Seems sane to me. > > It needs to handle %z as well as %Z. Yes, thanks. I missed that change to vsnprintf(). > > The layout of `struct kern_log_entry' may be problematic. Think of the > situation where a 64-bit kernel constructs one of these and sends it up to > 32-bit userspace. Will the structure layout be the same under the 32-bit > compiler as under the 64-bit one? I think so. Nothing is bigger than 4 bytes except log_facility[] (16-byte array of char, which doesn't induce padding at all on i386). But I will find a 64K/32U ppc machine and check that. > > How do you actually intend to use this? I envision it being used by a configuration/status-monitoring system that monitors hotplug events, sysfs, etc. for configuration changes, and listens to the proposed interface for error events. Binary-only events (logged with evl_write()) would have to be interpreted based on knowledge existing entirely in user space (either coded into the monitor program, or provided as supplementary information via a formatting template or some such). PRINTF-format events can carry and/or be supplemented with similar info, but have the error message built in. > Will it be by adding new > evt_printf() calls to particular drivers, or replacing existing printk's or > both? There have been a variety of suggestions for how error reporting could be improved. Two common ones are: 1. Leave printks alone, and log additional info in whatever format you want via netlink. (E.g., Dave Miller recommended something like this.) This proposal supports that. 2. Migrate from raw printks to "smarter" versions -- e.g., dev_printk and friends, or the proposed netdev_printk. Such macros now call just printk (after adding relevant info), but could be modified to call evl_printk as well with the same args. There are a zillion variations on this, of course... > > Would it make sense for evt_printf() to fall back to printk() if nobody is > listening to the log stream? That certainly makes sense for evl_printf. For evl_write, just do a hex dump or something. So evlog.c would query kerror.c to see if anybody's listening. Would kerror.c consult nl_table[] directly, or is there an anybody_listening() function that does this? Jim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-08 17:59 ` Andrew Morton 2003-07-08 19:22 ` Jim Keniston @ 2003-07-08 19:45 ` Jim Keniston 1 sibling, 0 replies; 21+ messages in thread From: Jim Keniston @ 2003-07-08 19:45 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev, davem, jgarzik, alan, rddunlap If you're going to reply to this, please change "netdev@oss.sgi.net" to "netdev@oss.sgi.com" in your cc list. My apologies for the error. Jim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-08 17:31 [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting Jim Keniston 2003-07-08 17:59 ` Andrew Morton @ 2003-07-10 4:42 ` James Morris 2003-07-10 19:08 ` Jim Keniston 1 sibling, 1 reply; 21+ messages in thread From: James Morris @ 2003-07-10 4:42 UTC (permalink / raw) To: Jim Keniston Cc: LKML, netdev, Andrew Morton, David S. Miller, Jeff Garzik, Alan Cox, Randy Dunlap On Tue, 8 Jul 2003, Jim Keniston wrote: + kerror_nl = netlink_kernel_create(NETLINK_KERROR, kerror_netlink_rcv); + if (kerror_nl == NULL) + panic("kerror_init: cannot initialize kerror_nl\n"); You can simply use NULL instead of passing the dummy kerror_netlink_rcv function. +struct kern_log_entry { + __u16 log_kmagic; /* always LOGREC_KMAGIC */ + __u16 log_kversion; /* which version of this struct? */ + char log_facility[FACILITY_MAXLEN]; /* e.g., driver name */ These fields should generally be specified in ascending order to help with alignment. It may also be worth looking at how the ULOG code batches messages to improve peformance. - James -- James Morris <jmorris@intercode.com.au> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-10 4:42 ` James Morris @ 2003-07-10 19:08 ` Jim Keniston 2003-07-11 15:37 ` James Morris 0 siblings, 1 reply; 21+ messages in thread From: Jim Keniston @ 2003-07-10 19:08 UTC (permalink / raw) To: James Morris Cc: LKML, netdev, Andrew Morton, David S. Miller, Jeff Garzik, Alan Cox, Randy Dunlap James Morris wrote: > > On Tue, 8 Jul 2003, Jim Keniston wrote: > > + kerror_nl = netlink_kernel_create(NETLINK_KERROR, kerror_netlink_rcv); > + if (kerror_nl == NULL) > + panic("kerror_init: cannot initialize kerror_nl\n"); > > You can simply use NULL instead of passing the dummy kerror_netlink_rcv > function. That begs the question: do we trust that nobody but the kernel will send packets to a NETLINK_KERROR socket? Ordinary users can't, but any root application can. Without kerror_netlink_rcv(), such packets don't get dequeued. > > +struct kern_log_entry { > + __u16 log_kmagic; /* always LOGREC_KMAGIC */ > + __u16 log_kversion; /* which version of this struct? */ > + char log_facility[FACILITY_MAXLEN]; /* e.g., driver name */ > > These fields should generally be specified in ascending order to help with > alignment. We include log_kmagic and log_kversion so the receiving app (e.g. the evlog daemon) can figure out which version of the kernel header it's getting. Note that we're up to #3 (going on #4, with the changes you and others have suggested). As long as we have to include log_kmagic and log_kversion, they need to be first. That said, I get the impression that people would be more comfortable if log_facility[] were moved to the end. Other than that, can anybody point out a specific area where there's likely to be an alignment problem? The various members' offsets are the same on i386, ppc32, and ppc64. This should also be true for s390 and s390x. I'd think it'd really matter only when kernel and user on the same system are different architectures. ppc64K/ppc32U, at least, works. > > It may also be worth looking at how the ULOG code batches messages to > improve peformance. Thanks for the pointer. Looks nice. If performance turns out to be an issue (e.g., if for some reason people want to cc all printk messages through this interface), I'll keep that in mind. > > - James > -- > James Morris > <jmorris@intercode.com.au> Jim K. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-10 19:08 ` Jim Keniston @ 2003-07-11 15:37 ` James Morris 2003-07-12 5:09 ` David S. Miller 2003-07-12 5:41 ` David S. Miller 0 siblings, 2 replies; 21+ messages in thread From: James Morris @ 2003-07-11 15:37 UTC (permalink / raw) To: Jim Keniston Cc: LKML, netdev, Andrew Morton, David S. Miller, Jeff Garzik, Alan Cox, Randy Dunlap, kuznet On Thu, 10 Jul 2003, Jim Keniston wrote: > James Morris wrote: > > > > On Tue, 8 Jul 2003, Jim Keniston wrote: > > > > + kerror_nl = netlink_kernel_create(NETLINK_KERROR, kerror_netlink_rcv); > > + if (kerror_nl == NULL) > > + panic("kerror_init: cannot initialize kerror_nl\n"); > > > > You can simply use NULL instead of passing the dummy kerror_netlink_rcv > > function. > > That begs the question: do we trust that nobody but the kernel will send > packets to a NETLINK_KERROR socket? Ordinary users can't, but any root > application can. Without kerror_netlink_rcv(), such packets don't get > dequeued. Indeed, the kernel socket buffer fills up. I think this needs to be addressed in the netlink code, per the patch below. Comments? - James -- James Morris <jmorris@intercode.com.au> diff -NurX dontdiff linux-2.5.75.orig/net/netlink/af_netlink.c linux-2.5.75.w1/net/netlink/af_netlink.c --- linux-2.5.75.orig/net/netlink/af_netlink.c 2003-06-26 12:43:45.000000000 +1000 +++ linux-2.5.75.w1/net/netlink/af_netlink.c 2003-07-12 01:23:49.708254261 +1000 @@ -430,6 +430,10 @@ goto no_dst; nlk = nlk_sk(sk); + /* Don't bother queuing skb if kernel socket has no input function */ + if (nlk->pid == 0 && !nlk->data_ready) + goto no_dst; + #ifdef NL_EMULATE_DEV if (nlk->handler) { skb_orphan(skb); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-11 15:37 ` James Morris @ 2003-07-12 5:09 ` David S. Miller 2003-07-12 5:41 ` David S. Miller 1 sibling, 0 replies; 21+ messages in thread From: David S. Miller @ 2003-07-12 5:09 UTC (permalink / raw) To: James Morris Cc: jkenisto, linux-kernel, netdev, akpm, jgarzik, alan, rddunlap, kuznet On Sat, 12 Jul 2003 01:37:44 +1000 (EST) James Morris <jmorris@intercode.com.au> wrote: > On Thu, 10 Jul 2003, Jim Keniston wrote: > > > That begs the question: do we trust that nobody but the kernel will send > > packets to a NETLINK_KERROR socket? Ordinary users can't, but any root > > application can. Without kerror_netlink_rcv(), such packets don't get > > dequeued. > > Indeed, the kernel socket buffer fills up. > > I think this needs to be addressed in the netlink code, per the patch > below. Looks good, I'll apply this. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-11 15:37 ` James Morris 2003-07-12 5:09 ` David S. Miller @ 2003-07-12 5:41 ` David S. Miller 2003-07-13 1:17 ` James Morris 1 sibling, 1 reply; 21+ messages in thread From: David S. Miller @ 2003-07-12 5:41 UTC (permalink / raw) To: James Morris Cc: jkenisto, linux-kernel, netdev, akpm, jgarzik, alan, rddunlap, kuznet On Sat, 12 Jul 2003 01:37:44 +1000 (EST) James Morris <jmorris@intercode.com.au> wrote: > Indeed, the kernel socket buffer fills up. > > I think this needs to be addressed in the netlink code, per the patch > below. ... > + /* Don't bother queuing skb if kernel socket has no input function */ > + if (nlk->pid == 0 && !nlk->data_ready) > + goto no_dst; > + Oops, turns out this doesn't work. data_ready is never NULL, look at how netlink_kernel_create() works. Also, the broadcast case probably needs to be handled too? As an aside, to be honest what's so wrong with the socket receive buffer filling up? The damage is limited to the receive buffer size of the kernel netlink socket, but that's it. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-12 5:41 ` David S. Miller @ 2003-07-13 1:17 ` James Morris 2003-07-13 5:34 ` David S. Miller 2003-07-15 17:42 ` [PATCH] [1/2] kernel error reporting (revised) Jim Keniston 0 siblings, 2 replies; 21+ messages in thread From: James Morris @ 2003-07-13 1:17 UTC (permalink / raw) To: David S. Miller Cc: jkenisto, linux-kernel, netdev, akpm, jgarzik, alan, rddunlap, kuznet On Fri, 11 Jul 2003, David S. Miller wrote: > > + /* Don't bother queuing skb if kernel socket has no input function */ > > + if (nlk->pid == 0 && !nlk->data_ready) > > + goto no_dst; > > + > > Oops, turns out this doesn't work. data_ready is never NULL, look at > how netlink_kernel_create() works. It's ok: sk->data_ready is never null, but nlk_sk(sk)->data_ready will be null unless an input function is provided there. > Also, the broadcast case probably needs to be handled > too? Netlink sockets created by netlink_kernel_create() do not subscribe to any groups and are not broadcast to. > As an aside, to be honest what's so wrong with the socket receive > buffer filling up? The damage is limited to the receive buffer size > of the kernel netlink socket, but that's it. Agreed, it's not really harmful, but it's sloppy. Better to let the application know that it can't send to the socket rather than let it keep sending (with successful return codes) until the kernel socket buffer fills up. - James -- James Morris <jmorris@intercode.com.au> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting 2003-07-13 1:17 ` James Morris @ 2003-07-13 5:34 ` David S. Miller 2003-07-15 17:42 ` [PATCH] [1/2] kernel error reporting (revised) Jim Keniston 1 sibling, 0 replies; 21+ messages in thread From: David S. Miller @ 2003-07-13 5:34 UTC (permalink / raw) To: James Morris Cc: jkenisto, linux-kernel, netdev, akpm, jgarzik, alan, rddunlap, kuznet On Sun, 13 Jul 2003 11:17:35 +1000 (EST) James Morris <jmorris@intercode.com.au> wrote: > On Fri, 11 Jul 2003, David S. Miller wrote: > > > Oops, turns out this doesn't work. data_ready is never NULL, look at > > how netlink_kernel_create() works. > > It's ok: sk->data_ready is never null, but nlk_sk(sk)->data_ready will be > null unless an input function is provided there. > > > Also, the broadcast case probably needs to be handled > > too? > > Netlink sockets created by netlink_kernel_create() do not subscribe to any > groups and are not broadcast to. Oops, you're right on both counts, I brainfarted here. I'll apply your original patch, thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] [1/2] kernel error reporting (revised) 2003-07-13 1:17 ` James Morris 2003-07-13 5:34 ` David S. Miller @ 2003-07-15 17:42 ` Jim Keniston 2003-07-15 17:45 ` [PATCH] [2/2] " Jim Keniston 2003-07-15 19:51 ` [PATCH] [1/2] " Andrew Morton 1 sibling, 2 replies; 21+ messages in thread From: Jim Keniston @ 2003-07-15 17:42 UTC (permalink / raw) To: James Morris Cc: David S. Miller, linux-kernel, netdev, akpm, jgarzik, alan, rddunlap, kuznet, jkenisto [-- Attachment #1: Type: text/plain, Size: 1567 bytes --] Jim Keniston <jkenisto@us.ibm.com> wrote: > Subject: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting > > Andrew Morton's 2.6 must-fix list includes the following item: > > o We need a kernel side API for reporting error events to userspace (could > > be async to 2.6 itself) > > > > (Prototype core based on netlink exists) > > The enclosed patches provide a mechanism for reporting error events > to user-mode applications via netlink. This mechanism supplements > the text-oriented printk mechanism, providing a way to log binary > data or a mixture of text+binary. > ... > Here are updated patches, reflecting the following changes: Patch #1 (kerror.c et al): - Given James Morris's patch to af_netlink.c (Rev 1.30 in BitKeeper), I was able to remove kerror_netlink_rcv(). (My patches work fine without this, except that any packets sent to the NETLINK_KERROR socket by an ill-behaved, root-owned application would accumulate in the kernel's socket buffer.) Patch #2 (evlog.c et al -- see accompanying post): - Paraphrase dropped packets via printk() when nobody's listening to netlink socket. - Added support for 'z' qualifier, to resync with vsnprintf(). - In evlog.h, reordered members of struct kern_log_entry to address alignment worries. These patches work for both 2.5.74 and 2.5.75. Jim Keniston IBM Linux Technology Center http://prdownloads.sourceforge.net/evlog/kerror-2.5.75.patch?download http://prdownloads.sourceforge.net/evlog/evlog-2.5.75.patch?download http://prdownloads.sourceforge.net/evlog/kerrord.tar.gz?download [-- Attachment #2: kerror-2.5.75.patch --] [-- Type: text/plain, Size: 5337 bytes --] diff -Naur linux.org/include/linux/kerror.h linux.kerror.patched/include/linux/kerror.h --- linux.org/include/linux/kerror.h Wed Dec 31 16:00:00 1969 +++ linux.kerror.patched/include/linux/kerror.h Mon Jul 14 09:53:00 2003 @@ -0,0 +1,27 @@ +#ifndef _KERROR_H +#define _KERROR_H + +#ifdef __KERNEL__ +#include <linux/config.h> +#include <linux/uio.h> +#include <asm/types.h> + +#ifdef CONFIG_NET +extern int kernel_error_event(void *data, size_t len, __u32 groups); +extern int kernel_error_event_iov(const struct iovec *iov, + unsigned int nseg, __u32 groups); +#else +static inline int kernel_error_event(void *data, size_t len, __u32 groups) + { return -ENOSYS; } +static inline int kernel_error_event_iov(const struct iovec *iov, + unsigned int nseg, __u32 groups) + { return -ENOSYS; } +#endif /* CONFIG_NET */ +#endif /* __KERNEL__ */ + +#define KERROR_GROUP_RAW 0x00000001 +#define KERROR_GROUP_EVLOG 0x00000002 + +#define KERROR_GROUP_ALL (~(u32)0) + +#endif /* _KERROR_H */ diff -Naur linux.org/include/linux/netlink.h linux.kerror.patched/include/linux/netlink.h --- linux.org/include/linux/netlink.h Mon Jul 14 09:53:00 2003 +++ linux.kerror.patched/include/linux/netlink.h Mon Jul 14 09:53:00 2003 @@ -10,6 +10,7 @@ #define NETLINK_TCPDIAG 4 /* TCP socket monitoring */ #define NETLINK_NFLOG 5 /* netfilter/iptables ULOG */ #define NETLINK_XFRM 6 /* ipsec */ +#define NETLINK_KERROR 7 /* kernel error event facility */ #define NETLINK_ARPD 8 #define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */ #define NETLINK_IP6_FW 13 diff -Naur linux.org/net/netlink/Makefile linux.kerror.patched/net/netlink/Makefile --- linux.org/net/netlink/Makefile Mon Jul 14 09:53:00 2003 +++ linux.kerror.patched/net/netlink/Makefile Mon Jul 14 09:53:00 2003 @@ -2,5 +2,5 @@ # Makefile for the netlink driver. # -obj-y := af_netlink.o +obj-y := af_netlink.o kerror.o obj-$(CONFIG_NETLINK_DEV) += netlink_dev.o diff -Naur linux.org/net/netlink/kerror.c linux.kerror.patched/net/netlink/kerror.c --- linux.org/net/netlink/kerror.c Wed Dec 31 16:00:00 1969 +++ linux.kerror.patched/net/netlink/kerror.c Mon Jul 14 09:53:00 2003 @@ -0,0 +1,97 @@ +/* kerror.c: Kernel error event logging facility. + * + * Copyright (C) 2003 David S. Miller (davem@redhat.com) + * June 2003 - Jim Keniston and Dan Stekloff (kenistoj and dsteklof@us.ibm.com) + * Fixed a couple of bugs and added iovec interface. + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/skbuff.h> +#include <linux/socket.h> +#include <linux/netlink.h> +#include <linux/kerror.h> +#include <linux/init.h> +#include <linux/uio.h> +#include <net/sock.h> + +static struct sock *kerror_nl; + +/** + * kernel_error_event_iov() - Broadcast packet to NETLINK_KERROR sockets. + * @iov: the packet's data + * @nseg: number of segments in iov[] + * @groups: as with kernel_error_event() + */ +int kernel_error_event_iov(const struct iovec *iov, unsigned int nseg, + u32 groups) +{ + struct sk_buff *skb; + struct nlmsghdr *nlh; + unsigned char *b, *p; + size_t len; + unsigned int seg; + + if (!groups) + return -EINVAL; + + len = iov_length(iov, nseg); + skb = alloc_skb(NLMSG_SPACE(len), GFP_ATOMIC); + if (skb == NULL) + return -ENOMEM; + + b = skb->tail; + + nlh = NLMSG_PUT(skb, current->pid, 0, 0, len); + nlh->nlmsg_flags = 0; + + p = NLMSG_DATA(nlh); + for (seg = 0; seg < nseg; seg++) { + memcpy(p, (const void*)iov[seg].iov_base, iov[seg].iov_len); + p += iov[seg].iov_len; + } + nlh->nlmsg_len = skb->tail - b; + + NETLINK_CB(skb).dst_groups = groups; + + return netlink_broadcast(kerror_nl, skb, 0, ~0, GFP_ATOMIC); + +nlmsg_failure: + kfree_skb(skb); + return -EINVAL; +} + +/** + * kernel_error_event() - Broadcast packet to NETLINK_KERROR sockets. + * @data, @len: the packet's data + * @groups: the group(s) to which the packet pertains -- e.g., + * KERROR_GROUP_EVLOG. On a recvmsg(), this shows up in + * ((struct sockaddr_nl*)(msg->msg_name))->nl_groups. + */ +int kernel_error_event(void *data, size_t len, u32 groups) +{ + struct iovec iov; + iov.iov_base = data; + iov.iov_len = len; + return kernel_error_event_iov(&iov, 1, groups); +} + +static int __init kerror_init(void) +{ + printk(KERN_INFO "Initializing KERROR netlink socket\n"); + + /* Note that we ignore all incoming messages on this socket. */ + kerror_nl = netlink_kernel_create(NETLINK_KERROR, NULL); + if (kerror_nl == NULL) + panic("kerror_init: cannot initialize kerror_nl\n"); + + return 0; +} + +static void __exit kerror_exit(void) +{ + sock_release(kerror_nl->sk_socket); +} + +module_init(kerror_init); +module_exit(kerror_exit); diff -Naur linux.org/net/netsyms.c linux.kerror.patched/net/netsyms.c --- linux.org/net/netsyms.c Mon Jul 14 09:53:00 2003 +++ linux.kerror.patched/net/netsyms.c Mon Jul 14 09:53:00 2003 @@ -83,6 +83,7 @@ #endif #include <linux/rtnetlink.h> +#include <linux/kerror.h> #ifdef CONFIG_IPX_MODULE extern struct datalink_proto *make_EII_client(void); @@ -505,6 +506,8 @@ EXPORT_SYMBOL(netlink_set_nonroot); EXPORT_SYMBOL(netlink_register_notifier); EXPORT_SYMBOL(netlink_unregister_notifier); +EXPORT_SYMBOL(kernel_error_event); +EXPORT_SYMBOL(kernel_error_event_iov); #if defined(CONFIG_NETLINK_DEV) || defined(CONFIG_NETLINK_DEV_MODULE) EXPORT_SYMBOL(netlink_attach); EXPORT_SYMBOL(netlink_detach); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [2/2] kernel error reporting (revised) 2003-07-15 17:42 ` [PATCH] [1/2] kernel error reporting (revised) Jim Keniston @ 2003-07-15 17:45 ` Jim Keniston 2003-07-15 19:51 ` [PATCH] [1/2] " Andrew Morton 1 sibling, 0 replies; 21+ messages in thread From: Jim Keniston @ 2003-07-15 17:45 UTC (permalink / raw) To: James Morris, David S. Miller, linux-kernel, netdev, akpm, jgarzik, alan, rddunlap, kuznet, jkenisto [-- Attachment #1: Type: text/plain, Size: 87 bytes --] This patch is described in the previous post. Jim Keniston IBM Linux Technology Center [-- Attachment #2: evlog-2.5.75.patch --] [-- Type: text/plain, Size: 19237 bytes --] diff -Naur linux.org/include/linux/evlog.h linux.evlog.patched/include/linux/evlog.h --- linux.org/include/linux/evlog.h Wed Dec 31 16:00:00 1969 +++ linux.evlog.patched/include/linux/evlog.h Mon Jul 14 09:52:59 2003 @@ -0,0 +1,109 @@ +/* + * Linux Event Logging + * Copyright (c) International Business Machines Corp., 2001 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Please send e-mail to kenistoj@users.sourceforge.net if you have + * questions or comments. + * + * Project Website: http://evlog.sourceforge.net/ + */ + +#ifndef _LINUX_EVLOG_H +#define _LINUX_EVLOG_H + +#include <stdarg.h> +#include <linux/types.h> +#include <asm/types.h> + +/* Values for log_flags member */ +#define EVL_TRUNCATE 0x1 +#define EVL_KERNEL_EVENT 0x2 +#define EVL_INTERRUPT 0x10 /* Logged from interrupt context */ +#define EVL_PRINTK 0x20 /* Strip leading <n> when formatting */ +#define EVL_EVTYCRC 0x40 /* Daemon will set event type = CRC */ + /* of format string. */ + +/* Formats for optional portion of record. */ +#define EVL_NODATA 0 +#define EVL_BINARY 1 +#define EVL_STRING 2 +#define EVL_PRINTF 3 + +/* Maximum length of variable portion of record */ +#define EVL_ENTRY_MAXLEN (8 * 1024) + +/* Facility (e.g., driver) names are truncated to 15+null. */ +#define FACILITY_MAXLEN 16 + +/* + * struct kern_log_entry - kernel record header + * Each record sent to group KERROR_GROUP_EVLOG begins with this header. + */ +struct kern_log_entry { + __u16 log_kmagic; /* always LOGREC_KMAGIC */ + __u16 log_kversion; /* which version of this struct? */ + __u16 log_size; /* # bytes in variable part of record */ + __s8 log_format; /* BINARY, STRING, PRINTF, NODATA */ + __s8 log_severity; /* DEBUG, INFO, NOTICE, WARN, etc. */ + __s32 log_event_type; /* facility-specific event ID */ + __u32 log_flags; /* EVL_TRUNCATE, etc. */ + __s32 log_processor; /* CPU ID */ + uid_t log_uid; /* event context... */ + gid_t log_gid; + pid_t log_pid; + pid_t log_pgrp; + char log_facility[FACILITY_MAXLEN]; /* e.g., driver name */ +}; + +#define LOGREC_KMAGIC 0x7af8 +#define LOGREC_KVERSION 3 + +#ifdef __KERNEL__ +/* + * severities, AKA priorities + */ +#define LOG_EMERG 0 /* system is unusable */ +#define LOG_ALERT 1 /* action must be taken immediately */ +#define LOG_CRIT 2 /* critical conditions */ +#define LOG_ERR 3 /* error conditions */ +#define LOG_WARNING 4 /* warning conditions */ +#define LOG_NOTICE 5 /* normal but significant condition */ +#define LOG_INFO 6 /* informational */ +#define LOG_DEBUG 7 /* debug-level messages */ + +#ifdef CONFIG_NET +extern int evl_write(const char *facility, int event_type, + int severity, const void *buf, size_t len, int format); +extern int evl_printf(const char *facility, int event_type, int sev, + const char *fmt, ...); +extern int evl_vprintf(const char *facility, int event_type, int sev, + const char *fmt, va_list args); +#else /* ! CONFIG_NET */ +static inline int evl_write(const char *facility, int event_type, + int severity, const void *buf, size_t len, int format) + { return -ENOSYS; } +static inline int evl_printf(const char *facility, int event_type, int sev, + const char *fmt, ...); + { return -ENOSYS; } +static inline int evl_vprintf(const char *facility, int event_type, int sev, + const char *fmt, va_list args) + { return -ENOSYS; } +#endif /* CONFIG_NET */ + +#endif /* __KERNEL__ */ + +#endif /* _LINUX_EVLOG_H */ diff -Naur linux.org/kernel/Makefile linux.evlog.patched/kernel/Makefile --- linux.org/kernel/Makefile Mon Jul 14 09:52:59 2003 +++ linux.evlog.patched/kernel/Makefile Mon Jul 14 09:52:59 2003 @@ -19,6 +19,7 @@ obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o obj-$(CONFIG_COMPAT) += compat.o +obj-$(CONFIG_NET) += evlog.o ifneq ($(CONFIG_IA64),y) # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is diff -Naur linux.org/kernel/evlog.c linux.evlog.patched/kernel/evlog.c --- linux.org/kernel/evlog.c Wed Dec 31 16:00:00 1969 +++ linux.evlog.patched/kernel/evlog.c Mon Jul 14 09:52:59 2003 @@ -0,0 +1,542 @@ +/* + * Linux Event Logging + * Copyright (c) International Business Machines Corp., 2003 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Please send e-mail to kenistoj@users.sourceforge.net if you have + * questions or comments. + * + * Project Website: http://evlog.sourceforge.net/ + */ + +#include <linux/config.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/kerror.h> +#include <linux/stddef.h> +#include <linux/uio.h> +#include <linux/spinlock.h> +#include <linux/smp.h> +#include <linux/string.h> +#include <linux/interrupt.h> +#include <linux/ctype.h> +#include <linux/evlog.h> + +static void report_dropped_event(const struct kern_log_entry *hdr, + const void *vardata); +static void report_dropped_printf_event(const struct kern_log_entry *hdr, + const char *fmt, va_list args); + +/** + * mk_rec_header() - Populate evlog record header. + * @fac: facility name (e.g., "kern", driver name) + * @event_type: event type (event ID assigned by programmer; may also be + * computed by recipient -- e.g., CRC of format string) + * @severity: severity level (e.g., LOG_INFO) + * @size: length, in bytes, of variable data + * @flags: event flags (e.g., EVL_TRUNCATE, EVL_EVTYCRC) + * @format: format of variable data (e.g., EVL_STRING) + */ +static void +mk_rec_header(struct kern_log_entry *rec_hdr, + const char *facility, + int event_type, + int severity, + size_t size, + uint flags, + int format) +{ + rec_hdr->log_kmagic = LOGREC_KMAGIC; + rec_hdr->log_kversion = LOGREC_KVERSION; + rec_hdr->log_size = (__u16) size; + rec_hdr->log_format = (__s8) format; + rec_hdr->log_event_type = (__s32) event_type; + rec_hdr->log_severity = (__s8) severity; + rec_hdr->log_uid = current->uid; + rec_hdr->log_gid = current->gid; + rec_hdr->log_pid = current->pid; + rec_hdr->log_pgrp = current->pgrp; + rec_hdr->log_flags = (__u32) flags; + rec_hdr->log_processor = (__s32) smp_processor_id(); + + strncpy(rec_hdr->log_facility, facility, FACILITY_MAXLEN); + rec_hdr->log_facility[FACILITY_MAXLEN-1] = '\0'; +} + +/** + * evl_sendh() - Log event, given a pre-constructed header. + * In case of sloppiness, clean it up rather than failing, since the caller + * is unlikely to handle failure. + * Returns 0 on success, or a negative error code otherwise. + */ +static int +evl_sendh(struct kern_log_entry *hdr, const void *vardata) +{ + struct iovec iov[2] = { + { hdr, sizeof(struct kern_log_entry) }, + { (void*) vardata, hdr->log_size } + }; + int nsegs = 2; + + if (hdr->log_severity < 0 || hdr->log_severity > LOG_DEBUG) { + hdr->log_severity = LOG_WARNING; + } + if (vardata == NULL || hdr->log_size == 0) { + vardata = NULL; + hdr->log_size = 0; + hdr->log_format = EVL_NODATA; + nsegs = 1; + } + hdr->log_flags |= EVL_KERNEL_EVENT; + if (in_interrupt()) { + hdr->log_flags |= EVL_INTERRUPT; + } + if (hdr->log_size > EVL_ENTRY_MAXLEN) { + iov[1].iov_len = hdr->log_size = EVL_ENTRY_MAXLEN; + hdr->log_flags |= EVL_TRUNCATE; + } + + return kernel_error_event_iov(iov, nsegs, KERROR_GROUP_EVLOG); +} + +/** + * evl_write() - write header + optional buffer to event handler + * + * @buf: optional variable-length data + * other args as per mk_rec_header() + */ +int +evl_write(const char *fac, int event_type, int severity, const void *buf, + size_t size, int format) +{ + int ret; + struct kern_log_entry hdr; + + mk_rec_header(&hdr, fac, event_type, severity, size, 0, format); + ret = evl_sendh(&hdr, buf); + if (ret == -ESRCH) { + report_dropped_event(&hdr, buf); + } + return ret; +} + +/* + * A buffer to pack with data, one value at a time. By convention, b_tail + * reflects the total amount you've attempted to add, and so may be past b_end. + */ +struct evl_recbuf { + char *b_buf; /* start of buffer */ + char *b_tail; /* add next data here */ + char *b_end; /* b_buf + buffer size */ +}; + +void +evl_init_recbuf(struct evl_recbuf *b, char *buf, size_t size) +{ + b->b_buf = buf; + b->b_tail = buf; + b->b_end = buf + size; +} + +/** + * evl_put() - Append data to buffer; handle overflow. + * @b - describes buffer; updated to reflect data appended + * @data - data to append + * @datasz - data length in bytes + */ +void +evl_put(struct evl_recbuf *b, const void *data, size_t datasz) +{ + ptrdiff_t room = b->b_end - b->b_tail; + if (room > 0) { + (void) memcpy(b->b_tail, data, min(datasz, (size_t)room)); + } + b->b_tail += datasz; +} + +/** + * evl_puts() - Append string to buffer; handle overflow. + * Append a string to the buffer. If null == 1, we include the terminating + * null. If the string extends over the end of the buffer, terminate the + * buffer with a null. + * + * @b - describes buffer; updated to reflect data appended + * @s - null-terminated string + * @null - 1 if we append the terminating null, 0 otherwise + */ +void +evl_puts(struct evl_recbuf *b, const char *s, int null) +{ + char *old_tail = b->b_tail; + evl_put(b, s, strlen(s) + null); + if (b->b_tail > b->b_end && old_tail < b->b_end) { + *(b->b_end - 1) = '\0'; + } +} + +static inline void +skip_atoi(const char **s) +{ + while (isdigit(**s)) { + (*s)++; + } +} + +/** + * parse_printf_fmt() - Parse printf/printk conversion spec. + * fmt points to the '%' in a printk conversion specification. Advance + * fmt past any flags, width and/or precision specifiers, and qualifiers + * such as 'l' and 'L'. Return a pointer to the conversion character. + * Stores the qualifier character (or -1, if there is none) at *pqualifier. + * *wp is set to flags indicating whether the width and/or precision are '*'. + * For example, given + * %*.2lx + * *pqualifier is set to 'l', *wp is set to 0x1, and a pointer to the 'x' + * is returned. + * + * Note: This function is derived from vsnprintf() (see lib/vsprintf.c), + * and should be kept in sync with that function. + * + * @fmt - points to '%' in conversion spec + * @pqualifier - *pqualifier is set to conversion spec's qualifier, or -1. + * @wp - Bits in *wp are set if the width or/and precision are '*'. + */ +const char * +parse_printf_fmt(const char *fmt, int *pqualifier, int *wp) +{ + int qualifier = -1; + *wp = 0; + + /* process flags */ + repeat: + ++fmt; /* this also skips first '%' */ + switch (*fmt) { + case '-': + case '+': + case ' ': + case '#': + case '0': + goto repeat; + } + + /* get field width */ + if (isdigit(*fmt)) + skip_atoi(&fmt); + else if (*fmt == '*') { + ++fmt; + /* it's the next argument */ + *wp |= 0x1; + } + + /* get the precision */ + if (*fmt == '.') { + ++fmt; + if (isdigit(*fmt)) + skip_atoi(&fmt); + else if (*fmt == '*') { + ++fmt; + /* it's the next argument */ + *wp |= 0x2; + } + } + + /* get the conversion qualifier */ + if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L' || + *fmt == 'Z' || *fmt == 'z') { + qualifier = *fmt; + ++fmt; + if (qualifier == 'l' && *fmt == 'l') { + qualifier = 'L'; + ++fmt; + } + } + + *pqualifier = qualifier; + return fmt; +} + +/** + * evl_pack_args() - Pack args into buffer, guided by format string. + * b describes a buffer. fmt and args are as passed to vsnprintf(). Using + * fmt as a guide, copy the args into b's buffer. + * + * @b - describes buffer; updated to reflect data added + * @fmt - printf/printk-style format string + * @args - values to be packed into buffer + */ +void +evl_pack_args(struct evl_recbuf *b, const char *fmt, va_list args) +{ +#define COPYARG(type) \ + do { type v=va_arg(args,type); evl_put(b,&v,sizeof(v)); } while(0) + + const char *s; + int qualifier; + + for (; *fmt ; ++fmt) { + int wp = 0x0; + if (*fmt != '%') { + continue; + } + + fmt = parse_printf_fmt(fmt, &qualifier, &wp); + if (wp & 0x1) { + /* width is '*' (next arg) */ + COPYARG(int); + } + if (wp & 0x2) { + /* ditto precision */ + COPYARG(int); + } + + switch (*fmt) { + case 'c': + COPYARG(int); + continue; + + case 's': + s = va_arg(args, char *); + evl_puts(b, s, 1); + continue; + + case 'p': + COPYARG(void*); + continue; + + case 'n': + /* Skip over the %n arg. */ + if (qualifier == 'l') { + (void) va_arg(args, long *); + } else if (qualifier == 'Z' || qualifier == 'z') { + (void) va_arg(args, size_t *); + } else { + (void) va_arg(args, int *); + } + continue; + + case '%': + continue; + + /* integer number formats - handle outside switch */ + case 'o': + case 'X': + case 'x': + case 'd': + case 'i': + case 'u': + break; + + default: + /* Bogus conversion. Pass thru unchanged. */ + if (*fmt == '\0') + --fmt; + continue; + } + if (qualifier == 'L') { + COPYARG(long long); + } else if (qualifier == 'l') { + COPYARG(long); + } else if (qualifier == 'Z' || qualifier == 'z') { + COPYARG(size_t); + } else if (qualifier == 'h') { + COPYARG(int); + } else { + COPYARG(int); + } + } +} + +/* + * Scratch buffer for constructing event records. This is static because + * (1) we want events to be logged even in low-memory situations; and + * (2) the buffer is too big to be an auto variable. + */ +static spinlock_t msgbuf_lock = SPIN_LOCK_UNLOCKED; +static char msgbuf[EVL_ENTRY_MAXLEN]; + +/** + * evl_send_printf() - Format and log a PRINTF-format message. + * Create and log a PRINTF-format event record whose contents are: + * format string + * int containing args size + * args + * @hdr - pre-constructed record header + * @fmt - format string + * @args - arg list + */ +static int +evl_send_printf(struct kern_log_entry *hdr, const char *fmt, va_list args) +{ + int ret; + struct evl_recbuf b; + int argsz = 0; + char *nl, *pargsz, *pargs; + unsigned long iflags; + + spin_lock_irqsave(&msgbuf_lock, iflags); + evl_init_recbuf(&b, msgbuf, EVL_ENTRY_MAXLEN); + evl_puts(&b, fmt, 1); + + /* + * If the format ends in a newline, remove it. We remove the + * terminating newline to increase flexibility when formatting + * the record for viewing. + */ + nl = b.b_tail - 2; + if (b.b_buf <= nl && nl < b.b_end && *nl == '\n') { + *nl = '\0'; + b.b_tail--; + } + + /* Remember where to store argsz; store 0 for now. */ + pargsz = b.b_tail; + evl_put(&b, &argsz, sizeof(argsz)); + pargs = b.b_tail; + + evl_pack_args(&b, fmt, args); + if (pargs <= b.b_end) { + argsz = (int) (b.b_tail - pargs); + memcpy(pargsz, &argsz, sizeof(argsz)); + } + + hdr->log_size = b.b_tail - b.b_buf; + if (hdr->log_size > EVL_ENTRY_MAXLEN) { + hdr->log_size = EVL_ENTRY_MAXLEN; + hdr->log_flags |= EVL_TRUNCATE; + } + + ret = evl_sendh(hdr, b.b_buf); + spin_unlock_irqrestore(&msgbuf_lock, iflags); + + if (ret == -ESRCH) { + report_dropped_printf_event(hdr, fmt, args); + } + return ret; +} + +/** + * evl_vprintf() - Format and log a PRINTF-format record. + * @fmt - format string + * @args - arg list + * other args as per mk_rec_header(). If event_type == 0, set flag to + * request that recipient set event type. + */ +int +evl_vprintf(const char *facility, int event_type, int severity, + const char *fmt, va_list args) +{ + struct kern_log_entry hdr; + unsigned int flags = 0; + if (event_type == 0) { + flags |= EVL_EVTYCRC; + } + mk_rec_header(&hdr, facility, event_type, severity, 0, flags, + EVL_PRINTF); + + return evl_send_printf(&hdr, fmt, args); +} + +/** + * evl_printf() - Format and log a PRINTF-format record. + * @fmt - format string + * other args as per mk_rec_header() + */ +int +evl_printf(const char *facility, int event_type, int severity, + const char *fmt, ...) +{ + va_list args; + int ret; + va_start(args, fmt); + ret = evl_vprintf(facility, event_type, severity, fmt, args); + va_end(args); + return ret; +} + +/*** Functions for handling of events logged when nobody was listening ***/ +static void +report_dropped_hdr(const struct kern_log_entry *hdr) +{ + printk("<%d>evlog packet dropped: size=%u fmt=%d evty=%#x" + " fac=%s sev=%d uid=%u gid=%u pid=%d pgrp=%d" + " flags=%#x cpu=%d\n", + hdr->log_severity, hdr->log_size, hdr->log_format, + hdr->log_event_type, hdr->log_facility, hdr->log_severity, + hdr->log_uid, hdr->log_gid, hdr->log_pid, hdr->log_pgrp, + hdr->log_flags, hdr->log_processor); +} + +static void +report_dropped_printf_event(const struct kern_log_entry *hdr, const char *fmt, + va_list args) +{ + char msg[500]; + if (hdr->log_flags & EVL_PRINTK) { + /* printk's already reporting this event. */ + return; + } + report_dropped_hdr(hdr); + vsnprintf(msg, 500, fmt, args); + printk("<%d>%s\n", hdr->log_severity, msg); +} + +static void +hexdump(const void *data, size_t nbytes, int severity) +{ +#define MAX_HEXDUMP_LEN 512 /* Keep this small: don't flood printk buffer */ + size_t nb = min(nbytes, (size_t)MAX_HEXDUMP_LEN); + const unsigned char *dbase = (const unsigned char*) data; + const unsigned char *dp = dbase, *dend = dbase + nb; + char *lp, line[100]; + int i; + + while (dp < dend) { + lp = line; + lp += sprintf(lp, "%04X ", (unsigned) (dp - dbase)); + for (i = 0; i < 16 && dp < dend; i++, dp++) { + lp += sprintf(lp, "%02X ", *dp); + } + printk("<%d>%s\n", severity, line); + } +} + +static void +report_dropped_event(const struct kern_log_entry *hdr, const void *vardata) +{ + if (hdr->log_flags & EVL_PRINTK) { + /* printk's already reporting this event. */ + return; + } + report_dropped_hdr(hdr); + switch (hdr->log_format) { + case EVL_STRING: + printk("<%d>%s\n", hdr->log_severity, (const char*) vardata); + break; + case EVL_PRINTF: + /* Should be handled by report_dropped_printf_event() */ + /*FALLTHRU*/ + case EVL_BINARY: + hexdump(vardata, hdr->log_size, hdr->log_severity); + break; + case EVL_NODATA: + default: + break; + } +} + +EXPORT_SYMBOL(evl_write); +EXPORT_SYMBOL(evl_printf); +EXPORT_SYMBOL(evl_vprintf); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [1/2] kernel error reporting (revised) 2003-07-15 17:42 ` [PATCH] [1/2] kernel error reporting (revised) Jim Keniston 2003-07-15 17:45 ` [PATCH] [2/2] " Jim Keniston @ 2003-07-15 19:51 ` Andrew Morton 2003-07-15 23:10 ` kuznet 2003-07-17 19:13 ` Jim Keniston 1 sibling, 2 replies; 21+ messages in thread From: Andrew Morton @ 2003-07-15 19:51 UTC (permalink / raw) To: Jim Keniston Cc: jmorris, davem, linux-kernel, netdev, jgarzik, alan, rddunlap, kuznet, jkenisto Jim Keniston <jkenisto@us.ibm.com> wrote: > > +int kernel_error_event_iov(const struct iovec *iov, unsigned int nseg, > + u32 groups) > +{ > ... > + > + return netlink_broadcast(kerror_nl, skb, 0, ~0, GFP_ATOMIC); This appears to be deadlocky when called from interrupt handlers. netlink_broadcast() does read_lock(&nl_table_lock). But nl_table_lock is not an irq-safe lock. Possibly netlink_broadcast() can be made callable from hardirq context, but it looks to be non trivial. The various error and delivery handlers need to be reviewed, the kfree_skb() calls should be thought about, etc. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [1/2] kernel error reporting (revised) 2003-07-15 19:51 ` [PATCH] [1/2] " Andrew Morton @ 2003-07-15 23:10 ` kuznet 2003-07-17 19:13 ` Jim Keniston 1 sibling, 0 replies; 21+ messages in thread From: kuznet @ 2003-07-15 23:10 UTC (permalink / raw) To: Andrew Morton Cc: jkenisto, jmorris, davem, linux-kernel, netdev, jgarzik, alan, rddunlap, kuznet Hello! > netlink_broadcast() does read_lock(&nl_table_lock). But nl_table_lock is > not an irq-safe lock. Just as reminder, there are _no_ irq safe locks in net/*. A few of local_irq_disable()s are segregated in interface to device drivers. > Possibly netlink_broadcast() can be made callable from hardirq context, but > it looks to be non trivial. Trivial or non-trivial, before all this is highly not desired. net/* is better to remain in the form free of knowledge of hardirqs. Alexey ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [1/2] kernel error reporting (revised) 2003-07-15 19:51 ` [PATCH] [1/2] " Andrew Morton 2003-07-15 23:10 ` kuznet @ 2003-07-17 19:13 ` Jim Keniston 2003-07-18 1:53 ` James Morris 1 sibling, 1 reply; 21+ messages in thread From: Jim Keniston @ 2003-07-17 19:13 UTC (permalink / raw) To: Andrew Morton Cc: jmorris, davem, linux-kernel, netdev, jgarzik, alan, rddunlap, kuznet, jkenisto [-- Attachment #1: Type: text/plain, Size: 2323 bytes --] Andrew Morton wrote: > > Jim Keniston <jkenisto@us.ibm.com> wrote: > > > > +int kernel_error_event_iov(const struct iovec *iov, unsigned int nseg, > > + u32 groups) > > +{ > > ... > > + > > + return netlink_broadcast(kerror_nl, skb, 0, ~0, GFP_ATOMIC); > > This appears to be deadlocky when called from interrupt handlers. > > netlink_broadcast() does read_lock(&nl_table_lock). But nl_table_lock is > not an irq-safe lock. > > Possibly netlink_broadcast() can be made callable from hardirq context, but > it looks to be non trivial. The various error and delivery handlers need > to be reviewed, Yes indeed. I believe this issue is resolved by detecting that we're in an interrupt handler and delaying the call to netlink_broadcast() via a tasklet. See the enclosed patch to the previously posted rev. I'll update the full patch at http://prdownloads.sourceforge.net/evlog/kerror-2.5.75.patch?download An issue remains: what, if anything, to tell the caller if the delayed netlink_broadcast() fails. See below for further thoughts. > the kfree_skb() calls should be thought about, etc. I've thought about them. :-) Given the aforementioned solution, I don't think kfree_skb() is an issue, because it's called as needed by netlink_broadcast(). If I'm missing something here, feel free to clarify. Thanks. Jim WHAT IF THE DELAYED netlink_broadcast() CALL FAILS? 1. Can we detect from IRQ context that nobody is listening, and thereby return -ESRCH to the caller? No, to do that would require perusing the nl_table[NETLINK_KERROR] list. We can't do that for the same reason we can't call netlink_broadcast(). So kernel_error_event_iov() now returns -EINPROGRESS if it had to delay the netlink_broadcast() call. 2. Could the tasklet report netlink_broadcast() failures back to the higher-level code? Yes, we could implement a per-group callback to handle that. Current thinking is that it's overkill. But it would resolve the next issue... 3. Given the above, what should the evlog.c caller do when kernel_error_event_iov() returns -EINPROGRESS? a. Nothing. Figure the packet will probably get logged. b. Just to be safe, report it via printk, the same way we report dropped packets. We currently do (a). (b) would mean that every event logged from IRQ context would be cc-ed to printk. ----- [-- Attachment #2: kerror.patch.txt --] [-- Type: text/plain, Size: 1984 bytes --] --- kerror.c.old Thu Jul 17 10:53:19 2003 +++ kerror.c Thu Jul 17 10:54:55 2003 @@ -3,10 +3,12 @@ * Copyright (C) 2003 David S. Miller (davem@redhat.com) * June 2003 - Jim Keniston and Dan Stekloff (kenistoj and dsteklof@us.ibm.com) * Fixed a couple of bugs and added iovec interface. + * July 2003 - Jim Keniston - Added handling of packets logged from IRQ context. */ #include <linux/kernel.h> #include <linux/types.h> +#include <linux/interrupt.h> #include <linux/skbuff.h> #include <linux/socket.h> #include <linux/netlink.h> @@ -17,6 +19,33 @@ static struct sock *kerror_nl; +/* Packets logged from IRQ context are queued for broadcast by a tasklet. */ +static struct sk_buff_head delayed_pkts; +static void broadcast_delayed_pkts(unsigned long); +static DECLARE_TASKLET(delayed_pkts_tasklet, broadcast_delayed_pkts, 0); + +/** + * delayed_broadcast() - Schedule a tasklet to broadcast a packet. + * We want to broadcast the indicated packet, but can't because we're + * in a hardware interrupt and so can't call netlink_broadcast(). + * Schedule a tasklet to do the job. + * + * @skb: the socket buffer to broadcast + */ +static void delayed_broadcast(struct sk_buff *skb) +{ + skb_queue_tail(&delayed_pkts, skb); + tasklet_schedule(&delayed_pkts_tasklet); +} + +static void broadcast_delayed_pkts(unsigned long ignored) +{ + struct sk_buff *skb; + while ((skb = skb_dequeue(&delayed_pkts)) != NULL) { + (void) netlink_broadcast(kerror_nl, skb, 0, ~0, GFP_ATOMIC); + } +} + /** * kernel_error_event_iov() - Broadcast packet to NETLINK_KERROR sockets. * @iov: the packet's data @@ -54,6 +83,11 @@ NETLINK_CB(skb).dst_groups = groups; + if (in_irq()) { + delayed_broadcast(skb); + return -EINPROGRESS; + } + return netlink_broadcast(kerror_nl, skb, 0, ~0, GFP_ATOMIC); nlmsg_failure: @@ -85,6 +119,7 @@ if (kerror_nl == NULL) panic("kerror_init: cannot initialize kerror_nl\n"); + skb_queue_head_init(&delayed_pkts); return 0; } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [1/2] kernel error reporting (revised) 2003-07-17 19:13 ` Jim Keniston @ 2003-07-18 1:53 ` James Morris 2003-07-18 17:06 ` Jim Keniston 2003-07-18 23:29 ` Jim Keniston 0 siblings, 2 replies; 21+ messages in thread From: James Morris @ 2003-07-18 1:53 UTC (permalink / raw) To: Jim Keniston Cc: Andrew Morton, davem, linux-kernel, netdev, jgarzik, alan, rddunlap, kuznet On Thu, 17 Jul 2003, Jim Keniston wrote: > 3. Given the above, what should the evlog.c caller do when > kernel_error_event_iov() returns -EINPROGRESS? > a. Nothing. Figure the packet will probably get logged. > b. Just to be safe, report it via printk, the same way we report dropped > packets. > We currently do (a). (b) would mean that every event logged from IRQ > context would be cc-ed to printk. I don't think this irq detection logic should be added at all here, let the caller reschedule its logging if running in irq context. - James -- James Morris <jmorris@intercode.com.au> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [1/2] kernel error reporting (revised) 2003-07-18 1:53 ` James Morris @ 2003-07-18 17:06 ` Jim Keniston 2003-07-18 23:29 ` Jim Keniston 1 sibling, 0 replies; 21+ messages in thread From: Jim Keniston @ 2003-07-18 17:06 UTC (permalink / raw) To: James Morris Cc: Andrew Morton, davem, linux-kernel, netdev, jgarzik, alan, rddunlap, kuznet James Morris wrote: > > On Thu, 17 Jul 2003, Jim Keniston wrote: > > > 3. Given the above, what should the evlog.c caller do when > > kernel_error_event_iov() returns -EINPROGRESS? > > a. Nothing. Figure the packet will probably get logged. > > b. Just to be safe, report it via printk, the same way we report dropped > > packets. > > We currently do (a). (b) would mean that every event logged from IRQ > > context would be cc-ed to printk. > > I don't think this irq detection logic should be added at all here, let > the caller reschedule its logging if running in irq context. > > - James > -- > James Morris > <jmorris@intercode.com.au> Yes, this makes sense. At the kerror.c level, just return -EDEADLK if in_irq(). Delay packet delivery (via a tasklet, as before) at the evlog.c level instead. That way, we know at the evlog.c level (in the tasklet) whether the event packet was delivered to anybody, and can paraphrase it to printk if it wasn't. Is this the sort of thing you had in mind? Jim K ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [1/2] kernel error reporting (revised) 2003-07-18 1:53 ` James Morris 2003-07-18 17:06 ` Jim Keniston @ 2003-07-18 23:29 ` Jim Keniston 2003-07-19 23:52 ` James Morris 1 sibling, 1 reply; 21+ messages in thread From: Jim Keniston @ 2003-07-18 23:29 UTC (permalink / raw) To: James Morris Cc: Andrew Morton, davem, linux-kernel, netdev, jgarzik, alan, rddunlap, kuznet, jkenisto Jim Keniston wrote: > James Morris wrote: > > > > On Thu, 17 Jul 2003, Jim Keniston wrote: > > > > > 3. Given the above, what should the evlog.c caller do when > > > kernel_error_event_iov() returns -EINPROGRESS? > > > a. Nothing. Figure the packet will probably get logged. > > > b. Just to be safe, report it via printk, the same way we report dropped > > > packets. > > > We currently do (a). (b) would mean that every event logged from IRQ > > > context would be cc-ed to printk. > > > > I don't think this irq detection logic should be added at all here, let > > the caller reschedule its logging if running in irq context. > > > > - James > > -- > > James Morris > > <jmorris@intercode.com.au> > > Yes, this makes sense. At the kerror.c level, just return -EDEADLK if in_irq(). > Delay packet delivery (via a tasklet, as before) at the evlog.c level instead. > That way, we know at the evlog.c level (in the tasklet) whether the event packet > was delivered to anybody, and can paraphrase it to printk if it wasn't. > > Is this the sort of thing you had in mind? > Jim K I implemented the above change. Now, an event logged from an interrupt handler when nobody's listening to our socket (e.g., during boot) is paraphrased to printk. Here are the updated patches: http://prdownloads.sourceforge.net/evlog/kerror-2.5.75.patch?download http://prdownloads.sourceforge.net/evlog/evlog-2.5.75.patch?download http://prdownloads.sourceforge.net/evlog/kerrord.tar.gz?download Jim K ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] [1/2] kernel error reporting (revised) 2003-07-18 23:29 ` Jim Keniston @ 2003-07-19 23:52 ` James Morris 0 siblings, 0 replies; 21+ messages in thread From: James Morris @ 2003-07-19 23:52 UTC (permalink / raw) To: Jim Keniston Cc: Andrew Morton, davem, linux-kernel, netdev, jgarzik, alan, rddunlap, kuznet On Fri, 18 Jul 2003, Jim Keniston wrote: > > Yes, this makes sense. At the kerror.c level, just return -EDEADLK if in_irq(). > > Delay packet delivery (via a tasklet, as before) at the evlog.c level instead. > > That way, we know at the evlog.c level (in the tasklet) whether the event packet > > was delivered to anybody, and can paraphrase it to printk if it wasn't. > > > > Is this the sort of thing you had in mind? Not exactly -- I don't think the logging framework should do any irq detection. The caller should either know if its in an interrupt, or do the detection itself. - James -- James Morris <jmorris@intercode.com.au> ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <3F0AFFE6.E85FF283@us.ibm.com.suse.lists.linux.kernel>]
[parent not found: <20030708105912.57015026.akpm@osdl.org.suse.lists.linux.kernel>]
* Re: [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting [not found] ` <20030708105912.57015026.akpm@osdl.org.suse.lists.linux.kernel> @ 2003-07-08 19:14 ` Andi Kleen 0 siblings, 0 replies; 21+ messages in thread From: Andi Kleen @ 2003-07-08 19:14 UTC (permalink / raw) To: Andrew Morton; +Cc: jkenisto, linux-kernel Andrew Morton <akpm@osdl.org> writes: > The layout of `struct kern_log_entry' may be problematic. Think of the > situation where a 64-bit kernel constructs one of these and sends it up to > 32-bit userspace. Will the structure layout be the same under the 32-bit > compiler as under the 64-bit one? No it won't. Best is to order the fields by size (arrays ordered by their subtype). This should always give compatible alignment. -Andi ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-07-19 23:37 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-07-08 17:31 [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting Jim Keniston 2003-07-08 17:59 ` Andrew Morton 2003-07-08 19:22 ` Jim Keniston 2003-07-08 19:45 ` Jim Keniston 2003-07-10 4:42 ` James Morris 2003-07-10 19:08 ` Jim Keniston 2003-07-11 15:37 ` James Morris 2003-07-12 5:09 ` David S. Miller 2003-07-12 5:41 ` David S. Miller 2003-07-13 1:17 ` James Morris 2003-07-13 5:34 ` David S. Miller 2003-07-15 17:42 ` [PATCH] [1/2] kernel error reporting (revised) Jim Keniston 2003-07-15 17:45 ` [PATCH] [2/2] " Jim Keniston 2003-07-15 19:51 ` [PATCH] [1/2] " Andrew Morton 2003-07-15 23:10 ` kuznet 2003-07-17 19:13 ` Jim Keniston 2003-07-18 1:53 ` James Morris 2003-07-18 17:06 ` Jim Keniston 2003-07-18 23:29 ` Jim Keniston 2003-07-19 23:52 ` James Morris [not found] <3F0AFFE6.E85FF283@us.ibm.com.suse.lists.linux.kernel> [not found] ` <20030708105912.57015026.akpm@osdl.org.suse.lists.linux.kernel> 2003-07-08 19:14 ` [PATCH - RFC] [1/2] 2.6 must-fix list - kernel error reporting Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).