linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: [patch] log fatal signals like SIGSEGV
@ 2008-09-12 13:02 Thomas Jarosch
  2008-09-12 17:11 ` Marcin Slusarz
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Jarosch @ 2008-09-12 13:02 UTC (permalink / raw)
  To: linux-kernel

Hello together,

we've created a small patch that helped us troubleshoot obscure hardware
faults several times. Imagine someone calls and complains his box is
freezing from time to time. We usually check the logs and find various tasks
segfaulting between the freezes. Most of the time the hardware is broken.

Attached patch logs fatal signal like SIGSEGV or SIGBUS. It also features
a log flood protection, though I'm not sure if it works with dynamic ticks.
The code is a stripped down version of grsecurity's signal logger.

The patch doesn't have Kconfig support or other fancy stuff yet as I first
wanted to ask if it would make sense to integrate something like this
upstream. For us it made remote diagnosis a lot easier.

Please CC: comments.

Best regards,
Thomas Jarosch

--------------------------------------------------------
Log fatal signals like SIGSEGV or SIGBUS
to aid debugging of obscure problems.

The code is a stripped down version
of grsecurity's signal logger.

Signed-Off-By: Gerd v. Egidy <gve@intra2net.com>
Signed-Off-By: Thomas Jarosch <thomas.jarosch@intra2net.com>

diff -u -r -p linux-2.6.22/kernel/signal.c linux.siglog/kernel/signal.c
--- linux-2.6.22/kernel/signal.c	Mon Jul  9 01:32:17 2007
+++ linux.siglog/kernel/signal.c	Wed Aug 22 11:08:58 2007
@@ -514,6 +514,64 @@ static int rm_from_queue(unsigned long m
 }
 
 /*
+ * Stuff needed for signal logger
+ */
+
+spinlock_t siglog_lock = SPIN_LOCK_UNLOCKED;
+unsigned long volatile siglog_wtime = 0;
+unsigned long volatile siglog_fyet = 0;
+
+/* time span in which flooding is measured */
+#define SIGLOG_FLOOD_SECONDS 5
+
+/* how many log entries are allowed in this time span */
+#define SIGLOG_FLOOD_BURST_LINES 20
+
+/*
+ * Log fatal signals
+ */
+void
+log_fatal_signal(const int sig, const struct task_struct *t)
+{
+	if ((sig == SIGSEGV) || (sig == SIGILL) || (sig == SIGABRT)
+        || (sig == SIGBUS) || (sig == SIGKILL) || (sig == SIGFPE)) {
+
+		/* flood protection */
+		spin_lock(&siglog_lock);
+		if (!siglog_wtime || jiffies - siglog_wtime > SIGLOG_FLOOD_SECONDS * HZ) {
+			/* no logging activity yet */
+			siglog_wtime = jiffies;
+			siglog_fyet = 0;
+		} else if ((jiffies - siglog_wtime <= SIGLOG_FLOOD_SECONDS * HZ)
+                    && (siglog_fyet < SIGLOG_FLOOD_BURST_LINES)) {
+			/* logging within SIGLOG_FLOOD_SECONDS, but below threshold */
+			siglog_fyet++;
+		} else {
+			/* flooding detected, warn once and return */
+			if (siglog_fyet == SIGLOG_FLOOD_BURST_LINES) {
+				siglog_wtime = jiffies;
+				siglog_fyet++;
+				printk(KERN_ALERT "siglog: more alerts, logging disabled for"
+                       " %d seconds\n", SIGLOG_FLOOD_SECONDS);
+			}
+			spin_unlock(&siglog_lock);
+			return;
+		}
+		spin_unlock(&siglog_lock);
+
+		read_lock(&tasklist_lock);
+		printk(KERN_WARNING "signal %d sent to %.30s[%d] uid:%u, "
+               "parent %.30s[%d] uid:%u by %.30s[%d] uid:%u, "
+               "parent %.30s[%d] uid:%u\n", sig, t->comm, t->pid, t->uid,
+               t->parent->comm, t->parent->pid, t->parent->uid,
+               current->comm, current->pid, current->uid,
+               current->parent->comm, current->parent->pid, current->parent->uid);
+		read_unlock(&tasklist_lock);
+	}
+	return;
+}
+
+/*
  * Bad permissions for sending the signal
  */
 static int check_kill_permission(int sig, struct siginfo *info,
@@ -536,6 +594,8 @@ static int check_kill_permission(int sig
 	    && !capable(CAP_KILL))
 		return error;
 
+	log_fatal_signal(sig, t);
+
 	return security_task_kill(t, info, sig, 0);
 }
 
@@ -773,6 +833,9 @@ force_sig_info(int sig, struct siginfo *
 		}
 	}
 	ret = specific_send_sig_info(sig, info, t);
+
+	log_fatal_signal(sig, t);
+
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: [patch] log fatal signals like SIGSEGV
  2008-09-12 13:02 RFC: [patch] log fatal signals like SIGSEGV Thomas Jarosch
@ 2008-09-12 17:11 ` Marcin Slusarz
  2008-09-16 12:59   ` Thomas Jarosch
  0 siblings, 1 reply; 13+ messages in thread
From: Marcin Slusarz @ 2008-09-12 17:11 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: linux-kernel

On Fri, Sep 12, 2008 at 03:02:14PM +0200, Thomas Jarosch wrote:
> --------------------------------------------------------
> Log fatal signals like SIGSEGV or SIGBUS

Note that on current kernel when process segfaults it printks:
a.out[5974]: segfault at 0 ip 00000000004004c1 sp 00007fffdd1a3ce0 error 6 in a.out[400000+1000]

> to aid debugging of obscure problems.
> 
> The code is a stripped down version
> of grsecurity's signal logger.
> 
> Signed-Off-By: Gerd v. Egidy <gve@intra2net.com>
> Signed-Off-By: Thomas Jarosch <thomas.jarosch@intra2net.com>

Please look at Documentation/SubmittingPatches.
(Signed-off-by capitalization and lack of (?) From field)

> 
> diff -u -r -p linux-2.6.22/kernel/signal.c linux.siglog/kernel/signal.c
> --- linux-2.6.22/kernel/signal.c	Mon Jul  9 01:32:17 2007
> +++ linux.siglog/kernel/signal.c	Wed Aug 22 11:08:58 2007
> @@ -514,6 +514,64 @@ static int rm_from_queue(unsigned long m
>  }
>  
>  /*
> + * Stuff needed for signal logger
> + */
> +
> +spinlock_t siglog_lock = SPIN_LOCK_UNLOCKED;

Please use DEFINE_SPINLOCK if spinlock is really needed (look below).

> +unsigned long volatile siglog_wtime = 0;
> +unsigned long volatile siglog_fyet = 0;
> +
> +/* time span in which flooding is measured */
> +#define SIGLOG_FLOOD_SECONDS 5
> +
> +/* how many log entries are allowed in this time span */
> +#define SIGLOG_FLOOD_BURST_LINES 20
> +
> +/*
> + * Log fatal signals
> + */
> +void
> +log_fatal_signal(const int sig, const struct task_struct *t)
> +{
> +	if ((sig == SIGSEGV) || (sig == SIGILL) || (sig == SIGABRT)
> +        || (sig == SIGBUS) || (sig == SIGKILL) || (sig == SIGFPE)) {

It would be clearer if you could write this code as:

if (!sth)
	return;
do_other_stuff;

> +
> +		/* flood protection */
> +		spin_lock(&siglog_lock);
> +		if (!siglog_wtime || jiffies - siglog_wtime > SIGLOG_FLOOD_SECONDS * HZ) {
> +			/* no logging activity yet */
> +			siglog_wtime = jiffies;
> +			siglog_fyet = 0;
> +		} else if ((jiffies - siglog_wtime <= SIGLOG_FLOOD_SECONDS * HZ)
> +                    && (siglog_fyet < SIGLOG_FLOOD_BURST_LINES)) {
> +			/* logging within SIGLOG_FLOOD_SECONDS, but below threshold */
> +			siglog_fyet++;
> +		} else {
> +			/* flooding detected, warn once and return */
> +			if (siglog_fyet == SIGLOG_FLOOD_BURST_LINES) {
> +				siglog_wtime = jiffies;
> +				siglog_fyet++;
> +				printk(KERN_ALERT "siglog: more alerts, logging disabled for"
> +                       " %d seconds\n", SIGLOG_FLOOD_SECONDS);
> +			}
> +			spin_unlock(&siglog_lock);
> +			return;
> +		}
> +		spin_unlock(&siglog_lock);

You probably want to use printk_ratelimit or DEFINE_RATELIMIT_STATE/__ratelimit.

> +
> +		read_lock(&tasklist_lock);
> +		printk(KERN_WARNING "signal %d sent to %.30s[%d] uid:%u, "
> +               "parent %.30s[%d] uid:%u by %.30s[%d] uid:%u, "
> +               "parent %.30s[%d] uid:%u\n", sig, t->comm, t->pid, t->uid,
> +               t->parent->comm, t->parent->pid, t->parent->uid,
> +               current->comm, current->pid, current->uid,
> +               current->parent->comm, current->parent->pid, current->parent->uid);
> +		read_unlock(&tasklist_lock);
> +	}
> +	return;
> +}
> +
> +/*
>   * Bad permissions for sending the signal
>   */
>  static int check_kill_permission(int sig, struct siginfo *info,
> @@ -536,6 +594,8 @@ static int check_kill_permission(int sig
>  	    && !capable(CAP_KILL))
>  		return error;
>  
> +	log_fatal_signal(sig, t);
> +
>  	return security_task_kill(t, info, sig, 0);
>  }
>  
> @@ -773,6 +833,9 @@ force_sig_info(int sig, struct siginfo *
>  		}
>  	}
>  	ret = specific_send_sig_info(sig, info, t);
> +
> +	log_fatal_signal(sig, t);
> +
>  	spin_unlock_irqrestore(&t->sighand->siglock, flags);
>  
>  	return ret;
> 
> --

Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: [patch] log fatal signals like SIGSEGV
  2008-09-12 17:11 ` Marcin Slusarz
@ 2008-09-16 12:59   ` Thomas Jarosch
  2008-09-16 17:42     ` Marcin Slusarz
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Jarosch @ 2008-09-16 12:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcin Slusarz

On Friday, 12. September 2008 19:11:09 Marcin Slusarz wrote:
> Note that on current kernel when process segfaults it printks:
> a.out[5974]: segfault at 0 ip 00000000004004c1 sp 00007fffdd1a3ce0 error 6
> in a.out[400000+1000]

Thank you for your feedback Marcin, it was really helpful.

The log message above is a good start. There's also the
"print-fatal-signals" kernel parameter which prints out
much more than just the basic info:

 =======================
sleep/3277: potentially unexpected fatal signal 11.
code at 001b7571: 89 d3 3d 01 f0 ff ff 73 01 c3 53 e8 00 00 00 00

Pid: 3277, comm: sleep Not tainted (2.6.26-2.i2nsmp #7)
EIP: 0073:[<001b7571>] EFLAGS: 00000246 CPU: 0
EIP is at 0x1b7571
EAX: fffffdfc EBX: bffae0d4 ECX: bffae0d4 EDX: 00215c80
ESI: bffae0d4 EDI: bffae1e8 EBP: bffae268 ESP: bffae0a0
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b
CR0: 8005003b CR2: b7fbf55c CR3: 369db000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 =======================

This is way too much information to leave it on all the time.

The new code just adds one line of information to the log
and is rate limited. The idea is to aid troubleshooting
later on by looking through the logs. I guess you'll
hardly notice it during normal system operation.

> Please look at Documentation/SubmittingPatches.
> (Signed-off-by capitalization and lack of (?) From field)

scripts/checkpatch.pl is happy now :-)

I've reworked the locking using CONFIG_LOCKDEP and ran  

while /bin/true; do
  sleep 10 &
  kill -9 $!
done

for over twenty minutes with no noticable issue.
The printk_ratelimit() works perfect.

Here's the new version:
-----------------------------------------------------------------
From: Thomas Jarosch <thomas.jarosch@intra2net.com>

Log the signals SIGSEGV, SIGILL, SIGABRT, SIGBUS, SIGKILL and SIGFPE
to aid debugging of obscure problems. Also logs the sender of the signal.

The log message looks like this:
"kernel: signal 9 sent to freezed[2634] uid:100,
 parent init[1] uid:0 by bash[3168] uid:0, parent sshd[3164] uid:0"

The printing code is based on grsecurity's signal logger.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Gerd v. Egidy <gve@intra2net.com>

diff -u -r -p linux-2.6.26.vanilla/kernel/signal.c linux-2.6.26/kernel/signal.c
--- linux-2.6.26.vanilla/kernel/signal.c	Tue Sep 16 13:45:34 2008
+++ linux-2.6.26/kernel/signal.c	Tue Sep 16 14:02:54 2008
@@ -801,6 +801,24 @@ static inline int legacy_queue(struct si
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
+static void log_signal_and_sender(const int sig, const struct task_struct *t)
+{
+	if (!((sig == SIGSEGV) || (sig == SIGILL) || (sig == SIGABRT)
+		|| (sig == SIGBUS) || (sig == SIGKILL) || (sig == SIGFPE)))
+			return;
+
+	if (printk_ratelimit()) {
+		/* Note: tasklist_lock is already locked by siglock */
+		printk(KERN_WARNING "signal %d sent to %.30s[%d] uid:%u, "
+				"parent %.30s[%d] uid:%u by %.30s[%d] uid:%u, "
+				"parent %.30s[%d] uid:%u\n", sig, t->comm,
+				t->pid, t->uid, t->parent->comm, t->parent->pid,
+				t->parent->uid, current->comm, current->pid,
+				current->uid, current->parent->comm,
+				current->parent->pid, current->parent->uid);
+	}
+}
+
 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			int group)
 {
@@ -810,6 +828,8 @@ static int send_signal(int sig, struct s
 	assert_spin_locked(&t->sighand->siglock);
 	if (!prepare_signal(sig, t))
 		return 0;
+
+	log_signal_and_sender(sig, t);
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
 	/*


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: [patch] log fatal signals like SIGSEGV
  2008-09-16 12:59   ` Thomas Jarosch
@ 2008-09-16 17:42     ` Marcin Slusarz
  2008-09-17  8:12       ` Thomas Jarosch
  2008-09-18 10:10       ` Thomas Jarosch
  0 siblings, 2 replies; 13+ messages in thread
From: Marcin Slusarz @ 2008-09-16 17:42 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: linux-kernel

On Tue, Sep 16, 2008 at 02:59:16PM +0200, Thomas Jarosch wrote:
> Here's the new version:
> -----------------------------------------------------------------
> From: Thomas Jarosch <thomas.jarosch@intra2net.com>
> 
> Log the signals SIGSEGV, SIGILL, SIGABRT, SIGBUS, SIGKILL and SIGFPE
> to aid debugging of obscure problems. Also logs the sender of the signal.
> 
> The log message looks like this:
> "kernel: signal 9 sent to freezed[2634] uid:100,
>  parent init[1] uid:0 by bash[3168] uid:0, parent sshd[3164] uid:0"
> 
> The printing code is based on grsecurity's signal logger.
> 
> Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
> Signed-off-by: Gerd v. Egidy <gve@intra2net.com>
> 
> diff -u -r -p linux-2.6.26.vanilla/kernel/signal.c linux-2.6.26/kernel/signal.c
> --- linux-2.6.26.vanilla/kernel/signal.c	Tue Sep 16 13:45:34 2008
> +++ linux-2.6.26/kernel/signal.c	Tue Sep 16 14:02:54 2008
> @@ -801,6 +801,24 @@ static inline int legacy_queue(struct si
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
>  }
>  
> +static void log_signal_and_sender(const int sig, const struct task_struct *t)
> +{
> +	if (!((sig == SIGSEGV) || (sig == SIGILL) || (sig == SIGABRT)
> +		|| (sig == SIGBUS) || (sig == SIGKILL) || (sig == SIGFPE)))
> +			return;
> +
> +	if (printk_ratelimit()) {
> +		/* Note: tasklist_lock is already locked by siglock */
> +		printk(KERN_WARNING "signal %d sent to %.30s[%d] uid:%u, "
> +				"parent %.30s[%d] uid:%u by %.30s[%d] uid:%u, "
> +				"parent %.30s[%d] uid:%u\n", sig, t->comm,
> +				t->pid, t->uid, t->parent->comm, t->parent->pid,
> +				t->parent->uid, current->comm, current->pid,
> +				current->uid, current->parent->comm,
> +				current->parent->pid, current->parent->uid);
> +	}
> +}
> +
>  static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
>  			int group)
>  {
> @@ -810,6 +828,8 @@ static int send_signal(int sig, struct s
>  	assert_spin_locked(&t->sighand->siglock);
>  	if (!prepare_signal(sig, t))
>  		return 0;
> +
> +	log_signal_and_sender(sig, t);
>  
>  	pending = group ? &t->signal->shared_pending : &t->pending;
>  	/*
> 

It looks much better now. But I don't think it will go in as is.
Maybe you can disable it by default and create a sysctl switch?

Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: [patch] log fatal signals like SIGSEGV
  2008-09-16 17:42     ` Marcin Slusarz
@ 2008-09-17  8:12       ` Thomas Jarosch
  2008-09-18 10:10       ` Thomas Jarosch
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Jarosch @ 2008-09-17  8:12 UTC (permalink / raw)
  To: Marcin Slusarz; +Cc: linux-kernel

On Tuesday, 16. September 2008 19:42:17 you wrote:
> It looks much better now. But I don't think it will go in as is.
> Maybe you can disable it by default and create a sysctl switch?

The idea is to enable it by default :-) Normally you shouldn't see -any- 
output except for broken hardware/software. That's something you want to get 
notified about and be able to trace back later on.

I guess it might be a good idea to make the SIGKILL output optional via sysctl 
and disabled by default as this could trigger during normal system operation
("Sending all processes the KILL signal" during shutdown). Would that be ok?

Thomas


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: [patch] log fatal signals like SIGSEGV
  2008-09-16 17:42     ` Marcin Slusarz
  2008-09-17  8:12       ` Thomas Jarosch
@ 2008-09-18 10:10       ` Thomas Jarosch
  2008-09-18 20:20         ` Marcin Slusarz
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Jarosch @ 2008-09-18 10:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcin Slusarz

> It looks much better now. But I don't think it will go in as is.
> Maybe you can disable it by default and create a sysctl switch?

Thinking about it some more, I've added a "signal-log-level" sysctl var
including documentation. The patch applies nicely to 2.6.26 and 2.6 HEAD.

The idea is to default to log level 1 and log fatal signals only.
Log output should be close to zero during normal system operation.

There is a bit of a naming clash with "print-fatal-signals", though that
should be called "debug-fatal-signals" because of all the register dumps etc.
I don't want to rename it as it would unnecessarily cause issues
and it's debug-only (Documentation/kernel-parameters.txt) anyway.

Enjoy.

------------------------------------------------------
From: Thomas Jarosch <thomas.jarosch@intra2net.com>

Log signals like SIGSEGV, SIGILL, SIGBUS or SIGFPE to aid tracing
of obscure problems. Also logs the sender of the signal.

The log message looks like this:
"kernel: signal 9 sent to freezed[2634] uid:100,
 parent init[1] uid:0 by bash[3168] uid:0, parent sshd[3164] uid:0"

You can control the degree of logging via sysctl: "signal-log-level"
    0 - Signal logging disabled
    1 - Log SIGSEGV, SIGILL, SIGBUS and SIGPFE (default)
    2 - Log SIGKILL and SIGABRT and all signals from log level 1
    3 or higher: Log all signals

The printing code is based on grsecurity's signal logger.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Gerd v. Egidy <gve@intra2net.com>

diff -u -r -p linux-2.6.26.vanilla/kernel/signal.c linux-2.6.26/kernel/signal.c
--- linux-2.6.26.vanilla/kernel/signal.c	Tue Sep 16 13:45:34 2008
+++ linux-2.6.26/kernel/signal.c	Thu Sep 18 10:43:27 2008
@@ -796,6 +796,35 @@ static void complete_signal(int sig, str
 	return;
 }
 
+int signal_log_level __read_mostly = 1;
+
+static void log_signal(const int sig, const struct task_struct *t)
+{
+	bool log_signal = false;
+
+	if(signal_log_level >= 1 && (sig == SIGSEGV || sig == SIGILL
+			|| sig == SIGBUS || sig == SIGFPE))
+		log_signal = true;
+	else if (signal_log_level >= 2 && (sig == SIGKILL || sig == SIGABRT))
+		log_signal = true;
+	else if (signal_log_level >= 3)
+		log_signal = true;
+
+	if (!log_signal)
+		return;
+
+	if (printk_ratelimit()) {
+		/* Don't lock "tasklist_lock" here as it's already locked by "siglock" */
+		printk(KERN_WARNING "signal %d sent to %.30s[%d] uid:%u, "
+				"parent %.30s[%d] uid:%u by %.30s[%d] uid:%u, "
+				"parent %.30s[%d] uid:%u\n", sig, t->comm,
+				t->pid, t->uid, t->parent->comm, t->parent->pid,
+				t->parent->uid, current->comm, current->pid,
+				current->uid, current->parent->comm,
+				current->parent->pid, current->parent->uid);
+	}
+}
+
 static inline int legacy_queue(struct sigpending *signals, int sig)
 {
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
@@ -810,6 +839,8 @@ static int send_signal(int sig, struct s
 	assert_spin_locked(&t->sighand->siglock);
 	if (!prepare_signal(sig, t))
 		return 0;
+
+	log_signal(sig, t);
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
 	/*
diff -u -r -p linux-2.6.26.vanilla/kernel/sysctl.c linux-2.6.26/kernel/sysctl.c
--- linux-2.6.26.vanilla/kernel/sysctl.c	Sun Jul 13 23:51:29 2008
+++ linux-2.6.26/kernel/sysctl.c	Thu Sep 18 10:08:47 2008
@@ -63,6 +63,7 @@ static int deprecated_sysctl_warning(str
 /* External variables not in a header file. */
 extern int C_A_D;
 extern int print_fatal_signals;
+extern int signal_log_level;
 extern int sysctl_overcommit_memory;
 extern int sysctl_overcommit_ratio;
 extern int sysctl_panic_on_oom;
@@ -398,6 +428,14 @@ static struct ctl_table kern_table[] = {
 		.ctl_name	= CTL_UNNUMBERED,
 		.procname	= "print-fatal-signals",
 		.data		= &print_fatal_signals,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "signal-log-level",
+		.data		= &signal_log_level,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
diff -u -r linux-2.6.26.vanilla/Documentation/sysctl/kernel.txt linux-2.6.26/Documentation/sysctl/kernel.txt
--- linux-2.6.26.vanilla/Documentation/sysctl/kernel.txt	Sun Jul 13 23:51:29 2008
+++ linux-2.6.26/Documentation/sysctl/kernel.txt	Thu Sep 18 10:50:13 2008
@@ -47,6 +47,7 @@
 - rtsig-max
 - rtsig-nr
 - sem
+- signal-log-level
 - sg-big-buff                 [ generic SCSI device (sg) ]
 - shmall
 - shmmax                      [ sysv ipc ]
@@ -349,6 +350,21 @@
 
 ==============================================================
 
+signal-log-level: 
+
+Brief logging of signal and sender to aid
+tracing of obsucure problems later on.
+
+  0 - Signal logging disabled
+
+  1 - Log SIGSEGV, SIGILL, SIGBUS and SIGPFE (default)
+
+  2 - Log SIGKILL and SIGABRT and all signals from log level 1
+  
+  3 or higher: Log all signals
+
+==============================================================
+
 softlockup_thresh:
 
 This value can be used to lower the softlockup tolerance



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: [patch] log fatal signals like SIGSEGV
  2008-09-18 10:10       ` Thomas Jarosch
@ 2008-09-18 20:20         ` Marcin Slusarz
  2008-09-20 17:12           ` Thomas Jarosch
  0 siblings, 1 reply; 13+ messages in thread
From: Marcin Slusarz @ 2008-09-18 20:20 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: linux-kernel

On Thu, Sep 18, 2008 at 12:10:39PM +0200, Thomas Jarosch wrote:
> > It looks much better now. But I don't think it will go in as is.
> > Maybe you can disable it by default and create a sysctl switch?
> 
> Thinking about it some more, I've added a "signal-log-level" sysctl var
> including documentation. The patch applies nicely to 2.6.26 and 2.6 HEAD.
> 
> The idea is to default to log level 1 and log fatal signals only.
> Log output should be close to zero during normal system operation.
> 
> There is a bit of a naming clash with "print-fatal-signals", though that
> should be called "debug-fatal-signals" because of all the register dumps etc.
> I don't want to rename it as it would unnecessarily cause issues
> and it's debug-only (Documentation/kernel-parameters.txt) anyway.
> 
> Enjoy.
> 
> ------------------------------------------------------
> From: Thomas Jarosch <thomas.jarosch@intra2net.com>
> 
> Log signals like SIGSEGV, SIGILL, SIGBUS or SIGFPE to aid tracing
> of obscure problems. Also logs the sender of the signal.
> 
> The log message looks like this:
> "kernel: signal 9 sent to freezed[2634] uid:100,
>  parent init[1] uid:0 by bash[3168] uid:0, parent sshd[3164] uid:0"
> 
> You can control the degree of logging via sysctl: "signal-log-level"
>     0 - Signal logging disabled
>     1 - Log SIGSEGV, SIGILL, SIGBUS and SIGPFE (default)
                                          ^^^^^^

>     2 - Log SIGKILL and SIGABRT and all signals from log level 1
>     3 or higher: Log all signals
> 
> The printing code is based on grsecurity's signal logger.
> 
> Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
> Signed-off-by: Gerd v. Egidy <gve@intra2net.com>
> 
> diff -u -r -p linux-2.6.26.vanilla/kernel/signal.c linux-2.6.26/kernel/signal.c
> --- linux-2.6.26.vanilla/kernel/signal.c	Tue Sep 16 13:45:34 2008
> +++ linux-2.6.26/kernel/signal.c	Thu Sep 18 10:43:27 2008
> @@ -796,6 +796,35 @@ static void complete_signal(int sig, str
>  	return;
>  }
>  
> +int signal_log_level __read_mostly = 1;
> +
> +static void log_signal(const int sig, const struct task_struct *t)
> +{
> +	bool log_signal = false;
> +
> +	if(signal_log_level >= 1 && (sig == SIGSEGV || sig == SIGILL
> +			|| sig == SIGBUS || sig == SIGFPE))
> +		log_signal = true;
> +	else if (signal_log_level >= 2 && (sig == SIGKILL || sig == SIGABRT))
> +		log_signal = true;
> +	else if (signal_log_level >= 3)
> +		log_signal = true;
> +
> +	if (!log_signal)
> +		return;
> +
> +	if (printk_ratelimit()) {
> +		/* Don't lock "tasklist_lock" here as it's already locked by "siglock" */
> +		printk(KERN_WARNING "signal %d sent to %.30s[%d] uid:%u, "
> +				"parent %.30s[%d] uid:%u by %.30s[%d] uid:%u, "
> +				"parent %.30s[%d] uid:%u\n", sig, t->comm,
> +				t->pid, t->uid, t->parent->comm, t->parent->pid,
> +				t->parent->uid, current->comm, current->pid,
> +				current->uid, current->parent->comm,
> +				current->parent->pid, current->parent->uid);
> +	}
> +}
> +
>  static inline int legacy_queue(struct sigpending *signals, int sig)
>  {
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
> @@ -810,6 +839,8 @@ static int send_signal(int sig, struct s
>  	assert_spin_locked(&t->sighand->siglock);
>  	if (!prepare_signal(sig, t))
>  		return 0;
> +
> +	log_signal(sig, t);
>  
>  	pending = group ? &t->signal->shared_pending : &t->pending;
>  	/*
> diff -u -r -p linux-2.6.26.vanilla/kernel/sysctl.c linux-2.6.26/kernel/sysctl.c
> --- linux-2.6.26.vanilla/kernel/sysctl.c	Sun Jul 13 23:51:29 2008
> +++ linux-2.6.26/kernel/sysctl.c	Thu Sep 18 10:08:47 2008
> @@ -63,6 +63,7 @@ static int deprecated_sysctl_warning(str
>  /* External variables not in a header file. */
>  extern int C_A_D;
>  extern int print_fatal_signals;
> +extern int signal_log_level;
>  extern int sysctl_overcommit_memory;
>  extern int sysctl_overcommit_ratio;
>  extern int sysctl_panic_on_oom;
> @@ -398,6 +428,14 @@ static struct ctl_table kern_table[] = {
>  		.ctl_name	= CTL_UNNUMBERED,
>  		.procname	= "print-fatal-signals",
>  		.data		= &print_fatal_signals,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= &proc_dointvec,
> +	},
> +	{
> +		.ctl_name	= CTL_UNNUMBERED,
> +		.procname	= "signal-log-level",
> +		.data		= &signal_log_level,
>  		.maxlen		= sizeof(int),
>  		.mode		= 0644,
>  		.proc_handler	= &proc_dointvec,
> diff -u -r linux-2.6.26.vanilla/Documentation/sysctl/kernel.txt linux-2.6.26/Documentation/sysctl/kernel.txt
> --- linux-2.6.26.vanilla/Documentation/sysctl/kernel.txt	Sun Jul 13 23:51:29 2008
> +++ linux-2.6.26/Documentation/sysctl/kernel.txt	Thu Sep 18 10:50:13 2008
> @@ -47,6 +47,7 @@
>  - rtsig-max
>  - rtsig-nr
>  - sem
> +- signal-log-level
>  - sg-big-buff                 [ generic SCSI device (sg) ]
>  - shmall
>  - shmmax                      [ sysv ipc ]
> @@ -349,6 +350,21 @@
>  
>  ==============================================================
>  
> +signal-log-level: 
> +
> +Brief logging of signal and sender to aid
> +tracing of obsucure problems later on.
              ^^^^^^^^

> +
> +  0 - Signal logging disabled
> +
> +  1 - Log SIGSEGV, SIGILL, SIGBUS and SIGPFE (default)
                                         ^^^^^^

> +
> +  2 - Log SIGKILL and SIGABRT and all signals from log level 1
> +  
> +  3 or higher: Log all signals
> +
> +==============================================================
> +
>  softlockup_thresh:
>  
>  This value can be used to lower the softlockup tolerance
> 
> 

When fix typos you can add:
Reviewed-by: Marcin Slusarz <marcin.slusarz@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch] log fatal signals like SIGSEGV
  2008-09-18 20:20         ` Marcin Slusarz
@ 2008-09-20 17:12           ` Thomas Jarosch
  2008-09-21 19:05             ` Mikael Pettersson
  2008-09-22 23:52             ` Jiri Kosina
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Jarosch @ 2008-09-20 17:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcin Slusarz

Hello together,

here's the reworked version of the signal logging patch.
Many thanks to Marcin Slusarz for reviewing it. 

The idea is to briefly log "fatal" signals by default, the output
should be close to zero during normal system operation.
Faulty hardware gets detected much quicker
if you see crashing apps in the logs.

The patch applies to 2.6.26 and git HEAD.
scripts/checkpatch.pl is happy, too.

I'm off for a vacation, so expect a delay of some days
if there should be additional questions or comments.

Hope the new feature is as useful to others as it has been to me.

Enjoy,
Thomas

------------------------------------------------------
From: Thomas Jarosch <thomas.jarosch@intra2net.com>

Log signals like SIGSEGV, SIGILL, SIGBUS or SIGFPE to aid tracing
of obscure problems. Also logs the sender of the signal.

The log message looks like this:
"kernel: signal 9 sent to freezed[2634] uid:100,
 parent init[1] uid:0 by bash[3168] uid:0, parent sshd[3164] uid:0"

You can control the degree of logging via sysctl: "signal-log-level"
    0 - Signal logging disabled
    1 - Log SIGSEGV, SIGILL, SIGBUS and SIGFPE (default)
    2 - Log SIGKILL and SIGABRT and all signals from log level 1
    3 or higher: Log all signals

The printing code is based on grsecurity's signal logger.

Reviewed-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Gerd v. Egidy <gve@intra2net.com>
------------------------------------------------------
diff -u -r -p linux-2.6.26.vanilla/kernel/signal.c linux-2.6.26/kernel/signal.c
--- linux-2.6.26.vanilla/kernel/signal.c	Tue Sep 16 13:45:34 2008
+++ linux-2.6.26/kernel/signal.c	Thu Sep 18 10:43:27 2008
@@ -796,6 +796,35 @@ static void complete_signal(int sig, str
 	return;
 }
 
+int signal_log_level __read_mostly = 1;
+
+static void log_signal(const int sig, const struct task_struct *t)
+{
+	bool log_signal = false;
+
+	if (signal_log_level >= 1 && (sig == SIGSEGV || sig == SIGILL
+			|| sig == SIGBUS || sig == SIGFPE))
+		log_signal = true;
+	else if (signal_log_level >= 2 && (sig == SIGKILL || sig == SIGABRT))
+		log_signal = true;
+	else if (signal_log_level >= 3)
+		log_signal = true;
+
+	if (!log_signal)
+		return;
+
+	if (printk_ratelimit()) {
+		/* Note: tasklist_lock is already locked by siglock */
+		printk(KERN_WARNING "signal %d sent to %.30s[%d] uid:%u, "
+				"parent %.30s[%d] uid:%u by %.30s[%d] uid:%u, "
+				"parent %.30s[%d] uid:%u\n", sig, t->comm,
+				t->pid, t->uid, t->parent->comm, t->parent->pid,
+				t->parent->uid, current->comm, current->pid,
+				current->uid, current->parent->comm,
+				current->parent->pid, current->parent->uid);
+	}
+}
+
 static inline int legacy_queue(struct sigpending *signals, int sig)
 {
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
@@ -810,6 +839,8 @@ static int send_signal(int sig, struct s
 	assert_spin_locked(&t->sighand->siglock);
 	if (!prepare_signal(sig, t))
 		return 0;
+
+	log_signal(sig, t);
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
 	/*
diff -u -r -p linux-2.6.26.vanilla/kernel/sysctl.c linux-2.6.26/kernel/sysctl.c
--- linux-2.6.26.vanilla/kernel/sysctl.c	Sun Jul 13 23:51:29 2008
+++ linux-2.6.26/kernel/sysctl.c	Thu Sep 18 10:08:47 2008
@@ -63,6 +63,7 @@ static int deprecated_sysctl_warning(str
 /* External variables not in a header file. */
 extern int C_A_D;
 extern int print_fatal_signals;
+extern int signal_log_level;
 extern int sysctl_overcommit_memory;
 extern int sysctl_overcommit_ratio;
 extern int sysctl_panic_on_oom;
@@ -398,6 +428,14 @@ static struct ctl_table kern_table[] = {
 		.ctl_name	= CTL_UNNUMBERED,
 		.procname	= "print-fatal-signals",
 		.data		= &print_fatal_signals,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "signal-log-level",
+		.data		= &signal_log_level,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
diff -u -r linux-2.6.26.vanilla/Documentation/sysctl/kernel.txt linux-2.6.26/Documentation/sysctl/kernel.txt
--- linux-2.6.26.vanilla/Documentation/sysctl/kernel.txt	Sun Jul 13 23:51:29 2008
+++ linux-2.6.26/Documentation/sysctl/kernel.txt	Thu Sep 18 10:50:13 2008
@@ -47,6 +47,7 @@
 - rtsig-max
 - rtsig-nr
 - sem
+- signal-log-level
 - sg-big-buff                 [ generic SCSI device (sg) ]
 - shmall
 - shmmax                      [ sysv ipc ]
@@ -349,6 +350,21 @@
 
 ==============================================================
 
+signal-log-level:
+
+Brief logging of signal and sender to aid
+tracing of obscure problems later on.
+
+  0 - Signal logging disabled
+
+  1 - Log SIGSEGV, SIGILL, SIGBUS and SIGFPE (default)
+
+  2 - Log SIGKILL and SIGABRT and all signals from log level 1
+
+  3 or higher: Log all signals
+
+==============================================================
+
 softlockup_thresh:
 
 This value can be used to lower the softlockup tolerance


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] log fatal signals like SIGSEGV
  2008-09-20 17:12           ` Thomas Jarosch
@ 2008-09-21 19:05             ` Mikael Pettersson
  2008-09-21 19:15               ` Bernd Eckenfels
  2008-10-06  8:53               ` Thomas Jarosch
  2008-09-22 23:52             ` Jiri Kosina
  1 sibling, 2 replies; 13+ messages in thread
From: Mikael Pettersson @ 2008-09-21 19:05 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: linux-kernel, Marcin Slusarz

Thomas Jarosch writes:
 > Hello together,
 > 
 > here's the reworked version of the signal logging patch.
 > Many thanks to Marcin Slusarz for reviewing it. 
 > 
 > The idea is to briefly log "fatal" signals by default, the output
 > should be close to zero during normal system operation.
 > Faulty hardware gets detected much quicker
 > if you see crashing apps in the logs.
 > 
 > The patch applies to 2.6.26 and git HEAD.
 > scripts/checkpatch.pl is happy, too.
 > 
 > I'm off for a vacation, so expect a delay of some days
 > if there should be additional questions or comments.
 > 
 > Hope the new feature is as useful to others as it has been to me.
 > 
 > Enjoy,
 > Thomas
 > 
 > ------------------------------------------------------
 > From: Thomas Jarosch <thomas.jarosch@intra2net.com>
 > 
 > Log signals like SIGSEGV, SIGILL, SIGBUS or SIGFPE to aid tracing
 > of obscure problems. Also logs the sender of the signal.

I believe the approach taken in this patch is broken:

1. The signal logging decision is taken before signal delivery,
   which causes *handled* signals in the above list to be logged.
   So your printk_ratelimit() can be swamped by handled signals
   causing it to not log unhandled fatal signals.

   Applications that handle SEGV/BUS/ILL/FPE aren't that uncommon.

2. Fatal signals are only interesting if they are self-generated.
   Signals sent from other processes or threads are uninteresting,
   if the purpose is to detect program errors or faulty hardware.

3. Similar functionality already exists in the kernel, except
   it correctly runs much later in the signal delivery path.
   Grep for print_fatal_signals and show_unhandled_signals.

   There's also some trace hooks in the signal delivery path
   that look like they could log actual fatal signals.

/Mikael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] log fatal signals like SIGSEGV
  2008-09-21 19:05             ` Mikael Pettersson
@ 2008-09-21 19:15               ` Bernd Eckenfels
  2008-09-21 19:40                 ` Mikael Pettersson
  2008-10-06  8:53               ` Thomas Jarosch
  1 sibling, 1 reply; 13+ messages in thread
From: Bernd Eckenfels @ 2008-09-21 19:15 UTC (permalink / raw)
  To: linux-kernel

In article <18646.39701.819497.337394@harpo.it.uu.se> you wrote:
>   Applications that handle SEGV/BUS/ILL/FPE aren't that uncommon.

JRE uses SIGSEGV for (expected) NullPointers. So you might see a lot of
those on the logs. However: not in all cases you dont want to log handled
(fatal) signals. Not sure how to detect those. Maybe it is better to fix
applications and remove their home grown crash logging.

Gruss
Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] log fatal signals like SIGSEGV
  2008-09-21 19:15               ` Bernd Eckenfels
@ 2008-09-21 19:40                 ` Mikael Pettersson
  0 siblings, 0 replies; 13+ messages in thread
From: Mikael Pettersson @ 2008-09-21 19:40 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

Bernd Eckenfels writes:
 > In article <18646.39701.819497.337394@harpo.it.uu.se> you wrote:
 > >   Applications that handle SEGV/BUS/ILL/FPE aren't that uncommon.
 > 
 > JRE uses SIGSEGV for (expected) NullPointers. So you might see a lot of
 > those on the logs. However: not in all cases you dont want to log handled
 > (fatal) signals. Not sure how to detect those. Maybe it is better to fix
 > applications and remove their home grown crash logging.

I was referring to applications that properly handle these signals,
e.g. by emulation, fixing up fault-causing conditions, or turning
faults into application-level exceptions.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] log fatal signals like SIGSEGV
  2008-09-20 17:12           ` Thomas Jarosch
  2008-09-21 19:05             ` Mikael Pettersson
@ 2008-09-22 23:52             ` Jiri Kosina
  1 sibling, 0 replies; 13+ messages in thread
From: Jiri Kosina @ 2008-09-22 23:52 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: linux-kernel, Marcin Slusarz

On Sat, 20 Sep 2008, Thomas Jarosch wrote:

> ------------------------------------------------------
> From: Thomas Jarosch <thomas.jarosch@intra2net.com>
> 
> Log signals like SIGSEGV, SIGILL, SIGBUS or SIGFPE to aid tracing
> of obscure problems. Also logs the sender of the signal.
> 
> The log message looks like this:
> "kernel: signal 9 sent to freezed[2634] uid:100,
>  parent init[1] uid:0 by bash[3168] uid:0, parent sshd[3164] uid:0"

BTW be aware that for example x86 arch-specific code does this on its own, 
and therefore with your patch, the information will be duplicated. See 
page fault handler for x86.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] log fatal signals like SIGSEGV
  2008-09-21 19:05             ` Mikael Pettersson
  2008-09-21 19:15               ` Bernd Eckenfels
@ 2008-10-06  8:53               ` Thomas Jarosch
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Jarosch @ 2008-10-06  8:53 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-kernel, Marcin Slusarz, Jiri Kosina

Hello Mikael,

>  > Log signals like SIGSEGV, SIGILL, SIGBUS or SIGFPE to aid tracing
>  > of obscure problems. Also logs the sender of the signal.
>
> I believe the approach taken in this patch is broken:
>
> 1. The signal logging decision is taken before signal delivery,
>    which causes *handled* signals in the above list to be logged.
>    So your printk_ratelimit() can be swamped by handled signals
>    causing it to not log unhandled fatal signals.
>
>    Applications that handle SEGV/BUS/ILL/FPE aren't that uncommon.
>
> 2. Fatal signals are only interesting if they are self-generated.
>    Signals sent from other processes or threads are uninteresting,
>    if the purpose is to detect program errors or faulty hardware.

Thanks for your review. We already run the new patch on 500+ boxes and
didn't get any complaints about noisy messages in the logs (yet?).
Some of those boxes run "logcheck" and generate a daily report,
so there should be -something-.

I'm not sure if separating between kernel and process generated signals
makes much of a difference as there should be no log output anyway.
If you can provide me input that this will in fact generate noisy output,
I'll happily change the code.

> 3. Similar functionality already exists in the kernel, except
>    it correctly runs much later in the signal delivery path.
>    Grep for print_fatal_signals and show_unhandled_signals.

print_fatal_signals is debug-only, see the mails
from the first review phase about that.

show_unhandled_signals seems to be implemented on x86 and PPC only.

Concerning x86: Both pieces of code are integrated in 
arch/x86/kernel/traps_32.c: do_general_proection().

Does this code path also get called for SIGABRT or SIGFPE?

>    There's also some trace hooks in the signal delivery path
>    that look like they could log actual fatal signals.

Do you have a particular one in mind?

[Jiri Kosina wrote]
> BTW be aware that for example x86 arch-specific code does this on its own, 
> and therefore with your patch, the information will be duplicated. See 
> page fault handler for x86.

Yes, I like that. The new code is architecture-independant,
perhaps the architecture-dependant code could even be obsoleted
and all platforms would benefit from the new logging.

Thomas


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-10-06  8:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-12 13:02 RFC: [patch] log fatal signals like SIGSEGV Thomas Jarosch
2008-09-12 17:11 ` Marcin Slusarz
2008-09-16 12:59   ` Thomas Jarosch
2008-09-16 17:42     ` Marcin Slusarz
2008-09-17  8:12       ` Thomas Jarosch
2008-09-18 10:10       ` Thomas Jarosch
2008-09-18 20:20         ` Marcin Slusarz
2008-09-20 17:12           ` Thomas Jarosch
2008-09-21 19:05             ` Mikael Pettersson
2008-09-21 19:15               ` Bernd Eckenfels
2008-09-21 19:40                 ` Mikael Pettersson
2008-10-06  8:53               ` Thomas Jarosch
2008-09-22 23:52             ` Jiri Kosina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).