linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Instability in kernel version 2.6.12.5
@ 2005-10-06 17:30 Justin R. Smith
  2005-10-06 18:24 ` Linus Torvalds
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Justin R. Smith @ 2005-10-06 17:30 UTC (permalink / raw)
  To: linux-kernel

I recently converted my web server from FreeBSD to Gentoo Linux and am 
running the 'vanilla' kernel, version 2.5.12.5 (the latest vanilla 
kernel one can emerge from Gentoon without hacking the package.keywords 
file).

Info on my system:
--------------------------------------------------------------------------------
Linux vorpal.math.drexel.edu 2.6.12.5 #1 SMP Wed Oct 5 16:04:20 EDT 2005 
i686 In
tel(R) Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux
 
Gnu C                  3.3.6
Gnu make               3.80
binutils               2.15.92.0.2
util-linux             2.12r
mount                  2.12r
module-init-tools      3.0
e2fsprogs              1.38
reiserfsprogs          3.6.19
reiser4progs           line
Linux C Library        2.3.5
Dynamic linker (ldd)   2.3.5
Procps                 3.2.5
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   068
Modules Loaded         ipt_LOG ipt_state iptable_nat lp snd_pcm_oss 
snd_mixer_os
s snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device e1000 
snd_intel8x0 snd_a
c97_codec snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp 
iptable_mangl
e iptable_filter ipt_ttl ipt_tos ipt_tcpmss ipt_sctp ipt_recent 
ipt_realm ipt_pk
ttype ipt_owner ipt_multiport ipt_mark ipt_mac ipt_limit ipt_length 
ipt_iprange 
ipt_hashlimit ipt_esp ipt_ecn ipt_dscp ipt_comment ipt_ah ipt_addrtype 
ipt_TOS i
pt_MARK ipt_ECN ipt_DSCP ipt_CLASSIFY arptable_filter arpt_mangle 
arp_tables ip_
conntrack ip_tables
------------------------------------------------------------

Additional info: I'm running a firewall that closes all ports except 22, 
80, 443 and high ports (so I can ftp).

After running for 24 hours, I discovered that the system was 'funky'.

funky=

1. the clock is frozen at about 2331 the previous night. Setting it is 
possible, but it remains frozen at whatever time one set it to.

2. Any X app one starts hangs.

3. Many operations take an extraordinarily long time. Rebooting the 
system too > 30 minutes (all spent shutting down. The restart was at the 
normal speed).


Examining the system logs disclosed that someone attempted to hack my 
system at 2331 (the time the clock was frozen at) by trying to initiate 
about 200 ssh connections with randomly generated user ids over a very 
short time (a few seconds).


I can easily modify the firewall to block the incoming connections, but 
this strikes me as showing an instability in the Linux kernel: 
initiating a large number of failed ssh connections should not be able 
to corrupt the kernel.

Any suggestions?


Thank you!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instability in kernel version 2.6.12.5
  2005-10-06 17:30 Instability in kernel version 2.6.12.5 Justin R. Smith
@ 2005-10-06 18:24 ` Linus Torvalds
  2005-10-06 20:23   ` David S. Miller
  2005-10-07 16:41 ` Romano Giannetti
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2005-10-06 18:24 UTC (permalink / raw)
  To: Justin R. Smith, David S. Miller; +Cc: Linux Kernel Mailing List



On Thu, 6 Oct 2005, Justin R. Smith wrote:
> 
> funky=
> 
> 1. the clock is frozen at about 2331 the previous night. Setting it is
> possible, but it remains frozen at whatever time one set it to.
> 
> 2. Any X app one starts hangs.
> 
> 3. Many operations take an extraordinarily long time. Rebooting the system too
> > 30 minutes (all spent shutting down. The restart was at the normal speed).

This all sounds like there are no timer interrupts happening (or they are 
extremely slowed down). The "X app" thing is likely because a lot of X 
apps end up doing things that are itimer-related and do gettimeofday() for 
X events. I assume things like "sleep 1" also hung forever..

It could also be that some networking stuff corrupted the timers somehow - 
maybe the interrupt happens, but longer timers end up being infinitely 
delayed by the timer just never triggering. There was some report of 
double-added neighbor entry timers recently.

> I can easily modify the firewall to block the incoming connections, but this
> strikes me as showing an instability in the Linux kernel: initiating a large
> number of failed ssh connections should not be able to corrupt the kernel.
> 
> Any suggestions?

I don't recall any similar bug-reports. Did you have any kernel events 
printed at all during this time? 

If you can trigger it again, it would be very interesting to see if 
/proc/timer indicates that the timer interrupt happens, just to check 
that. Also, to see if there's some timer list corruption, testign whether 
"sleep 1" is broken (the timers are also sorted according to how long they 
are, so testing other timeouts can be interesting too).

For example, maybe some networking timer just changes its "expires" field 
_while_ a timer is active directly rather than using "mod_timer()", which 
can screw up the sorting - and that can affect other timers.

If you can figure out a way to re-create this, people would be very 
interested, I bet.

			Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instability in kernel version 2.6.12.5
  2005-10-06 18:24 ` Linus Torvalds
@ 2005-10-06 20:23   ` David S. Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David S. Miller @ 2005-10-06 20:23 UTC (permalink / raw)
  To: torvalds; +Cc: jsmith, linux-kernel

From: Linus Torvalds <torvalds@osdl.org>
Date: Thu, 6 Oct 2005 11:24:25 -0700 (PDT)

> For example, maybe some networking timer just changes its "expires" field 
> _while_ a timer is active directly rather than using "mod_timer()", which 
> can screw up the sorting - and that can affect other timers.

We actually investigated a possible case of this recently.

It was thought that perhaps it was possible for the ARP
generic neighbour cache to double-add a timer, so we added
a guard by converting it to mod_timer() from add_timer()
and making sure it always returns "0" (timer not active).

static inline void neigh_add_timer(struct neighbour *n, unsigned long when)
{
	if (unlikely(mod_timer(&n->timer, when))) {
		printk("NEIGH: BUG, double timer add, state is %x\n",
		       n->nud_state);
	}
}

But this new debugging hasn't triggered for anyone yet :-)

In general the networking tends to use mod_timer() exclusively, for
the simple reason that this makes refcounting on the object so much
simpler.  For example, all of the socket timer helpers do stuff like
this:

void sk_reset_timer(struct sock *sk, struct timer_list* timer,
		    unsigned long expires)
{
	if (!mod_timer(timer, expires))
		sock_hold(sk);
}

void sk_stop_timer(struct sock *sk, struct timer_list* timer)
{
	if (timer_pending(timer) && del_timer(timer))
		__sock_put(sk);
}

I have no idea what the situation is in the netfilter bits, but
something similar is likely.

This brings me to a topic I'd like addressed.  add_timer() no longer
checks whether it is adding a timer twice or not.

This debugging functionality got lost when add_timer() was changed
to be implemented in terms of __mod_timer().  I really think adding
back a "BUG_ON(timer_pending(timer)" would be a very good idea.
I believe Andrew Morton even added this bug check into his -mm tree
last time I brought this issue up.

Finally, it could be argued that add_timer() is not really a necessary
interface and that one can do whatever they need to purely using
mod_timer().  mod_timer() is kind of like a NAND gate I suppose :-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instability in kernel version 2.6.12.5
  2005-10-06 17:30 Instability in kernel version 2.6.12.5 Justin R. Smith
  2005-10-06 18:24 ` Linus Torvalds
@ 2005-10-07 16:41 ` Romano Giannetti
  2005-10-08 21:38 ` Nathan Lynch
  2005-10-10  2:53 ` Herbert Xu
  3 siblings, 0 replies; 6+ messages in thread
From: Romano Giannetti @ 2005-10-07 16:41 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 06, 2005 at 01:30:27PM -0400, Justin R. Smith wrote:
> 
> Examining the system logs disclosed that someone attempted to hack my 
> system at 2331 (the time the clock was frozen at) by trying to initiate 
> about 200 ssh connections with randomly generated user ids over a very 
> short time (a few seconds).
> 

Probably not the casue. It happens quite often here (followed by me doing a
iptable blocking on the IP and sending a reclamation to the preovider, with
no answer unfortunately) and I never saw problems like the one you describe. 

Version is Linux 2.6.11-12mdkcustom, but it happened with vanilla kernel(s)
too, same behavior here. 

HTH,
    Romano


-- 
Romano Giannetti             -  Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416  fax +34 915 596 569

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instability in kernel version 2.6.12.5
  2005-10-06 17:30 Instability in kernel version 2.6.12.5 Justin R. Smith
  2005-10-06 18:24 ` Linus Torvalds
  2005-10-07 16:41 ` Romano Giannetti
@ 2005-10-08 21:38 ` Nathan Lynch
  2005-10-10  2:53 ` Herbert Xu
  3 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2005-10-08 21:38 UTC (permalink / raw)
  To: Justin R. Smith; +Cc: linux-kernel

Justin R. Smith wrote:
> funky=
> 
> 1. the clock is frozen at about 2331 the previous night. Setting it is 
> possible, but it remains frozen at whatever time one set it to.
> 
> 2. Any X app one starts hangs.
> 
> 3. Many operations take an extraordinarily long time. Rebooting the 
> system too > 30 minutes (all spent shutting down. The restart was at the 
> normal speed).

I saw behavior quite similar to this on a P4 workstation a few months
ago -- after about 24 hours the system would get all "funky" and
/proc/interrupts showed that timer interrupts had slowed to a trickle.
Updating the BIOS fixed it.


Nathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Instability in kernel version 2.6.12.5
  2005-10-06 17:30 Instability in kernel version 2.6.12.5 Justin R. Smith
                   ` (2 preceding siblings ...)
  2005-10-08 21:38 ` Nathan Lynch
@ 2005-10-10  2:53 ` Herbert Xu
  3 siblings, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2005-10-10  2:53 UTC (permalink / raw)
  To: Justin R. Smith; +Cc: linux-kernel

Justin R. Smith <jsmith@drexel.edu> wrote:
> 
> Examining the system logs disclosed that someone attempted to hack my 
> system at 2331 (the time the clock was frozen at) by trying to initiate 
> about 200 ssh connections with randomly generated user ids over a very 
> short time (a few seconds).
> 
> Any suggestions?

Try booting with nolapic.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-10-10  2:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-06 17:30 Instability in kernel version 2.6.12.5 Justin R. Smith
2005-10-06 18:24 ` Linus Torvalds
2005-10-06 20:23   ` David S. Miller
2005-10-07 16:41 ` Romano Giannetti
2005-10-08 21:38 ` Nathan Lynch
2005-10-10  2:53 ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).