linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* crashme on ARM - unkillable processes
@ 2003-11-09 11:43 Russell King
  2003-11-09 12:12 ` Russell King
  2003-11-09 17:43 ` Linus Torvalds
  0 siblings, 2 replies; 5+ messages in thread
From: Russell King @ 2003-11-09 11:43 UTC (permalink / raw)
  To: Linux Kernel List

Hi,

It seems that running crashme on ARM running 2.6.0-test8 with the
following command:

# crashme +2000.4 666 100 1:0:0

results in crashme not running for very long before it locks up thusly:

Subprocess 2: try 20, offset 80
Subprocess 2: Got signal 4 illegal instruction
Subprocess 2: Barfed
Subprocess 2: try 21, offset 84
Subprocess 2: Got signal 11 segmentation violation
Subprocess 2: Barfed
Subprocess 2: try 22, offset 88
time limit reached on pid 1704 0x6A8. using kill.

At this point, PID1704 refuses to die.

Looking at the output of sysrq-p and sysrq-t, it would appear that the
subprocess is receiving SIGILL after SIGILL after SIGILL, virtually
continuously.

I suspect that either crashme's signal handler got corrupted somehow,
or else the longjmp out of the handler is not allowing the next signal
to be dequeued.

Looking at next_signal(), the kernel treats signals 1-8 as having higher
priority than signal 9.  Since we only ever dequeue one signal on return
to user space, we always find the SIGILL before SIGKILL, and the kill
signal remains indefinitely queued.  When considering the situations
where we deliver signals to processes, this means that if a process
makes both no further system calls and receives no interrupts between
SIGILL delivery and the next SIGILL being generated, the SIGKILL will
not be delivered.

I, therefore, put it to linux-kernel that this is a potential DoS
attack.  A normal user space program can fork() multiple instances
of itself, and then use this technique to place a heavy CPU intensive
load on the machine, and the result could be a hundred or so unkillable
processes.

Resource limits on the number of processes a user can create limit the
effect, but you will still end up with a number of unkillable processes
at the end of the day.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crashme on ARM - unkillable processes
  2003-11-09 11:43 crashme on ARM - unkillable processes Russell King
@ 2003-11-09 12:12 ` Russell King
  2003-11-09 17:43 ` Linus Torvalds
  1 sibling, 0 replies; 5+ messages in thread
From: Russell King @ 2003-11-09 12:12 UTC (permalink / raw)
  To: Linux Kernel List

On Sun, Nov 09, 2003 at 11:43:22AM +0000, Russell King wrote:
> Subprocess 2: Barfed
> Subprocess 2: try 22, offset 88
> time limit reached on pid 1704 0x6A8. using kill.
> 
> At this point, PID1704 refuses to die.
> 
> Looking at the output of sysrq-p and sysrq-t, it would appear that the
> subprocess is receiving SIGILL after SIGILL after SIGILL, virtually
> continuously.

A little more information:

After trying to send SIGINT and SIGHUP, as well as SIGKILL,
/proc/*/status contains the following:

SigPnd: 0000000000000408
ShdPnd: 0000000000006103
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 00000000000020fa

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crashme on ARM - unkillable processes
  2003-11-09 11:43 crashme on ARM - unkillable processes Russell King
  2003-11-09 12:12 ` Russell King
@ 2003-11-09 17:43 ` Linus Torvalds
  2003-11-09 20:04   ` Russell King
  1 sibling, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2003-11-09 17:43 UTC (permalink / raw)
  To: Russell King; +Cc: Linux Kernel List


On Sun, 9 Nov 2003, Russell King wrote:
> 
> Looking at next_signal(), the kernel treats signals 1-8 as having higher
> priority than signal 9.  Since we only ever dequeue one signal on return
> to user space, we always find the SIGILL before SIGKILL, and the kill
> signal remains indefinitely queued.

Interesting. I wonder why it shows up only now. We've run crashme as a 
sanity-test before, and I don't think this is a new thing..

[ Duh dumm.. ]

Ok, I know... I think we used to queue up _all_ the signals onto the stack
frame before. We don't do that any more, and back when we did it we'd
notice that one of the signals was deadly, and just kill the process.

We can't do that any more, because with thread-shared signals one thread 
should _not_ try to hog all pending signals.

This is definitely a bug. I'd be inclined to just special-case SIGKILL in 
next_signal(). Better ideas?

		Linus


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crashme on ARM - unkillable processes
  2003-11-09 17:43 ` Linus Torvalds
@ 2003-11-09 20:04   ` Russell King
  2003-11-09 20:42     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King @ 2003-11-09 20:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel List

On Sun, Nov 09, 2003 at 09:43:27AM -0800, Linus Torvalds wrote:
> 
> On Sun, 9 Nov 2003, Russell King wrote:
> > 
> > Looking at next_signal(), the kernel treats signals 1-8 as having higher
> > priority than signal 9.  Since we only ever dequeue one signal on return
> > to user space, we always find the SIGILL before SIGKILL, and the kill
> > signal remains indefinitely queued.
> 
> Interesting. I wonder why it shows up only now. We've run crashme as a 
> sanity-test before, and I don't think this is a new thing..

Ok, I've been doing a bit more digging to work out what's going on.

The code which crashme generated corrupted the user stack pointer.  We
then tried to deliver a signal, found the user stack pointer invalid,
and tried to deliver a SEGV to the process via force_sig().  Unfortunately,
this signal never made it through for the reasons described previously.
(We dequeued the ILL, found we couldn't setup the stack frame, force_sig,
returned to userspace, generated another undefined instruction exception
on the same instruction, etc.)

So, not only is userspace not able to kill off processes with SIGKILL,
but the system can't kill off a process with a seriously corrupt stack.

Should the signal code be using something more forceful than force_sig()
(ie, something which is guaranteed to work) when the stack is corrupted ?

> This is definitely a bug. I'd be inclined to just special-case SIGKILL in 
> next_signal(). Better ideas?

>From the above, I think the problem is a little larger than just SIGKILL.

I think this problem happens on ARM because when we return to user space
after being unable to deliver a signal, we try to re-execute the illegal
instruction.  I believe on x86 this does not occur - it returns to the
next instruction.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crashme on ARM - unkillable processes
  2003-11-09 20:04   ` Russell King
@ 2003-11-09 20:42     ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2003-11-09 20:42 UTC (permalink / raw)
  To: Russell King; +Cc: Linux Kernel List


On Sun, 9 Nov 2003, Russell King wrote:
> 
> The code which crashme generated corrupted the user stack pointer.  We
> then tried to deliver a signal, found the user stack pointer invalid,
> and tried to deliver a SEGV to the process via force_sig().  Unfortunately,
> this signal never made it through for the reasons described previously.
> (We dequeued the ILL, found we couldn't setup the stack frame, force_sig,
> returned to userspace, generated another undefined instruction exception
> on the same instruction, etc.)

Ahh. I think I found why ARM has this problem, and others don't.

Your SA_NODEFER handling is broken.

The thing is, you only block a signal if its stack frame was successfully 
done _and_ SA_NODEFER is not set.

It should be the other way around. You should block a signal if it's stack 
frame was unsuccessful _or_ SA_NODEFER was not set.

(x86 gets this wrong too, in the sense that we don't even check to see if
the stack frame was successful - but since nobody sets SA_NODEFER anyway,
we don't really much care).

		Linus


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-11-09 20:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-09 11:43 crashme on ARM - unkillable processes Russell King
2003-11-09 12:12 ` Russell King
2003-11-09 17:43 ` Linus Torvalds
2003-11-09 20:04   ` Russell King
2003-11-09 20:42     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).