linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* POSIX timer deletion race?
@ 2007-07-13 18:19 Jeremy Katz
  2007-07-17  8:53 ` [PATCH] posix-timer: fix deletion race Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-13 18:19 UTC (permalink / raw)
  To: linux-kernel

[also sent to George Anzinger]

Hi,

I'm seeing a problem related to deleting posix timers.

Using a moderately patched 2.6.14 kernel, I am encountering a trio of BUGs 
involving the posix timers.  This is running on an SMP xeon (Intel MPCBL 
0030 and 0001) system with 200 threads managing a large number of timers.

The issue appears to be that itimer_delete() and sys_timer_delete() both do 
something along the lines of this:

sys_timer_delete(timer_t timer_id)
{
         ...
 	unlock_timer(timer, flags);
 	release_posix_timer(timer, IT_ID_SET);
         return 0;

As such, there's a window of opportunity for something else to grab the timer 
and attempt an operation of one sort or another.  Eventually, either 
sigqueue_free() or send_sigqueue() detect that the signal queue has been 
deleted, and call BUG_ON:

kernel/signal.c:
void sigqueue_free(struct sigqueue *q)
{
         unsigned long flags;
         BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
         ...

Reading the kernel.org pristine 2.6.22, the same sequence occurs.

kernel BUG at <bad filename>:52971!
invalid operand: 0000 [#1]
SMP
LTT NESTING LEVEL : 0
Modules linked in: ipmi_watchdog ipmi_si ipmi_devintf ipmi_msghandler 
softdog b
in
fmt_misc video thermal processor fan button battery ac sctp uhci_hcd 
usbcore ip
6_
tables ip_tables ipv6
CPU:    0
EIP:    0060:[<c0138fdb>]    Not tainted VLI
EFLAGS: 00010246   (2.6.14.7-selinux1-WR1.4aq_cgl)
EIP is at sigqueue_free+0x3b/0x7d
eax: 00000000   ebx: de28f9c4   ecx: ddcd9e60   edx: 00000292
esi: de3336f8   edi: 00000000   ebp: df829f90   esp: df829f8c
ds: 007b   es: 007b   ss: 0068
Process hrtm_test (pid: 16282, threadinfo=df829000 task=dfe9d350)
Stack: 00000292 df829fa0 c013ff57 df829000 de3336f8 df829fb4 c0140802 
00000286
        000003b9 00000000 df829000 c039e3ec 000003b9 4a248ff4 5a019aa0 
00000000
        00000000 b42fc3c8 00000107 c039007b 0000007b 00000107 4a2460c0 
00000073
Call Trace:
  [<c0103fad>] show_stack+0x7a/0x90
  [<c010412b>] show_registers+0x14f/0x1c7
  [<c010516b>] die+0x11a/0x195
  [<c039ff79>] do_trap+0x1991/0x205b
  [<c0105417>] do_invalid_op+0xa3/0xad
  [<c0103cdb>] error_code+0x4f/0x54
  [<c013ff57>] release_posix_timer+0x1b/0x7b
  [<c0140802>] sys_timer_delete+0xd9/0x10f
  [<c039e3ec>] no_syscall_entry_trace+0xb/0xf
Code: 2e 83 e0 fe 89 43 0c 8b 83 90 00 00 00 f0 ff 48 0c 8b 83 90 00 00 00 
e8 6
6
df ff ff 89 da a1 e0 ea 4b c0 e8 69 56 02 00 5b 5d c3 <0f> 0b eb ce b8 00 
3e 46
  c
0 e8 61 50 26 00 8b 43 08 e8 69 50 26


The other two occur via calls to itimer_delete(), and posix_timer_event(), in 
similar fashion.  Unfortunately, I don't have test code that I can share.


Version details:
Linux thermopolis-1 
2.6.14.7-selinux1-WR1.4aq_cgl #18 SMP Thu 
Jul 12 20:03:02 PDT 2007 i686 GNU/Linux

Gnu C                  3.4.4
util-linux             2.12
mount                  2.12
module-init-tools      3.2.2
e2fsprogs              1.38
reiserfsprogs          3.6.19
xfsprogs               2.7.11
PPP                    2.4.3
Linux C Library        2.3.6
Dynamic linker (ldd)   2.3.6
Linux C++ Library      6.0.3
Procps                 3.2.6
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   079
Modules Loaded         ipmi_watchdog 
sd_mod ipmi_si ipmi_devintf 
ipmi_msghandler softdog binfmt_misc video 
thermal processor fan button battery ac 
usb_storage scsi_mod sctp uhci_hcd usbcore 
ip6_tables ip_tables ipv6


-------------------------------- 
Jeremy Katz, Senior Engineer, Wind River
Telephone: +1 510 749 2901

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] posix-timer: fix deletion race
  2007-07-13 18:19 POSIX timer deletion race? Jeremy Katz
@ 2007-07-17  8:53 ` Thomas Gleixner
  2007-07-17  9:57   ` Ingo Molnar
                     ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Thomas Gleixner @ 2007-07-17  8:53 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is
caused by a subtle race, which is there since the original posix timer
commit and persists until today.

timer_delete does:
lock_timer();
timer->it_process = NULL;
unlock_timer();
release_posix_timer();

timer->it_process is checked in lock_timer() to prevent access to a
timer, which is on the way to be deleted, but the check happens after
idr_lock is dropped. This allows release_posix_timer() to delete the
timer before the lock code can check the timer:

CPU 0				CPU 1
lock_timer();			
timer->it_process = NULL;
unlock_timer();
				lock_timer()
					spin_lock(idr_lock);
					timer = idr_find();
					spin_lock(timer->lock);
					spin_unlock(idr_lock);
release_posix_timer();
	spin_lock(idr_lock);
	idr_remove(timer);
	spin_unlock(idr_lock);
	free_timer(timer);
					if (timer->......)

Change the locking to prevent this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 329ce01..c2ac6fd 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -605,13 +605,14 @@ static struct k_itimer * lock_timer(timer_t timer_id, unsigned long *flags)
 	timr = (struct k_itimer *) idr_find(&posix_timers_id, (int) timer_id);
 	if (timr) {
 		spin_lock(&timr->it_lock);
-		spin_unlock(&idr_lock);
 
 		if ((timr->it_id != timer_id) || !(timr->it_process) ||
 				timr->it_process->tgid != current->tgid) {
-			unlock_timer(timr, *flags);
+			spin_unlock(&timr->it_lock);
+			spin_unlock_irqrestore(&idr_lock, *flags);
 			timr = NULL;
-		}
+		} else
+			spin_unlock(&idr_lock);
 	} else
 		spin_unlock_irqrestore(&idr_lock, *flags);
 



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17  8:53 ` [PATCH] posix-timer: fix deletion race Thomas Gleixner
@ 2007-07-17  9:57   ` Ingo Molnar
  2007-07-17 18:29     ` Jeremy Katz
  2007-07-17 13:07   ` Oleg Nesterov
  2007-07-17 18:39   ` Jeremy Katz
  2 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2007-07-17  9:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeremy Katz, linux-kernel, Andrew Morton, Oleg Nesterov, Stable Team


* Thomas Gleixner <tglx@linutronix.de> wrote:

> Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is
> caused by a subtle race, which is there since the original posix timer
> commit and persists until today.
> 
> timer_delete does:
> lock_timer();
> timer->it_process = NULL;
> unlock_timer();
> release_posix_timer();
> 
> timer->it_process is checked in lock_timer() to prevent access to a
> timer, which is on the way to be deleted, but the check happens after
> idr_lock is dropped. This allows release_posix_timer() to delete the
> timer before the lock code can check the timer:
> 
> CPU 0				CPU 1
> lock_timer();			
> timer->it_process = NULL;
> unlock_timer();
> 				lock_timer()
> 					spin_lock(idr_lock);
> 					timer = idr_find();
> 					spin_lock(timer->lock);
> 					spin_unlock(idr_lock);
> release_posix_timer();
> 	spin_lock(idr_lock);
> 	idr_remove(timer);
> 	spin_unlock(idr_lock);
> 	free_timer(timer);
> 					if (timer->......)
> 
> Change the locking to prevent this.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

nice one! The race looks pretty narrow - Jeremy, does your Xens have 
hyperthreading? (or are there any heavy SMI sources perhaps that could 
open up this race.) If not then there might be some other bug lurking in 
there as well.

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17  8:53 ` [PATCH] posix-timer: fix deletion race Thomas Gleixner
  2007-07-17  9:57   ` Ingo Molnar
@ 2007-07-17 13:07   ` Oleg Nesterov
  2007-07-17 17:02     ` Thomas Gleixner
  2007-07-17 18:39   ` Jeremy Katz
  2 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2007-07-17 13:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeremy Katz, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On 07/17, Thomas Gleixner wrote:
>
> Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is
> caused by a subtle race, which is there since the original posix timer
> commit and persists until today.
> 
> timer_delete does:
> lock_timer();
> timer->it_process = NULL;
> unlock_timer();
> release_posix_timer();
> 
> timer->it_process is checked in lock_timer() to prevent access to a
> timer, which is on the way to be deleted, but the check happens after
> idr_lock is dropped. This allows release_posix_timer() to delete the
> timer before the lock code can check the timer:
> 
> CPU 0				CPU 1
> lock_timer();			
> timer->it_process = NULL;
> unlock_timer();
> 				lock_timer()
> 					spin_lock(idr_lock);
> 					timer = idr_find();
> 					spin_lock(timer->lock);
> 					spin_unlock(idr_lock);
> release_posix_timer();
> 	spin_lock(idr_lock);
> 	idr_remove(timer);
> 	spin_unlock(idr_lock);
> 	free_timer(timer);
> 					if (timer->......)

This is funny. I do remember this bug was discussed a couple of years ago,
and the conclusion was "we can fix it later" :)

> --- a/kernel/posix-timers.c
> +++ b/kernel/posix-timers.c
> @@ -605,13 +605,14 @@ static struct k_itimer * lock_timer(timer_t timer_id, unsigned long *flags)
>  	timr = (struct k_itimer *) idr_find(&posix_timers_id, (int) timer_id);
>  	if (timr) {
>  		spin_lock(&timr->it_lock);
> -		spin_unlock(&idr_lock);
>  
>  		if ((timr->it_id != timer_id) || !(timr->it_process) ||
>  				timr->it_process->tgid != current->tgid) {
> -			unlock_timer(timr, *flags);
> +			spin_unlock(&timr->it_lock);
> +			spin_unlock_irqrestore(&idr_lock, *flags);
>  			timr = NULL;
> -		}
> +		} else
> +			spin_unlock(&idr_lock);
>  	} else
>  		spin_unlock_irqrestore(&idr_lock, *flags);

I think we can make a simpler patch,

--- posix-timers.c~	2007-06-29 14:45:04.000000000 +0400
+++ posix-timers.c	2007-07-17 16:59:45.000000000 +0400
@@ -449,6 +449,9 @@ static void release_posix_timer(struct k
 		idr_remove(&posix_timers_id, tmr->it_id);
 		spin_unlock_irqrestore(&idr_lock, flags);
 	}
+
+	spin_unlock_wait(tmr->it_lock);
+
 	sigqueue_free(tmr->sigq);
 	if (unlikely(tmr->it_process) &&
 	    tmr->it_sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))


What do you think? Instead of complicating the lock_timer(), release_posix_timer()
just makes sure that nobody can "use" tmr.

(Actually, I am not sure this is my idea, perhaps something like above was suggested
 by somebody else, I forgot the discussion completely).

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17 13:07   ` Oleg Nesterov
@ 2007-07-17 17:02     ` Thomas Gleixner
  0 siblings, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2007-07-17 17:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jeremy Katz, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On Tue, 2007-07-17 at 17:07 +0400, Oleg Nesterov wrote:
> I think we can make a simpler patch,
> 
> --- posix-timers.c~	2007-06-29 14:45:04.000000000 +0400
> +++ posix-timers.c	2007-07-17 16:59:45.000000000 +0400
> @@ -449,6 +449,9 @@ static void release_posix_timer(struct k
>  		idr_remove(&posix_timers_id, tmr->it_id);
>  		spin_unlock_irqrestore(&idr_lock, flags);
>  	}
> +
> +	spin_unlock_wait(tmr->it_lock);
> +
>  	sigqueue_free(tmr->sigq);
>  	if (unlikely(tmr->it_process) &&
>  	    tmr->it_sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
> 
> 
> What do you think? Instead of complicating the lock_timer(), release_posix_timer()
> just makes sure that nobody can "use" tmr.

Hmm, I prefer to fix that at the place where the race actually happens. 

	tglx



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17  9:57   ` Ingo Molnar
@ 2007-07-17 18:29     ` Jeremy Katz
  0 siblings, 0 replies; 24+ messages in thread
From: Jeremy Katz @ 2007-07-17 18:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Oleg Nesterov, Stable Team

On Tue, 17 Jul 2007, Ingo Molnar wrote:

> nice one! The race looks pretty narrow - Jeremy, does your Xens have
> hyperthreading? (or are there any heavy SMI sources perhaps that could
> open up this race.) If not then there might be some other bug lurking in
> there as well.

Affirmative.  2 cores, 2 hyperthreading units per.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17  8:53 ` [PATCH] posix-timer: fix deletion race Thomas Gleixner
  2007-07-17  9:57   ` Ingo Molnar
  2007-07-17 13:07   ` Oleg Nesterov
@ 2007-07-17 18:39   ` Jeremy Katz
  2007-07-17 20:17     ` Thomas Gleixner
  2 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-17 18:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team


On Tue, 17 Jul 2007, Thomas Gleixner wrote:

> Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is
> caused by a subtle race, which is there since the original posix timer
> commit and persists until today.
>
> timer_delete does:
> lock_timer();
> timer->it_process = NULL;
> unlock_timer();
> release_posix_timer();
>
> timer->it_process is checked in lock_timer() to prevent access to a
> timer, which is on the way to be deleted, but the check happens after
> idr_lock is dropped. This allows release_posix_timer() to delete the
> timer before the lock code can check the timer:
>
> CPU 0				CPU 1
> lock_timer();
> timer->it_process = NULL;
> unlock_timer();
> 				lock_timer()
> 					spin_lock(idr_lock);
> 					timer = idr_find();
> 					spin_lock(timer->lock);
> 					spin_unlock(idr_lock);
> release_posix_timer();
> 	spin_lock(idr_lock);
> 	idr_remove(timer);
> 	spin_unlock(idr_lock);
> 	free_timer(timer);
> 					if (timer->......)
>
> Change the locking to prevent this.

I tried the patch with my test case, but still see the issue.
Here's my explanation of the double free race:
CPU 0					CPU 1
sys_timer_delete():
 	lock_timer();
 	...
 	unlock_timer();			itimer_delete()
 	release_posix_timer():			spin_lock_irqsave(timer...)
 		...				...
 		sigqueue_free()			unlock_timer()
 		...				sigqueue_free()
 							BUG_ON(!(q->flags...


the timer fires during deletion:
CPU 0					CPU 1
sys_timer_delete():
 	lock_timer();
 	...
 	unlock_timer();			posix_timer_fn()
 	release_posix_timer():			spin_lock_irqsave(timer...)
 		...				...
 		sigqueue_free()			posix_timer_event()
 		...					...
 							send_sigqueue()
 								BUG_ON(!(q->flags

Currently, this appears to fix the problem.  The system survived 19 hours 
before I rebooted to try other things, while the I usually see at least 
one of the BUGs within 15 minutes.


--- linux-2.6.14-cgl.original/kernel/posix-timers.c	2007-07-11 16:06:47.000000000 -0700
+++ linux-2.6.14-cgl/kernel/posix-timers.c	2007-07-16 15:25:43.000000000 -0700
@@ -339,7 +339,9 @@ static int posix_timer_fn(void *data)
  	int si_private = 0;
  	int ret = HRTIMER_NORESTART;

-	spin_lock_irqsave(&timr->it_lock, flags);
+	if(lock_timer(timr->it_id, &flags) == NULL) {
+		return ret;
+	}

  	if (timr->it.real.interval.tv64 != 0)
  		si_private = ++timr->it_requeue_pending;
@@ -411,8 +413,10 @@ static void release_posix_timer(struct k
  {
  	if (it_id_set) {
  		unsigned long flags;
+		/* unmangle timer id */
+		int it_id = tmr->it_id & INT_MAX;
  		spin_lock_irqsave(&idr_lock, flags);
-		idr_remove(&posix_timers_id, tmr->it_id);
+		idr_remove(&posix_timers_id, it_id);
  		spin_unlock_irqrestore(&idr_lock, flags);
  	}
  	sigqueue_free(tmr->sigq);
@@ -545,8 +549,11 @@ sys_timer_create(const clockid_t which_c
  	 */

  out:
-	if (error)
+	if (error) {
+		/* mangle the timer id for the release */
+		new_timer->it_id |= ~INT_MAX;
  		release_posix_timer(new_timer, it_id_set);
+	}

  	return error;
  }
@@ -822,6 +829,8 @@ retry_delete:
  			put_task_struct(timer->it_process);
  		timer->it_process = NULL;
  	}
+	/* mangle the timer ID to prevent anyone else from finding it */
+	timer->it_id |= ~INT_MAX;
  	unlock_timer(timer, flags);
  	release_posix_timer(timer, IT_ID_SET);
  	return 0;
@@ -835,7 +844,9 @@ static inline void itimer_delete(struct
  	unsigned long flags;

  retry_delete:
-	spin_lock_irqsave(&timer->it_lock, flags);
+	/* timer already deleted? */
+	if (lock_timer(timer->it_id, &flags) == NULL)
+		return;

  	if (timer_delete_hook(timer) == TIMER_RETRY) {
  		unlock_timer(timer, flags);
@@ -851,6 +862,8 @@ retry_delete:
  			put_task_struct(timer->it_process);
  		timer->it_process = NULL;
  	}
+	/* mangle the timer ID to prevent anyone else from finding it */
+	timer->it_id |= ~INT_MAX;
  	unlock_timer(timer, flags);
  	release_posix_timer(timer, IT_ID_SET);
  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17 18:39   ` Jeremy Katz
@ 2007-07-17 20:17     ` Thomas Gleixner
  2007-07-17 23:12       ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2007-07-17 20:17 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote:
> I tried the patch with my test case, but still see the issue.
> Here's my explanation of the double free race:
> CPU 0					CPU 1
> sys_timer_delete():
>  	lock_timer();
>  	...
>  	unlock_timer();			itimer_delete()
>  	release_posix_timer():			spin_lock_irqsave(timer...)
>  		...				...
>  		sigqueue_free()			unlock_timer()
>  		...				sigqueue_free()
>  							BUG_ON(!(q->flags...

With 2.6.14 or with current mainline ?

I doubt that this can happen, as itimer_delete() is only called, when
there are no more references to the shared sig struct. At this point no
other related thread can run sys_timer_delete(). 

If this happens, then something else is completely broken.

> the timer fires during deletion:
> CPU 0					CPU 1
> sys_timer_delete():
>  	lock_timer();
>  	...
>  	unlock_timer();			posix_timer_fn()
>  	release_posix_timer():			spin_lock_irqsave(timer...)
>  		...				...
>  		sigqueue_free()			posix_timer_event()
>  		...					...
>  							send_sigqueue()
>  								BUG_ON(!(q->flags
> 

I do not buy that either. In sys_timer_delete() the active timer is
canceled. If we can not cancel it then we retry until the timer callback
has been processed. This is even true for 2.6.14 (pre hrtimer).

> --- linux-2.6.14-cgl.original/kernel/posix-timers.c	2007-07-11 16:06:47.000000000 -0700
> +++ linux-2.6.14-cgl/kernel/posix-timers.c	2007-07-16 15:25:43.000000000 -0700
> @@ -339,7 +339,9 @@ static int posix_timer_fn(void *data)
>   	int si_private = 0;
>   	int ret = HRTIMER_NORESTART;
> 
> -	spin_lock_irqsave(&timr->it_lock, flags);
> +	if(lock_timer(timr->it_id, &flags) == NULL) {
> +		return ret;
> +	}

This is wrong. You look at something which is not valid anymore, if your
theory is correct. You are just pampering over the real bug.

> @@ -835,7 +844,9 @@ static inline void itimer_delete(struct
>   	unsigned long flags;
> 
>   retry_delete:
> -	spin_lock_irqsave(&timer->it_lock, flags);
> +	/* timer already deleted? */
> +	if (lock_timer(timer->it_id, &flags) == NULL)
> +		return;

Same here.

Can you reproduce the problem against mainline + my patch applied ?

	tglx



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17 20:17     ` Thomas Gleixner
@ 2007-07-17 23:12       ` Jeremy Katz
  2007-07-17 23:58         ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-17 23:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Tue, 17 Jul 2007, Thomas Gleixner wrote:

> On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote:
>> I tried the patch with my test case, but still see the issue.
>> Here's my explanation of the double free race:
>> CPU 0					CPU 1
>> sys_timer_delete():
>>  	lock_timer();
>>  	...
>>  	unlock_timer();			itimer_delete()
>>  	release_posix_timer():			spin_lock_irqsave(timer...)
>>  		...				...
>>  		sigqueue_free()			unlock_timer()
>>  		...				sigqueue_free()
>>  							BUG_ON(!(q->flags...
>
> With 2.6.14 or with current mainline ?

I haven't been keeping notes quite as studiously as I should have been, 
but this just occurred with 2.6.22.1 + the hrt6 patch + your proposed fix:

------------[ cut here ]------------
Kernel BUG at c0125987 [verbose debug info unavailable]
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c0125987>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22-hrt1-WR1.4aq_cgl #1)
EIP is at sigqueue_free+0x23/0x72
eax: 00000000   ebx: f713a7d8   ecx: f79e25e0   edx: 00000202
esi: f6f4fa1c   edi: 00000283   ebp: f7207e54   esp: f7207e4c
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process hrtm_test (pid: 19369, ti=f7207000 task=f729ea50 task.ti=f7207000)
Stack: 00000202 f6f4fa1c f7207e64 c012c7ca f6f4fa1c f6f4fa24 f7207e78 
c012d110
        f77463c0 f77463fc 00000001 f7207e88 c012d14b f77463c0 f729ea50 
f7207eb4
        c011e57c f729eea4 00000006 f729ea50 f7207ea4 c01244e8 00000006 
f77463c0
Call Trace:
  [<c0103504>] show_trace_log_lvl+0x1a/0x30
  [<c01035bb>] show_stack_log_lvl+0x8d/0xaa
  [<c01037f5>] show_registers+0x1cd/0x2cb
  [<c0103a4a>] die+0x113/0x207
  [<c0395cf5>] do_trap+0x8f/0xc6
  [<c0103d35>] do_invalid_op+0x88/0x92
  [<c0395ac2>] error_code+0x72/0x78
  [<c012c7ca>] release_posix_timer+0x1b/0x7a
  [<c012d110>] itimer_delete+0x9b/0xc0
  [<c012d14b>] exit_itimers+0x16/0x21
  [<c011e57c>] do_exit+0x300/0x3e8
  [<c011e6b2>] do_group_exit+0x29/0x6f
  [<c012643a>] get_signal_to_deliver+0x210/0x298
  [<c010256d>] do_signal+0x5c/0x158
  [<c01026a5>] do_notify_resume+0x3c/0x3f
  [<c0102866>] work_notifysig+0x13/0x19

> I doubt that this can happen, as itimer_delete() is only called, when
> there are no more references to the shared sig struct. At this point no
> other related thread can run sys_timer_delete().

I came to the above explanation after failing to find another path that 
clears the SIGQUEUE_PREALLOC flag.  Some form of memory corruption could 
be to blame.

>> the timer fires during deletion:
>> CPU 0					CPU 1
>> sys_timer_delete():
>>  	lock_timer();
>>  	...
>>  	unlock_timer();			posix_timer_fn()
>>  	release_posix_timer():			spin_lock_irqsave(timer...)
>>  		...				...
>>  		sigqueue_free()			posix_timer_event()
>>  		...					...
>>  							send_sigqueue()
>>  								BUG_ON(!(q->flags
>>
>
> I do not buy that either. In sys_timer_delete() the active timer is
> canceled. If we can not cancel it then we retry until the timer callback
> has been processed. This is even true for 2.6.14 (pre hrtimer).
>
>> --- linux-2.6.14-cgl.original/kernel/posix-timers.c	2007-07-11 16:06:47.000000000 -0700
>> +++ linux-2.6.14-cgl/kernel/posix-timers.c	2007-07-16 15:25:43.000000000 -0700
>> @@ -339,7 +339,9 @@ static int posix_timer_fn(void *data)
>>   	int si_private = 0;
>>   	int ret = HRTIMER_NORESTART;
>>
>> -	spin_lock_irqsave(&timr->it_lock, flags);
>> +	if(lock_timer(timr->it_id, &flags) == NULL) {
>> +		return ret;
>> +	}
>
> This is wrong. You look at something which is not valid anymore, if your
> theory is correct. You are just pampering over the real bug.

Noticed right after I sent the patch out.  My goal was to validate and 
lock the timer through a single interface, but itimer_delete() and 
posix_timer_fn() currently recieve a k_itimer * instead of a timer_t.  I 
should have reworked them.

>> @@ -835,7 +844,9 @@ static inline void itimer_delete(struct
>>   	unsigned long flags;
>>
>>   retry_delete:
>> -	spin_lock_irqsave(&timer->it_lock, flags);
>> +	/* timer already deleted? */
>> +	if (lock_timer(timer->it_id, &flags) == NULL)
>> +		return;
>
> Same here.

Thanks for the reply,
Jeremy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17 23:12       ` Jeremy Katz
@ 2007-07-17 23:58         ` Jeremy Katz
  2007-07-18  6:05           ` Thomas Gleixner
  2007-07-18 16:11           ` Oleg Nesterov
  0 siblings, 2 replies; 24+ messages in thread
From: Jeremy Katz @ 2007-07-17 23:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Tue, 17 Jul 2007, Jeremy Katz wrote:

> On Tue, 17 Jul 2007, Thomas Gleixner wrote:
>
>> With 2.6.14 or with current mainline ?
>
> I haven't been keeping notes quite as studiously as I should have been, but 
> this just occurred with 2.6.22.1 + the hrt6 patch + your proposed fix:

Scratch that.  I had infrastructure problems, and ended up using the wrong 
build.

This is with the patch (and 2.6.22.1 and hrt6):

------------[ cut here ]------------
Kernel BUG at c0125adb [verbose debug info unavailable]
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU:    3
EIP:    0060:[<c0125adb>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22.1-WR1.4aq_cgl #2)
EIP is at sigqueue_free+0x23/0x72
eax: 00000000   ebx: f6eaf1a8   ecx: f6d4b888   edx: 00000202
esi: f73e2ba8   edi: 00000000   ebp: f7c7bf8c   esp: f7c7bf84
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process hrtm_test (pid: 15340, ti=f7c7b000 task=f6c0f030 task.ti=f7c7b000)
Stack: 00000202 f73e2ba8 f7c7bf9c c012c941 f73e2ba8 000002fb f7c7bfb0 
c012d1c0
        00000286 000002fb 00000000 f7c7b000 c01027ca 000002fb 466afff4 
08064190
        00000000 00000000 ae8d43c8 00000107 ffff007b c010007b 00000000 
00000107
Call Trace:
  [<c0103504>] show_trace_log_lvl+0x1a/0x30
  [<c01035bb>] show_stack_log_lvl+0x8d/0xaa
  [<c01037f5>] show_registers+0x1cd/0x2cb
  [<c0103a4a>] die+0x113/0x207
  [<c039aab5>] do_trap+0x8f/0xc6
  [<c0103d35>] do_invalid_op+0x88/0x92
  [<c039a882>] error_code+0x72/0x78
  [<c012c941>] release_posix_timer+0x1b/0x7a
  [<c012d1c0>] sys_timer_delete+0xd7/0x10c
  [<c01027ca>] syscall_call+0x7/0xb
  =======================
Code: 74 04 83 48 08 01 5d c3 55 89 e5 56 53 89 c3 8b 40 08 a8 01 74 15 39 
1b 75 15 83 e0 fe 89 43 08 89 d8 e8 1d ec ff ff 5b 5e 5d c3 <0f> 0b eb fe 
64 a1 00 c0 4a c0 8b b0 50 04 00 00 b8 00 1e 46 c0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17 23:58         ` Jeremy Katz
@ 2007-07-18  6:05           ` Thomas Gleixner
  2007-07-18  6:58             ` Thomas Gleixner
  2007-07-18 16:11           ` Oleg Nesterov
  1 sibling, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2007-07-18  6:05 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote:

> Scratch that.  I had infrastructure problems, and ended up using the wrong 
> build.

> EFLAGS: 00010246   (2.6.22.1-WR1.4aq_cgl #2)

Hmm. Are there any other patches on that kernel ?

Is there a chance that you can whip up a test program which is not
secret sauce ?

Also can you please enable CONFIG_PROVE_LOCKING, which might catch any
locking problem, which might be related to this.

	tglx



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-18  6:05           ` Thomas Gleixner
@ 2007-07-18  6:58             ` Thomas Gleixner
  2007-07-18 19:02               ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2007-07-18  6:58 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Wed, 2007-07-18 at 08:05 +0200, Thomas Gleixner wrote:
> On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote:
> 
> > Scratch that.  I had infrastructure problems, and ended up using the wrong 
> > build.
> 
> > EFLAGS: 00010246   (2.6.22.1-WR1.4aq_cgl #2)
> 
> Hmm. Are there any other patches on that kernel ?
> 
> Is there a chance that you can whip up a test program which is not
> secret sauce ?
> 
> Also can you please enable CONFIG_PROVE_LOCKING, which might catch any
> locking problem, which might be related to this.

Another test: Can you please disable CONFIG_SCHED_SMT to narrow it down
further ?

	tglx



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-17 23:58         ` Jeremy Katz
  2007-07-18  6:05           ` Thomas Gleixner
@ 2007-07-18 16:11           ` Oleg Nesterov
  2007-07-18 19:08             ` Jeremy Katz
  1 sibling, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2007-07-18 16:11 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On 07/17, Jeremy Katz wrote:
>
> This is with the patch (and 2.6.22.1 and hrt6):
> 
> ------------[ cut here ]------------
> Kernel BUG at c0125adb [verbose debug info unavailable]
> invalid opcode: 0000 [#1]
> SMP
> Modules linked in:
> CPU:    3
> EIP:    0060:[<c0125adb>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.22.1-WR1.4aq_cgl #2)
> EIP is at sigqueue_free+0x23/0x72
> eax: 00000000   ebx: f6eaf1a8   ecx: f6d4b888   edx: 00000202
> esi: f73e2ba8   edi: 00000000   ebp: f7c7bf8c   esp: f7c7bf84
> ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> Process hrtm_test (pid: 15340, ti=f7c7b000 task=f6c0f030 task.ti=f7c7b000)
> Stack: 00000202 f73e2ba8 f7c7bf9c c012c941 f73e2ba8 000002fb f7c7bfb0 
> c012d1c0
>        00000286 000002fb 00000000 f7c7b000 c01027ca 000002fb 466afff4 
> 08064190
>        00000000 00000000 ae8d43c8 00000107 ffff007b c010007b 00000000 
> 00000107
> Call Trace:
>  [<c0103504>] show_trace_log_lvl+0x1a/0x30
>  [<c01035bb>] show_stack_log_lvl+0x8d/0xaa
>  [<c01037f5>] show_registers+0x1cd/0x2cb
>  [<c0103a4a>] die+0x113/0x207
>  [<c039aab5>] do_trap+0x8f/0xc6
>  [<c0103d35>] do_invalid_op+0x88/0x92
>  [<c039a882>] error_code+0x72/0x78
>  [<c012c941>] release_posix_timer+0x1b/0x7a
>  [<c012d1c0>] sys_timer_delete+0xd7/0x10c
>  [<c01027ca>] syscall_call+0x7/0xb

This looks really impossible.

I believe I see another bug in sys_timer_create(), but it is not related
to this problem.

sys_timer_create() inserts a partly initialized new_timer into posix_timers_id
and drops idr_lock. Suppose that another thread does sys_timer_delete(). If it
sees ->it_process != NULL, lock_timer() succeeds.

If the timer was not fully initialized at this time (or another CPU sees the
result of STOREs out of order), we can have multiple probles.

list_del() may oops, we can do put_task_struct() before sys_timer_create()
does get_task_struct(), we may leak the task_struct if sys_timer_delete()
doesn't see SIGEV_THREAD_ID yet.


Jeremy, I agree with Thomas that your patch should not be right, but it
does make a difference. Perhaps this is just the timing, but who knows.
Could you add some printk's to be sure that lock_timer() actually fails
while it never should?

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-18  6:58             ` Thomas Gleixner
@ 2007-07-18 19:02               ` Jeremy Katz
  2007-07-18 23:43                 ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-18 19:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Wed, 18 Jul 2007, Thomas Gleixner wrote:

> On Wed, 2007-07-18 at 08:05 +0200, Thomas Gleixner wrote:
>> On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote:
>>
>>> EFLAGS: 00010246   (2.6.22.1-WR1.4aq_cgl #2)
>>
>> Hmm. Are there any other patches on that kernel ?

Just hrt6 and your proposed fix.  The build system I'm using appends the 
WR1.4aq_cgl, but isn't patching anything.

>> Is there a chance that you can whip up a test program which is not
>> secret sauce ?

I've taken a brief stab at it, but haven't produced anything successful 
thus far.  Unfortunately, I don't even have complete source to the test 
code, and the people that do have expressed an inability to release it due 
to contractual obligations.

>> Also can you please enable CONFIG_PROVE_LOCKING, which might catch any
>> locking problem, which might be related to this.
>
> Another test: Can you please disable CONFIG_SCHED_SMT to narrow it down
> further ?

I'll try both of these.

Jeremy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-18 16:11           ` Oleg Nesterov
@ 2007-07-18 19:08             ` Jeremy Katz
  2007-07-20 14:15               ` Oleg Nesterov
  0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-18 19:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On Wed, 18 Jul 2007, Oleg Nesterov wrote:

> Jeremy, I agree with Thomas that your patch should not be right, but it
> does make a difference. Perhaps this is just the timing, but who knows.
> Could you add some printk's to be sure that lock_timer() actually fails
> while it never should?

Agreed.

Unfortunately, adding any significant output appears to alter the 
situation to the point where the issue either does not occur, or takes 
significantly longer to surface.

Jeremy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-18 19:02               ` Jeremy Katz
@ 2007-07-18 23:43                 ` Jeremy Katz
  2007-07-19  5:50                   ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-18 23:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Wed, 18 Jul 2007, Jeremy Katz wrote:

> On Wed, 18 Jul 2007, Thomas Gleixner wrote:
>
>>> Also can you please enable CONFIG_PROVE_LOCKING, which might catch any
>>> locking problem, which might be related to this.
>> 
>> Another test: Can you please disable CONFIG_SCHED_SMT to narrow it down
>> further ?
>
> I'll try both of these.

I'm still seeing the sys_timer_delete version with your patch, and 
sys_timer_delete and posix_timer_event without. The itimer_delete version 
is currently missing in action, but getting any particular one to perform 
on demand is currently not in my power.


Jeremy



Stack dumps with 2.6.22.1 + hrt6:
------------[ cut here ]------------
Kernel BUG at c01266fb [verbose debug info unavailable]
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU:    1
EIP:    0060:[<c01266fb>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22.1-WR1.4aq_cgl #4)
EIP is at sigqueue_free+0x23/0x75
eax: 00000000   ebx: f6f5f868   ecx: 00000000   edx: 00000000
esi: f73d1614   edi: 00000000   ebp: f737ff8c   esp: f737ff84
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process hrtm_test (pid: 1442, ti=f737f000 task=f738b590 task.ti=f737f000)
Stack: 00000286 f73d1614 f737ff9c c012d771 f73d1614 0000067c f737ffb0 
c012dfff
        00000282 0000067c 00000000 f737f000 c0102802 0000067c 466afff4 
92607898
        00000000 00000000 936a43c8 00000107 c010007b 0000007b 00000000 
00000107
Call Trace:
  [<c01035c4>] show_trace_log_lvl+0x1a/0x30
  [<c010367b>] show_stack_log_lvl+0x8d/0xaa
  [<c01038b5>] show_registers+0x1cd/0x2cb
  [<c0103b0a>] die+0x113/0x207
  [<c03a925d>] do_trap+0x8f/0xc6
  [<c0103df5>] do_invalid_op+0x88/0x92
  [<c03a902a>] error_code+0x72/0x78
  [<c012d771>] release_posix_timer+0x1b/0x7a
  [<c012dfff>] sys_timer_delete+0xda/0x10f
  [<c0102802>] syscall_call+0x7/0xb
  =======================
Code: 74 04 83 48 08 01 5d c3 55 89 e5 56 53 89 c3 8b 40 08 a8 01 74 15 39 
1b 75 15 83 e0 fe 89 43 08 89 d8 e8 ed eb ff ff 5b 5e 5d c3 <0f> 0b eb fe 
64 a1 00f0 4f c0 8b b0 50 04 00 00 b8 00 3e 4b c0
EIP: [<c01266fb>] sigqueue_free+0x23/0x75 SS:ESP 0068:f737ff84


------------[ cut here ]------------
Kernel BUG at c012682d [verbose debug info unavailable]
invalid opcode: 0000 [#5]
SMP
Modules linked in:
CPU:    3
EIP:    0060:[<c012682d>]    Not tainted VLI
EFLAGS: 00010046   (2.6.22.1-WR1.4aq_cgl #4)
EIP is at send_sigqueue+0xe0/0xe8
eax: 00000020   ebx: f6d44c58   ecx: f71e8aa0   edx: f6d44c58
esi: f71e8aa0   edi: 00000020   ebp: c050cf84   esp: c050cf70
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process swapper (pid: 0, ti=c050c000 task=c2e00aa0 task.ti=c2e02000)
Stack: 00000000 f6f6bd9c f6f6bd94 f6f6bde0 f6d44ce4 c050cf94 c012d519 
f6f6bd94
        00000003 c050cfc0 c012d595 c2a41990 c2a41980 c050cfb4 00000046 
00000000
        00000282 c012d54d f6f6bde0 c2a41980 c050cfdc c0131c14 00000000 
c2a41a18
Call Trace:
  [<c01035c4>] show_trace_log_lvl+0x1a/0x30
  [<c010367b>] show_stack_log_lvl+0x8d/0xaa
  [<c01038b5>] show_registers+0x1cd/0x2cb
  [<c0103b0a>] die+0x113/0x207
  [<c03a925d>] do_trap+0x8f/0xc6
  [<c0103df5>] do_invalid_op+0x88/0x92
  [<c03a902a>] error_code+0x72/0x78
  [<c012d519>] posix_timer_event+0x71/0xa5
  [<c012d595>] posix_timer_fn+0x48/0xdd
  [<c0131c14>] run_hrtimer_softirq+0x5a/0xba
  [<c01211ac>] __do_softirq+0x7e/0xf9
  [<c0104708>] do_softirq+0x8c/0x101
  [<c0121292>] irq_exit+0x4b/0x4d
  [<c010f315>] smp_apic_timer_interrupt+0x2f/0x39
  [<c01032db>] apic_timer_interrupt+0x33/0x38
  [<c0100b5c>] mwait_idle+0x12/0x14
  [<c0100a11>] cpu_idle+0x7b/0x7d
  [<c04d06d5>] start_secondary+0xe8/0x30c
  [<00000000>] 0x0
  =======================
Code: 45 ec 01 00 00 00 eb c8 89 fa 89 f0 e8 a0 99 06 00 eb 93 81 7b 14 fe 
ff 01 00 75 13 83 43 1c 01 eb ae c7 45 ec ff ff ff ff eb b8 <0f> 0b eb fe 
0f 0b eb fe 55 89 e5 57 89 c7 56 89 ce 53 89 d3 83
EIP: [<c012682d>] send_sigqueue+0xe0/0xe8 SS:ESP 0068:c050cf70
Kernel panic - not syncing: Fatal exception in interrupt



With 2.6.22.1 + hrt6 + tglx's proposed fix:
------------[ cut here ]------------
Kernel BUG at c01266fb [verbose debug info unavailable]
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c01266fb>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22.1-WR1.4aq_cgl #5)
EIP is at sigqueue_free+0x23/0x75
eax: 00000000   ebx: f7349358   ecx: 00000000   edx: 00000000
esi: f7354194   edi: 00000000   ebp: f709af8c   esp: f709af84
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process hrtm_test (pid: 1424, ti=f709a000 task=f70d0ae0 task.ti=f709a000)
Stack: 00000286 f7354194 f709af9c c012d771 f7354194 00000190 f709afb0 
c012e00c
        00000282 00000190 00000000 f709a000 c0102802 00000190 466afff4 
8fa00fe0
        00000000 00000000 972753c8 00000107 ffff007b c010007b 00000000 
00000107
Call Trace:
  [<c01035c4>] show_trace_log_lvl+0x1a/0x30
  [<c010367b>] show_stack_log_lvl+0x8d/0xaa
  [<c01038b5>] show_registers+0x1cd/0x2cb
  [<c0103b0a>] die+0x113/0x207
  [<c03a926d>] do_trap+0x8f/0xc6
  [<c0103df5>] do_invalid_op+0x88/0x92
  [<c03a903a>] error_code+0x72/0x78
  [<c012d771>] release_posix_timer+0x1b/0x7a
  [<c012e00c>] sys_timer_delete+0xda/0x10f
  [<c0102802>] syscall_call+0x7/0xb
  =======================
Code: 74 04 83 48 08 01 5d c3 55 89 e5 56 53 89 c3 8b 40 08 a8 01 74 15 39 
1b 75 15 83 e0 fe 89 43 08 89 d8 e8 ed eb ff ff 5b 5e 5d c3 <0f> 0b eb fe 
64 a1 00 f0 4f c0 8b b0 50 04 00 00 b8 00 3e 4b c0
EIP: [<c01266fb>] sigqueue_free+0x23/0x75 SS:ESP 0068:f709af84

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-18 23:43                 ` Jeremy Katz
@ 2007-07-19  5:50                   ` Thomas Gleixner
  2007-07-19 19:24                     ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2007-07-19  5:50 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Wed, 2007-07-18 at 16:43 -0700, Jeremy Katz wrote:
> On Wed, 18 Jul 2007, Jeremy Katz wrote:
> 
> > On Wed, 18 Jul 2007, Thomas Gleixner wrote:
> >
> >>> Also can you please enable CONFIG_PROVE_LOCKING, which might catch any
> >>> locking problem, which might be related to this.
> >> 
> >> Another test: Can you please disable CONFIG_SCHED_SMT to narrow it down
> >> further ?
> >
> > I'll try both of these.
> 
> I'm still seeing the sys_timer_delete version with your patch, and 
> sys_timer_delete and posix_timer_event without. The itimer_delete version 
> is currently missing in action, but getting any particular one to perform 
> on demand is currently not in my power.

Ok, let me summarize:

2.6.22 + hrt6

Both problems are there whether CONFIG_SCHED_SMT is on or not

2.6.22 + hrt6 + posixtimer patch

Both problems are there whether CONFIG_SCHED_SMT is on
The timer callback problem is gone when CONFIG_SCHED_SMT is off

Correct ?

	tglx



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-19  5:50                   ` Thomas Gleixner
@ 2007-07-19 19:24                     ` Jeremy Katz
  0 siblings, 0 replies; 24+ messages in thread
From: Jeremy Katz @ 2007-07-19 19:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Oleg Nesterov, Stable Team

On Thu, 19 Jul 2007, Thomas Gleixner wrote:

> On Wed, 2007-07-18 at 16:43 -0700, Jeremy Katz wrote:
>> On Wed, 18 Jul 2007, Jeremy Katz wrote:
>>
>>> On Wed, 18 Jul 2007, Thomas Gleixner wrote:
>>>
>>>>> Also can you please enable CONFIG_PROVE_LOCKING, which might catch any
>>>>> locking problem, which might be related to this.
>>>>
>>>> Another test: Can you please disable CONFIG_SCHED_SMT to narrow it down
>>>> further ?
>>>
>>> I'll try both of these.
>>
>> I'm still seeing the sys_timer_delete version with your patch, and
>> sys_timer_delete and posix_timer_event without. The itimer_delete version
>> is currently missing in action, but getting any particular one to perform
>> on demand is currently not in my power.
>
> Ok, let me summarize:
>
> 2.6.22 + hrt6
>
> Both problems are there whether CONFIG_SCHED_SMT is on or not

If both means BUG triggered in sigqueue_free and send_sigqueue, then yes.

>
> 2.6.22 + hrt6 + posixtimer patch
>
> Both problems are there whether CONFIG_SCHED_SMT is on
> The timer callback problem is gone when CONFIG_SCHED_SMT is off

It turns out that the callback problem just took longer to show up with 
CONFIG_SCHED_SMT off.  I saw this when I came in this morning:

------------[ cut here ]------------
Kernel BUG at c012682d [verbose debug info unavailable]
invalid opcode: 0000 [#6]
SMP
Modules linked in:
CPU:    3
EIP:    0060:[<c012682d>]    Not tainted VLI
EFLAGS: 00010046   (2.6.22.1-WR1.4aq_cgl #5)
EIP is at send_sigqueue+0xe0/0xe8
eax: 00000020   ebx: f735e7d8   ecx: f73ea030   edx: f735e7d8
esi: f73ea030   edi: 00000020   ebp: c050cf84   esp: c050cf70
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process hrtm_test (pid: 11102, ti=c050c000 task=f6a2d550 task.ti=f6fb7000)
Stack: 00000000 f722c89c f722c894 f722c8e0 f735e864 c050cf94 c012d519 
f722c894
        00000025 c050cfc0 c012d595 c2a41990 c2a41980 c050cfb4 00000046 
00000000
        00000282 c012d54d f722c8e0 c2a41980 c050cfdc c0131c20 00000000 
c2a41a18
Call Trace:
  [<c01035c4>] show_trace_log_lvl+0x1a/0x30
  [<c010367b>] show_stack_log_lvl+0x8d/0xaa
  [<c01038b5>] show_registers+0x1cd/0x2cb
  [<c0103b0a>] die+0x113/0x207
  [<c03a926d>] do_trap+0x8f/0xc6
  [<c0103df5>] do_invalid_op+0x88/0x92
  [<c03a903a>] error_code+0x72/0x78
  [<c012d519>] posix_timer_event+0x71/0xa5
  [<c012d595>] posix_timer_fn+0x48/0xdd
  [<c0131c20>] run_hrtimer_softirq+0x5a/0xba
  [<c01211ac>] __do_softirq+0x7e/0xf9
  [<c0104708>] do_softirq+0x8c/0x101
  [<c0121292>] irq_exit+0x4b/0x4d
  [<c010f315>] smp_apic_timer_interrupt+0x2f/0x39
  [<c01032db>] apic_timer_interrupt+0x33/0x38
  [<c0125308>] __sigqueue_free+0x24/0x33
  [<c01266f7>] sigqueue_free+0x1f/0x75
  [<c012d771>] release_posix_timer+0x1b/0x7a
  [<c012e00c>] sys_timer_delete+0xda/0x10f
  [<c0102802>] syscall_call+0x7/0xb
  =======================
Code: 45 ec 01 00 00 00 eb c8 89 fa 89 f0 e8 ac 99 06 00 eb 93 81 7b 14 fe 
ff 01 00 75 13 83 43 1c 01 eb ae c7 45 ec ff ff ff ff eb b8 <0f> 0b eb fe 
0f 0b eb fe 55 89 e5 57 89 c7 56 89 ce 53 89 d3 83
EIP: [<c012682d>] send_sigqueue+0xe0/0xe8 SS:ESP 0068:c050cf70
Kernel panic - not syncing: Fatal exception in interrupt
BUG: spinlock lockup on CPU#2, hrtm_test/1355, f722c89c
  [<c01035c4>] show_trace_log_lvl+0x1a/0x30
  [<c01035ec>] show_trace+0x12/0x14
  [<c01036e6>] dump_stack+0x16/0x18
  [<c0275dca>] __spin_lock_debug+0xb3/0xc5
  [<c0275e31>] _raw_spin_lock+0x55/0x73
  [<c03a8ba6>] _spin_lock+0x32/0x38
  [<c012daa4>] lock_timer+0x3d/0xab
  [<c012df5d>] sys_timer_delete+0x2b/0x10f
  [<c0102802>] syscall_call+0x7/0xb
  =======================
BUG: spinlock lockup on CPU#0, hrtm_test/1374, c04918d0
BUG: spinlock lockup on CPU#1, hrtm_test/1387, c04918d0
  [<c01035c4>] show_trace_log_lvl+0x1a/0x30
  [<c01035ec>] show_trace+0x12/0x14
  [<c01036e6>] dump_stack+0x16/0x18
  [<c0275dca>] __spin_lock_debug+0xb3/0xc5
  [<c0275e31>] _raw_spin_lock+0x55/0x73
  [<c03a8961>] _spin_lock_irqsave+0x3f/0x4b
  [<c012da85>] lock_timer+0x1e/0xab
  [<c012df5d>] sys_timer_delete+0x2b/0x10f
  [<c0102802>] syscall_call+0x7/0xb
  =======================
  [<c01035c4>] show_trace_log_lvl+0x1a/0x30
  [<c01035ec>] show_trace+0x12/0x14
  [<c01036e6>] dump_stack+0x16/0x18
  [<c0275dca>] __spin_lock_debug+0xb3/0xc5
  [<c0275e31>] _raw_spin_lock+0x55/0x73
  [<c03a8961>] _spin_lock_irqsave+0x3f/0x4b
  [<c012da85>] lock_timer+0x1e/0xab
  [<c012df5d>] sys_timer_delete+0x2b/0x10f
  [<c0102802>] syscall_call+0x7/0xb
  =======================

Jeremy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-18 19:08             ` Jeremy Katz
@ 2007-07-20 14:15               ` Oleg Nesterov
  2007-07-24  2:07                 ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2007-07-20 14:15 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On 07/18, Jeremy Katz wrote:
>
> On Wed, 18 Jul 2007, Oleg Nesterov wrote:
> 
> >Jeremy, I agree with Thomas that your patch should not be right, but it
> >does make a difference. Perhaps this is just the timing, but who knows.
> >Could you add some printk's to be sure that lock_timer() actually fails
> >while it never should?
> 
> Agreed.
> 
> Unfortunately, adding any significant output appears to alter the 
> situation to the point where the issue either does not occur, or takes 
> significantly longer to surface.

No, no, I didn't mean any significant output. You changed itimer_delete()

	>  -       spin_lock_irqsave(&timer->it_lock, flags);
	>  +       /* timer already deleted? */
	>  +       if (lock_timer(timer->it_id, &flags) == NULL)
	>  +               return;

This change should not help, lock_timer() should always succeed here.
But since it makes a difference, we can make something like

	if (lock_timer(timer->it_id, &flags) == NULL) {
		printk("Impossible! but it happened.\n");
		return;
	}

The same for posix_timer_fn().


I still can't believe we have a double-free problem, this looks imposiible.
Do you see the

	"idr_remove called for id=%d which is not allocated.\n"

in syslog?


Could you try the patch below? Perhaps we have some wierd problem with
->sigq corruption.

Oleg.

--- t/kernel/posix-timers.c~	2007-06-29 14:45:04.000000000 +0400
+++ t/kernel/posix-timers.c	2007-07-20 18:07:59.000000000 +0400
@@ -443,13 +443,16 @@ static struct k_itimer * alloc_posix_tim
 #define IT_ID_NOT_SET	0
 static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
 {
+	sigqueue_free(tmr->sigq);
+	tmr->sigq = NULL;
+
 	if (it_id_set) {
 		unsigned long flags;
 		spin_lock_irqsave(&idr_lock, flags);
 		idr_remove(&posix_timers_id, tmr->it_id);
 		spin_unlock_irqrestore(&idr_lock, flags);
 	}
-	sigqueue_free(tmr->sigq);
+
 	if (unlikely(tmr->it_process) &&
 	    tmr->it_sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
 		put_task_struct(tmr->it_process);
@@ -615,6 +618,7 @@ static struct k_itimer * lock_timer(time
 	} else
 		spin_unlock_irqrestore(&idr_lock, *flags);
 
+	BUG_ON(timr && !timr->sigq);
 	return timr;
 }
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-20 14:15               ` Oleg Nesterov
@ 2007-07-24  2:07                 ` Jeremy Katz
  2007-07-24 14:51                   ` Oleg Nesterov
  0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-24  2:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On Fri, 20 Jul 2007, Oleg Nesterov wrote:

> On 07/18, Jeremy Katz wrote:
>>
>> On Wed, 18 Jul 2007, Oleg Nesterov wrote:
>>
>>> Jeremy, I agree with Thomas that your patch should not be right, but it
>>> does make a difference. Perhaps this is just the timing, but who knows.
>>> Could you add some printk's to be sure that lock_timer() actually fails
>>> while it never should?
>>
>> Agreed.
>>
>> Unfortunately, adding any significant output appears to alter the
>> situation to the point where the issue either does not occur, or takes
>> significantly longer to surface.
>
> No, no, I didn't mean any significant output. You changed itimer_delete()
>
> 	>  -       spin_lock_irqsave(&timer->it_lock, flags);
> 	>  +       /* timer already deleted? */
> 	>  +       if (lock_timer(timer->it_id, &flags) == NULL)
> 	>  +               return;
>
> This change should not help, lock_timer() should always succeed here.
> But since it makes a difference, we can make something like
>
> 	if (lock_timer(timer->it_id, &flags) == NULL) {
> 		printk("Impossible! but it happened.\n");
> 		return;
> 	}
>
> The same for posix_timer_fn().

Ahh, of course.  I did try that at some point, and remember seeing at 
least the occasional failure.  This time, taking the spinlock and then 
checking for a valid timer ID, I did not see the locking fail.  I did see 
the attempt to use a freed sigqueue, further suggesting my 'fix' merely 
altered the timing sufficiently to hide the issue.

> I still can't believe we have a double-free problem, this looks imposiible.
> Do you see the
>
> 	"idr_remove called for id=%d which is not allocated.\n"
>
> in syslog?

No.  I also added some accounting with atomic counters, and don't see 
evidence of a second call to release_posix_timer.

> Could you try the patch below? Perhaps we have some wierd problem with
> ->sigq corruption.

Tried, with apparent effect.

To add to the body of data: Turning off hyperthreading in the hardware 
does not resolve the issue.  Limiting the system to one CPU does appear to 
work.


Jeremy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-24  2:07                 ` Jeremy Katz
@ 2007-07-24 14:51                   ` Oleg Nesterov
  2007-07-24 18:36                     ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2007-07-24 14:51 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On 07/23, Jeremy Katz wrote:
>
> On Fri, 20 Jul 2007, Oleg Nesterov wrote:
> 
> >I still can't believe we have a double-free problem, this looks imposiible.
> >Do you see the
> >
> >	"idr_remove called for id=%d which is not allocated.\n"
> >
> >in syslog?
> 
> No.  I also added some accounting with atomic counters, and don't see 
> evidence of a second call to release_posix_timer.

Interesting. Could you show the patch? Where does sys_timer_create() set
counter == 1?

> >Could you try the patch below? Perhaps we have some wierd problem with
> >->sigq corruption.
> 
> Tried, with apparent effect.

You mean, "null pointer dereference" in release_posix_timer() ? Or it was
BUG_ON(timr && !timr->sigq) in lock_timer() ?

I assume you didn't apply any other patches except Thomas's, yes?

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-24 14:51                   ` Oleg Nesterov
@ 2007-07-24 18:36                     ` Jeremy Katz
  2007-07-24 20:43                       ` Oleg Nesterov
  0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Katz @ 2007-07-24 18:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On Tue, 24 Jul 2007, Oleg Nesterov wrote:

> On 07/23, Jeremy Katz wrote:
>>
>> On Fri, 20 Jul 2007, Oleg Nesterov wrote:
>>
>>> I still can't believe we have a double-free problem, this looks imposiible.
>>> Do you see the
>>>
>>> 	"idr_remove called for id=%d which is not allocated.\n"
>>>
>>> in syslog?
>>
>> No.  I also added some accounting with atomic counters, and don't see
>> evidence of a second call to release_posix_timer.
>
> Interesting. Could you show the patch? Where does sys_timer_create() set
> counter == 1?

--- kernel/posix-timers.c.old      2007-07-24 11:21:29.000000000 -0700
+++ kernel/posix-timers.c       2007-07-20 15:49:51.000000000 -0700
@@ -409,13 +409,17 @@ static struct k_itimer * alloc_posix_tim
  #define IT_ID_NOT_SET  0
  static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
  {
+       atomic_inc(&tmr->freecount);
@@ -443,6 +447,7 @@ sys_timer_create(const clockid_t which_c
         new_timer = alloc_posix_timer();
         if (unlikely(!new_timer))
                 return -EAGAIN;
+       atomic_set(&new_timer->freecount, 0);

--- include/linux/posix-timers.h.old  2007-07-24 11:32:36.000000000 -0700
+++ include/linux/posix-timers.h      2007-07-20 15:15:44.000000000 -0700
@@ -62,6 +62,7 @@ struct k_itimer {
                         unsigned long expires;
                 } mmtimer;
         } it;
+       atomic_t freecount;
  };

  struct k_clock {


Examination was done through a kgdb session when it caught the bug. 
freecount always had the expected value, 1 during a delete operation, 
and 0 otherwise.

>>> Could you try the patch below? Perhaps we have some wierd problem with
>>> ->sigq corruption.
>>
>> Tried, with apparent effect.
>
> You mean, "null pointer dereference" in release_posix_timer() ? Or it was
> BUG_ON(timr && !timr->sigq) in lock_timer() ?

Sorry.  That should have been "without apparent effect".

> I assume you didn't apply any other patches except Thomas's, yes?

Alone, as well as with Thomas' patch. Neither way changed the behavior.

Jeremy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-24 18:36                     ` Jeremy Katz
@ 2007-07-24 20:43                       ` Oleg Nesterov
  2007-07-24 21:19                         ` Jeremy Katz
  0 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2007-07-24 20:43 UTC (permalink / raw)
  To: Jeremy Katz
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On 07/24, Jeremy Katz wrote:
>
> On Tue, 24 Jul 2007, Oleg Nesterov wrote:
> 
> >Interesting. Could you show the patch? Where does sys_timer_create() set
> >counter == 1?
> 
> --- kernel/posix-timers.c.old      2007-07-24 11:21:29.000000000 -0700
> +++ kernel/posix-timers.c       2007-07-20 15:49:51.000000000 -0700
> @@ -409,13 +409,17 @@ static struct k_itimer * alloc_posix_tim
>  #define IT_ID_NOT_SET  0
>  static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
>  {
> +       atomic_inc(&tmr->freecount);
> @@ -443,6 +447,7 @@ sys_timer_create(const clockid_t which_c
>         new_timer = alloc_posix_timer();
>         if (unlikely(!new_timer))
>                 return -EAGAIN;
> +       atomic_set(&new_timer->freecount, 0);
> 
> --- include/linux/posix-timers.h.old  2007-07-24 11:32:36.000000000 -0700
> +++ include/linux/posix-timers.h      2007-07-20 15:15:44.000000000 -0700
> @@ -62,6 +62,7 @@ struct k_itimer {
>                         unsigned long expires;
>                 } mmtimer;
>         } it;
> +       atomic_t freecount;
>  };
> 
>  struct k_clock {
> 
> 
> Examination was done through a kgdb session when it caught the bug. 
> freecount always had the expected value, 1 during a delete operation, 
> and 0 otherwise.

OK, thanks.

> >>>Could you try the patch below? Perhaps we have some wierd problem with
> >>>->sigq corruption.
> >>
> >>Tried, with apparent effect.
> >
> >You mean, "null pointer dereference" in release_posix_timer() ? Or it was
> >BUG_ON(timr && !timr->sigq) in lock_timer() ?
> 
> Sorry.  That should have been "without apparent effect".

Sorry. I confused completely.

So. You mean that even with that patch you _still_ see the BUG_ON(!SIGQUEUE_PREALLOC)
in sigqueue_free() ?

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] posix-timer: fix deletion race
  2007-07-24 20:43                       ` Oleg Nesterov
@ 2007-07-24 21:19                         ` Jeremy Katz
  0 siblings, 0 replies; 24+ messages in thread
From: Jeremy Katz @ 2007-07-24 21:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, Ingo Molnar, Stable Team

On Wed, 25 Jul 2007, Oleg Nesterov wrote:

> On 07/24, Jeremy Katz wrote:
>>
>> Sorry.  That should have been "without apparent effect".
>
> Sorry. I confused completely.
>
> So. You mean that even with that patch you _still_ see the 
> BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free() ?

Yes.  I did not notice any difference in the system's behavior after 
applying the patch.

Jeremy

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2007-07-24 21:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-13 18:19 POSIX timer deletion race? Jeremy Katz
2007-07-17  8:53 ` [PATCH] posix-timer: fix deletion race Thomas Gleixner
2007-07-17  9:57   ` Ingo Molnar
2007-07-17 18:29     ` Jeremy Katz
2007-07-17 13:07   ` Oleg Nesterov
2007-07-17 17:02     ` Thomas Gleixner
2007-07-17 18:39   ` Jeremy Katz
2007-07-17 20:17     ` Thomas Gleixner
2007-07-17 23:12       ` Jeremy Katz
2007-07-17 23:58         ` Jeremy Katz
2007-07-18  6:05           ` Thomas Gleixner
2007-07-18  6:58             ` Thomas Gleixner
2007-07-18 19:02               ` Jeremy Katz
2007-07-18 23:43                 ` Jeremy Katz
2007-07-19  5:50                   ` Thomas Gleixner
2007-07-19 19:24                     ` Jeremy Katz
2007-07-18 16:11           ` Oleg Nesterov
2007-07-18 19:08             ` Jeremy Katz
2007-07-20 14:15               ` Oleg Nesterov
2007-07-24  2:07                 ` Jeremy Katz
2007-07-24 14:51                   ` Oleg Nesterov
2007-07-24 18:36                     ` Jeremy Katz
2007-07-24 20:43                       ` Oleg Nesterov
2007-07-24 21:19                         ` Jeremy Katz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).