* Recursion bug in -rt @ 2005-12-14 22:39 Dinakar Guniguntala 2005-12-15 1:03 ` david singleton 2005-12-15 19:00 ` David Singleton 0 siblings, 2 replies; 37+ messages in thread From: Dinakar Guniguntala @ 2005-12-14 22:39 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar, David Singleton [-- Attachment #1: Type: text/plain, Size: 2210 bytes --] Hi David, I hit this bug with -rt22-rf11 ========================================== [ BUG: lock recursion deadlock detected! | ------------------------------------------ already locked: [f7abbc94] {futex} .. held by: testpi-3: 4595 [f7becdd0, 59] ... acquired at: futex_wait_robust+0x142/0x1f3 ------------------------------ | showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): ------------------------------ #001: [f7abbc94] {futex} ... acquired at: futex_wait_robust+0x142/0x1f3 -{current task's backtrace}-----------------> [<c0103e04>] dump_stack+0x1e/0x20 (20) [<c0136bc2>] check_deadlock+0x2d7/0x334 (44) [<c01379bc>] task_blocks_on_lock+0x2c/0x224 (36) [<c03f29c5>] __down_interruptible+0x37c/0x95d (160) [<c013aebf>] down_futex+0xa3/0xe7 (40) [<c013ebc5>] futex_wait_robust+0x142/0x1f3 (72) [<c013f35c>] do_futex+0x9a/0x109 (40) [<c013f4dd>] sys_futex+0x112/0x11e (68) [<c0102f03>] sysenter_past_esp+0x54/0x75 (-8116) ------------------------------ | showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): ------------------------------ #001: [f7abbc94] {futex} ... acquired at: futex_wait_robust+0x142/0x1f3 --------------------------------------------------------------------- futex.c -> futex_wait_robust if ((curval & FUTEX_PID) == current->pid) { ret = -EAGAIN; goto out_unlock; } rt.c -> down_futex if (!owner_task || owner_task == current) { up(sem); up_read(¤t->mm->mmap_sem); return -EAGAIN; } I noticed that both the above checks below have been removed in your patch. I do understand that the futex_wait_robust path has been made similar to the futex_wait path, but I think we are not taking PI into consideration. Basically it looks like we still need to check if the current task has become owner. or are we missing a lock somewhere ? I added the down_futex check above and my test has been running for hours without the oops. Without this check it used to oops within minutes. Patch that works for me attached below. Thoughts? -Dinakar [-- Attachment #2: check_current.patch --] [-- Type: text/plain, Size: 486 bytes --] Index: linux-2.6.14-rt22-rayrt5/kernel/rt.c =================================================================== --- linux-2.6.14-rt22-rayrt5.orig/kernel/rt.c 2005-12-15 02:15:13.000000000 +0530 +++ linux-2.6.14-rt22-rayrt5/kernel/rt.c 2005-12-15 02:18:29.000000000 +0530 @@ -3001,7 +3001,7 @@ * if the owner can't be found return try again. */ - if (!owner_task) { + if (!owner_task || owner_task == current) { up(sem); up_read(¤t->mm->mmap_sem); return -EAGAIN; ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-14 22:39 Recursion bug in -rt Dinakar Guniguntala @ 2005-12-15 1:03 ` david singleton 2005-12-15 19:44 ` Dinakar Guniguntala 2005-12-15 19:00 ` David Singleton 1 sibling, 1 reply; 37+ messages in thread From: david singleton @ 2005-12-15 1:03 UTC (permalink / raw) To: dino; +Cc: linux-kernel, Ingo Molnar Dinakar, you may be correct, since reverting the change in down_futex seems to work. However, I'm wondering if you've hit a bug that Dave Carlson is reporting that he's tracked down to an inline in the glibc patch. I'll explain what the kernel code is intending to do and you can let me know what you think. The changes to the latest robust futex patch are to close an SMP race condition that Dave Carlson reported. The changes were intended to make the robust wait/wake code behave like the regular futex_wait/wake code, which doesn't see this race condition. The change allowed the library to pass in the value that it saw in the lock when it tried to lock the pthread_mutex. The kernel then locks the mmap_sem and robust_sem to protect the kernel lock and checks the value in the pthread_mutex again. If the value has not changed since the time the library saw it then the kernel continues to wait for the lock. Reverting the check in down_futex would mean that the library saw that the thread trying to lock the lock actually owned the lock in user space and still tried to wait for the lock. The library then passed this value into the kernel which checked to see if the value had changed since the kernel acquired its locks. The kernel determined that the value had not changed and called down_futex to block for the lock. The down futex code used the value in the lock to find the owning task. The task it found was the current task! Now there are two problems I see with the reversion: 1) Why did the library call into the kernel if the calling thread owned the lock? 2) The reversion would have the kernel pass back EAGAIN if it found that the the thread trying to lock the lock was in fact the owning thread. EAGAIN would cause the library to retry to lock the lock it already owns! So I'm wondering if the error you're seeing is caused by a bug in the library. I'm trying to get a glibc patch from Dave Carlson that he says fixes his problem. Can you run strace and see if you are getting any EAGAINs back from mutex_lock? EAGAINs passed to user space is an error, the library should try again as long as the value it got back from wait_robust is EAGAIN. What do you think? David On Dec 14, 2005, at 2:39 PM, Dinakar Guniguntala wrote: > Hi David, > > I hit this bug with -rt22-rf11 > > ========================================== > [ BUG: lock recursion deadlock detected! | > ------------------------------------------ > already locked: [f7abbc94] {futex} > .. held by: testpi-3: 4595 [f7becdd0, 59] > ... acquired at: futex_wait_robust+0x142/0x1f3 > ------------------------------ > | showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): > ------------------------------ > > #001: [f7abbc94] {futex} > ... acquired at: futex_wait_robust+0x142/0x1f3 > > -{current task's backtrace}-----------------> > [<c0103e04>] dump_stack+0x1e/0x20 (20) > [<c0136bc2>] check_deadlock+0x2d7/0x334 (44) > [<c01379bc>] task_blocks_on_lock+0x2c/0x224 (36) > [<c03f29c5>] __down_interruptible+0x37c/0x95d (160) > [<c013aebf>] down_futex+0xa3/0xe7 (40) > [<c013ebc5>] futex_wait_robust+0x142/0x1f3 (72) > [<c013f35c>] do_futex+0x9a/0x109 (40) > [<c013f4dd>] sys_futex+0x112/0x11e (68) > [<c0102f03>] sysenter_past_esp+0x54/0x75 (-8116) > ------------------------------ > | showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): > ------------------------------ > > #001: [f7abbc94] {futex} > ... acquired at: futex_wait_robust+0x142/0x1f3 > > --------------------------------------------------------------------- > > futex.c -> futex_wait_robust > > if ((curval & FUTEX_PID) == current->pid) { > ret = -EAGAIN; > goto out_unlock; > } > > rt.c -> down_futex > > if (!owner_task || owner_task == current) { > up(sem); > up_read(¤t->mm->mmap_sem); > return -EAGAIN; > } > > I noticed that both the above checks below have been removed in your > patch. I do understand that the futex_wait_robust path has been > made similar to the futex_wait path, but I think we are not taking > PI into consideration. Basically it looks like we still need to check > if the current task has become owner. or are we missing a lock > somewhere ? > > I added the down_futex check above and my test has been > running for hours without the oops. Without this check it > used to oops within minutes. > > Patch that works for me attached below. Thoughts? > > -Dinakar > > > <check_current.patch> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-15 1:03 ` david singleton @ 2005-12-15 19:44 ` Dinakar Guniguntala 2005-12-15 20:40 ` David Singleton 2005-12-16 0:02 ` david singleton 0 siblings, 2 replies; 37+ messages in thread From: Dinakar Guniguntala @ 2005-12-15 19:44 UTC (permalink / raw) To: david singleton; +Cc: linux-kernel, Ingo Molnar On Wed, Dec 14, 2005 at 05:03:13PM -0800, david singleton wrote: > Dinakar, > you may be correct, since reverting the change in down_futex seems > to work. Well it did not. It ran for much longer than previously but still hung up. Well we have a very basic problem in the current implementation. Currently if a thread calls pthread_mutex_lock on the same lock (normal, non recursive lock) twice without unlocking in between, the application hangs. Which is the right behaviour. However if the same thing is done with a non recursive robust mutex, it manages to acquire the lock. I see many problems here (I am assuming that the right behaviour with robust mutexes is for application to ultimately block indefinitely in the kernel) 1. In down_futex we do the following check if (owner_task != current) down_try_futex(lock, owner_task->thread_info __EIP__); In the above scenario, the thread would have acquired the uncontended lock first time around in userspace. The second time it tries to acquire the same mutex, because of the above check, does not call down_try_futex and hence will not initialize the lock structure in the kernel. It then goes to __down_interruptible where it is granted the lock, which is wrong. So IMO the above check is not right. However removing this check is not the end of story. This time it gets to task_blocks_on_lock and tries to grab the task->pi_lock of the owvner which is itself and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS is not set). So it looks like we need to add some check to prevent this below in case lock_owner happens to be current. _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > However, I'm wondering if you've hit a bug that Dave Carlson is > reporting that he's tracked down to an inline in the glibc patch. Yes I noticed that, basically it looks like INLINE_SYSCALL, on error returns -1 with the error code in errno. Whereas we expect the return code to be the same as the kernel return code. Are you referring to this or something else ?? However even with all of the above fixes (remove the check in down_futex as mentioned above, add a check in task_blocks_on_lock and the glibc changes) my test program continues to hang up the system, though it takes a lot longer to recreate the problem now [snip] > 1) Why did the library call into the kernel if the calling thread > owned the lock? This is something I still havent figured out and leads me to believe that we still have a tiny race race somewhere -Dinakar ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-15 19:44 ` Dinakar Guniguntala @ 2005-12-15 20:40 ` David Singleton 2005-12-16 0:02 ` david singleton 1 sibling, 0 replies; 37+ messages in thread From: David Singleton @ 2005-12-15 20:40 UTC (permalink / raw) To: dino; +Cc: linux-kernel, Ingo Molnar, robustmutexes Dinakar, two items. 1) Yes, the INLINE_SYSCALL is the problem that Dave Carlson is seeing. 2) Are you saying that a recursive pthread_mutex enters the kernel on subsequent lock opertions when it already owns the lock? I believe the library should take care of recursive pthread mutexes and not enter the kernel on subsequent calls. The futex_wait function is designed to block on lock owned by another thread. Let me go spend some time looking at the library code and see what it is doing for recursive mutexes. David >On Wed, Dec 14, 2005 at 05:03:13PM -0800, david singleton wrote: > > >>Dinakar, >>you may be correct, since reverting the change in down_futex seems >>to work. >> >> > >Well it did not. It ran for much longer than previously but still >hung up. > >Well we have a very basic problem in the current implementation. > >Currently if a thread calls pthread_mutex_lock on the same lock >(normal, non recursive lock) twice without unlocking in between, the >application hangs. Which is the right behaviour. >However if the same thing is done with a non recursive robust mutex, >it manages to acquire the lock. > >I see many problems here (I am assuming that the right behaviour >with robust mutexes is for application to ultimately block >indefinitely in the kernel) > >1. In down_futex we do the following check > > if (owner_task != current) > down_try_futex(lock, owner_task->thread_info __EIP__); > > In the above scenario, the thread would have acquired the uncontended > lock first time around in userspace. The second time it tries to > acquire the same mutex, because of the above check, does not > call down_try_futex and hence will not initialize the lock structure > in the kernel. It then goes to __down_interruptible where it is > granted the lock, which is wrong. > > So IMO the above check is not right. However removing this check > is not the end of story. This time it gets to task_blocks_on_lock > and tries to grab the task->pi_lock of the owvner which is itself > and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS > is not set). So it looks like we need to add some check to > prevent this below in case lock_owner happens to be current. > > _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > > > >>However, I'm wondering if you've hit a bug that Dave Carlson is >>reporting that he's tracked down to an inline in the glibc patch. >> >> > >Yes I noticed that, basically it looks like INLINE_SYSCALL, on error >returns -1 with the error code in errno. Whereas we expect the >return code to be the same as the kernel return code. Are you referring >to this or something else ?? > >However even with all of the above fixes (remove the check in down_futex >as mentioned above, add a check in task_blocks_on_lock and the glibc >changes) my test program continues to hang up the system, though it >takes a lot longer to recreate the problem now > >[snip] > > > >>1) Why did the library call into the kernel if the calling thread >>owned the lock? >> >> > >This is something I still havent figured out and leads me to believe >that we still have a tiny race race somewhere > > -Dinakar > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-15 19:44 ` Dinakar Guniguntala 2005-12-15 20:40 ` David Singleton @ 2005-12-16 0:02 ` david singleton 2005-12-16 18:42 ` Dinakar Guniguntala 1 sibling, 1 reply; 37+ messages in thread From: david singleton @ 2005-12-16 0:02 UTC (permalink / raw) To: dino; +Cc: Ingo Molnar, linux-kernel, robustmutexes Dinakar, I believe the problem we have is that the library is not checking to see if the mutex is a recursive mutex, and then checking to see if the recursive mutex is already owned by the calling thread. If a recursive mutex is owned by the calling thread the library should increment the lock count (POSIX says recursive mutexes must be unlocked as many times as they are locked) and return 0 (success) to the caller. I don't see anywhere in the library where the check for a recursive mutex is being made. The mutex_trylock code just does a compare and exchange on the lock to see if it can get it. A recursive mutex will fail this test and erroneously enter the kernel. I believe we need the down_futex change and a patch to glibc in which recursive mutexes are handled in the library. I'm talking to the library people now to see if that's the way it's supposed to work for recursive mutexes. If it is we'll have to provide a new glibc patch along with the kernel patch. I think I'll have to provide a new glibc patch anyways to fix the INLINE_SYSCALL problem. Does this make sense? David On Dec 15, 2005, at 11:44 AM, Dinakar Guniguntala wrote: > On Wed, Dec 14, 2005 at 05:03:13PM -0800, david singleton wrote: >> Dinakar, >> you may be correct, since reverting the change in down_futex seems >> to work. > > Well it did not. It ran for much longer than previously but still > hung up. > > Well we have a very basic problem in the current implementation. > > Currently if a thread calls pthread_mutex_lock on the same lock > (normal, non recursive lock) twice without unlocking in between, the > application hangs. Which is the right behaviour. > However if the same thing is done with a non recursive robust mutex, > it manages to acquire the lock. > > I see many problems here (I am assuming that the right behaviour > with robust mutexes is for application to ultimately block > indefinitely in the kernel) > > 1. In down_futex we do the following check > > if (owner_task != current) > down_try_futex(lock, owner_task->thread_info __EIP__); > > In the above scenario, the thread would have acquired the > uncontended > lock first time around in userspace. The second time it tries to > acquire the same mutex, because of the above check, does not > call down_try_futex and hence will not initialize the lock structure > in the kernel. It then goes to __down_interruptible where it is > granted the lock, which is wrong. > > So IMO the above check is not right. However removing this check > is not the end of story. This time it gets to task_blocks_on_lock > and tries to grab the task->pi_lock of the owvner which is itself > and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS > is not set). So it looks like we need to add some check to > prevent this below in case lock_owner happens to be current. > > _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > >> However, I'm wondering if you've hit a bug that Dave Carlson is >> reporting that he's tracked down to an inline in the glibc patch. > > Yes I noticed that, basically it looks like INLINE_SYSCALL, on error > returns -1 with the error code in errno. Whereas we expect the > return code to be the same as the kernel return code. Are you referring > to this or something else ?? > > However even with all of the above fixes (remove the check in > down_futex > as mentioned above, add a check in task_blocks_on_lock and the glibc > changes) my test program continues to hang up the system, though it > takes a lot longer to recreate the problem now > > [snip] > >> 1) Why did the library call into the kernel if the calling thread >> owned the lock? > > This is something I still havent figured out and leads me to believe > that we still have a tiny race race somewhere > > -Dinakar ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-16 0:02 ` david singleton @ 2005-12-16 18:42 ` Dinakar Guniguntala 2005-12-16 21:26 ` David Singleton 0 siblings, 1 reply; 37+ messages in thread From: Dinakar Guniguntala @ 2005-12-16 18:42 UTC (permalink / raw) To: david singleton; +Cc: Ingo Molnar, linux-kernel, robustmutexes On Thu, Dec 15, 2005 at 04:02:36PM -0800, david singleton wrote: > Dinakar, > > I believe the problem we have is that the library is not checking > to see if the mutex is a recursive mutex, and then checking to see > if the recursive mutex is already owned by the calling thread. If a > recursive mutex > is owned by the calling thread the library should increment the lock > count (POSIX says recursive mutexes must be unlocked as > many times as they are locked) and return 0 (success) to the caller. Sorry I seem to have confused you. I am _not_ talking about recursive mutexes here. I have two testcases. Here is a code snippet of testcase I void* test_thread (void* arg) { pthread_mutex_lock(&child_mutex); printf("test_thread got lock\n"); pthread_mutex_lock(&child_mutex); printf("test_thread got lock 2nd time !!\n"); printf("test_thread exiting\n"); return NULL; } main () { ... if (pthread_mutexattr_init(&mutexattr) != 0) { printf("Failed to init mutexattr\n"); }; if (pthread_mutexattr_setprotocol(&mutexattr, PTHREAD_PRIO_INHERIT) != 0) { printf("Can't set protocol prio inherit\n"); } if (pthread_mutexattr_getprotocol(&mutexattr, &protocol) != 0) { printf("Can't get mutexattr protocol\n"); } else { printf("protocol in mutexattr is %d\n", protocol); } if ((retc = pthread_mutex_init(&child_mutex, &mutexattr)) != 0) { printf("Failed to init mutex: %d\n", retc); } ... } Clearly what the application is doing here is wrong. However, 1. In the case of normal (non-robust) non recursive mutexes, the behaviour when we make the second pthread_mutex_lock call is for glibc to make a futex_wait call which will block forever. (Which is the right behaviour) 2. In the case of a robust/PI non recursive mutex, the current behaviour is the glibc makes a futex_wait_robust call (which is right) The kernel (2.6.15-rc5-rt1) rt_mutex lock is currently unowned and since we do not call down_try_futex if current == owner_task, we end up grabbing the lock in __down_interruptible and returning succesfully ! 3. Adding the check below in down_futex is also wrong if (!owner_task || owner_task == current) { up(sem); up_read(¤t->mm->mmap_sem); return -EAGAIN; } This will result in glibc calling the kernel continuosly in a loop and we will end up context switching to death I guess we need to cleanup this path to ensure that the application blocks forever. I also have testcase II which does not do anything illegal as the above one, instead it exercises the PI boosting code path in the kernel and that is where I see the system hang up yet again and this is the race that I am currently investigating Hope that clearls up things a bit -Dinakar > > I don't see anywhere in the library where the check for a recursive > mutex is being made. The mutex_trylock code just does a compare > and exchange on the lock to see if it can get it. A recursive mutex > will fail this test and erroneously enter the kernel. > > I believe we need the down_futex change and a patch > to glibc in which recursive mutexes are handled in the library. > > I'm talking to the library people now to see if that's the way > it's supposed to work for recursive mutexes. If it is we'll have > to provide a new glibc patch along with the kernel patch. > > I think I'll have to provide a new glibc patch anyways to > fix the INLINE_SYSCALL problem. > > Does this make sense? > > David > > On Dec 15, 2005, at 11:44 AM, Dinakar Guniguntala wrote: > > >On Wed, Dec 14, 2005 at 05:03:13PM -0800, david singleton wrote: > >>Dinakar, > >>you may be correct, since reverting the change in down_futex seems > >>to work. > > > >Well it did not. It ran for much longer than previously but still > >hung up. > > > >Well we have a very basic problem in the current implementation. > > > >Currently if a thread calls pthread_mutex_lock on the same lock > >(normal, non recursive lock) twice without unlocking in between, the > >application hangs. Which is the right behaviour. > >However if the same thing is done with a non recursive robust mutex, > >it manages to acquire the lock. > > > >I see many problems here (I am assuming that the right behaviour > >with robust mutexes is for application to ultimately block > >indefinitely in the kernel) > > > >1. In down_futex we do the following check > > > > if (owner_task != current) > > down_try_futex(lock, owner_task->thread_info __EIP__); > > > > In the above scenario, the thread would have acquired the > >uncontended > > lock first time around in userspace. The second time it tries to > > acquire the same mutex, because of the above check, does not > > call down_try_futex and hence will not initialize the lock structure > > in the kernel. It then goes to __down_interruptible where it is > > granted the lock, which is wrong. > > > > So IMO the above check is not right. However removing this check > > is not the end of story. This time it gets to task_blocks_on_lock > > and tries to grab the task->pi_lock of the owvner which is itself > > and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS > > is not set). So it looks like we need to add some check to > > prevent this below in case lock_owner happens to be current. > > > > _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > > > > >>However, I'm wondering if you've hit a bug that Dave Carlson is > >>reporting that he's tracked down to an inline in the glibc patch. > > > >Yes I noticed that, basically it looks like INLINE_SYSCALL, on error > >returns -1 with the error code in errno. Whereas we expect the > >return code to be the same as the kernel return code. Are you referring > >to this or something else ?? > > > >However even with all of the above fixes (remove the check in > >down_futex > >as mentioned above, add a check in task_blocks_on_lock and the glibc > >changes) my test program continues to hang up the system, though it > >takes a lot longer to recreate the problem now > > > >[snip] > > > >>1) Why did the library call into the kernel if the calling thread > >>owned the lock? > > > >This is something I still havent figured out and leads me to believe > >that we still have a tiny race race somewhere > > > > -Dinakar > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-16 18:42 ` Dinakar Guniguntala @ 2005-12-16 21:26 ` David Singleton 2005-12-19 11:56 ` Dinakar Guniguntala 0 siblings, 1 reply; 37+ messages in thread From: David Singleton @ 2005-12-16 21:26 UTC (permalink / raw) To: dino; +Cc: Ingo Molnar, linux-kernel, robustmutexes Dinakar, I believe this patch will give you the behavior you expect. http://source.mvista.com/~dsingleton/patch-2.6.15-rc5-rt2-rf3 With this patch your app that tries to lock a non-recursive pthread_mutex twice will not get EAGAIN back, it will hang. I should point out that you will still get the new, and I had hoped, added benefit feature of being able to detect when you app would deadlock itself by turning on DEBUG_DEADLOCKS. It will give you the same behavior that you originally posted. The kernel will inform you of the deadlock. If you need the POSIX behavior of having an app hang if it tries to lock the same, non-recursive pthread_mutex twice then you need to turn off DEBUG_DEADLOCKS. If you can send on the Priority Boost test I can start looking at it as well. David >On Thu, Dec 15, 2005 at 04:02:36PM -0800, david singleton wrote: > > >>Dinakar, >> >> I believe the problem we have is that the library is not checking >>to see if the mutex is a recursive mutex, and then checking to see >>if the recursive mutex is already owned by the calling thread. If a >>recursive mutex >>is owned by the calling thread the library should increment the lock >>count (POSIX says recursive mutexes must be unlocked as >>many times as they are locked) and return 0 (success) to the caller. >> >> > >Sorry I seem to have confused you. I am _not_ talking about recursive >mutexes here. > >I have two testcases. Here is a code snippet of testcase I > > void* test_thread (void* arg) > { > pthread_mutex_lock(&child_mutex); > printf("test_thread got lock\n"); > > pthread_mutex_lock(&child_mutex); > printf("test_thread got lock 2nd time !!\n"); > > printf("test_thread exiting\n"); > return NULL; > } > > main () > { > ... > > if (pthread_mutexattr_init(&mutexattr) != 0) { > printf("Failed to init mutexattr\n"); > }; > if (pthread_mutexattr_setprotocol(&mutexattr, > PTHREAD_PRIO_INHERIT) != 0) { > printf("Can't set protocol prio inherit\n"); > } > if (pthread_mutexattr_getprotocol(&mutexattr, &protocol) != 0) { > printf("Can't get mutexattr protocol\n"); > } else { > printf("protocol in mutexattr is %d\n", protocol); > } > if ((retc = pthread_mutex_init(&child_mutex, &mutexattr)) != 0) { > printf("Failed to init mutex: %d\n", retc); > } > > ... > } > > >Clearly what the application is doing here is wrong. However, > >1. In the case of normal (non-robust) non recursive mutexes, the >behaviour when we make the second pthread_mutex_lock call is for glibc >to make a futex_wait call which will block forever. >(Which is the right behaviour) > >2. In the case of a robust/PI non recursive mutex, the current >behaviour is the glibc makes a futex_wait_robust call (which is right) >The kernel (2.6.15-rc5-rt1) rt_mutex lock is currently unowned and >since we do not call down_try_futex if current == owner_task, we end >up grabbing the lock in __down_interruptible and returning succesfully ! > >3. Adding the check below in down_futex is also wrong > > if (!owner_task || owner_task == current) { > up(sem); > up_read(¤t->mm->mmap_sem); > return -EAGAIN; > } > > This will result in glibc calling the kernel continuosly in a > loop and we will end up context switching to death > >I guess we need to cleanup this path to ensure that the application >blocks forever. > >I also have testcase II which does not do anything illegal as the >above one, instead it exercises the PI boosting code path in the >kernel and that is where I see the system hang up yet again >and this is the race that I am currently investigating > >Hope that clearls up things a bit > > -Dinakar > > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-16 21:26 ` David Singleton @ 2005-12-19 11:56 ` Dinakar Guniguntala 2005-12-19 20:11 ` David Singleton 0 siblings, 1 reply; 37+ messages in thread From: Dinakar Guniguntala @ 2005-12-19 11:56 UTC (permalink / raw) To: David Singleton; +Cc: Ingo Molnar, linux-kernel, robustmutexes [-- Attachment #1: Type: text/plain, Size: 785 bytes --] On Fri, Dec 16, 2005 at 01:26:44PM -0800, David Singleton wrote: > Dinakar, > > I believe this patch will give you the behavior you expect. > > http://source.mvista.com/~dsingleton/patch-2.6.15-rc5-rt2-rf3 Not really, the reason, quoting from my previous mail So IMO the above check is not right. However removing this check is not the end of story. This time it gets to task_blocks_on_lock and tries to grab the task->pi_lock of the owvner which is itself and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS is not set). So it looks like we need to add some check to prevent this below in case lock_owner happens to be current. _raw_spin_lock(&lock_owner(lock)->task->pi_lock); The patch that works for me is attached below -Dinakar [-- Attachment #2: check_current.patch --] [-- Type: text/plain, Size: 1171 bytes --] Index: linux-2.6.14-rt22-rayrt5/kernel/rt.c =================================================================== --- linux-2.6.14-rt22-rayrt5.orig/kernel/rt.c 2005-12-15 02:15:13.000000000 +0530 +++ linux-2.6.14-rt22-rayrt5/kernel/rt.c 2005-12-19 15:51:26.000000000 +0530 @@ -1042,7 +1042,8 @@ return; } #endif - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); + if (current != lock_owner(lock)->task) + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); plist_add(&waiter->pi_list, &lock_owner(lock)->task->pi_waiters); /* * Add RT tasks to the head: @@ -1055,7 +1056,8 @@ */ if (task->prio < lock_owner(lock)->task->prio) pi_setprio(lock, lock_owner(lock)->task, task->prio); - _raw_spin_unlock(&lock_owner(lock)->task->pi_lock); + if (current != lock_owner(lock)->task) + _raw_spin_unlock(&lock_owner(lock)->task->pi_lock); } /* @@ -3016,8 +3018,7 @@ * the first waiter and we'll just block on the down_interruptible. */ - if (owner_task != current) - down_try_futex(lock, owner_task->thread_info __EIP__); + down_try_futex(lock, owner_task->thread_info __EIP__); /* * we can now drop the locks because the rt_mutex is held. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-19 11:56 ` Dinakar Guniguntala @ 2005-12-19 20:11 ` David Singleton 0 siblings, 0 replies; 37+ messages in thread From: David Singleton @ 2005-12-19 20:11 UTC (permalink / raw) To: dino; +Cc: Ingo Molnar, linux-kernel, robustmutexes Dinakar Guniguntala wrote: Dinakar, the new patch is available at http://source.mvista.com/~dsingleton/patch-2.6.15-rc5-rt2-rf4 thanks for the patch and the testing. David >On Fri, Dec 16, 2005 at 01:26:44PM -0800, David Singleton wrote: > > >>Dinakar, >> >> I believe this patch will give you the behavior you expect. >> >> http://source.mvista.com/~dsingleton/patch-2.6.15-rc5-rt2-rf3 >> >> > >Not really, the reason, quoting from my previous mail > > So IMO the above check is not right. However removing this check > is not the end of story. This time it gets to task_blocks_on_lock > and tries to grab the task->pi_lock of the owvner which is itself > and results in a system hang. (Assuming CONFIG_DEBUG_DEADLOCKS > is not set). So it looks like we need to add some check to > prevent this below in case lock_owner happens to be current. > > _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > >The patch that works for me is attached below > > -Dinakar > > > > >------------------------------------------------------------------------ > >Index: linux-2.6.14-rt22-rayrt5/kernel/rt.c >=================================================================== >--- linux-2.6.14-rt22-rayrt5.orig/kernel/rt.c 2005-12-15 02:15:13.000000000 +0530 >+++ linux-2.6.14-rt22-rayrt5/kernel/rt.c 2005-12-19 15:51:26.000000000 +0530 >@@ -1042,7 +1042,8 @@ > return; > } > #endif >- _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >+ if (current != lock_owner(lock)->task) >+ _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > plist_add(&waiter->pi_list, &lock_owner(lock)->task->pi_waiters); > /* > * Add RT tasks to the head: >@@ -1055,7 +1056,8 @@ > */ > if (task->prio < lock_owner(lock)->task->prio) > pi_setprio(lock, lock_owner(lock)->task, task->prio); >- _raw_spin_unlock(&lock_owner(lock)->task->pi_lock); >+ if (current != lock_owner(lock)->task) >+ _raw_spin_unlock(&lock_owner(lock)->task->pi_lock); > } > > /* >@@ -3016,8 +3018,7 @@ > * the first waiter and we'll just block on the down_interruptible. > */ > >- if (owner_task != current) >- down_try_futex(lock, owner_task->thread_info __EIP__); >+ down_try_futex(lock, owner_task->thread_info __EIP__); > > /* > * we can now drop the locks because the rt_mutex is held. > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-14 22:39 Recursion bug in -rt Dinakar Guniguntala 2005-12-15 1:03 ` david singleton @ 2005-12-15 19:00 ` David Singleton 2005-12-15 19:52 ` Dinakar Guniguntala 2005-12-20 13:19 ` Ingo Molnar 1 sibling, 2 replies; 37+ messages in thread From: David Singleton @ 2005-12-15 19:00 UTC (permalink / raw) To: dino; +Cc: linux-kernel, Ingo Molnar, robustmutexes Dinakar, after further testing and investigation I believe you original assessment was correct. The problem you are seeing is not a library problem. The changes to down_futex need to be reverted. There is a new patch at http://source.mvista.com/~dsingleton/patch-2.6.15-rc5-rt2-rf2. that reverts the changes to down_futex. Thanks for testing this. David Dinakar Guniguntala wrote: >Hi David, > >I hit this bug with -rt22-rf11 > >========================================== >[ BUG: lock recursion deadlock detected! | >------------------------------------------ >already locked: [f7abbc94] {futex} >.. held by: testpi-3: 4595 [f7becdd0, 59] >... acquired at: futex_wait_robust+0x142/0x1f3 >------------------------------ >| showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): >------------------------------ > >#001: [f7abbc94] {futex} >... acquired at: futex_wait_robust+0x142/0x1f3 > >-{current task's backtrace}-----------------> > [<c0103e04>] dump_stack+0x1e/0x20 (20) > [<c0136bc2>] check_deadlock+0x2d7/0x334 (44) > [<c01379bc>] task_blocks_on_lock+0x2c/0x224 (36) > [<c03f29c5>] __down_interruptible+0x37c/0x95d (160) > [<c013aebf>] down_futex+0xa3/0xe7 (40) > [<c013ebc5>] futex_wait_robust+0x142/0x1f3 (72) > [<c013f35c>] do_futex+0x9a/0x109 (40) > [<c013f4dd>] sys_futex+0x112/0x11e (68) > [<c0102f03>] sysenter_past_esp+0x54/0x75 (-8116) >------------------------------ >| showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): >------------------------------ > >#001: [f7abbc94] {futex} >... acquired at: futex_wait_robust+0x142/0x1f3 > >--------------------------------------------------------------------- > >futex.c -> futex_wait_robust > > if ((curval & FUTEX_PID) == current->pid) { > ret = -EAGAIN; > goto out_unlock; > } > >rt.c -> down_futex > > if (!owner_task || owner_task == current) { > up(sem); > up_read(¤t->mm->mmap_sem); > return -EAGAIN; > } > >I noticed that both the above checks below have been removed in your >patch. I do understand that the futex_wait_robust path has been >made similar to the futex_wait path, but I think we are not taking >PI into consideration. Basically it looks like we still need to check >if the current task has become owner. or are we missing a lock somewhere ? > >I added the down_futex check above and my test has been >running for hours without the oops. Without this check it >used to oops within minutes. > >Patch that works for me attached below. Thoughts? > > -Dinakar > > > > >------------------------------------------------------------------------ > >Index: linux-2.6.14-rt22-rayrt5/kernel/rt.c >=================================================================== >--- linux-2.6.14-rt22-rayrt5.orig/kernel/rt.c 2005-12-15 02:15:13.000000000 +0530 >+++ linux-2.6.14-rt22-rayrt5/kernel/rt.c 2005-12-15 02:18:29.000000000 +0530 >@@ -3001,7 +3001,7 @@ > * if the owner can't be found return try again. > */ > >- if (!owner_task) { >+ if (!owner_task || owner_task == current) { > up(sem); > up_read(¤t->mm->mmap_sem); > return -EAGAIN; > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-15 19:00 ` David Singleton @ 2005-12-15 19:52 ` Dinakar Guniguntala 2005-12-20 13:19 ` Ingo Molnar 1 sibling, 0 replies; 37+ messages in thread From: Dinakar Guniguntala @ 2005-12-15 19:52 UTC (permalink / raw) To: David Singleton; +Cc: linux-kernel, Ingo Molnar, robustmutexes On Thu, Dec 15, 2005 at 11:00:49AM -0800, David Singleton wrote: > Dinakar, > after further testing and investigation I believe you original > assessment was correct. > The problem you are seeing is not a library problem. > The changes to down_futex need to be reverted. There is a new patch at > > http://source.mvista.com/~dsingleton/patch-2.6.15-rc5-rt2-rf2. > > that reverts the changes to down_futex. > David, See my previous mail. IMO this patch is not right and besides it does not fix the hang that I am seeing either. I am continuing to debug the hang. Will keep you posted -Dinakar > Dinakar Guniguntala wrote: > > >Hi David, > > > >I hit this bug with -rt22-rf11 > > > >========================================== > >[ BUG: lock recursion deadlock detected! | > >------------------------------------------ > >already locked: [f7abbc94] {futex} > >.. held by: testpi-3: 4595 [f7becdd0, 59] > >... acquired at: futex_wait_robust+0x142/0x1f3 > >------------------------------ > >| showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): > >------------------------------ > > > >#001: [f7abbc94] {futex} > >... acquired at: futex_wait_robust+0x142/0x1f3 > > > >-{current task's backtrace}-----------------> > >[<c0103e04>] dump_stack+0x1e/0x20 (20) > >[<c0136bc2>] check_deadlock+0x2d7/0x334 (44) > >[<c01379bc>] task_blocks_on_lock+0x2c/0x224 (36) > >[<c03f29c5>] __down_interruptible+0x37c/0x95d (160) > >[<c013aebf>] down_futex+0xa3/0xe7 (40) > >[<c013ebc5>] futex_wait_robust+0x142/0x1f3 (72) > >[<c013f35c>] do_futex+0x9a/0x109 (40) > >[<c013f4dd>] sys_futex+0x112/0x11e (68) > >[<c0102f03>] sysenter_past_esp+0x54/0x75 (-8116) > >------------------------------ > >| showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): > >------------------------------ > > > >#001: [f7abbc94] {futex} > >... acquired at: futex_wait_robust+0x142/0x1f3 > > > >--------------------------------------------------------------------- > > > >futex.c -> futex_wait_robust > > > > if ((curval & FUTEX_PID) == current->pid) { > > ret = -EAGAIN; > > goto out_unlock; > > } > > > >rt.c -> down_futex > > > > if (!owner_task || owner_task == current) { > > up(sem); > > up_read(¤t->mm->mmap_sem); > > return -EAGAIN; > > } > > > >I noticed that both the above checks below have been removed in your > >patch. I do understand that the futex_wait_robust path has been > >made similar to the futex_wait path, but I think we are not taking > >PI into consideration. Basically it looks like we still need to check > >if the current task has become owner. or are we missing a lock somewhere ? > > > >I added the down_futex check above and my test has been > >running for hours without the oops. Without this check it > >used to oops within minutes. > > > >Patch that works for me attached below. Thoughts? > > > > -Dinakar > > > > > > > > > >------------------------------------------------------------------------ > > > >Index: linux-2.6.14-rt22-rayrt5/kernel/rt.c > >=================================================================== > >--- linux-2.6.14-rt22-rayrt5.orig/kernel/rt.c 2005-12-15 > >02:15:13.000000000 +0530 > >+++ linux-2.6.14-rt22-rayrt5/kernel/rt.c 2005-12-15 > >02:18:29.000000000 +0530 > >@@ -3001,7 +3001,7 @@ > > * if the owner can't be found return try again. > > */ > > > >- if (!owner_task) { > >+ if (!owner_task || owner_task == current) { > > up(sem); > > up_read(¤t->mm->mmap_sem); > > return -EAGAIN; > > > > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-15 19:00 ` David Singleton 2005-12-15 19:52 ` Dinakar Guniguntala @ 2005-12-20 13:19 ` Ingo Molnar 2005-12-20 15:50 ` Dinakar Guniguntala 1 sibling, 1 reply; 37+ messages in thread From: Ingo Molnar @ 2005-12-20 13:19 UTC (permalink / raw) To: David Singleton; +Cc: dino, linux-kernel, robustmutexes * David Singleton <dsingleton@mvista.com> wrote: > Dinakar, > after further testing and investigation I believe you original > assessment was correct. > The problem you are seeing is not a library problem. > The changes to down_futex need to be reverted. There is a new patch at > > http://source.mvista.com/~dsingleton/patch-2.6.15-rc5-rt2-rf2. > > that reverts the changes to down_futex. hm, i'm looking at -rf4 - these changes look fishy: - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); + if (current != lock_owner(lock)->task) + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); why is this done? Ingo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 13:19 ` Ingo Molnar @ 2005-12-20 15:50 ` Dinakar Guniguntala 2005-12-20 17:43 ` Esben Nielsen ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Dinakar Guniguntala @ 2005-12-20 15:50 UTC (permalink / raw) To: Ingo Molnar; +Cc: David Singleton, linux-kernel, robustmutexes On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: > > hm, i'm looking at -rf4 - these changes look fishy: > > - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > + if (current != lock_owner(lock)->task) > + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > why is this done? > Ingo, this is to prevent a kernel hang due to application error. Basically when an application does a pthread_mutex_lock twice on a _nonrecursive_ mutex with robust/PI attributes the whole system hangs. Ofcourse the application clearly should not be doing anything like that, but it should not end up hanging the system either -Dinakar ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 15:50 ` Dinakar Guniguntala @ 2005-12-20 17:43 ` Esben Nielsen 2005-12-20 19:33 ` Steven Rostedt 2006-01-03 1:54 ` david singleton 2006-01-05 2:14 ` david singleton 2 siblings, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2005-12-20 17:43 UTC (permalink / raw) To: Dinakar Guniguntala; +Cc: Ingo Molnar, robustmutexes, linux-kernel On Tue, 20 Dec 2005, Dinakar Guniguntala wrote: > On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: > > > > hm, i'm looking at -rf4 - these changes look fishy: > > > > - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > + if (current != lock_owner(lock)->task) > > + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > > > why is this done? > > > > Ingo, this is to prevent a kernel hang due to application error. > > Basically when an application does a pthread_mutex_lock twice on a > _nonrecursive_ mutex with robust/PI attributes the whole system hangs. > Ofcourse the application clearly should not be doing anything like > that, but it should not end up hanging the system either > Hmm, reading the comment on the function, wouldn't it be more natural to use if(task != lock_owner(lock)->task) as it assumes that task->pi_lock is locked, not that current->pi_lock is locked. By the way: task->pi_lock is taken. lock_owner(lock)->task->pi_lock will be taken. What if the task lock_owner(lock)->task tries to lock another futex, (lock2) with which has lock_owner(lock2)->task==task. Can't you promote a user space futex deadlock into a kernel spin deadlock this way? Esben > -Dinakar > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 17:43 ` Esben Nielsen @ 2005-12-20 19:33 ` Steven Rostedt 2005-12-20 20:42 ` Esben Nielsen 0 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2005-12-20 19:33 UTC (permalink / raw) To: Esben Nielsen Cc: linux-kernel, robustmutexes, Ingo Molnar, Dinakar Guniguntala On Tue, 2005-12-20 at 18:43 +0100, Esben Nielsen wrote: > > On Tue, 20 Dec 2005, Dinakar Guniguntala wrote: > > > On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: > > > > > > hm, i'm looking at -rf4 - these changes look fishy: > > > > > > - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > > + if (current != lock_owner(lock)->task) > > > + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > > > > > why is this done? > > > > > > > Ingo, this is to prevent a kernel hang due to application error. > > > > Basically when an application does a pthread_mutex_lock twice on a > > _nonrecursive_ mutex with robust/PI attributes the whole system hangs. > > Ofcourse the application clearly should not be doing anything like > > that, but it should not end up hanging the system either > > > > Hmm, reading the comment on the function, wouldn't it be more natural to > use > if(task != lock_owner(lock)->task) > as it assumes that task->pi_lock is locked, not that current->pi_lock is > locked. > > By the way: > task->pi_lock is taken. lock_owner(lock)->task->pi_lock will be taken. > What if the task lock_owner(lock)->task tries to lock another futex, > (lock2) with which has lock_owner(lock2)->task==task. > Can't you promote a user space futex deadlock into a kernel spin deadlock > this way? Yes! The locking code of the pi locks in the rt.c code is VERY dependent on the order of locks taken. It works by assuming the order of locks taken will not themselves cause a deadlock. I just recently submitted a patch to Ingo because I found that mutex_trylock can cause a deadlock, since it is not bound to the order of locks. So, to answer your question more formal this time. If the futex code uses the pi_lock code in rt.c and the futex causes a deadlock, then the kernel can deadlock too. The benefit of this locking order is that we got rid of the global pi_lock, and that was worth the problems you face today. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 19:33 ` Steven Rostedt @ 2005-12-20 20:42 ` Esben Nielsen 2005-12-20 21:20 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2005-12-20 20:42 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, robustmutexes, Ingo Molnar, Dinakar Guniguntala On Tue, 20 Dec 2005, Steven Rostedt wrote: > On Tue, 2005-12-20 at 18:43 +0100, Esben Nielsen wrote: > > > > On Tue, 20 Dec 2005, Dinakar Guniguntala wrote: > > > > > On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: > > > > > > > > hm, i'm looking at -rf4 - these changes look fishy: > > > > > > > > - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > > > + if (current != lock_owner(lock)->task) > > > > + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > > > > > > > > why is this done? > > > > > > > > > > Ingo, this is to prevent a kernel hang due to application error. > > > > > > Basically when an application does a pthread_mutex_lock twice on a > > > _nonrecursive_ mutex with robust/PI attributes the whole system hangs. > > > Ofcourse the application clearly should not be doing anything like > > > that, but it should not end up hanging the system either > > > > > > > Hmm, reading the comment on the function, wouldn't it be more natural to > > use > > if(task != lock_owner(lock)->task) > > as it assumes that task->pi_lock is locked, not that current->pi_lock is > > locked. > > > > By the way: > > task->pi_lock is taken. lock_owner(lock)->task->pi_lock will be taken. > > What if the task lock_owner(lock)->task tries to lock another futex, > > (lock2) with which has lock_owner(lock2)->task==task. > > Can't you promote a user space futex deadlock into a kernel spin deadlock > > this way? > > Yes! > > The locking code of the pi locks in the rt.c code is VERY dependent on > the order of locks taken. It works by assuming the order of locks taken > will not themselves cause a deadlock. I just recently submitted a patch > to Ingo because I found that mutex_trylock can cause a deadlock, since > it is not bound to the order of locks. > > So, to answer your question more formal this time. If the futex code > uses the pi_lock code in rt.c and the futex causes a deadlock, then the > kernel can deadlock too. > This is ok for kernel mutexes, which are supposed not to cause deadlocks. But user space deadlocks must not cause kernel deadlocks. Therefore the robust futex code _must_ be fixed. > The benefit of this locking order is that we got rid of the global > pi_lock, and that was worth the problems you face today. > I believe this problem can be solved in the pi_lock code - but it will require quite a bit of recoding. I started on it a little while ago but didn't at all get the time to get into anything to even compile :-( I don't have time to finish any code at all but I guess I can plant an the idea instead: When resolving the mutex chain (task A locks mutex 1 owned by B blocked on 2 owned by C etc) for PI boosting and also when finding deadlocks, release _all_ locks before going to the next step in the chain. Use get_task_struct on the next task in the chain, release the locks, take the pi_locks in a fixed order (sort by address forinstance), do the PI boosting, get_task_struct on the next lock, release all the locks, do the put_task_struct() on the previous task etc. This a lot more expensive approach as it involves double as many spinlock operations and get/put_task_struct() calls , but it has the added benifit of reducing the overall system latency for long locking chains as there are spots where interrupts and preemption will be enabled for each step in the chain. Remember also that deep locking chains should be considered an exeption so the code doesn't have to be optimal wrt. performance. I added my feeble attempts to implement this below. I have no chance of ever getting time finish it :-( > -- Steve > > This is a "near" replacement for task_blocks_on_lock(). It leaves with all spinlocks unlocked. A lot of other code needs to be changed as well to accept the new right order of locking. static void task_blocks_on_lock2(struct rt_mutex_waiter *waiter, struct thread_info *ti, struct rt_mutex *lock __EIP_DECL__) { task_t *owner, *blocked_task = ti->task; #ifdef CONFIG_RT_DEADLOCK_DETECT /* mark the current thread as blocked on the lock */ waiter->eip = eip; #endif blocked_task->blocked_on = waiter; waiter->lock = lock; waiter->ti = blocked_task->thread_info; waiter->on_pi_waiters = 0; plist_init(&waiter->pi_list, blocked_task->prio); plist_init(&waiter->list, blocked_task->prio); SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&blocked_task->pi_lock)); SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock)); plist_add(&waiter->list, &lock->wait_list); set_lock_owner_pending(lock); owner = lock_owner_task(lock); /* Put code to grab the lock and return on success here */ /* Code for waiting: */ get_task_struct(owner); get_task_struct(blocked_task); blocked_task->blocked_on = waiter; _raw_spin_unlock(&lock->wait_lock); _raw_spin_unlock(¤t->pi_lock); while(owner) { /* No spin-locks held here! */ struct rt_mutex *this_lock; task_t *next_owner = NULL; if(owner-blocked_task<0) { _raw_spin_lock(&owner->pi_lock); _raw_spin_lock(&blocked_task->pi_lock); } else { _raw_spin_lock(&blocked_task->pi_lock); _raw_spin_lock(&owner->pi_lock); } if(!blocked_task->blocked_on || !blocked_task->blocked_on->ti) { /* The lock have been released!! */ _raw_spin_unlock(&blocked_task->pi_lock); _raw_spin_unlock(&owner->pi_lock); put_task_struct(owner); break; } waiter = blocked_task->blocked_on; this_lock = waiter->lock; _raw_spin_lock(&this_lock->wait_lock); if(lock_owner_task(this_lock)!=owner) { /* Ups, owner have changed redo it with the new owner */ next_owner = lock_owner_task(this_lock); if(next_owner) get_task_struct(next_owner); _raw_spin_unlock(&this_lock->wait_lock); _raw_spin_unlock(&blocked_task->pi_lock); _raw_spin_unlock(&owner->pi_lock); put_task_struct(owner); owner = next_owner; continue; } /* Prio might have changed - it does so in the previous loop. So the lists needs to be resorted */ plist_del(&waiter->list, &lock->wait_list); plist_init(&waiter->list, blocked_task->prio); plist_add(&waiter->list, &lock->wait_list); if(waiter->on_pi_waiters) { plist_del(&waiter->pi_list, &owner->pi_waiters); waiter->on_pi_waiters = 0; } /* We only need the highest priority waiter on owner->pi_list */ if(plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, pi_list) == waiter) { plist_init(&waiter->pi_list, blocked_task->prio); plist_add(&waiter->pi_list, &owner->pi_waiters); waiter->on_pi_waiters = 1; } _raw_spin_unlock(&this_lock->wait_lock); _raw_spin_unlock(&blocked_task->pi_lock); if(DEADLOCK_DETECT || owner->prio > blocked_task->prio) { if(owner->prio > blocked_task->prio) mutex_setprio(owner,blocked_task->prio); if(owner->blocked_on) { waiter = owner->blocked_on; _raw_spin_lock(&waiter->lock->wait_lock); next_owner = lock_owner_task(waiter->lock); if(next_owner) get_task_struct(next_owner); _raw_spin_unlock(&waiter->lock->wait_lock); } } put_task_struct(blocked_task); blocked_task = owner; owner = next_owner; } BUG_ON(!blocked_task); put_task_struct(blocked_task); } ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 20:42 ` Esben Nielsen @ 2005-12-20 21:20 ` Steven Rostedt 2005-12-20 21:55 ` david singleton 2005-12-20 22:43 ` Esben Nielsen 0 siblings, 2 replies; 37+ messages in thread From: Steven Rostedt @ 2005-12-20 21:20 UTC (permalink / raw) To: Esben Nielsen Cc: linux-kernel, robustmutexes, Ingo Molnar, Dinakar Guniguntala On Tue, 2005-12-20 at 21:42 +0100, Esben Nielsen wrote: > > > This is ok for kernel mutexes, which are supposed not to cause deadlocks. > But user space deadlocks must not cause kernel deadlocks. Therefore the > robust futex code _must_ be fixed. And it should be fixed in the futex side. Not in rt.c. > > > The benefit of this locking order is that we got rid of the global > > pi_lock, and that was worth the problems you face today. > > > > I believe this problem can be solved in the pi_lock code - but it will > require quite a bit of recoding. I started on it a little while ago but > didn't at all get the time to get into anything to even compile :-( > I don't have time to finish any code at all but I guess I can plant an > the idea instead: I removed the global pi_lock in two sleepless days, and it was all on the theory that the locks themselves would not deadlock. That was the only sanity I was able to hang on to. That was complex enough, and very scary to get right (scary since it _had_ to be done right). And it was complex enough to keep a highly skilled programmer living in Hungary from doing it himself (not to say he couldn't do it, just complex enough for him to put it off for quite some time). And one must also worry about the BKL which is a separate beast all together. So making it any more complex is IMO out of the question. > > When resolving the mutex chain (task A locks mutex 1 owned by B blocked > on 2 owned by C etc) for PI boosting and also when finding deadlocks, > release _all_ locks before going to the next step in the chain. Use > get_task_struct on the next task in the chain, release the locks, > take the pi_locks in a fixed order (sort by address forinstance), do the > PI boosting, get_task_struct on the next lock, release all the locks, do > the put_task_struct() on the previous task etc. > This a lot more expensive approach as it involves double as many spinlock > operations and get/put_task_struct() calls , but it has the added benifit > of reducing the overall system latency for long locking chains as there > are spots where interrupts and preemption will be enabled for each step in > the chain. Remember also that deep locking chains should be considered an > exeption so the code doesn't have to be optimal wrt. performance. The long lock holding is only by the lock being grabbed and the owner grabbing it. All other locks don't need to be held for long periods of time. There's lots of issues if you release these two locks. How do you deal with the mutex being released while going up the chain? > > I added my feeble attempts to implement this below. I have no chance of > ever getting time finish it :-( I doubt they were feeble, but just proof that this approach is far too complex. As I said, if you don't want futex to deadlock the kernel, the API for futex should have deadlocking checks, since the only way this can deadlock the system, is if two threads are in the kernel at the same time. Also, the ones who are paying me to help out the -rt kernel, don't care if we let the user side deadlock the system. They deliver a complete package, kernel and user apps, and nothing else is to be running on the system. This means that if the user app deadlocks, it doesn't matter if the kernel deadlocks or not, because the deadlocking of the user app means the system has failed. And I have a feeling that a lot of other users of -rt feel the same way. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 21:20 ` Steven Rostedt @ 2005-12-20 21:55 ` david singleton 2005-12-20 22:56 ` Esben Nielsen 2005-12-20 22:43 ` Esben Nielsen 1 sibling, 1 reply; 37+ messages in thread From: david singleton @ 2005-12-20 21:55 UTC (permalink / raw) To: Steven Rostedt; +Cc: robustmutexes, Esben Nielsen, linux-kernel, Ingo Molnar On Dec 20, 2005, at 1:20 PM, Steven Rostedt wrote: > On Tue, 2005-12-20 at 21:42 +0100, Esben Nielsen wrote: > >>> >> This is ok for kernel mutexes, which are supposed not to cause >> deadlocks. >> But user space deadlocks must not cause kernel deadlocks. Therefore >> the >> robust futex code _must_ be fixed. > > And it should be fixed in the futex side. Not in rt.c. Yes. I believe you are right about this. The bad news is POSIX specifies that an app that calls lock_mutex twice on the same mutex hangs. EINVAL or EWOULDBLOCK are not POSIX compliant. Let me think on this a while and see if I can come up with an idea that will let the app hang itself (and be POSIX compliant) and not hang the kernel. Perhaps I'll just detect it and hang them on a waitqueue waiting for the user to get impatient and hit control C . . . . David > >> >>> The benefit of this locking order is that we got rid of the global >>> pi_lock, and that was worth the problems you face today. >>> >> >> I believe this problem can be solved in the pi_lock code - but it will >> require quite a bit of recoding. I started on it a little while ago >> but >> didn't at all get the time to get into anything to even compile :-( >> I don't have time to finish any code at all but I guess I can plant an >> the idea instead: > > I removed the global pi_lock in two sleepless days, and it was all on > the theory that the locks themselves would not deadlock. That was the > only sanity I was able to hang on to. That was complex enough, and > very > scary to get right (scary since it _had_ to be done right). And it was > complex enough to keep a highly skilled programmer living in Hungary > from doing it himself (not to say he couldn't do it, just complex > enough > for him to put it off for quite some time). And one must also worry > about the BKL which is a separate beast all together. > > So making it any more complex is IMO out of the question. > >> >> When resolving the mutex chain (task A locks mutex 1 owned by B >> blocked >> on 2 owned by C etc) for PI boosting and also when finding deadlocks, >> release _all_ locks before going to the next step in the chain. Use >> get_task_struct on the next task in the chain, release the locks, >> take the pi_locks in a fixed order (sort by address forinstance), do >> the >> PI boosting, get_task_struct on the next lock, release all the locks, >> do >> the put_task_struct() on the previous task etc. >> This a lot more expensive approach as it involves double as many >> spinlock >> operations and get/put_task_struct() calls , but it has the added >> benifit >> of reducing the overall system latency for long locking chains as >> there >> are spots where interrupts and preemption will be enabled for each >> step in >> the chain. Remember also that deep locking chains should be >> considered an >> exeption so the code doesn't have to be optimal wrt. performance. > > The long lock holding is only by the lock being grabbed and the owner > grabbing it. All other locks don't need to be held for long periods of > time. There's lots of issues if you release these two locks. How do > you > deal with the mutex being released while going up the chain? > >> >> I added my feeble attempts to implement this below. I have no chance >> of >> ever getting time finish it :-( > > I doubt they were feeble, but just proof that this approach is far too > complex. As I said, if you don't want futex to deadlock the kernel, > the > API for futex should have deadlocking checks, since the only way this > can deadlock the system, is if two threads are in the kernel at the > same > time. > > Also, the ones who are paying me to help out the -rt kernel, don't care > if we let the user side deadlock the system. They deliver a complete > package, kernel and user apps, and nothing else is to be running on the > system. This means that if the user app deadlocks, it doesn't matter > if > the kernel deadlocks or not, because the deadlocking of the user app > means the system has failed. And I have a feeling that a lot of other > users of -rt feel the same way. > > -- Steve > > > _______________________________________________ > robustmutexes mailing list > robustmutexes@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/robustmutexes ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 21:55 ` david singleton @ 2005-12-20 22:56 ` Esben Nielsen 2005-12-20 23:12 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2005-12-20 22:56 UTC (permalink / raw) To: david singleton; +Cc: Steven Rostedt, robustmutexes, linux-kernel, Ingo Molnar On Tue, 20 Dec 2005, david singleton wrote: > > On Dec 20, 2005, at 1:20 PM, Steven Rostedt wrote: > > > On Tue, 2005-12-20 at 21:42 +0100, Esben Nielsen wrote: > > > >>> > >> This is ok for kernel mutexes, which are supposed not to cause > >> deadlocks. > >> But user space deadlocks must not cause kernel deadlocks. Therefore > >> the > >> robust futex code _must_ be fixed. > > > > And it should be fixed in the futex side. Not in rt.c. > > Yes. I believe you are right about this. The bad news is POSIX > specifies > that an app that calls lock_mutex twice on the same mutex hangs. > EINVAL or EWOULDBLOCK are not POSIX compliant. > > Let me think on this a while and see if I can come up with an idea that > will let the app hang itself (and be POSIX compliant) and not hang the > kernel. > > Perhaps I'll just detect it and hang them on a waitqueue waiting > for the user to get impatient and hit control C . . . . > The same lock taken twice is just a special case of deadlocking. It would be very hard to check for the general case in the futex code without "fixing" the rt_mutex. Not that the rt_mutex code is broken - it just doesn't handle deadlocks very well as it wasn't supposed to. But as the futex indirectly exposes the rt_mutex to userspace it becomes a problem. The only _hack_ I can see is to force all robust futex calls to go through one global lock to prevent the futex deadlocks becomming rt_mutex deadlocks which again can turn into spin-lock deadlocks. I instead argue for altering the premisses for the rt_mutex such they can handle deadlocks without turning them into spin-lock deadlocks blocking the whole system. Then a futex deadlock will become a rt_mutex deadlock which can be handled. Esben > > > David > > > > >> > >>> The benefit of this locking order is that we got rid of the global > >>> pi_lock, and that was worth the problems you face today. > >>> > >> > >> I believe this problem can be solved in the pi_lock code - but it will > >> require quite a bit of recoding. I started on it a little while ago > >> but > >> didn't at all get the time to get into anything to even compile :-( > >> I don't have time to finish any code at all but I guess I can plant an > >> the idea instead: > > > > I removed the global pi_lock in two sleepless days, and it was all on > > the theory that the locks themselves would not deadlock. That was the > > only sanity I was able to hang on to. That was complex enough, and > > very > > scary to get right (scary since it _had_ to be done right). And it was > > complex enough to keep a highly skilled programmer living in Hungary > > from doing it himself (not to say he couldn't do it, just complex > > enough > > for him to put it off for quite some time). And one must also worry > > about the BKL which is a separate beast all together. > > > > So making it any more complex is IMO out of the question. > > > >> > >> When resolving the mutex chain (task A locks mutex 1 owned by B > >> blocked > >> on 2 owned by C etc) for PI boosting and also when finding deadlocks, > >> release _all_ locks before going to the next step in the chain. Use > >> get_task_struct on the next task in the chain, release the locks, > >> take the pi_locks in a fixed order (sort by address forinstance), do > >> the > >> PI boosting, get_task_struct on the next lock, release all the locks, > >> do > >> the put_task_struct() on the previous task etc. > >> This a lot more expensive approach as it involves double as many > >> spinlock > >> operations and get/put_task_struct() calls , but it has the added > >> benifit > >> of reducing the overall system latency for long locking chains as > >> there > >> are spots where interrupts and preemption will be enabled for each > >> step in > >> the chain. Remember also that deep locking chains should be > >> considered an > >> exeption so the code doesn't have to be optimal wrt. performance. > > > > The long lock holding is only by the lock being grabbed and the owner > > grabbing it. All other locks don't need to be held for long periods of > > time. There's lots of issues if you release these two locks. How do > > you > > deal with the mutex being released while going up the chain? > > > >> > >> I added my feeble attempts to implement this below. I have no chance > >> of > >> ever getting time finish it :-( > > > > I doubt they were feeble, but just proof that this approach is far too > > complex. As I said, if you don't want futex to deadlock the kernel, > > the > > API for futex should have deadlocking checks, since the only way this > > can deadlock the system, is if two threads are in the kernel at the > > same > > time. > > > > Also, the ones who are paying me to help out the -rt kernel, don't care > > if we let the user side deadlock the system. They deliver a complete > > package, kernel and user apps, and nothing else is to be running on the > > system. This means that if the user app deadlocks, it doesn't matter > > if > > the kernel deadlocks or not, because the deadlocking of the user app > > means the system has failed. And I have a feeling that a lot of other > > users of -rt feel the same way. > > > > -- Steve > > > > > > _______________________________________________ > > robustmutexes mailing list > > robustmutexes@lists.osdl.org > > https://lists.osdl.org/mailman/listinfo/robustmutexes > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 22:56 ` Esben Nielsen @ 2005-12-20 23:12 ` Steven Rostedt 2005-12-20 23:55 ` Esben Nielsen 0 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2005-12-20 23:12 UTC (permalink / raw) To: Esben Nielsen; +Cc: david singleton, robustmutexes, linux-kernel, Ingo Molnar On Tue, 20 Dec 2005, Esben Nielsen wrote: > > > > The same lock taken twice is just a special case of deadlocking. It would > be very hard to check for the general case in the futex code without > "fixing" the rt_mutex. Not that the rt_mutex code is broken - it just > doesn't handle deadlocks very well as it wasn't supposed to. But as the > futex indirectly exposes the rt_mutex to userspace it becomes a problem. > > The only _hack_ I can see is to force all robust futex calls to go through > one global lock to prevent the futex deadlocks becomming rt_mutex > deadlocks which again can turn into spin-lock deadlocks. > > I instead argue for altering the premisses for the rt_mutex such > they can handle deadlocks without turning them into spin-lock deadlocks > blocking the whole system. Then a futex deadlock will become a rt_mutex > deadlock which can be handled. > For the type of deadlock you are talking about is the following: P1 -- grabs futex A (no system call) P2 -- grabs futex B (no system call) P1 -- tries to grab futex B (system call to block and boost P2) But holds no other kernel rt_mutex! P2 -- tries to grab futex A (system call to block and boost P1) spinning deadlock here, So, before P2 blocks on P1, can there be a circular check t see if this is a deadlock. You don't need to worry about other kernel rt_mutexes, you only need to worry about blocked process. Is this feasible? -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 23:12 ` Steven Rostedt @ 2005-12-20 23:55 ` Esben Nielsen 2005-12-22 4:37 ` david singleton 0 siblings, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2005-12-20 23:55 UTC (permalink / raw) To: Steven Rostedt; +Cc: david singleton, robustmutexes, linux-kernel, Ingo Molnar On Tue, 20 Dec 2005, Steven Rostedt wrote: > > On Tue, 20 Dec 2005, Esben Nielsen wrote: > > > > > > > > The same lock taken twice is just a special case of deadlocking. It would > > be very hard to check for the general case in the futex code without > > "fixing" the rt_mutex. Not that the rt_mutex code is broken - it just > > doesn't handle deadlocks very well as it wasn't supposed to. But as the > > futex indirectly exposes the rt_mutex to userspace it becomes a problem. > > > > The only _hack_ I can see is to force all robust futex calls to go through > > one global lock to prevent the futex deadlocks becomming rt_mutex > > deadlocks which again can turn into spin-lock deadlocks. > > > > I instead argue for altering the premisses for the rt_mutex such > > they can handle deadlocks without turning them into spin-lock deadlocks > > blocking the whole system. Then a futex deadlock will become a rt_mutex > > deadlock which can be handled. > > > > For the type of deadlock you are talking about is the following: > > P1 -- grabs futex A (no system call) > P2 -- grabs futex B (no system call) > > P1 -- tries to grab futex B (system call to block and boost P2) > But holds no other kernel rt_mutex! > P2 -- tries to grab futex A (system call to block and boost P1) > spinning deadlock here, > > So, before P2 blocks on P1, can there be a circular check t see if this is > a deadlock. You don't need to worry about other kernel rt_mutexes, you > only need to worry about blocked process. > > Is this feasible? Ofcourse it is - but it the exact same kind of traversal you do in the rt_mutex part to resolve the PI boosting. Thus by making the futex code do this by it's own you essentially just move the complexity in there and make the total more complex. Notice, that the futex code can't rely on user-space flag - it can't be trusted for this (opening a local DOS attack). So it has to rely on the rt_mutex structure to do this - and therefore also the locks in there to protect against simultanious unlocking. Esben > > -- Steve > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 23:55 ` Esben Nielsen @ 2005-12-22 4:37 ` david singleton 0 siblings, 0 replies; 37+ messages in thread From: david singleton @ 2005-12-22 4:37 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, Steven Rostedt, linux-kernel, robustmutexes On Dec 20, 2005, at 3:55 PM, Esben Nielsen wrote: > On Tue, 20 Dec 2005, Steven Rostedt wrote: > >> >> On Tue, 20 Dec 2005, Esben Nielsen wrote: >> >>>> >>> >>> The same lock taken twice is just a special case of deadlocking. It >>> would >>> be very hard to check for the general case in the futex code without >>> "fixing" the rt_mutex. Not that the rt_mutex code is broken - it just >>> doesn't handle deadlocks very well as it wasn't supposed to. But as >>> the >>> futex indirectly exposes the rt_mutex to userspace it becomes a >>> problem. >>> >>> The only _hack_ I can see is to force all robust futex calls to go >>> through >>> one global lock to prevent the futex deadlocks becomming rt_mutex >>> deadlocks which again can turn into spin-lock deadlocks. >>> >>> I instead argue for altering the premisses for the rt_mutex such >>> they can handle deadlocks without turning them into spin-lock >>> deadlocks >>> blocking the whole system. Then a futex deadlock will become a >>> rt_mutex >>> deadlock which can be handled. >>> >> >> For the type of deadlock you are talking about is the following: >> >> P1 -- grabs futex A (no system call) >> P2 -- grabs futex B (no system call) >> >> P1 -- tries to grab futex B (system call to block and boost P2) >> But holds no other kernel rt_mutex! >> P2 -- tries to grab futex A (system call to block and boost P1) >> spinning deadlock here, >> >> So, before P2 blocks on P1, can there be a circular check t see if >> this is >> a deadlock. You don't need to worry about other kernel rt_mutexes, >> you >> only need to worry about blocked process. >> >> Is this feasible? > Ofcourse it is - but it the exact same kind of traversal you do in the > rt_mutex part to resolve the PI boosting. Thus by making the futex > code do > this by it's own you essentially just move the complexity in there and > make the total more complex. Notice, that the futex code can't rely on > user-space flag - it can't be trusted for this (opening a local DOS > attack). So it has to rely on the rt_mutex structure to do this - and > therefore also the locks in there to protect against simultanious > unlocking. Correct. And it's one of the reasons why I chose the rt_mutex to back the pthread_mutex. First we add robustness and then with the rt_mutex we get: 1) priority queuing. Tasks blocking on pthread_mutexes are queued in priority order, and the highest priority task is the first to be woken, with or without priority inheritance. 2) priority inheritance, if so desired. 3) With DEBUG_DEADLOCKS on we get deadlock detection. I was hoping people would use the deadlock detection to fix incorrect apps, not file bugs against the kernel. The real difficulty lies with POSIX stating that a non-recursive mutex will hang if locked twice. I would prefer to return -EWOULDDEADLOCK and not hang the app or the kernel, and we could in the case of robust mutexes, but POSIX priority inheriting mutexes could not, and be POSIX compliant. It would be simple to detect a simple deadlock, it's much more difficult to detect circular deadlocks, which is why I left it up to the rt_mutex's deadlock detect code. I understand the position that misbehaved apps should not be able to hang the kernel though. And this is where I ask two things: 1) The really smart people helping with a brilliant solution. 2) Anyone has a nice 4-way or 8-way for debugging purposes? I think I've pretty much exhausted UP debugging. David > >> >> -- Steve >> > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 21:20 ` Steven Rostedt 2005-12-20 21:55 ` david singleton @ 2005-12-20 22:43 ` Esben Nielsen 2005-12-20 22:59 ` Steven Rostedt 1 sibling, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2005-12-20 22:43 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, robustmutexes, Ingo Molnar, Dinakar Guniguntala On Tue, 20 Dec 2005, Steven Rostedt wrote: > On Tue, 2005-12-20 at 21:42 +0100, Esben Nielsen wrote: > > > > > > This is ok for kernel mutexes, which are supposed not to cause deadlocks. > > But user space deadlocks must not cause kernel deadlocks. Therefore the > > robust futex code _must_ be fixed. > > And it should be fixed in the futex side. Not in rt.c. > > > > > > The benefit of this locking order is that we got rid of the global > > > pi_lock, and that was worth the problems you face today. > > > > > > > I believe this problem can be solved in the pi_lock code - but it will > > require quite a bit of recoding. I started on it a little while ago but > > didn't at all get the time to get into anything to even compile :-( > > I don't have time to finish any code at all but I guess I can plant an > > the idea instead: > > I removed the global pi_lock in two sleepless days, and it was all on > the theory that the locks themselves would not deadlock. That was the > only sanity I was able to hang on to. That was complex enough, and very > scary to get right (scary since it _had_ to be done right). Gosh I wish I had such nights available.... :-( And then some equipment to test the result on. > And it was > complex enough to keep a highly skilled programmer living in Hungary > from doing it himself (not to say he couldn't do it, just complex enough > for him to put it off for quite some time). My guess is that he focuses on other areas. The global pi_lock worked - it just didn't scale. > And one must also worry > about the BKL which is a separate beast all together. The BKL is the scary point from my point of view. The rest of it is really not very Linux specific and is really just elaborating the textbook implementations of mutexes. > > So making it any more complex is IMO out of the question. Well the lock grabbing was making it very complex, too :-) > > > > > When resolving the mutex chain (task A locks mutex 1 owned by B blocked > > on 2 owned by C etc) for PI boosting and also when finding deadlocks, > > release _all_ locks before going to the next step in the chain. Use > > get_task_struct on the next task in the chain, release the locks, > > take the pi_locks in a fixed order (sort by address forinstance), do the > > PI boosting, get_task_struct on the next lock, release all the locks, do > > the put_task_struct() on the previous task etc. > > This a lot more expensive approach as it involves double as many spinlock > > operations and get/put_task_struct() calls , but it has the added benifit > > of reducing the overall system latency for long locking chains as there > > are spots where interrupts and preemption will be enabled for each step in > > the chain. Remember also that deep locking chains should be considered an > > exeption so the code doesn't have to be optimal wrt. performance. > > The long lock holding is only by the lock being grabbed and the owner > grabbing it. All other locks don't need to be held for long periods of > time. Preemption is disabled all along because you always hold at least one spinlock. > There's lots of issues if you release these two locks. How do you > deal with the mutex being released while going up the chain? > You have to recheck your conditions again once you have retaken the necesary locks. I do that in the code below. > > > > I added my feeble attempts to implement this below. I have no chance of > > ever getting time finish it :-( > > I doubt they were feeble, but just proof that this approach is far too > complex. As I said, if you don't want futex to deadlock the kernel, the > API for futex should have deadlocking checks, since the only way this > can deadlock the system, is if two threads are in the kernel at the same > time. That that code needs to traverse the locks. The PI code need to traverse the locks. It would be simpler to do it in one go... Ofcourse, all robust futex calls could take one global mutex akin to the old pi_lock you removed, to fix it now, but that is a hack. > > Also, the ones who are paying me to help out the -rt kernel, don't care > if we let the user side deadlock the system. They deliver a complete > package, kernel and user apps, and nothing else is to be running on the > system. This means that if the user app deadlocks, it doesn't matter if > the kernel deadlocks or not, because the deadlocking of the user app > means the system has failed. And I have a feeling that a lot of other > users of -rt feel the same way. 1) If PREEMPT_RT ever goes into the main line kernel this kind of aproach will _not_ work. 2) A for me for using Linux 2.6-rt over some specialized RTOS would be that I could have my RT task running without any risk of being destroyed by even buggy normal tasks. What if you run, say apache, compiled with robust futexes and it deadlocks? Normally I would simply restart apache. Customers could easily accept that I restart apache once in a while - they wouldn't even notice. But they can't accept that the system should reboot. Esben > > -- Steve > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 22:43 ` Esben Nielsen @ 2005-12-20 22:59 ` Steven Rostedt 0 siblings, 0 replies; 37+ messages in thread From: Steven Rostedt @ 2005-12-20 22:59 UTC (permalink / raw) To: Esben Nielsen Cc: linux-kernel, robustmutexes, Ingo Molnar, Dinakar Guniguntala On Tue, 20 Dec 2005, Esben Nielsen wrote: > On Tue, 20 Dec 2005, Steven Rostedt wrote: > > On Tue, 2005-12-20 at 21:42 +0100, Esben Nielsen wrote: > > I removed the global pi_lock in two sleepless days, and it was all on > > the theory that the locks themselves would not deadlock. That was the > > only sanity I was able to hang on to. That was complex enough, and very > > scary to get right (scary since it _had_ to be done right). > Gosh I wish I had such nights available.... :-( And then some equipment to > test the result on. My wife was nice enough to handle the kids during that time ;) > > > And it was > > complex enough to keep a highly skilled programmer living in Hungary > > from doing it himself (not to say he couldn't do it, just complex enough > > for him to put it off for quite some time). > My guess is that he focuses on other areas. The global pi_lock worked - it > just didn't scale. True. > > > And one must also worry > > about the BKL which is a separate beast all together. > > The BKL is the scary point from my point of view. The rest of it is really > not very Linux specific and is really just elaborating the textbook > implementations of mutexes. Yeah, but I always found that a lot of text book implementations don't apply very well to the real world without some sort of hack. > > > > > So making it any more complex is IMO out of the question. > > Well the lock grabbing was making it very complex, too :-) True, and that was part of my point. > > > > > > > > > When resolving the mutex chain (task A locks mutex 1 owned by B blocked > > > on 2 owned by C etc) for PI boosting and also when finding deadlocks, > > > release _all_ locks before going to the next step in the chain. Use > > > get_task_struct on the next task in the chain, release the locks, > > > take the pi_locks in a fixed order (sort by address forinstance), do the > > > PI boosting, get_task_struct on the next lock, release all the locks, do > > > the put_task_struct() on the previous task etc. > > > This a lot more expensive approach as it involves double as many spinlock > > > operations and get/put_task_struct() calls , but it has the added benifit > > > of reducing the overall system latency for long locking chains as there > > > are spots where interrupts and preemption will be enabled for each step in > > > the chain. Remember also that deep locking chains should be considered an > > > exeption so the code doesn't have to be optimal wrt. performance. > > > > The long lock holding is only by the lock being grabbed and the owner > > grabbing it. All other locks don't need to be held for long periods of > > time. > > Preemption is disabled all along because you always hold at least one > spinlock. But you still need to worry about SMP. > > > There's lots of issues if you release these two locks. How do you > > deal with the mutex being released while going up the chain? > > > > You have to recheck your conditions again once you have retaken the > necesary locks. I do that in the code below. Sorry, I didn't quite take the time to thoroughly go through your code. I really should before responding, but I'm a tad busy at the moment, which I'm sure you understand. > > > > > > > I added my feeble attempts to implement this below. I have no chance of > > > ever getting time finish it :-( > > > > I doubt they were feeble, but just proof that this approach is far too > > complex. As I said, if you don't want futex to deadlock the kernel, the > > API for futex should have deadlocking checks, since the only way this > > can deadlock the system, is if two threads are in the kernel at the same > > time. > > That that code needs to traverse the locks. The PI code need to traverse > the locks. It would be simpler to do it in one go... Ofcourse, all > robust futex calls could take one global mutex akin to the old pi_lock you > removed, to fix it now, but that is a hack. > > > > > Also, the ones who are paying me to help out the -rt kernel, don't care > > if we let the user side deadlock the system. They deliver a complete > > package, kernel and user apps, and nothing else is to be running on the > > system. This means that if the user app deadlocks, it doesn't matter if > > the kernel deadlocks or not, because the deadlocking of the user app > > means the system has failed. And I have a feeling that a lot of other > > users of -rt feel the same way. > > 1) If PREEMPT_RT ever goes into the main line kernel this kind of aproach > will _not_ work. :) I know! That last paragraph came out when I was in the middle of debugging my kernel. It suddenly dawned on me that the one who pays me must be satisfied first. And you got me in one of my working moments. There's a lot I do just to help out -rt that doesn't directly go along with what I'm paid to do, and if at the time I was doing that, I would not have responded as I did ;) > 2) A for me for using Linux 2.6-rt over some specialized RTOS would be > that I could have my RT task running without any risk of being destroyed > by even buggy normal tasks. > What if you run, say apache, compiled with robust futexes and it > deadlocks? Normally I would simply restart apache. Customers could easily > accept that I restart apache once in a while - they wouldn't even notice. > But they can't accept that the system should reboot. > Point taken. But as far as I'm concerned, the code to deal with this really needs to be satisfied by futex. I'm not saying it's "not my problem", I would gladly modify rt.c to help. But I would not sacrifice performance, or add much more complexity to do so. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 15:50 ` Dinakar Guniguntala 2005-12-20 17:43 ` Esben Nielsen @ 2006-01-03 1:54 ` david singleton 2006-01-05 2:14 ` david singleton 2 siblings, 0 replies; 37+ messages in thread From: david singleton @ 2006-01-03 1:54 UTC (permalink / raw) To: dino; +Cc: Ingo Molnar, linux-kernel, robustmutexes Dinakar, can you try patch-2.6.15-rc7-rt3-rf1 on http://source.mvista.com/~dsingleton/ and see if it works for your tests? This new patch creates a 'futex_deadlock' semaphore that we hang applications that are deadlocking themselves. This method will only hang the application, not the system, as no other locks are held, like the mmap_sem, just the futex_deadlock semaphore. NOTE: for pthread_mutexes that are robust but NOT POSIX priority inheriting I return -EWOULDDEADLOCK, since there is no POSIX specfication for robust pthread_mutexes yet. POSIX PI pthread_mutexes will hang on the futex_deadlock semaphore. Let me know how it works. David On Dec 20, 2005, at 7:50 AM, Dinakar Guniguntala wrote: > On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: >> >> hm, i'm looking at -rf4 - these changes look fishy: >> >> - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >> + if (current != lock_owner(lock)->task) >> + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >> >> why is this done? >> > > Ingo, this is to prevent a kernel hang due to application error. > > Basically when an application does a pthread_mutex_lock twice on a > _nonrecursive_ mutex with robust/PI attributes the whole system hangs. > Ofcourse the application clearly should not be doing anything like > that, but it should not end up hanging the system either > > -Dinakar > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2005-12-20 15:50 ` Dinakar Guniguntala 2005-12-20 17:43 ` Esben Nielsen 2006-01-03 1:54 ` david singleton @ 2006-01-05 2:14 ` david singleton 2006-01-05 9:43 ` Esben Nielsen 2 siblings, 1 reply; 37+ messages in thread From: david singleton @ 2006-01-05 2:14 UTC (permalink / raw) To: dino; +Cc: Ingo Molnar, linux-kernel, robustmutexes Dinakar, I've got a more complete patch for deadlock detection for robust futexes. The patch is at http://source.mvista.com/~dsingleton/patch-2.6.15-rc7-rt3-rf2 This patch handles pthread_mutex deadlocks in two ways: 1) POSIX states that non-recursive pthread_mutexes hang if locked twice. This patch hangs the deadlocking thread on a waitqueue. It releases all other kernel locks, like the mmap_sem and robust_sem before waiting on the waitqueue so as not to hang the kernel. 2) pthread_mutexes that are only robust, not PRIO_INHERIT, do not hang. They have a new error code returned to them, -EWOULDDEADLOCK. Since there is no POSIX specification for robust pthread_mutexes yet, returning EWOULDDEADLOCK to a robust mutex is more in the spirit of robustness. For robust pthread_mutexes we clean up if the holding thread dies and we return EWOULDDEADLOCK to inform the application that it is trying to deadlock itself. The patch handles both simple and circular deadlocks. This is something I've been wanting to do for a while, export deadlock detection to all flavors of kernels. The patch provides the correct deadlock behavior while not hanging the system. It's also easier to see if a POSIX compliant app has deadlocked itself. the 'ps' command will show that the wait channel of a deadlocked application is waiting at 'futex_deadlock'. Let me know if it passes all your tests. David On Dec 20, 2005, at 7:50 AM, Dinakar Guniguntala wrote: > On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: >> >> hm, i'm looking at -rf4 - these changes look fishy: >> >> - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >> + if (current != lock_owner(lock)->task) >> + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >> >> why is this done? >> > > Ingo, this is to prevent a kernel hang due to application error. > > Basically when an application does a pthread_mutex_lock twice on a > _nonrecursive_ mutex with robust/PI attributes the whole system hangs. > Ofcourse the application clearly should not be doing anything like > that, but it should not end up hanging the system either > > -Dinakar > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2006-01-05 2:14 ` david singleton @ 2006-01-05 9:43 ` Esben Nielsen 2006-01-05 17:11 ` david singleton 0 siblings, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2006-01-05 9:43 UTC (permalink / raw) To: david singleton; +Cc: dino, Ingo Molnar, linux-kernel, robustmutexes On Wed, 4 Jan 2006, david singleton wrote: > Dinakar, > I've got a more complete patch for deadlock detection for robust > futexes. > > The patch is at > http://source.mvista.com/~dsingleton/patch-2.6.15-rc7-rt3-rf2 > > This patch handles pthread_mutex deadlocks in two ways: > > 1) POSIX states that non-recursive pthread_mutexes hang if locked twice. > This patch hangs the deadlocking thread on a waitqueue. It releases > all other kernel locks, like the mmap_sem and robust_sem before waiting > on the waitqueue so as not to hang the kernel. > > 2) pthread_mutexes that are only robust, not PRIO_INHERIT, do not > hang. They have a new error code returned to them, -EWOULDDEADLOCK. > Since there is no POSIX specification for robust pthread_mutexes yet, > returning EWOULDDEADLOCK to a robust mutex is more in the spirit > of robustness. For robust pthread_mutexes we clean up if the holding > thread dies and we return EWOULDDEADLOCK to inform the > application that it is trying to deadlock itself. > > The patch handles both simple and circular deadlocks. This is > something I've been wanting to do for a while, export deadlock > detection to all flavors of kernels. The patch provides the correct > deadlock behavior while not hanging the system. Have you tested this on SMP - without CONFIG_DEBUG_DEADLOCKS? As far as I can see check_futex_deadlock() can easily crash the system as there are _no_ spinlocks protecting access to task->blocked_on, task->blocked_on->lock etc. As far as I can see the tasks and locks structures can be freed at any point while you are in check_futex_deadlock(). Furthermore, check_futex_deadlock is recursive. You can easily from userspace trick the system into a kernel stack overflow by making a many nested futexes. I am working on a new way to do priority inheritance for nested locks in rt_mutex such you do not risc deadlocking the system on raw-spinlocks when you have a rt_mutex deadlock. But it wont have deadlock detection without CONFIG_DEBUG_DEADLOCKS. On the other hand it would be possible to make a deadlock scanner finding deadlocks in the system after they have happened. With a suitable /proc interface it could even be done in userspace. My patch to the rt_mutex is far from finished. I haven't even compiled a kernel with it yet. I spend the little time I have between my family goes to bed and I simply have to go to bed myself writing a unittest framework for the rt_mutex and have both the original and the patched rt_mutex parsing all my tests. But I need more tests to hammer out the details about task->state forinstance. If anyone is interrested I would be happy to send what I got right now. Esben > > It's also easier to see if a POSIX compliant app has deadlocked itself. > the 'ps' command will show that the wait channel of a deadlocked > application is waiting at 'futex_deadlock'. > > Let me know if it passes all your tests. > > David > > > > > On Dec 20, 2005, at 7:50 AM, Dinakar Guniguntala wrote: > > > On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: > >> > >> hm, i'm looking at -rf4 - these changes look fishy: > >> > >> - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > >> + if (current != lock_owner(lock)->task) > >> + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > >> > >> why is this done? > >> > > > > Ingo, this is to prevent a kernel hang due to application error. > > > > Basically when an application does a pthread_mutex_lock twice on a > > _nonrecursive_ mutex with robust/PI attributes the whole system hangs. > > Ofcourse the application clearly should not be doing anything like > > that, but it should not end up hanging the system either > > > > -Dinakar > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2006-01-05 9:43 ` Esben Nielsen @ 2006-01-05 17:11 ` david singleton 2006-01-05 17:47 ` Esben Nielsen 0 siblings, 1 reply; 37+ messages in thread From: david singleton @ 2006-01-05 17:11 UTC (permalink / raw) To: Esben Nielsen; +Cc: linux-kernel, dino, robustmutexes, Ingo Molnar On Jan 5, 2006, at 1:43 AM, Esben Nielsen wrote: > > > On Wed, 4 Jan 2006, david singleton wrote: > >> Dinakar, >> I've got a more complete patch for deadlock detection for robust >> futexes. >> >> The patch is at >> http://source.mvista.com/~dsingleton/patch-2.6.15-rc7-rt3-rf2 >> >> This patch handles pthread_mutex deadlocks in two ways: >> >> 1) POSIX states that non-recursive pthread_mutexes hang if locked >> twice. >> This patch hangs the deadlocking thread on a waitqueue. It releases >> all other kernel locks, like the mmap_sem and robust_sem before >> waiting >> on the waitqueue so as not to hang the kernel. >> >> 2) pthread_mutexes that are only robust, not PRIO_INHERIT, do not >> hang. They have a new error code returned to them, -EWOULDDEADLOCK. >> Since there is no POSIX specification for robust pthread_mutexes yet, >> returning EWOULDDEADLOCK to a robust mutex is more in the spirit >> of robustness. For robust pthread_mutexes we clean up if the holding >> thread dies and we return EWOULDDEADLOCK to inform the >> application that it is trying to deadlock itself. >> >> The patch handles both simple and circular deadlocks. This is >> something I've been wanting to do for a while, export deadlock >> detection to all flavors of kernels. The patch provides the correct >> deadlock behavior while not hanging the system. > > > Have you tested this on SMP - without CONFIG_DEBUG_DEADLOCKS? As far > as I > can see check_futex_deadlock() can easily crash the system as there are > _no_ spinlocks protecting access to task->blocked_on, > task->blocked_on->lock etc. As far as I can see the tasks and locks > structures can be freed at any point while you are in > check_futex_deadlock(). > You are right. I didn't extract enough of the deadlock check routines to get the locking. I'll fix that. > Furthermore, check_futex_deadlock is recursive. You can easily from > userspace trick the system into a kernel stack overflow by making a > many > nested futexes. The rt deadlock check is also recursive, but it stops at a depth of 20, deciding something must be corrupt to have a task blocked on more than 20 locks. We should also set a limit. We can either just hang an ill-behaved app on the waitqueue or return an error code to the application. Any suggestions on which would be best, hang an illbehaved app or just return it an error? David > > I am working on a new way to do priority inheritance for nested locks > in > rt_mutex such you do not risc deadlocking the system on raw-spinlocks > when you have a rt_mutex deadlock. But it wont have deadlock detection > without > CONFIG_DEBUG_DEADLOCKS. On the other hand it would be possible to make > a > deadlock scanner finding deadlocks in the system after they have > happened. > With a suitable /proc interface it could even be done in userspace. > > My patch to the rt_mutex is far from finished. I haven't even compiled > a > kernel with it yet. I spend the little time I have between my > family goes to bed and I simply have to go to bed myself writing a > unittest framework for the rt_mutex and have both the original and the > patched rt_mutex parsing all my tests. But I need more tests to hammer > out the details about task->state forinstance. If anyone is > interrested I > would be happy to send what I got right now. > > Esben > >> >> It's also easier to see if a POSIX compliant app has deadlocked >> itself. >> the 'ps' command will show that the wait channel of a deadlocked >> application is waiting at 'futex_deadlock'. >> >> Let me know if it passes all your tests. >> >> David >> >> >> >> >> On Dec 20, 2005, at 7:50 AM, Dinakar Guniguntala wrote: >> >>> On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: >>>> >>>> hm, i'm looking at -rf4 - these changes look fishy: >>>> >>>> - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >>>> + if (current != lock_owner(lock)->task) >>>> + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >>>> >>>> why is this done? >>>> >>> >>> Ingo, this is to prevent a kernel hang due to application error. >>> >>> Basically when an application does a pthread_mutex_lock twice on a >>> _nonrecursive_ mutex with robust/PI attributes the whole system >>> hangs. >>> Ofcourse the application clearly should not be doing anything like >>> that, but it should not end up hanging the system either >>> >>> -Dinakar >>> >> >> - >> To unsubscribe from this list: send the line "unsubscribe >> linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2006-01-05 17:11 ` david singleton @ 2006-01-05 17:47 ` Esben Nielsen 2006-01-05 18:26 ` david singleton 2006-01-07 2:40 ` robust futex deadlock detection patch david singleton 0 siblings, 2 replies; 37+ messages in thread From: Esben Nielsen @ 2006-01-05 17:47 UTC (permalink / raw) To: david singleton; +Cc: linux-kernel, dino, robustmutexes, Ingo Molnar On Thu, 5 Jan 2006, david singleton wrote: > > On Jan 5, 2006, at 1:43 AM, Esben Nielsen wrote: > > > > > > > On Wed, 4 Jan 2006, david singleton wrote: > > > >> Dinakar, > >> I've got a more complete patch for deadlock detection for robust > >> futexes. > >> > >> The patch is at > >> http://source.mvista.com/~dsingleton/patch-2.6.15-rc7-rt3-rf2 > >> > >> This patch handles pthread_mutex deadlocks in two ways: > >> > >> 1) POSIX states that non-recursive pthread_mutexes hang if locked > >> twice. > >> This patch hangs the deadlocking thread on a waitqueue. It releases > >> all other kernel locks, like the mmap_sem and robust_sem before > >> waiting > >> on the waitqueue so as not to hang the kernel. > >> > >> 2) pthread_mutexes that are only robust, not PRIO_INHERIT, do not > >> hang. They have a new error code returned to them, -EWOULDDEADLOCK. > >> Since there is no POSIX specification for robust pthread_mutexes yet, > >> returning EWOULDDEADLOCK to a robust mutex is more in the spirit > >> of robustness. For robust pthread_mutexes we clean up if the holding > >> thread dies and we return EWOULDDEADLOCK to inform the > >> application that it is trying to deadlock itself. > >> > >> The patch handles both simple and circular deadlocks. This is > >> something I've been wanting to do for a while, export deadlock > >> detection to all flavors of kernels. The patch provides the correct > >> deadlock behavior while not hanging the system. > > > > > > Have you tested this on SMP - without CONFIG_DEBUG_DEADLOCKS? As far > > as I > > can see check_futex_deadlock() can easily crash the system as there are > > _no_ spinlocks protecting access to task->blocked_on, > > task->blocked_on->lock etc. As far as I can see the tasks and locks > > structures can be freed at any point while you are in > > check_futex_deadlock(). > > > > You are right. I didn't extract enough of the deadlock check routines > to get > the locking. I'll fix that. > > > Furthermore, check_futex_deadlock is recursive. You can easily from > > userspace trick the system into a kernel stack overflow by making a > > many > > nested futexes. > > The rt deadlock check is also recursive, but it stops at a depth of 20, > deciding something must be corrupt to have a task blocked on more > than 20 locks. > > We should also set a limit. We can either just hang an ill-behaved > app on the waitqueue or return an error code to the application. > Any suggestions on which would be best, hang an illbehaved app > or just return it an error? > Per default make the tasks block. You can always make the return code an option on each futex, but it has to be off as default. Here is why I think so: Having an return code would require you do the deadlock detection "up front" in the down() operation. That takes a lot of CPU cycles .Ofcourse, if you return an error code, the application could use the info for something constructive, but I am afraid must applications wont do anything constructive about it anyway (i.e. such the application continues to run) - such error handling code would be hard to write. What is needed in most application is that the stuff simply deadlocks with the tasks blocked on the various locks. Then you can go in and trace the locks "postmortem". With the current setup, where you deadlock detection has to be performed "up front" because the rt_mutex can make spinlock-deadlocks, the error code will be a natural thing. But when rt_mutex is "fixed" or replaced with something else, this feature will force the kernel to run deadlock detection "up front" even though it isn't used for anything usefull. Esben > David > > > > > I am working on a new way to do priority inheritance for nested locks > > in > > rt_mutex such you do not risc deadlocking the system on raw-spinlocks > > when you have a rt_mutex deadlock. But it wont have deadlock detection > > without > > CONFIG_DEBUG_DEADLOCKS. On the other hand it would be possible to make > > a > > deadlock scanner finding deadlocks in the system after they have > > happened. > > With a suitable /proc interface it could even be done in userspace. > > > > My patch to the rt_mutex is far from finished. I haven't even compiled > > a > > kernel with it yet. I spend the little time I have between my > > family goes to bed and I simply have to go to bed myself writing a > > unittest framework for the rt_mutex and have both the original and the > > patched rt_mutex parsing all my tests. But I need more tests to hammer > > out the details about task->state forinstance. If anyone is > > interrested I > > would be happy to send what I got right now. > > > > Esben > > > >> > >> It's also easier to see if a POSIX compliant app has deadlocked > >> itself. > >> the 'ps' command will show that the wait channel of a deadlocked > >> application is waiting at 'futex_deadlock'. > >> > >> Let me know if it passes all your tests. > >> > >> David > >> > >> > >> > >> > >> On Dec 20, 2005, at 7:50 AM, Dinakar Guniguntala wrote: > >> > >>> On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: > >>>> > >>>> hm, i'm looking at -rf4 - these changes look fishy: > >>>> > >>>> - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > >>>> + if (current != lock_owner(lock)->task) > >>>> + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); > >>>> > >>>> why is this done? > >>>> > >>> > >>> Ingo, this is to prevent a kernel hang due to application error. > >>> > >>> Basically when an application does a pthread_mutex_lock twice on a > >>> _nonrecursive_ mutex with robust/PI attributes the whole system > >>> hangs. > >>> Ofcourse the application clearly should not be doing anything like > >>> that, but it should not end up hanging the system either > >>> > >>> -Dinakar > >>> > >> > >> - > >> To unsubscribe from this list: send the line "unsubscribe > >> linux-kernel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Please read the FAQ at http://www.tux.org/lkml/ > >> > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Recursion bug in -rt 2006-01-05 17:47 ` Esben Nielsen @ 2006-01-05 18:26 ` david singleton 2006-01-07 2:40 ` robust futex deadlock detection patch david singleton 1 sibling, 0 replies; 37+ messages in thread From: david singleton @ 2006-01-05 18:26 UTC (permalink / raw) To: Esben Nielsen; +Cc: dino, robustmutexes, linux-kernel, Ingo Molnar >> The rt deadlock check is also recursive, but it stops at a depth of >> 20, >> deciding something must be corrupt to have a task blocked on more >> than 20 locks. >> >> We should also set a limit. We can either just hang an ill-behaved >> app on the waitqueue or return an error code to the application. >> Any suggestions on which would be best, hang an illbehaved app >> or just return it an error? >> > > Per default make the tasks block. You can always make the return code > an > option on each futex, but it has to be off as default. Here is why I > think > so: > > Having an return code would require you do the deadlock detection "up > front" in the down() operation. That takes a lot of CPU cycles > .Ofcourse, > if you return an error code, the application could use the info for > something > constructive, but I am afraid must applications wont do anything > constructive > about it anyway (i.e. such the application continues to run) - such > error > handling code would be hard to write. Yes, I agree. I'll hang the app on a new waitqueue that will let the user see with 'ps' that they are hung in futex_ill_behaved waitqueue or some other name that makes it easy to see where the app is stopped. thanks. David > > What is needed in most application is that the stuff simply > deadlocks with the tasks blocked on the various locks. Then you can go > in and > trace the locks "postmortem". > > With the current setup, where you deadlock detection has to be > performed > "up front" because the rt_mutex can make spinlock-deadlocks, the error > code will be a natural thing. But when rt_mutex is "fixed" or > replaced with something else, this feature will force the kernel to > run > deadlock detection "up front" even though it isn't used for anything > usefull. > > Esben > >> David >> >>> >>> I am working on a new way to do priority inheritance for nested locks >>> in >>> rt_mutex such you do not risc deadlocking the system on raw-spinlocks >>> when you have a rt_mutex deadlock. But it wont have deadlock >>> detection >>> without >>> CONFIG_DEBUG_DEADLOCKS. On the other hand it would be possible to >>> make >>> a >>> deadlock scanner finding deadlocks in the system after they have >>> happened. >>> With a suitable /proc interface it could even be done in userspace. >>> >>> My patch to the rt_mutex is far from finished. I haven't even >>> compiled >>> a >>> kernel with it yet. I spend the little time I have between my >>> family goes to bed and I simply have to go to bed myself writing a >>> unittest framework for the rt_mutex and have both the original and >>> the >>> patched rt_mutex parsing all my tests. But I need more tests to >>> hammer >>> out the details about task->state forinstance. If anyone is >>> interrested I >>> would be happy to send what I got right now. >>> >>> Esben >>> >>>> >>>> It's also easier to see if a POSIX compliant app has deadlocked >>>> itself. >>>> the 'ps' command will show that the wait channel of a deadlocked >>>> application is waiting at 'futex_deadlock'. >>>> >>>> Let me know if it passes all your tests. >>>> >>>> David >>>> >>>> >>>> >>>> >>>> On Dec 20, 2005, at 7:50 AM, Dinakar Guniguntala wrote: >>>> >>>>> On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote: >>>>>> >>>>>> hm, i'm looking at -rf4 - these changes look fishy: >>>>>> >>>>>> - _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >>>>>> + if (current != lock_owner(lock)->task) >>>>>> + _raw_spin_lock(&lock_owner(lock)->task->pi_lock); >>>>>> >>>>>> why is this done? >>>>>> >>>>> >>>>> Ingo, this is to prevent a kernel hang due to application error. >>>>> >>>>> Basically when an application does a pthread_mutex_lock twice on a >>>>> _nonrecursive_ mutex with robust/PI attributes the whole system >>>>> hangs. >>>>> Ofcourse the application clearly should not be doing anything like >>>>> that, but it should not end up hanging the system either >>>>> >>>>> -Dinakar >>>>> >>>> >>>> - >>>> To unsubscribe from this list: send the line "unsubscribe >>>> linux-kernel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> Please read the FAQ at http://www.tux.org/lkml/ >>>> >>> >> >> - >> To unsubscribe from this list: send the line "unsubscribe >> linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > ^ permalink raw reply [flat|nested] 37+ messages in thread
* robust futex deadlock detection patch 2006-01-05 17:47 ` Esben Nielsen 2006-01-05 18:26 ` david singleton @ 2006-01-07 2:40 ` david singleton [not found] ` <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com> 2006-01-09 9:23 ` Esben Nielsen 1 sibling, 2 replies; 37+ messages in thread From: david singleton @ 2006-01-07 2:40 UTC (permalink / raw) To: Esben Nielsen; +Cc: dino, robustmutexes, linux-kernel, Ingo Molnar Here is a new patch that provides both futex deadlock detection and prevents ill-behaved and malicious apps from deadlocking the kernel through the robust futex interface. http://source.mvista.com/~dsingleton/patch-2.6.15-rt2-rf1 Deadlock detection is done 'up front' for both POSIX and robust pthread_mutexes. Non-recursive POSIX mutexes will hang if deadlocked, as defined by the POSIX spec. The wait channel they are hung on is 'futex_deadlock'. This wait channel makes it easy to spot that your POSIX app has deadlocked itself via the 'ps' command. Robust futexes will have -EDEADLK returned to them since there is no POSIX specification for robust mutexes, yet, and returning -EDEADLK is more in the spirit of robustness. Robust mutexes are cleaned up by the kernel after a thread dies and they also report to the app if it is deadlocking itself. Deadlock detection is something I have wanted to provide for both debug and production kernels for a while. It was previously available through DEBUG_DEADLOCKS. I needed to add the deadlock dection code for both production and debug kernels to prevent applications hanging the kernel. David ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com>]
* Re: robust futex deadlock detection patch [not found] ` <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com> @ 2006-01-07 19:49 ` Ulrich Drepper 0 siblings, 0 replies; 37+ messages in thread From: Ulrich Drepper @ 2006-01-07 19:49 UTC (permalink / raw) To: dsingleton; +Cc: robustmutexes, linux-kernel, Ingo Molnar [-- Attachment #1: Type: text/plain, Size: 652 bytes --] > Robust futexes will have -EDEADLK returned to them since there is no > POSIX specification for > robust mutexes, yet, and returning -EDEADLK is more in the spirit of > robustness. I disagree. Robustness is only an additional mutex attribute on top of the mutex type. I.e., there can be robust normal/error-checking/recursive mutexes. What kind it is can be told to the kernel at the time the robust mutex is registered with the kernel. This is the way I'll write up the proposal for robust mutexes for inclusion in the next POSIX revision. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: robust futex deadlock detection patch 2006-01-07 2:40 ` robust futex deadlock detection patch david singleton [not found] ` <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com> @ 2006-01-09 9:23 ` Esben Nielsen 2006-01-09 20:01 ` David Singleton 1 sibling, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2006-01-09 9:23 UTC (permalink / raw) To: david singleton; +Cc: dino, robustmutexes, linux-kernel, Ingo Molnar On Fri, 6 Jan 2006, david singleton wrote: > > Here is a new patch that provides both futex deadlock detection and > prevents ill-behaved and > malicious apps from deadlocking the kernel through the robust futex > interface. > > http://source.mvista.com/~dsingleton/patch-2.6.15-rt2-rf1 > > Deadlock detection is done 'up front' for both POSIX and robust > pthread_mutexes. Non-recursive > POSIX mutexes will hang if deadlocked, as defined by the POSIX spec. > The wait channel they > are hung on is 'futex_deadlock'. This wait channel makes it easy to > spot that your POSIX app > has deadlocked itself via the 'ps' command. > > Robust futexes will have -EDEADLK returned to them since there is no > POSIX specification for > robust mutexes, yet, and returning -EDEADLK is more in the spirit of > robustness. Robust > mutexes are cleaned up by the kernel after a thread dies and they also > report to the app if > it is deadlocking itself. > > Deadlock detection is something I have wanted to provide for both debug > and production kernels > for a while. It was previously available through DEBUG_DEADLOCKS. I > needed to add the > deadlock dection code for both production and debug kernels to prevent > applications hanging > the kernel. > I am a little bit confused when I read check_futex_deadlock(): It takes a parameter struct thread_info *ti and immediately do struct task_struct *task = ti->task. Now we have the usual pair (thread_info *ti, task_t *task) corresponding to the same process. Later on in the function you do ti = lock_owner(lock), but do not update task. Was this intented? Anyway, I can't see that you have locked the necesary raw_spin_locks. Forinstance lock_owner(lock) must be called with the lock->wait_lock taken and task->blocked_on needs task->pi_lock locked. To avoid deadlocks in all the deadlock detection you have to do the loop something like for(owner = current; owner; ) { raw_spin_lock(&owner->pi_lock); if(owner->task->blocked_on) { lock = owner->task->blocked_on->lock; raw_spin_lock(&lock->wait_lock); owner2 = lock_owner(lock); if(owner2) { get_task_struct(owner2->task); raw_spin_unlock(&lock->wait_lock) raw_spin_unlock(&owner->pi_lock); } put_task_struct(owner->task); owner = owner2; if(owner2==current) DEADLOCK } Esben > David > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: robust futex deadlock detection patch 2006-01-09 9:23 ` Esben Nielsen @ 2006-01-09 20:01 ` David Singleton 2006-01-09 20:16 ` Esben Nielsen 0 siblings, 1 reply; 37+ messages in thread From: David Singleton @ 2006-01-09 20:01 UTC (permalink / raw) To: Esben Nielsen; +Cc: dino, robustmutexes, linux-kernel, Ingo Molnar Esben Nielsen wrote: > >I am a little bit confused when I read check_futex_deadlock(): >It takes a parameter struct thread_info *ti and immediately do >struct task_struct *task = ti->task. Now we have the usual pair >(thread_info *ti, task_t *task) corresponding to the same process. Later >on in the function you do ti = lock_owner(lock), but do not update task. >Was this intented? > > Whoops. You are right. I've fixed the update to the task structure. The check_futex_deadlock() code now mirrors the existing check_deadlock() code. >Anyway, I can't see that you have locked the necesary raw_spin_locks. >Forinstance lock_owner(lock) must be called with the lock->wait_lock taken >and task->blocked_on needs task->pi_lock locked. > > Actually those locks are grabbed in the down_try_futex code. I hold both of those locks across the check for deadlocks and into __down_interruptible. Those locks need to be held for the check for deadlocks and holding them from down_try_futex to down_interruptible garuantees that a thread that enters the kernel to block on a lock will block on the lock. There is no window any more between dropping the robust_sem and mmap_sem and calling down_interruptible. This also fixed an SMP problem that Dave Carlson has been seeing on earlier patches. The new patch is at http://source.mvista.com/~dsingleton/patch-2.6.15-rt2-rf2 David >To avoid deadlocks in all the deadlock detection you have to do the loop >something like > >for(owner = current; owner; ) { > raw_spin_lock(&owner->pi_lock); > if(owner->task->blocked_on) { > lock = owner->task->blocked_on->lock; > raw_spin_lock(&lock->wait_lock); > owner2 = lock_owner(lock); > if(owner2) { > get_task_struct(owner2->task); > raw_spin_unlock(&lock->wait_lock) > raw_spin_unlock(&owner->pi_lock); > } > put_task_struct(owner->task); > owner = owner2; > if(owner2==current) DEADLOCK >} > > >Esben > > > > >>David >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >>Please read the FAQ at http://www.tux.org/lkml/ >> >> >> > > > > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: robust futex deadlock detection patch 2006-01-09 20:01 ` David Singleton @ 2006-01-09 20:16 ` Esben Nielsen 2006-01-09 21:08 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Esben Nielsen @ 2006-01-09 20:16 UTC (permalink / raw) To: David Singleton; +Cc: dino, robustmutexes, linux-kernel, Ingo Molnar On Mon, 9 Jan 2006, David Singleton wrote: > Esben Nielsen wrote: > > > > >I am a little bit confused when I read check_futex_deadlock(): > >It takes a parameter struct thread_info *ti and immediately do > >struct task_struct *task = ti->task. Now we have the usual pair > >(thread_info *ti, task_t *task) corresponding to the same process. Later > >on in the function you do ti = lock_owner(lock), but do not update task. > >Was this intented? > > > > > > Whoops. You are right. I've fixed the update to the task structure. > The check_futex_deadlock() > code now mirrors the existing check_deadlock() code. > > >Anyway, I can't see that you have locked the necesary raw_spin_locks. > >Forinstance lock_owner(lock) must be called with the lock->wait_lock taken > >and task->blocked_on needs task->pi_lock locked. > > > > > > Actually those locks are grabbed in the down_try_futex code. I hold > both of those > locks across the check for deadlocks and into __down_interruptible. > Those locks > need to be held for the check for deadlocks and holding > them from down_try_futex to down_interruptible garuantees that a thread > that enters the kernel to block on a lock will block on the lock. There > is no > window any more between dropping the robust_sem and mmap_sem > and calling down_interruptible. > You only take the spinlocks corresponding to the current lock. What about the next locks in the chain? Remember those locks might not be futexes but a lock inside the kernel, taken in system calls. I.e. the robust_sem might not protect you. If there are n locks you need to lock n lock->wait_lock and n owner->task->pi_lock as you traverse the locks. That is what I tried to sketch in the examble below. Esben > This also fixed an SMP problem that Dave Carlson has been seeing on earlier > patches. > > The new patch is at > > http://source.mvista.com/~dsingleton/patch-2.6.15-rt2-rf2 > > David > > >To avoid deadlocks in all the deadlock detection you have to do the loop > >something like > > > >for(owner = current; owner; ) { > > raw_spin_lock(&owner->pi_lock); > > if(owner->task->blocked_on) { > > lock = owner->task->blocked_on->lock; > > raw_spin_lock(&lock->wait_lock); > > owner2 = lock_owner(lock); > > if(owner2) { > > get_task_struct(owner2->task); > > raw_spin_unlock(&lock->wait_lock) > > raw_spin_unlock(&owner->pi_lock); > > } > > put_task_struct(owner->task); > > owner = owner2; > > if(owner2==current) DEADLOCK > >} > > > > > >Esben > > > > > > > > > >>David > >> > >>- > >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > >>the body of a message to majordomo@vger.kernel.org > >>More majordomo info at http://vger.kernel.org/majordomo-info.html > >>Please read the FAQ at http://www.tux.org/lkml/ > >> > >> > >> > > > > > > > > > > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: robust futex deadlock detection patch 2006-01-09 20:16 ` Esben Nielsen @ 2006-01-09 21:08 ` Steven Rostedt 2006-01-09 21:19 ` Esben Nielsen 0 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2006-01-09 21:08 UTC (permalink / raw) To: Esben Nielsen Cc: Ingo Molnar, linux-kernel, robustmutexes, dino, David Singleton On Mon, 2006-01-09 at 21:16 +0100, Esben Nielsen wrote: > You only take the spinlocks corresponding to the current lock. What about > the next locks in the chain? Remember those locks might not be > futexes but a lock inside the kernel, taken in system calls. I.e. the > robust_sem might not protect you. > If there are n locks you need to lock n lock->wait_lock and n > owner->task->pi_lock as you traverse the locks. That is what I tried to > sketch in the examble below. The thing about this is that you can't (if the kernel is not buggy) deadlock on the kernel spin locks. As long as you protect the locks in the futex from deadlocking you are fine. This is because you don't grab a futex after grabbing a kernel spin lock. All kernel spin locks __must__ be released before going back to user land, and that's where you grab the futexes. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: robust futex deadlock detection patch 2006-01-09 21:08 ` Steven Rostedt @ 2006-01-09 21:19 ` Esben Nielsen 0 siblings, 0 replies; 37+ messages in thread From: Esben Nielsen @ 2006-01-09 21:19 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, linux-kernel, robustmutexes, dino, David Singleton On Mon, 9 Jan 2006, Steven Rostedt wrote: > On Mon, 2006-01-09 at 21:16 +0100, Esben Nielsen wrote: > > > You only take the spinlocks corresponding to the current lock. What about > > the next locks in the chain? Remember those locks might not be > > futexes but a lock inside the kernel, taken in system calls. I.e. the > > robust_sem might not protect you. > > If there are n locks you need to lock n lock->wait_lock and n > > owner->task->pi_lock as you traverse the locks. That is what I tried to > > sketch in the examble below. > > The thing about this is that you can't (if the kernel is not buggy) > deadlock on the kernel spin locks. As long as you protect the locks in > the futex from deadlocking you are fine. This is because you don't grab > a futex after grabbing a kernel spin lock. All kernel spin locks > __must__ be released before going back to user land, and that's where > you grab the futexes. > Yes, but the deadlock-detect code doesn't know what is a futex and what isn't. Therefore when it starts to traverse rt_mutexes which aren't futexes it has to be _very_ carefull taking the right spinlocks. On the other hand when traversing the futexes it has to be carefull not to deadlocks on those very same spinlocks. And yes, you _can_ traverse the lock chain without deadlocking the kernel. The code I sketched in the previouse mail should do it. Anyway, I am trying to rewrite rt_mutex to remove the spinlock deadlock altogether. My girlfriend is away with our daugther a few days so I might get time to do finish it :-) > -- Steve > > Esben ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2006-01-09 21:19 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-12-14 22:39 Recursion bug in -rt Dinakar Guniguntala 2005-12-15 1:03 ` david singleton 2005-12-15 19:44 ` Dinakar Guniguntala 2005-12-15 20:40 ` David Singleton 2005-12-16 0:02 ` david singleton 2005-12-16 18:42 ` Dinakar Guniguntala 2005-12-16 21:26 ` David Singleton 2005-12-19 11:56 ` Dinakar Guniguntala 2005-12-19 20:11 ` David Singleton 2005-12-15 19:00 ` David Singleton 2005-12-15 19:52 ` Dinakar Guniguntala 2005-12-20 13:19 ` Ingo Molnar 2005-12-20 15:50 ` Dinakar Guniguntala 2005-12-20 17:43 ` Esben Nielsen 2005-12-20 19:33 ` Steven Rostedt 2005-12-20 20:42 ` Esben Nielsen 2005-12-20 21:20 ` Steven Rostedt 2005-12-20 21:55 ` david singleton 2005-12-20 22:56 ` Esben Nielsen 2005-12-20 23:12 ` Steven Rostedt 2005-12-20 23:55 ` Esben Nielsen 2005-12-22 4:37 ` david singleton 2005-12-20 22:43 ` Esben Nielsen 2005-12-20 22:59 ` Steven Rostedt 2006-01-03 1:54 ` david singleton 2006-01-05 2:14 ` david singleton 2006-01-05 9:43 ` Esben Nielsen 2006-01-05 17:11 ` david singleton 2006-01-05 17:47 ` Esben Nielsen 2006-01-05 18:26 ` david singleton 2006-01-07 2:40 ` robust futex deadlock detection patch david singleton [not found] ` <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com> 2006-01-07 19:49 ` Ulrich Drepper 2006-01-09 9:23 ` Esben Nielsen 2006-01-09 20:01 ` David Singleton 2006-01-09 20:16 ` Esben Nielsen 2006-01-09 21:08 ` Steven Rostedt 2006-01-09 21:19 ` Esben Nielsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).