linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix deadlock in RPC scheduling code.
@ 2006-03-09 10:35 Aurelien Degremont
  2006-03-09 14:40 ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Aurelien Degremont @ 2006-03-09 10:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2629 bytes --]

Hello,

Here is a small patch which fixes a deadlock issue in RPC scheduling
code (rpc_wake_up_task() is mainly concerned). This problem happens if a
RPC task, waiting for the response of a sync request, is woken up by a
signal, and, in the same time, the kernel tries to wake up some RPC
tasks. If this is done, a deadlock could occurs due to an error in RPC
workqueue lock management.

This error was added by linux-2.6.8.1-49-rpc_workqueue.patch (included
in 2.6.11). This code was not changed since. The issue was detected on
linux-2.6.12.


**DESCRIPTION**

This deadlock is due to the wrong management of two locks: queue->lock
and the bit RPC_TASK_WAKEUP in task->tk_runstate.
When dealing with RPC workqueues, the code must take care of locking the
associated queue lock before doing any modifications on it or its tasks.
And it does it... except for one function. Nested locks should always be
taken in the same order.

As a consequence, in some circumstances (in fact, I noticed it several
times), this code deadlocks. Moreover, when this code is called, by
example by nfs_file_flush(), the lock_kernel() is held and so all the
system hangs.


**PATCH**

To fix the problem, we only need to invert the lock hierarchy in
rpc_wake_up_task() and taking care of checking the task is really queued
before trying to grab the lock.

This code becomes :

void rpc_wake_up_task(struct rpc_task *task)
{
           struct rpc_wait_queue *queue = NULL;


           /*
            * The task must always be queued because __rpc_do_wake_up_task()
            * modify it.
            */
           if (RPC_IS_QUEUED(task)) {


                   /*
                    * The queue lock must be taken BEFORE rpc_start_wakeup()
                    * is called.
                    */
                   queue = task->u.tk_wait.rpc_waitq;
                   spin_lock_bh(&queue->lock);


                   /* Really wake up the specified RPC task */
                   if (rpc_start_wakeup(task)) {
                           __rpc_do_wake_up_task(task);
                           rpc_finish_wakeup(task);
                   }


                   spin_unlock_bh(&queue->lock);
           }
}

This patch was done with version 2.6.15.4. It works correctly with
2.6.15.6.


**NOTES**

"The smaller the patch is, the better it is". So I kept it short. But,
in fact, I think the groups of functions rpc_wake_up_task(),
__rpc_wake_up_task() and __rpc_do_wake_up_task() could be simplified.


**AUTHORS**

The bug was worked out by Bruno Faccini bruno(dot)faccini(at)bull(dot)net
I did the corresponding patch.

-- 
Aurelien Degremont



[-- Attachment #2: rpc_deadlock-2.6.15.6.patch --]
[-- Type: text/x-patch, Size: 1450 bytes --]

--- linux-2.6.15.4/net/sunrpc/sched.c.orig	2006-03-08 16:54:36.515804799 +0100
+++ linux-2.6.15.4/net/sunrpc/sched.c		2006-03-08 17:00:37.724637607 +0100
@@ -353,7 +353,7 @@
  */
 static void __rpc_do_wake_up_task(struct rpc_task *task)
 {
-	dprintk("RPC: %4d __rpc_wake_up_task (now %ld)\n", task->tk_pid, jiffies);
+	dprintk("RPC: %4d __rpc_do_wake_up_task (now %ld)\n", task->tk_pid, jiffies);
 
 #ifdef RPC_DEBUG
 	BUG_ON(task->tk_magic != RPC_TASK_MAGIC_ID);
@@ -369,7 +369,7 @@
 
 	rpc_make_runnable(task);
 
-	dprintk("RPC:      __rpc_wake_up_task done\n");
+	dprintk("RPC:      __rpc_do_wake_up_task done\n");
 }
 
 /*
@@ -400,15 +400,28 @@
  */
 void rpc_wake_up_task(struct rpc_task *task)
 {
-	if (rpc_start_wakeup(task)) {
-		if (RPC_IS_QUEUED(task)) {
-			struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
+	struct rpc_wait_queue *queue = NULL;
+
+	/* 
+	 * The task must always be queued because __rpc_do_wake_up_task() 
+	 * modify it.
+	 */
+	if (RPC_IS_QUEUED(task)) {
 
-			spin_lock_bh(&queue->lock);
+		/* 
+		 * The queue lock must be taken BEFORE rpc_start_wakeup() 
+		 * is called.
+		 */
+		queue = task->u.tk_wait.rpc_waitq;
+		spin_lock_bh(&queue->lock);
+	
+		/* Really wake up the specified RPC task */
+		if (rpc_start_wakeup(task)) {
 			__rpc_do_wake_up_task(task);
-			spin_unlock_bh(&queue->lock);
+			rpc_finish_wakeup(task);
 		}
-		rpc_finish_wakeup(task);
+		
+		spin_unlock_bh(&queue->lock);
 	}
 }
 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix deadlock in RPC scheduling code.
  2006-03-09 10:35 [PATCH] Fix deadlock in RPC scheduling code Aurelien Degremont
@ 2006-03-09 14:40 ` Trond Myklebust
  2006-03-10 15:10   ` Aurelien Degremont
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2006-03-09 14:40 UTC (permalink / raw)
  To: Aurelien Degremont
  Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

On Thu, 2006-03-09 at 11:35 +0100, Aurelien Degremont wrote:
> Hello,
> 
> Here is a small patch which fixes a deadlock issue in RPC scheduling
> code (rpc_wake_up_task() is mainly concerned). This problem happens if a
> RPC task, waiting for the response of a sync request, is woken up by a
> signal, and, in the same time, the kernel tries to wake up some RPC
> tasks. If this is done, a deadlock could occurs due to an error in RPC
> workqueue lock management.
> 
> This error was added by linux-2.6.8.1-49-rpc_workqueue.patch (included
> in 2.6.11). This code was not changed since. The issue was detected on
> linux-2.6.12.
> 
> 
> **DESCRIPTION**
> 
> This deadlock is due to the wrong management of two locks: queue->lock
> and the bit RPC_TASK_WAKEUP in task->tk_runstate.
> When dealing with RPC workqueues, the code must take care of locking the
> associated queue lock before doing any modifications on it or its tasks.
> And it does it... except for one function. Nested locks should always be
> taken in the same order.
> 
> As a consequence, in some circumstances (in fact, I noticed it several
> times), this code deadlocks. Moreover, when this code is called, by
> example by nfs_file_flush(), the lock_kernel() is held and so all the
> system hangs.
> 
> 
> **PATCH**
> 
> To fix the problem, we only need to invert the lock hierarchy in
> rpc_wake_up_task() and taking care of checking the task is really queued
> before trying to grab the lock.

No you can't. You have absolutely _no_ guarantee that
task->u.tk_wait.rpc_waitq is valid here.

The real fix is the one I posted in response to this thread last week.
See

http://client.linux-nfs.org/Linux-2.6.x/2.6.16-rc5/linux-2.6.16-89-fix_race_in_rpc_wakeup.dif

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix deadlock in RPC scheduling code.
  2006-03-09 14:40 ` Trond Myklebust
@ 2006-03-10 15:10   ` Aurelien Degremont
  2006-03-10 15:24     ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Aurelien Degremont @ 2006-03-10 15:10 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

Trond Myklebust wrote:

> The real fix is the one I posted in response to this thread last week.

Oops, I missed it.

Ok for the patch, the list iteration will be better, but I don't
understand how this will prevent the race condition?

I do not think it is not a good idea to keep this lock order in
rpc_wake_up_task() anyway. I must be missing something but I
think this function should be modified in order to be in accordance with
the lock hierarchy in rpc code. It seems to me that the potential race 
is still there.

Even if we cannot certify task->u.tk_wait.rpc_waitq is valid, the
current kernel code cannot either (err... I think it can't). So let's 
try at least to improve it, even if we cannot set it totally harmless.
Warn me if I'm wrong :
    When rpc_wake_up_task() is called, the calling context is helpless. 
So we have absolutely no information on the task queue. We must 
atomically check the "queued-ness" of the task and grab the queue lock 
to prevent any error? Hmmm... So the matter is : the queue mustn't be 
modified between the test and the lock? Have we some "magical" lock 
somewhere which could help up? I didn't find it.
   With the patch I propose (quite the same than Simon Derr's last week
proposal), the insertion of the RPC task is not a problem since the
QUEUED flag is set when the task is totally enqueued (in 
__rpc_add_wait_queue(): task->u.tk_wait.list is modified and 
task->u.tk_wait.rpc_waitq set), so if the test is
true, the task does have a queue. The task removal is more problematic.
The task could be executed and removed from the queue between the bit
test and the lock grabbing. I've checked the code responsible for this,
and it seems that the task->u.tk_wait.rpc_waitq pointer is still valid,
the task is just removed from wait queue list. So, we could still take
the queue lock and, with the lock taken, we just need to do one more
test to verify the task was not woken up and removed from the queue
since the test.

I do not really like this solution, that's not really clean, but, at
least, I hope my analysis is not too far from correct?
Please let me now where I wrong.

Cordially

-- 
Aurelien Degremont


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix deadlock in RPC scheduling code.
  2006-03-10 15:10   ` Aurelien Degremont
@ 2006-03-10 15:24     ` Trond Myklebust
  2006-03-13 10:07       ` Aurelien Degremont
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2006-03-10 15:24 UTC (permalink / raw)
  To: Aurelien Degremont
  Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

On Fri, 2006-03-10 at 16:10 +0100, Aurelien Degremont wrote:
> Trond Myklebust wrote:
> 
> > The real fix is the one I posted in response to this thread last week.
> 
> Oops, I missed it.
> 
> Ok for the patch, the list iteration will be better, but I don't
> understand how this will prevent the race condition?
> 
> I do not think it is not a good idea to keep this lock order in
> rpc_wake_up_task() anyway. I must be missing something but I
> think this function should be modified in order to be in accordance with
> the lock hierarchy in rpc code. It seems to me that the potential race 
> is still there.
> 
> Even if we cannot certify task->u.tk_wait.rpc_waitq is valid, the
> current kernel code cannot either (err... I think it can't). So let's 
> try at least to improve it, even if we cannot set it totally harmless.
> Warn me if I'm wrong :
>     When rpc_wake_up_task() is called, the calling context is helpless. 
> So we have absolutely no information on the task queue. We must 
> atomically check the "queued-ness" of the task and grab the queue lock 
> to prevent any error? Hmmm... So the matter is : the queue mustn't be 
> modified between the test and the lock? Have we some "magical" lock 
> somewhere which could help up? I didn't find it.

Yes. The RPC_TASK_QUEUED bit can only be cleared when both the
RPC_TASK_WAKEUP bit _and_ the queue spinlock are held.
If you are holding either one of those two, then it is safe to test for
RPC_IS_QUEUED(). If the latter is true, then it is also safe to
dereference the value of task->u.tk_wait.rpc_waitq.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix deadlock in RPC scheduling code.
  2006-03-10 15:24     ` Trond Myklebust
@ 2006-03-13 10:07       ` Aurelien Degremont
  2006-03-13 14:28         ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Aurelien Degremont @ 2006-03-13 10:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

Trond Myklebust wrote:
> Yes. The RPC_TASK_QUEUED bit can only be cleared when both the
> RPC_TASK_WAKEUP bit _and_ the queue spinlock are held.
> If you are holding either one of those two, then it is safe to test for
> RPC_IS_QUEUED(). If the latter is true, then it is also safe to
> dereference the value of task->u.tk_wait.rpc_waitq.

Hmmm... With those constraints, it seems difficult to be able to modify 
the current rpc_wake_up_task() function...

But, are you sure the patch you provided is sufficient to remove the 
potential deadlock we faced ? I do not see how, could you explain ?

-- 
Aurelien Degremont


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix deadlock in RPC scheduling code.
  2006-03-13 10:07       ` Aurelien Degremont
@ 2006-03-13 14:28         ` Trond Myklebust
  2006-03-13 15:16           ` Aurelien Degremont
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2006-03-13 14:28 UTC (permalink / raw)
  To: Aurelien Degremont
  Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

On Mon, 2006-03-13 at 11:07 +0100, Aurelien Degremont wrote:
> Trond Myklebust wrote:
> > Yes. The RPC_TASK_QUEUED bit can only be cleared when both the
> > RPC_TASK_WAKEUP bit _and_ the queue spinlock are held.
> > If you are holding either one of those two, then it is safe to test for
> > RPC_IS_QUEUED(). If the latter is true, then it is also safe to
> > dereference the value of task->u.tk_wait.rpc_waitq.
> 
> Hmmm... With those constraints, it seems difficult to be able to modify 
> the current rpc_wake_up_task() function...

That is the price of optimisation in this case.

> But, are you sure the patch you provided is sufficient to remove the 
> potential deadlock we faced ? I do not see how, could you explain ?

Your deadlock problem resulted in __rpc_wake_up_task() iterating forever
on the same task since the while() loop would not ever exit before it
was empty. By changing the iteration scheme into one where we only try
to wake up each task once, we allow rpc_wake_up()/rpc_wake_up_status()
to complete.

Cheers,
 Trond


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix deadlock in RPC scheduling code.
  2006-03-13 14:28         ` Trond Myklebust
@ 2006-03-13 15:16           ` Aurelien Degremont
  2006-03-13 16:08             ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Aurelien Degremont @ 2006-03-13 15:16 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

Trond Myklebust wrote:
>>Hmmm... With those constraints, it seems difficult to be able to modify 
>>the current rpc_wake_up_task() function...
> That is the price of optimisation in this case.

Optimisation against potential race (see below) ?

> Your deadlock problem resulted in __rpc_wake_up_task() iterating forever
> on the same task since the while() loop would not ever exit before it
> was empty. By changing the iteration scheme into one where we only try
> to wake up each task once, we allow rpc_wake_up()/rpc_wake_up_status()
> to complete.

IMO, the code will still hang because, the list_for_each_entry_safe() 
loop will never complete, even with the new scheme. Indeed, the loop 
call __rpc_wake_up_task() will try to set RPC_TASK_WAKEUP bit, but it is 
already set by rpc_wake_up_task(), so it hangs...

A: run rpc_wake_up_task(task foo)
A: set RPC_TASK_WAKEUP bit
--interrupt--
B: run rpc_wake_up()
B: grab queue->lock
B: enter the list_for_each_entry_safe() loop
B: run __rpc_wake_up_task(task foo)
B: wait for RPC_TASK_WAKEUP bit
A: wait for queue->lock

  -> deadlock ?

-- 
Aurelien Degremont


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix deadlock in RPC scheduling code.
  2006-03-13 15:16           ` Aurelien Degremont
@ 2006-03-13 16:08             ` Trond Myklebust
  0 siblings, 0 replies; 8+ messages in thread
From: Trond Myklebust @ 2006-03-13 16:08 UTC (permalink / raw)
  To: Aurelien Degremont
  Cc: Jacques-Charles Lafoucriere, Bruno Faccini, linux-kernel

On Mon, 2006-03-13 at 16:16 +0100, Aurelien Degremont wrote:

> IMO, the code will still hang because, the list_for_each_entry_safe() 
> loop will never complete, even with the new scheme. Indeed, the loop 
> call __rpc_wake_up_task() will try to set RPC_TASK_WAKEUP bit, but it is 
> already set by rpc_wake_up_task(), so it hangs...

Look again. There is nothing that loops in __rpc_wake_up_task(). If the
RPC_TASK_WAKEUP is already set, then the function exits.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-03-13 16:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-09 10:35 [PATCH] Fix deadlock in RPC scheduling code Aurelien Degremont
2006-03-09 14:40 ` Trond Myklebust
2006-03-10 15:10   ` Aurelien Degremont
2006-03-10 15:24     ` Trond Myklebust
2006-03-13 10:07       ` Aurelien Degremont
2006-03-13 14:28         ` Trond Myklebust
2006-03-13 15:16           ` Aurelien Degremont
2006-03-13 16:08             ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).