All of lore.kernel.org
 help / color / mirror / Atom feed
* krb5 problems in 2.6.36
@ 2010-08-28 17:09 J. Bruce Fields
  2010-08-30 17:57 ` J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: J. Bruce Fields @ 2010-08-28 17:09 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

As of a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound
cred to struct rpc_rqst" the NFS server crashes when using krb5.

I don't have good errors--I'll get some--but what I've seen suggests
maybe a use-after-free of an rpc client on rpc_pipefs operations by
gssd?

I need the following to compile commits in that range.

--b.

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 5eccea1..4a7c3d9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1382,7 +1382,7 @@ static const struct rpc_call_ops nfs_commit_ops = {
 	.rpc_release = nfs_commit_release,
 };
 
-static int nfs_commit_inode(struct inode *inode, int how)
+int nfs_commit_inode(struct inode *inode, int how)
 {
 	LIST_HEAD(head);
 	int may_wait = how & FLUSH_SYNC;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a9d8026..92926bf 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -504,6 +504,7 @@ extern int nfs_wb_page_cancel(struct inode *inode, struct page* page);
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
 extern struct nfs_write_data *nfs_commitdata_alloc(void);
 extern void nfs_commit_free(struct nfs_write_data *wdata);
+extern int nfs_commit_inode(struct inode *inode, int how);
 #endif
 
 static inline int

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: krb5 problems in 2.6.36
  2010-08-28 17:09 krb5 problems in 2.6.36 J. Bruce Fields
@ 2010-08-30 17:57 ` J. Bruce Fields
  2010-09-07  5:01   ` [PATCH] Fix null dereference in call_allocate J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: J. Bruce Fields @ 2010-08-30 17:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sat, Aug 28, 2010 at 01:09:53PM -0400, J. Bruce Fields wrote:
> As of a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound
> cred to struct rpc_rqst" the NFS server crashes when using krb5.
> 
> I don't have good errors--I'll get some--but what I've seen suggests
> maybe a use-after-free of an rpc client on rpc_pipefs operations by
> gssd?

Here's an example.

--b.

Aug 30 13:55:07 plink1 kernel: ------------[ cut here ]------------
Aug 30 13:55:07 plink1 kernel: WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0()
Aug 30 13:55:07 plink1 kernel: Hardware name: Bochs
Aug 30 13:55:07 plink1 kernel: list_add corruption. prev->next should be next (ffff88001b8db440), but was (null). (prev=ffff88001f7d84b8).
Aug 30 13:55:07 plink1 kernel: Modules linked in: [last unloaded: scsi_wait_scan]
Aug 30 13:55:07 plink1 kernel: Pid: 390, comm: rpciod/0 Not tainted 2.6.35-rc3-00041-g4d019ca #31
Aug 30 13:55:07 plink1 kernel: Call Trace:
Aug 30 13:55:07 plink1 kernel: [<ffffffff81038d5f>] warn_slowpath_common+0x7f/0xc0
Aug 30 13:55:07 plink1 kernel: [<ffffffff81038e56>] warn_slowpath_fmt+0x46/0x50
Aug 30 13:55:07 plink1 kernel: [<ffffffff814f441f>] __list_add+0x8f/0xa0
Aug 30 13:55:07 plink1 kernel: [<ffffffff8190f255>] ? rpc_queue_upcall+0x35/0x110
Aug 30 13:55:07 plink1 kernel: [<ffffffff8190f281>] rpc_queue_upcall+0x61/0x110
Aug 30 13:55:07 plink1 kernel: [<ffffffff81913fcc>] gss_setup_upcall+0x2cc/0x420
Aug 30 13:55:07 plink1 kernel: [<ffffffff819146b3>] gss_refresh+0x93/0x2c0
Aug 30 13:55:07 plink1 kernel: [<ffffffff810682ad>] ? trace_hardirqs_on_caller+0x14d/0x190
Aug 30 13:55:07 plink1 kernel: [<ffffffff819006c8>] rpcauth_refreshcred+0x48/0x1c0
Aug 30 13:55:07 plink1 kernel: [<ffffffff81913cdd>] ? gss_release_msg+0x5d/0x80
Aug 30 13:55:07 plink1 kernel: [<ffffffff818f6143>] call_refresh+0x43/0x70
Aug 30 13:55:07 plink1 kernel: [<ffffffff818ff252>] __rpc_execute+0xa2/0x230
Aug 30 13:55:07 plink1 kernel: [<ffffffff818ff410>] ? rpc_async_schedule+0x0/0x20
Aug 30 13:55:07 plink1 kernel: [<ffffffff818ff425>] rpc_async_schedule+0x15/0x20
Aug 30 13:55:07 plink1 kernel: [<ffffffff81053105>] worker_thread+0x225/0x410
Aug 30 13:55:07 plink1 kernel: [<ffffffff810530b5>] ? worker_thread+0x1d5/0x410
Aug 30 13:55:07 plink1 kernel: [<ffffffff8102f8d1>] ? get_parent_ip+0x11/0x50
Aug 30 13:55:07 plink1 kernel: [<ffffffff810579b0>] ? autoremove_wake_function+0x0/0x40
Aug 30 13:55:07 plink1 kernel: [<ffffffff81052ee0>] ? worker_thread+0x0/0x410
Aug 30 13:55:07 plink1 kernel: [<ffffffff81057516>] kthread+0x96/0xa0
Aug 30 13:55:07 plink1 kernel: [<ffffffff810030b4>] kernel_thread_helper+0x4/0x10
Aug 30 13:55:07 plink1 kernel: [<ffffffff8196587e>] ? restore_args+0x0/0x30
Aug 30 13:55:07 plink1 kernel: [<ffffffff81057480>] ? kthread+0x0/0xa0
Aug 30 13:55:07 plink1 kernel: [<ffffffff810030b0>] ? kernel_thread_helper+0x0/0x10
Aug 30 13:55:07 plink1 kernel: ---[ end trace 71a47b9c9b9b77dc ]---
Aug 30 13:55:07 plink1 kernel: general protection fault: 0000 [#1] PREEMPT
Aug 30 13:55:07 plink1 kernel: last sysfs file: /sys/devices/virtual/block/dm-0/dev
Aug 30 13:55:07 plink1 kernel: CPU 0
Aug 30 13:55:07 plink1 kernel: Modules linked in: [last unloaded: scsi_wait_scan]
Aug 30 13:55:07 plink1 kernel:
Aug 30 13:55:07 plink1 kernel: Pid: 3604, comm: rpc.gssd Tainted: G        W   2.6.35-rc3-00041-g4d019ca #31 /Bochs
Aug 30 13:55:07 plink1 kernel: RIP: 0010:[<ffffffff814f430b>]  [<ffffffff814f430b>] list_del+0x1b/0xa0
Aug 30 13:55:07 plink1 kernel: RSP: 0018:ffff88001d567e28  EFLAGS: 00010246
Aug 30 13:55:07 plink1 kernel: RAX: 6b6b6b6b6b6b6b6b RBX: ffff88001f7fd9f0 RCX: 00000000fffffff5
Aug 30 13:55:07 plink1 kernel: RDX: ffffffff819141a0 RSI: ffff88001d567e88 RDI: ffff88001f7fd9f0
Aug 30 13:55:07 plink1 kernel: RBP: ffff88001d567e38 R08: ffff88001f7fd9f0 R09: 0000000000000000
Aug 30 13:55:07 plink1 kernel: R10: 0000000000000246 R11: 0000000000000299 R12: ffff88001d567e88
Aug 30 13:55:07 plink1 kernel: R13: ffffffff819141a0 R14: ffff88001f7fd9f0 R15: 00000000fffffff5
Aug 30 13:55:07 plink1 kernel: FS:  00007f85d61417c0(0000) GS:ffffffff81e1c000(0000) knlGS:0000000000000000
Aug 30 13:55:07 plink1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 30 13:55:07 plink1 kernel: CR2: 00007f85d614c000 CR3: 000000001e41c000 CR4: 00000000000006f0
Aug 30 13:55:07 plink1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 30 13:55:07 plink1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 30 13:55:07 plink1 kernel: Process rpc.gssd (pid: 3604, threadinfo ffff88001d566000, task ffff88001ebc0090)
Aug 30 13:55:07 plink1 kernel: Stack:
Aug 30 13:55:07 plink1 kernel: ffff88001b8db128 ffff88001b8db048 ffff88001d567e78 ffffffff8190e860
Aug 30 13:55:07 plink1 kernel: <0> ffff88001b8db0f8 ffff88001b8db048 ffff88001b8db128 ffff88001d567e88
Aug 30 13:55:07 plink1 kernel: <0> ffff88001b8db0f8 ffff88001e245078 ffff88001d567ec8 ffffffff8190eb13
Aug 30 13:55:07 plink1 kernel: Call Trace:
Aug 30 13:55:07 plink1 kernel: [<ffffffff8190e860>] rpc_purge_list+0x40/0x90
Aug 30 13:55:07 plink1 kernel: [<ffffffff8190eb13>] rpc_pipe_release+0x183/0x1a0
Aug 30 13:55:07 plink1 kernel: [<ffffffff810ea2d2>] fput+0x132/0x2c0
Aug 30 13:55:07 plink1 kernel: [<ffffffff810e6ccd>] filp_close+0x5d/0x90
Aug 30 13:55:07 plink1 kernel: [<ffffffff810e6db2>] sys_close+0xb2/0x110
Aug 30 13:55:07 plink1 kernel: [<ffffffff81002498>] system_call_fastpath+0x16/0x1b
Aug 30 13:55:07 plink1 kernel: Code: ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 47 08 4c 8b 00 4c 39 c7 75 39 48 8b 03 <4c> 8b 40 08 4c 39 c3 75 4c 48 8b 53 08 48 89 50 08 48 89 02 48
Aug 30 13:55:07 plink1 kernel: RIP  [<ffffffff814f430b>] list_del+0x1b/0xa0
Aug 30 13:55:07 plink1 kernel: RSP <ffff88001d567e28>
Aug 30 13:55:07 plink1 kernel: Slab corruption: size-1024 start=ffff88001f7fd9e8, len=1024
Aug 30 13:55:07 plink1 kernel: Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Aug 30 13:55:07 plink1 kernel: Last user: [<ffffffff81837870>](skb_release_data+0xd0/0xe0)
Aug 30 13:55:07 plink1 kernel: 010: 88 7e 56 1d 00 88 ff ff 6b 6b 6b 6b 6b 6b 6b 6b
Aug 30 13:55:07 plink1 kernel: Prev obj: start=ffff88001f7fd5d0, len=1024
Aug 30 13:55:07 plink1 kernel: Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Aug 30 13:55:07 plink1 kernel: Last user: [<ffffffff810f1a1f>](alloc_pipe_info+0x6f/0x1f0)
Aug 30 13:55:07 plink1 kernel: 000: 30 ec 5c 00 00 ea ff ff 00 10 00 00 00 00 00 00
Aug 30 13:55:07 plink1 kernel: 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Aug 30 13:55:07 plink1 kernel: ---[ end trace 71a47b9c9b9b77dd ]---


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] Fix null dereference in call_allocate
  2010-08-30 17:57 ` J. Bruce Fields
@ 2010-09-07  5:01   ` J. Bruce Fields
  2010-09-07  5:12     ` [PATCH] Fix race corrupting rpc upcall list J. Bruce Fields
  2010-09-07 23:03     ` [PATCH] SUNRPC: cleanup state-machine ordering J. Bruce Fields
  0 siblings, 2 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-07  5:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

From: J. Bruce Fields <bfields@redhat.com>

In call_allocate we need to reach the auth in order to factor au_cslack
into the allocation.

As of a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound
cred to struct rpc_rqst", call_allocate attempts to do this by
dereferencing tk_client->cl_auth, however this is not guaranteed to be
defined--cl_auth can be zero in the case of gss context destruction (see
rpc_free_auth).

Reorder the client state machine to bind credentials before allocating,
so that we can instead reach the auth through the cred.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

---

On Mon, Aug 30, 2010 at 01:57:28PM -0400, J. Bruce Fields wrote:
> On Sat, Aug 28, 2010 at 01:09:53PM -0400, J. Bruce Fields wrote:
> > As of a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound
> > cred to struct rpc_rqst" the NFS server crashes when using krb5.
> > 
> > I don't have good errors--I'll get some--but what I've seen suggests
> > maybe a use-after-free of an rpc client on rpc_pipefs operations by
> > gssd?

As it turns out, there were two problems causing the crashes I saw.
This one was the regression identified by the above commit.

If this is actually the right approach, then we should probably also
reorder these functions and fix the comments to reflect the new
state-machine order.

--b.

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 2388d83..657aac6 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -931,7 +931,7 @@ call_reserveresult(struct rpc_task *task)
 	task->tk_status = 0;
 	if (status >= 0) {
 		if (task->tk_rqstp) {
-			task->tk_action = call_allocate;
+			task->tk_action = call_refresh;
 			return;
 		}
 
@@ -972,7 +972,7 @@ call_reserveresult(struct rpc_task *task)
 static void
 call_allocate(struct rpc_task *task)
 {
-	unsigned int slack = task->tk_client->cl_auth->au_cslack;
+	unsigned int slack = task->tk_rqstp->rq_cred->cr_auth->au_cslack;
 	struct rpc_rqst *req = task->tk_rqstp;
 	struct rpc_xprt *xprt = task->tk_xprt;
 	struct rpc_procinfo *proc = task->tk_msg.rpc_proc;
@@ -980,7 +980,7 @@ call_allocate(struct rpc_task *task)
 	dprint_status(task);
 
 	task->tk_status = 0;
-	task->tk_action = call_refresh;
+	task->tk_action = call_bind;
 
 	if (req->rq_buffer)
 		return;
@@ -1042,7 +1042,7 @@ call_refreshresult(struct rpc_task *task)
 	dprint_status(task);
 
 	task->tk_status = 0;
-	task->tk_action = call_bind;
+	task->tk_action = call_allocate;
 	if (status >= 0 && rpcauth_uptodatecred(task))
 		return;
 	switch (status) {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] Fix race corrupting rpc upcall list
  2010-09-07  5:01   ` [PATCH] Fix null dereference in call_allocate J. Bruce Fields
@ 2010-09-07  5:12     ` J. Bruce Fields
  2010-09-07  5:13       ` J. Bruce Fields
                         ` (2 more replies)
  2010-09-07 23:03     ` [PATCH] SUNRPC: cleanup state-machine ordering J. Bruce Fields
  1 sibling, 3 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-07  5:12 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

From: J. Bruce Fields <bfields@redhat.com>

If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
after rpc_pipe_release calls rpc_purge_list(), but before it calls
gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
will free a message without deleting it from the rpci->pipe list.

We will be left with a freed object on the rpc->pipe list.  Most
frequent symptoms are kernel crashes in rpc.gssd system calls on the
pipe in question.

We could just add a list_del(&gss_msg->msg.list) here.  But I can see no
reason for doing all this cleanup here; the preceding rpc_purge_list()
should have done the job, except possibly for any newly queued upcalls
as above, which can safely be left to wait for another opener.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 36eee66..8ad9a34 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -744,23 +744,6 @@ static int gss_pipe_open_v1(struct inode *inode)
 static void
 gss_pipe_release(struct inode *inode)
 {
-	struct rpc_inode *rpci = RPC_I(inode);
-	struct gss_upcall_msg *gss_msg;
-
-	spin_lock(&inode->i_lock);
-	while (!list_empty(&rpci->in_downcall)) {
-
-		gss_msg = list_entry(rpci->in_downcall.next,
-				struct gss_upcall_msg, list);
-		gss_msg->msg.errno = -EPIPE;
-		atomic_inc(&gss_msg->count);
-		__gss_unhash_msg(gss_msg);
-		spin_unlock(&inode->i_lock);
-		gss_release_msg(gss_msg);
-		spin_lock(&inode->i_lock);
-	}
-	spin_unlock(&inode->i_lock);
-
 	put_pipe_version();
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-07  5:12     ` [PATCH] Fix race corrupting rpc upcall list J. Bruce Fields
@ 2010-09-07  5:13       ` J. Bruce Fields
  2010-09-07 18:23         ` Trond Myklebust
  2010-09-08 22:05         ` J. Bruce Fields
  2010-09-07 17:24       ` J. Bruce Fields
  2010-09-12 21:07       ` Trond Myklebust
  2 siblings, 2 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-07  5:13 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

After those two patches I can finally pass connectathon tests on 2.6.36.
(Argh.)

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-07  5:12     ` [PATCH] Fix race corrupting rpc upcall list J. Bruce Fields
  2010-09-07  5:13       ` J. Bruce Fields
@ 2010-09-07 17:24       ` J. Bruce Fields
  2010-09-12 21:07       ` Trond Myklebust
  2 siblings, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-07 17:24 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Oh, and I have no idea why this problem is suddenly easier to reproduce
on 2.6.36.  Maybe there's some obvious patch I overlooked, but I didn't
see anything on a quick skim of the history.  I made an attempt at
bisecting, but couldn't get a test case that reproduced the problem
reliably enough.

--b.

On Tue, Sep 07, 2010 at 01:12:41AM -0400, J. Bruce Fields wrote:
> From: J. Bruce Fields <bfields@redhat.com>
> 
> If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
> after rpc_pipe_release calls rpc_purge_list(), but before it calls
> gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
> will free a message without deleting it from the rpci->pipe list.
> 
> We will be left with a freed object on the rpc->pipe list.  Most
> frequent symptoms are kernel crashes in rpc.gssd system calls on the
> pipe in question.
> 
> We could just add a list_del(&gss_msg->msg.list) here.  But I can see no
> reason for doing all this cleanup here; the preceding rpc_purge_list()
> should have done the job, except possibly for any newly queued upcalls
> as above, which can safely be left to wait for another opener.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
> 
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index 36eee66..8ad9a34 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -744,23 +744,6 @@ static int gss_pipe_open_v1(struct inode *inode)
>  static void
>  gss_pipe_release(struct inode *inode)
>  {
> -	struct rpc_inode *rpci = RPC_I(inode);
> -	struct gss_upcall_msg *gss_msg;
> -
> -	spin_lock(&inode->i_lock);
> -	while (!list_empty(&rpci->in_downcall)) {
> -
> -		gss_msg = list_entry(rpci->in_downcall.next,
> -				struct gss_upcall_msg, list);
> -		gss_msg->msg.errno = -EPIPE;
> -		atomic_inc(&gss_msg->count);
> -		__gss_unhash_msg(gss_msg);
> -		spin_unlock(&inode->i_lock);
> -		gss_release_msg(gss_msg);
> -		spin_lock(&inode->i_lock);
> -	}
> -	spin_unlock(&inode->i_lock);
> -
>  	put_pipe_version();
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-07  5:13       ` J. Bruce Fields
@ 2010-09-07 18:23         ` Trond Myklebust
  2010-09-08 22:05         ` J. Bruce Fields
  1 sibling, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2010-09-07 18:23 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Tue, 2010-09-07 at 01:13 -0400, J. Bruce Fields wrote:
> After those two patches I can finally pass connectathon tests on 2.6.36.
> (Argh.)
> 
> --b.

OK. Both applied

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] SUNRPC: cleanup state-machine ordering
  2010-09-07  5:01   ` [PATCH] Fix null dereference in call_allocate J. Bruce Fields
  2010-09-07  5:12     ` [PATCH] Fix race corrupting rpc upcall list J. Bruce Fields
@ 2010-09-07 23:03     ` J. Bruce Fields
  1 sibling, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-07 23:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

From: J. Bruce Fields <bfields@redhat.com>

This is just a minor cleanup: net/sunrpc/clnt.c clarifies the rpc client
state machine by commenting each state and by laying out the functions
implementing each state in the order that each state is normally
executed (in the absence of errors).

The previous patch "Fix null dereference in call_allocate" changed the
order of the states.  Move the functions and update the comments to
reflect the change.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 net/sunrpc/clnt.c |   84 ++++++++++++++++++++++++++--------------------------
 1 files changed, 42 insertions(+), 42 deletions(-)

On Tue, Sep 07, 2010 at 01:01:42AM -0400, J. Bruce Fields wrote:
> If this is actually the right approach, then we should probably also
> reorder these functions and fix the comments to reflect the new
> state-machine order.

Here you go.  Trivial cut-and-paste, but compile-tested only.--b.

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 657aac6..68e653f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -966,7 +966,48 @@ call_reserveresult(struct rpc_task *task)
 }
 
 /*
- * 2.	Allocate the buffer. For details, see sched.c:rpc_malloc.
+ * 2.	Bind and/or refresh the credentials
+ */
+static void
+call_refresh(struct rpc_task *task)
+{
+	dprint_status(task);
+
+	task->tk_action = call_refreshresult;
+	task->tk_status = 0;
+	task->tk_client->cl_stats->rpcauthrefresh++;
+	rpcauth_refreshcred(task);
+}
+
+/*
+ * 2a.	Process the results of a credential refresh
+ */
+static void
+call_refreshresult(struct rpc_task *task)
+{
+	int status = task->tk_status;
+
+	dprint_status(task);
+
+	task->tk_status = 0;
+	task->tk_action = call_allocate;
+	if (status >= 0 && rpcauth_uptodatecred(task))
+		return;
+	switch (status) {
+	case -EACCES:
+		rpc_exit(task, -EACCES);
+		return;
+	case -ENOMEM:
+		rpc_exit(task, -ENOMEM);
+		return;
+	case -ETIMEDOUT:
+		rpc_delay(task, 3*HZ);
+	}
+	task->tk_action = call_refresh;
+}
+
+/*
+ * 2b.	Allocate the buffer. For details, see sched.c:rpc_malloc.
  *	(Note: buffer memory is freed in xprt_release).
  */
 static void
@@ -1017,47 +1058,6 @@ call_allocate(struct rpc_task *task)
 	rpc_exit(task, -ERESTARTSYS);
 }
 
-/*
- * 2a.	Bind and/or refresh the credentials
- */
-static void
-call_refresh(struct rpc_task *task)
-{
-	dprint_status(task);
-
-	task->tk_action = call_refreshresult;
-	task->tk_status = 0;
-	task->tk_client->cl_stats->rpcauthrefresh++;
-	rpcauth_refreshcred(task);
-}
-
-/*
- * 2b.	Process the results of a credential refresh
- */
-static void
-call_refreshresult(struct rpc_task *task)
-{
-	int status = task->tk_status;
-
-	dprint_status(task);
-
-	task->tk_status = 0;
-	task->tk_action = call_allocate;
-	if (status >= 0 && rpcauth_uptodatecred(task))
-		return;
-	switch (status) {
-	case -EACCES:
-		rpc_exit(task, -EACCES);
-		return;
-	case -ENOMEM:
-		rpc_exit(task, -ENOMEM);
-		return;
-	case -ETIMEDOUT:
-		rpc_delay(task, 3*HZ);
-	}
-	task->tk_action = call_refresh;
-}
-
 static inline int
 rpc_task_need_encode(struct rpc_task *task)
 {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-07  5:13       ` J. Bruce Fields
  2010-09-07 18:23         ` Trond Myklebust
@ 2010-09-08 22:05         ` J. Bruce Fields
  2010-09-08 23:07           ` Trond Myklebust
  2010-09-09 15:58           ` J. Bruce Fields
  1 sibling, 2 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-08 22:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Tue, Sep 07, 2010 at 01:13:36AM -0400, J. Bruce Fields wrote:
> After those two patches I can finally pass connectathon tests on 2.6.36.
> (Argh.)

Arrrrrrrrgh!

One more: rpc_shutdown_client() is getting called on a client which is
corrupt; looking at the client in kgdb:

0xffff880037fcd2b0: 0x9df20000 0xd490796c 0x65005452 0x0008d144
0xffff880037fcd2c0: 0x42000045 0x0040a275 0x514f1140 0x657aa8c0
0xffff880037fcd2d0: 0x017aa8c0 0x3500b786 0xeac22e00 0x0001f626
0xffff880037fcd2e0: 0x00000100 0x00000000 0x30013001 0x30013001
0xffff880037fcd2f0: 0x2d6e6907 0x72646461 0x70726104 0x0c000061
0xffff880037fcd300: 0x5a5a0100 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd310: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd320: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd330: 0x00000000 0x00000000 0x00000000 0x00000000
0xffff880037fcd340: 0x00000000 0x00000000 0x00000000 0x00000000
0xffff880037fcd350: 0x00000000 0x00000000 0x00000001 0x5a5a5a5a
0xffff880037fcd360: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd370: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd380: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd390: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd3a0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd3b0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd3c0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd3d0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd3e0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd3f0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd400: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd410: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd420: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd430: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd440: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd450: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
0xffff880037fcd460: 0x5a5a5a5a 0x5a5a5a5a

So it's mostly (but not exclusively) POISON_INUSE.  (Which is what the
allocator fills an object with before handing back to someone; so
apparently someone allocated it but didn't initialize most of it.)

I can't see how the rpc code would return a client that looked like
that.  It allocates clients with kzalloc, for one thing.

So all I can think is that we freed the client while it was still
in use, and that memory got handed to someone else.

There's only one place in the kernel code that frees rpc clients, in
nfsd4_set_callback_client().  It is always called under the global state
lock, and does essentially:

        *old = clp->cl_cb_client;
        clp->cl_cb_client = new;
        if (old)
                rpc_shutdown_client(old);

where "new" is always either NULL or something just returned from rpc_create().

So I don't see any possible way that can call rpc_shutdown_client on the same
thing twice.

It could be a double-free inside the rpc code somewhere, but I haven't found
any.

This happened during the pynfs DELEG9 test over krb5i, but I can't reproduce it
reliably.

Bah.  Anyone have debugging advice?

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-08 22:05         ` J. Bruce Fields
@ 2010-09-08 23:07           ` Trond Myklebust
  2010-09-09  1:23             ` J. Bruce Fields
  2010-09-09 15:58           ` J. Bruce Fields
  1 sibling, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2010-09-08 23:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Wed, 2010-09-08 at 18:05 -0400, J. Bruce Fields wrote:
> On Tue, Sep 07, 2010 at 01:13:36AM -0400, J. Bruce Fields wrote:
> > After those two patches I can finally pass connectathon tests on 2.6.36.
> > (Argh.)
> 
> Arrrrrrrrgh!
> 
> One more: rpc_shutdown_client() is getting called on a client which is
> corrupt; looking at the client in kgdb:
> 
> 0xffff880037fcd2b0: 0x9df20000 0xd490796c 0x65005452 0x0008d144
> 0xffff880037fcd2c0: 0x42000045 0x0040a275 0x514f1140 0x657aa8c0
> 0xffff880037fcd2d0: 0x017aa8c0 0x3500b786 0xeac22e00 0x0001f626
> 0xffff880037fcd2e0: 0x00000100 0x00000000 0x30013001 0x30013001
> 0xffff880037fcd2f0: 0x2d6e6907 0x72646461 0x70726104 0x0c000061
> 0xffff880037fcd300: 0x5a5a0100 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd310: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd320: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd330: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xffff880037fcd340: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xffff880037fcd350: 0x00000000 0x00000000 0x00000001 0x5a5a5a5a
> 0xffff880037fcd360: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd370: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd380: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd390: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd3a0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd3b0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd3c0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd3d0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd3e0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd3f0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd400: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd410: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd420: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd430: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd440: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd450: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> 0xffff880037fcd460: 0x5a5a5a5a 0x5a5a5a5a
> 
> So it's mostly (but not exclusively) POISON_INUSE.  (Which is what the
> allocator fills an object with before handing back to someone; so
> apparently someone allocated it but didn't initialize most of it.)
> 
> I can't see how the rpc code would return a client that looked like
> that.  It allocates clients with kzalloc, for one thing.
> 
> So all I can think is that we freed the client while it was still
> in use, and that memory got handed to someone else.
> 
> There's only one place in the kernel code that frees rpc clients, in
> nfsd4_set_callback_client().  It is always called under the global state
> lock, and does essentially:
> 
>         *old = clp->cl_cb_client;
>         clp->cl_cb_client = new;

	flush_workqueue(callback_wq);

>         if (old)
>                 rpc_shutdown_client(old);
> 
> where "new" is always either NULL or something just returned from rpc_create().
> 
> So I don't see any possible way that can call rpc_shutdown_client on the same
> thing twice.

A use-after-free rpc call will do just that, since it takes a reference
to the (freed up) rpc_client and releases it after it is done.

Any chance you might be doing an rpc call that circumvents the
callback_wq flush above?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-08 23:07           ` Trond Myklebust
@ 2010-09-09  1:23             ` J. Bruce Fields
  0 siblings, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-09  1:23 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Wed, Sep 08, 2010 at 07:07:55PM -0400, Trond Myklebust wrote:
> On Wed, 2010-09-08 at 18:05 -0400, J. Bruce Fields wrote:
> > On Tue, Sep 07, 2010 at 01:13:36AM -0400, J. Bruce Fields wrote:
> > > After those two patches I can finally pass connectathon tests on 2.6.36.
> > > (Argh.)
> > 
> > Arrrrrrrrgh!
> > 
> > One more: rpc_shutdown_client() is getting called on a client which is
> > corrupt; looking at the client in kgdb:
> > 
> > 0xffff880037fcd2b0: 0x9df20000 0xd490796c 0x65005452 0x0008d144
> > 0xffff880037fcd2c0: 0x42000045 0x0040a275 0x514f1140 0x657aa8c0
> > 0xffff880037fcd2d0: 0x017aa8c0 0x3500b786 0xeac22e00 0x0001f626
> > 0xffff880037fcd2e0: 0x00000100 0x00000000 0x30013001 0x30013001
> > 0xffff880037fcd2f0: 0x2d6e6907 0x72646461 0x70726104 0x0c000061
> > 0xffff880037fcd300: 0x5a5a0100 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd310: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd320: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd330: 0x00000000 0x00000000 0x00000000 0x00000000
> > 0xffff880037fcd340: 0x00000000 0x00000000 0x00000000 0x00000000
> > 0xffff880037fcd350: 0x00000000 0x00000000 0x00000001 0x5a5a5a5a
> > 0xffff880037fcd360: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd370: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd380: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd390: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd3a0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd3b0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd3c0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd3d0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd3e0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd3f0: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd400: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd410: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd420: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd430: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd440: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd450: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
> > 0xffff880037fcd460: 0x5a5a5a5a 0x5a5a5a5a
> > 
> > So it's mostly (but not exclusively) POISON_INUSE.  (Which is what the
> > allocator fills an object with before handing back to someone; so
> > apparently someone allocated it but didn't initialize most of it.)
> > 
> > I can't see how the rpc code would return a client that looked like
> > that.  It allocates clients with kzalloc, for one thing.
> > 
> > So all I can think is that we freed the client while it was still
> > in use, and that memory got handed to someone else.
> > 
> > There's only one place in the kernel code that frees rpc clients, in
> > nfsd4_set_callback_client().  It is always called under the global state
> > lock, and does essentially:
> > 
> >         *old = clp->cl_cb_client;
> >         clp->cl_cb_client = new;
> 
> 	flush_workqueue(callback_wq);
> 
> >         if (old)
> >                 rpc_shutdown_client(old);
> > 
> > where "new" is always either NULL or something just returned from rpc_create().
> > 
> > So I don't see any possible way that can call rpc_shutdown_client on the same
> > thing twice.
> 
> A use-after-free rpc call will do just that, since it takes a reference
> to the (freed up) rpc_client and releases it after it is done.
> 
> Any chance you might be doing an rpc call that circumvents the
> callback_wq flush above?

That does seem the more likely source of problems, but the backtrace is

#0  0xffffffff818ee35e in rpc_release_client (clnt=0x5a5a5a5a5a5a5a5a) at net/sunrpc/clnt.c:526

that bogus clnt is cl_parent, so:

#1  0xffffffff818ee739 in rpc_free_client (kref=0xffff880037fcd2b0) at net/sunrpc/clnt.c:479

rpc_clnt was given a pointer to a client that (as above) was already corrupted.

#2  0xffffffff814e0806 in kref_put (kref=0xffff880037fcd2b0, release=0xffffffff818ee6f0 <rpc_free_client>) at lib/kref.c:59
#3  0xffffffff818ee826 in rpc_free_auth (kref=0xffff880037fcd2b0) at net/sunrpc/clnt.c:515
#4  0xffffffff814e0806 in kref_put (kref=0xffff880037fcd2b0, release=0xffffffff818ee7e0 <rpc_free_auth>) at lib/kref.c:59
#5  0xffffffff818ee373 in rpc_release_client (clnt=0xffff880037fcd2b0) at net/sunrpc/clnt.c:528
#6  0xffffffff818ee896 in rpc_shutdown_client (clnt=0xffff880037fcd2b0) at net/sunrpc/clnt.c:460
#7  0xffffffff81276ac5 in nfsd4_set_callback_client (clp=<value optimized out>, new=<value optimized out>) at fs/nfsd/nfs4callback.c:748

And this is the code above.

So it seems to rpc_clnt was already freed before we got here.

I'm stumped for now.  I guess I'll work on finding a reliable reproducer.

--b.

#8  0xffffffff81276c71 in setup_callback_client (clp=0xffff8800329380d8, cb=<value optimized out>) at fs/nfsd/nfs4callback.c:508
#9  0xffffffff81276cf0 in nfsd4_probe_callback (clp=<value optimized out>, cb=<value optimized out>) at fs/nfsd/nfs4callback.c:571
#10 0xffffffff81272969 in nfsd4_setclientid_confirm (rqstp=0xffff88003e5c8000, cstate=<value optimized out>, setclientid_confirm=0xffff880037fe4080) at fs/nfsd/nfs4state.c:1810
#11 0xffffffff81262b51 in nfsd4_proc_compound (rqstp=0xffff88003e5c8000, args=0xffff880037fe4000, resp=0xffff880037fe5000) at fs/nfsd/nfs4proc.c:1092
#12 0xffffffff8124fcce in nfsd_dispatch (rqstp=0xffff88003e5c8000, statp=0xffff88003c97d03c) at fs/nfsd/nfssvc.c:608
#13 0xffffffff818f8d85 in svc_process_common (rqstp=0xffff88003e5c8000, argv=0xffff88003e5c8108, resv=0xffff88003e5c8148) at net/sunrpc/svc.c:1120
#14 0xffffffff818f93f0 in svc_process (rqstp=<value optimized out>) at net/sunrpc/svc.c:1246
#15 0xffffffff81250412 in nfsd (vrqstp=0xffff88003e5c8000) at fs/nfsd/nfssvc.c:535
#16 0xffffffff81059306 in kthread (_create=0xffff88003c939cc8) at kernel/kthread.c:95
#17 0xffffffff810030f4 in ?? () at arch/x86/kernel/entry_64.S:1176
#18 0x0000000000000000 in ?? ()


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-08 22:05         ` J. Bruce Fields
  2010-09-08 23:07           ` Trond Myklebust
@ 2010-09-09 15:58           ` J. Bruce Fields
  1 sibling, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-09 15:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Wed, Sep 08, 2010 at 06:05:28PM -0400, J. Bruce Fields wrote:
> This happened during the pynfs DELEG9 test over krb5i, but I can't reproduce it
> reliably.

I was wrong, this isn't one of the delegation tests.

Running my default pynfs tests over krb5i a few times gets it in a few
minutes.

One factor may be that the callback address the pynfs client gives us is
0.0.0.0, which may be exercising an error path somewhere.

Anyway, I think I'm close to having this.

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-07  5:12     ` [PATCH] Fix race corrupting rpc upcall list J. Bruce Fields
  2010-09-07  5:13       ` J. Bruce Fields
  2010-09-07 17:24       ` J. Bruce Fields
@ 2010-09-12 21:07       ` Trond Myklebust
  2010-09-12 23:47         ` J. Bruce Fields
  2 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2010-09-12 21:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Tue, 2010-09-07 at 01:12 -0400, J. Bruce Fields wrote:
> From: J. Bruce Fields <bfields@redhat.com>
> 
> If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
> after rpc_pipe_release calls rpc_purge_list(), but before it calls
> gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
> will free a message without deleting it from the rpci->pipe list.
> 
> We will be left with a freed object on the rpc->pipe list.  Most
> frequent symptoms are kernel crashes in rpc.gssd system calls on the
> pipe in question.
> 
> We could just add a list_del(&gss_msg->msg.list) here.  But I can see no
> reason for doing all this cleanup here; the preceding rpc_purge_list()
> should have done the job, except possibly for any newly queued upcalls
> as above, which can safely be left to wait for another opener.

Hi Bruce,

Looking again at this issue, I think I see why the ->release_pipe() is
needed. While the call to rpc_purge_list() does indeed clear the list of
all those messages that are waiting for their upcall to complete, it
does nothing for the messages that have successfully been read by the
daemon, but are now waiting for a reply.

How about something like the patch below instead?

Cheers
  Trond

-------------------------------------------------------------------------------------------
SUNRPC: Fix race corrupting rpc upcall

From: Trond Myklebust <Trond.Myklebust@netapp.com>

If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
after rpc_pipe_release calls rpc_purge_list(), but before it calls
gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
will free a message without deleting it from the rpci->pipe list.

We will be left with a freed object on the rpc->pipe list.  Most
frequent symptoms are kernel crashes in rpc.gssd system calls on the
pipe in question.

Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/auth_gss/auth_gss.c |    9 +++++----
 net/sunrpc/rpc_pipe.c          |    6 +++---
 2 files changed, 8 insertions(+), 7 deletions(-)


diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index dcfc66b..12c4859 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -745,17 +745,18 @@ gss_pipe_release(struct inode *inode)
 	struct rpc_inode *rpci = RPC_I(inode);
 	struct gss_upcall_msg *gss_msg;
 
+restart:
 	spin_lock(&inode->i_lock);
-	while (!list_empty(&rpci->in_downcall)) {
+	list_for_each_entry(gss_msg, &rpci->in_downcall, list) {
 
-		gss_msg = list_entry(rpci->in_downcall.next,
-				struct gss_upcall_msg, list);
+		if (!list_empty(&gss_msg->msg.list))
+			continue;
 		gss_msg->msg.errno = -EPIPE;
 		atomic_inc(&gss_msg->count);
 		__gss_unhash_msg(gss_msg);
 		spin_unlock(&inode->i_lock);
 		gss_release_msg(gss_msg);
-		spin_lock(&inode->i_lock);
+		goto restart;
 	}
 	spin_unlock(&inode->i_lock);
 
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 95ccbcf..41a762f 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -48,7 +48,7 @@ static void rpc_purge_list(struct rpc_inode *rpci, struct list_head *head,
 		return;
 	do {
 		msg = list_entry(head->next, struct rpc_pipe_msg, list);
-		list_del(&msg->list);
+		list_del_init(&msg->list);
 		msg->errno = err;
 		destroy_msg(msg);
 	} while (!list_empty(head));
@@ -208,7 +208,7 @@ rpc_pipe_release(struct inode *inode, struct file *filp)
 	if (msg != NULL) {
 		spin_lock(&inode->i_lock);
 		msg->errno = -EAGAIN;
-		list_del(&msg->list);
+		list_del_init(&msg->list);
 		spin_unlock(&inode->i_lock);
 		rpci->ops->destroy_msg(msg);
 	}
@@ -268,7 +268,7 @@ rpc_pipe_read(struct file *filp, char __user *buf, size_t len, loff_t *offset)
 	if (res < 0 || msg->len == msg->copied) {
 		filp->private_data = NULL;
 		spin_lock(&inode->i_lock);
-		list_del(&msg->list);
+		list_del_init(&msg->list);
 		spin_unlock(&inode->i_lock);
 		rpci->ops->destroy_msg(msg);
 	}


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-12 21:07       ` Trond Myklebust
@ 2010-09-12 23:47         ` J. Bruce Fields
  2010-09-13 17:49           ` J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-12 23:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sun, Sep 12, 2010 at 05:07:46PM -0400, Trond Myklebust wrote:
> On Tue, 2010-09-07 at 01:12 -0400, J. Bruce Fields wrote:
> > From: J. Bruce Fields <bfields@redhat.com>
> > 
> > If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
> > after rpc_pipe_release calls rpc_purge_list(), but before it calls
> > gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
> > will free a message without deleting it from the rpci->pipe list.
> > 
> > We will be left with a freed object on the rpc->pipe list.  Most
> > frequent symptoms are kernel crashes in rpc.gssd system calls on the
> > pipe in question.
> > 
> > We could just add a list_del(&gss_msg->msg.list) here.  But I can see no
> > reason for doing all this cleanup here; the preceding rpc_purge_list()
> > should have done the job, except possibly for any newly queued upcalls
> > as above, which can safely be left to wait for another opener.
> 
> Hi Bruce,
> 
> Looking again at this issue, I think I see why the ->release_pipe() is
> needed. While the call to rpc_purge_list() does indeed clear the list of
> all those messages that are waiting for their upcall to complete, it
> does nothing for the messages that have successfully been read by the
> daemon, but are now waiting for a reply.

Doh!

> How about something like the patch below instead?

I read it over, and it looks sensible to me.

It's also survived a few testing iterations.  I'll give it a few more
just out of paranoia, but would be surprised if they find the problem
isn't resolved.

--b.

> 
> Cheers
>   Trond
> 
> -------------------------------------------------------------------------------------------
> SUNRPC: Fix race corrupting rpc upcall
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
> after rpc_pipe_release calls rpc_purge_list(), but before it calls
> gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
> will free a message without deleting it from the rpci->pipe list.
> 
> We will be left with a freed object on the rpc->pipe list.  Most
> frequent symptoms are kernel crashes in rpc.gssd system calls on the
> pipe in question.
> 
> Reported-by: J. Bruce Fields <bfields@redhat.com>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> 
>  net/sunrpc/auth_gss/auth_gss.c |    9 +++++----
>  net/sunrpc/rpc_pipe.c          |    6 +++---
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index dcfc66b..12c4859 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -745,17 +745,18 @@ gss_pipe_release(struct inode *inode)
>  	struct rpc_inode *rpci = RPC_I(inode);
>  	struct gss_upcall_msg *gss_msg;
>  
> +restart:
>  	spin_lock(&inode->i_lock);
> -	while (!list_empty(&rpci->in_downcall)) {
> +	list_for_each_entry(gss_msg, &rpci->in_downcall, list) {
>  
> -		gss_msg = list_entry(rpci->in_downcall.next,
> -				struct gss_upcall_msg, list);
> +		if (!list_empty(&gss_msg->msg.list))
> +			continue;
>  		gss_msg->msg.errno = -EPIPE;
>  		atomic_inc(&gss_msg->count);
>  		__gss_unhash_msg(gss_msg);
>  		spin_unlock(&inode->i_lock);
>  		gss_release_msg(gss_msg);
> -		spin_lock(&inode->i_lock);
> +		goto restart;
>  	}
>  	spin_unlock(&inode->i_lock);
>  
> diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
> index 95ccbcf..41a762f 100644
> --- a/net/sunrpc/rpc_pipe.c
> +++ b/net/sunrpc/rpc_pipe.c
> @@ -48,7 +48,7 @@ static void rpc_purge_list(struct rpc_inode *rpci, struct list_head *head,
>  		return;
>  	do {
>  		msg = list_entry(head->next, struct rpc_pipe_msg, list);
> -		list_del(&msg->list);
> +		list_del_init(&msg->list);
>  		msg->errno = err;
>  		destroy_msg(msg);
>  	} while (!list_empty(head));
> @@ -208,7 +208,7 @@ rpc_pipe_release(struct inode *inode, struct file *filp)
>  	if (msg != NULL) {
>  		spin_lock(&inode->i_lock);
>  		msg->errno = -EAGAIN;
> -		list_del(&msg->list);
> +		list_del_init(&msg->list);
>  		spin_unlock(&inode->i_lock);
>  		rpci->ops->destroy_msg(msg);
>  	}
> @@ -268,7 +268,7 @@ rpc_pipe_read(struct file *filp, char __user *buf, size_t len, loff_t *offset)
>  	if (res < 0 || msg->len == msg->copied) {
>  		filp->private_data = NULL;
>  		spin_lock(&inode->i_lock);
> -		list_del(&msg->list);
> +		list_del_init(&msg->list);
>  		spin_unlock(&inode->i_lock);
>  		rpci->ops->destroy_msg(msg);
>  	}
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix race corrupting rpc upcall list
  2010-09-12 23:47         ` J. Bruce Fields
@ 2010-09-13 17:49           ` J. Bruce Fields
  0 siblings, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2010-09-13 17:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sun, Sep 12, 2010 at 07:47:48PM -0400, J. Bruce Fields wrote:
> On Sun, Sep 12, 2010 at 05:07:46PM -0400, Trond Myklebust wrote:
> > On Tue, 2010-09-07 at 01:12 -0400, J. Bruce Fields wrote:
> > > From: J. Bruce Fields <bfields@redhat.com>
> > > 
> > > If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
> > > after rpc_pipe_release calls rpc_purge_list(), but before it calls
> > > gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
> > > will free a message without deleting it from the rpci->pipe list.
> > > 
> > > We will be left with a freed object on the rpc->pipe list.  Most
> > > frequent symptoms are kernel crashes in rpc.gssd system calls on the
> > > pipe in question.
> > > 
> > > We could just add a list_del(&gss_msg->msg.list) here.  But I can see no
> > > reason for doing all this cleanup here; the preceding rpc_purge_list()
> > > should have done the job, except possibly for any newly queued upcalls
> > > as above, which can safely be left to wait for another opener.
> > 
> > Hi Bruce,
> > 
> > Looking again at this issue, I think I see why the ->release_pipe() is
> > needed. While the call to rpc_purge_list() does indeed clear the list of
> > all those messages that are waiting for their upcall to complete, it
> > does nothing for the messages that have successfully been read by the
> > daemon, but are now waiting for a reply.
> 
> Doh!
> 
> > How about something like the patch below instead?
> 
> I read it over, and it looks sensible to me.
> 
> It's also survived a few testing iterations.  I'll give it a few more
> just out of paranoia, but would be surprised if they find the problem
> isn't resolved.

Indeed, no surprises; please pass those patches along whenever you're
ready.

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-09-13 17:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-28 17:09 krb5 problems in 2.6.36 J. Bruce Fields
2010-08-30 17:57 ` J. Bruce Fields
2010-09-07  5:01   ` [PATCH] Fix null dereference in call_allocate J. Bruce Fields
2010-09-07  5:12     ` [PATCH] Fix race corrupting rpc upcall list J. Bruce Fields
2010-09-07  5:13       ` J. Bruce Fields
2010-09-07 18:23         ` Trond Myklebust
2010-09-08 22:05         ` J. Bruce Fields
2010-09-08 23:07           ` Trond Myklebust
2010-09-09  1:23             ` J. Bruce Fields
2010-09-09 15:58           ` J. Bruce Fields
2010-09-07 17:24       ` J. Bruce Fields
2010-09-12 21:07       ` Trond Myklebust
2010-09-12 23:47         ` J. Bruce Fields
2010-09-13 17:49           ` J. Bruce Fields
2010-09-07 23:03     ` [PATCH] SUNRPC: cleanup state-machine ordering J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.