All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
       [not found]           ` <20100921111657.330e7838@corrin.poochiereds.net>
@ 2010-09-21 19:20             ` J. Bruce Fields
  2010-09-21 19:36               ` Trond Myklebust
  2010-09-21 19:42               ` [bfields-btOuF8jVhiP3NsPE3w5/ayCwEArCW2h5@public.gmane.org: " J. Bruce Fields
  0 siblings, 2 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-09-21 19:20 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Trond Myklebust, linux-nfs

Added linux-nfs to cc:

On Tue, Sep 21, 2010 at 11:16:57AM -0700, Jeff Layton wrote:
> On Tue, 21 Sep 2010 13:56:55 -0400
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> 
> > On Mon, 2010-09-20 at 09:10 -0400, J. Bruce Fields wrote:
> > > On Mon, Sep 20, 2010 at 09:01:06AM -0400, J. Bruce Fields wrote:
> > > > On Sun, Sep 19, 2010 at 02:45:49PM -0400, J. Bruce Fields wrote:
> > > > > On Sat, Sep 18, 2010 at 01:41:20PM -0400, Trond Myklebust wrote:
> > > > > > Argh, yes... It is that very last commit from Chuck's new patch series
> > > > > > that conflicts with Jeff Layton's unlink fixes. I've now reverted that
> > > > > > commit...
> > > > > 
> > > > > Great, thanks.--b.
> > > > 
> > > > Unfortunately, now I'm getting some sort of crash.  I'll try to get
> > > > console and get more of the logs, but it looks like a bad pointer in
> > > > nfs4_xdr_enc_rename, with worker_thread, etc. at the other end of the
> > > > stack, so I assume it's in rpciod.
> > > 
> > > general protection fault: 0000 [#1] PREEMPT 
> > > last sysfs file: /sys/devices/virtio-pci/virtio2/net/eth0/broadcast
> > > CPU 0 
> > > Modules linked in: [last unloaded: scsi_wait_scan]
> > > 
> > > Pid: 2089, comm: kworker/0:2 Not tainted 2.6.36-rc4-00191-g5baeab4 #262 /Bochs
> > > RIP: 0010:[<ffffffff8124335a>]  [<ffffffff8124335a>] nfs4_xdr_enc_rename+0x2a/0x150
> > > RSP: 0018:ffff88001d723c50  EFLAGS: 00010206
> > > RAX: ffff88001e0fa07c RBX: ffff88001e9fb618 RCX: 5a5a5a5a5a5a5a5a
> >                                                    ^^^^^^^^^^^^^^^^
> > Looks like a use-after-free issue.
> > 
> > > RDX: 0000000000000000 RSI: ffff88001e0fa07c RDI: ffff88001e898320
> > > RBP: ffff88001d723cd0 R08: ffff88001e60e908 R09: 0000000000000000
> > > R10: 0000000000000002 R11: 0000000000000001 R12: ffff88001e0fa07c
> > > R13: ffff88001e60e908 R14: ffff88001e898320 R15: ffff88001dcf2668
> > > FS:  0000000000000000(0000) GS:ffffffff81e1c000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 00007fb75a46c360 CR3: 000000001e532000 CR4: 00000000000006f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Process kworker/0:2 (pid: 2089, threadinfo ffff88001d722000, task ffff88001ddfe050)
> > > Stack:
> > >  ffff88001ddfe050 ffffffff8103faf3 ffff88001ea1d000 ffff88001ea1d658
> > > <0> ffff88001d723c90 ffffffff8106a4dd ffffffff818fce45 ffffffff8190ec98
> > > <0> ffff88001e0fa024 ffff88001dcf2668 ffff88001e0fa028 ffff88001e9fb618
> > > Call Trace:
> > >  [<ffffffff8103faf3>] ? local_bh_enable_ip+0x83/0x110
> > >  [<ffffffff8106a4dd>] ? trace_hardirqs_on_caller+0x14d/0x190
> > >  [<ffffffff818fce45>] ? xprt_prepare_transmit+0x75/0xb0
> > >  [<ffffffff8190ec98>] ? xdr_encode_opaque_fixed+0x48/0x90
> > >  [<ffffffff81243330>] ? nfs4_xdr_enc_rename+0x0/0x150
> > >  [<ffffffff81903f94>] rpcauth_wrap_req+0x84/0xb0
> > >  [<ffffffff818fa637>] call_transmit+0x1a7/0x2c0
> > >  [<ffffffff81903112>] __rpc_execute+0xa2/0x230
> > >  [<ffffffff81903305>] rpc_async_schedule+0x15/0x20
> > >  [<ffffffff81054bf2>] process_one_work+0x192/0x520
> > >  [<ffffffff81054b85>] ? process_one_work+0x125/0x520
> > >  [<ffffffff819032f0>] ? rpc_async_schedule+0x0/0x20
> > >  [<ffffffff81055270>] worker_thread+0x140/0x390
> > >  [<ffffffff81055130>] ? worker_thread+0x0/0x390
> > >  [<ffffffff81059376>] kthread+0x96/0xa0
> > >  [<ffffffff810030f4>] kernel_thread_helper+0x4/0x10
> > >  [<ffffffff8196d89c>] ? kprobe_flush_task+0xbc/0xe0
> > >  [<ffffffff8196857e>] ? restore_args+0x0/0x30
> > >  [<ffffffff810592e0>] ? kthread+0x0/0xa0
> > >  [<ffffffff810030f0>] ? kernel_thread_helper+0x0/0x10
> > > Code: 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 58 0f 1f 44 00 00 48 8b 4a 28 49 89 d5 31 d2 49 89 fe 48 89 f0 48 85 c9 74 10 <48> 8b 91 30 03 00 00 48 8b 92 20 03 00 00 8b 12 48 8d 5d b0 4c 
> > > RIP  [<ffffffff8124335a>] nfs4_xdr_enc_rename+0x2a/0x150
> > >  RSP <ffff88001d723c50>
> > > 
> > > --b.
> > 
> > I can see one double free in the nfs_async_rename() error case: the code
> > calls nfs_async_rename_release() after it has already been called by
> > rpc_run_task().
> > I don't know if that is sufficient to explain the above trace, though.
> > 
> > Cheers
> >   Trond
> > 
> > --------------------------------------------------------------------------------------------------------
> > NFS: Fix a use-after-free case in nfs_async_rename()
> > 
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > 
> > The call to nfs_async_rename_release() after rpc_run_task() is incorrect.
> > The rpc_run_task() is always guaranteed to call the ->rpc_release() method.
> > 
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > ---
> > 
> >  fs/nfs/unlink.c |    9 ++-------
> >  1 files changed, 2 insertions(+), 7 deletions(-)
> > 
> > 
> > diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
> > index 698b3e6..47530aa 100644
> > --- a/fs/nfs/unlink.c
> > +++ b/fs/nfs/unlink.c
> > @@ -426,7 +426,6 @@ nfs_async_rename(struct inode *old_dir, struct inode *new_dir,
> >  		.rpc_client = NFS_CLIENT(old_dir),
> >  		.flags = RPC_TASK_ASYNC,
> >  	};
> > -	struct rpc_task *task;
> >  
> >  	data = kmalloc(sizeof(*data), GFP_KERNEL);
> >  	if (data == NULL)
> > @@ -435,7 +434,7 @@ nfs_async_rename(struct inode *old_dir, struct inode *new_dir,
> >  
> >  	data->cred = rpc_lookup_cred();
> >  	if (IS_ERR(data->cred)) {
> > -		task = (struct rpc_task *)data->cred;
> > +		struct rpc_task *task = ERR_CAST(data->cred);
> >  		kfree(data);
> >  		return task;
> >  	}
> > @@ -468,11 +467,7 @@ nfs_async_rename(struct inode *old_dir, struct inode *new_dir,
> >  
> >  	NFS_PROTO(data->old_dir)->rename_setup(&msg, old_dir);
> >  
> > -	task = rpc_run_task(&task_setup_data);
> > -	if (IS_ERR(task))
> > -		nfs_async_rename_release(data);
> > -
> > -	return task;
> > +	return rpc_run_task(&task_setup_data);
> >  }
> >  
> >  /**
> > 
> 
> Thanks Trond. Sorry I missed that. Good catch.
> 
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> 
> That said, I agree -- that doesn't seem sufficient to explain the
> problem. Bruce, is this reproducible?

All I need to do is mount nfs4.1 and run cthon -s.  The last output I
see from the test is:

check for proper open/unlink operation
nfsjunk files before unlink:
  

That's on a kernel without Trond's fix above.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-21 19:20             ` [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results] J. Bruce Fields
@ 2010-09-21 19:36               ` Trond Myklebust
  2010-09-21 20:00                 ` Trond Myklebust
  2010-09-21 19:42               ` [bfields-btOuF8jVhiP3NsPE3w5/ayCwEArCW2h5@public.gmane.org: " J. Bruce Fields
  1 sibling, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2010-09-21 19:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
> All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> see from the test is:
> 
> check for proper open/unlink operation
> nfsjunk files before unlink:
>   

Oh... I bet I see what it is.

It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
initialisation junk that's biting us in the arse again...

I'll fix it...

Trond

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields-btOuF8jVhiP3NsPE3w5/ayCwEArCW2h5@public.gmane.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-21 19:20             ` [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results] J. Bruce Fields
  2010-09-21 19:36               ` Trond Myklebust
@ 2010-09-21 19:42               ` J. Bruce Fields
  1 sibling, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-09-21 19:42 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Trond Myklebust, linux-nfs

On Tue, Sep 21, 2010 at 03:20:05PM -0400, J. Bruce Fields wrote:
> All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> see from the test is:
> 
> check for proper open/unlink operation
> nfsjunk files before unlink:
>   
> 
> That's on a kernel without Trond's fix above.

But, as expected, that fix makes no difference.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-21 19:36               ` Trond Myklebust
@ 2010-09-21 20:00                 ` Trond Myklebust
       [not found]                   ` <1285099232.30222.25.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2010-09-21 20:00 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
> On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
> > All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> > see from the test is:
> > 
> > check for proper open/unlink operation
> > nfsjunk files before unlink:
> >   
> 
> Oh... I bet I see what it is.
> 
> It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
> initialisation junk that's biting us in the arse again...
> 
> I'll fix it...

Does this work?

Cheers
  Trond
-------------------------------------------------------------------------------------
NFSv4.1: Fix the slotid initialisation in nfs_async_rename()

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c46e45e..72aa706 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2550,6 +2550,7 @@ static void nfs4_proc_unlink_setup(struct rpc_message *msg, struct inode *dir)
 
 	args->bitmask = server->cache_consistency_bitmask;
 	res->server = server;
+	res->seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_REMOVE];
 }
 
@@ -2575,6 +2576,7 @@ static void nfs4_proc_rename_setup(struct rpc_message *msg, struct inode *dir)
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_RENAME];
 	arg->bitmask = server->attr_bitmask;
 	res->server = server;
+	res->seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 }
 
 static int nfs4_proc_rename_done(struct rpc_task *task, struct inode *old_dir,



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [bfields-btOuF8jVhiP3NsPE3w5/ayCwEArCW2h5@public.gmane.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
       [not found]                   ` <1285099232.30222.25.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2010-09-21 20:24                     ` J. Bruce Fields
  2010-09-21 20:46                       ` [bfields@pop.test.fieldses.org: " Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2010-09-21 20:24 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jeff Layton, linux-nfs

On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
> On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
> > On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
> > > All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> > > see from the test is:
> > > 
> > > check for proper open/unlink operation
> > > nfsjunk files before unlink:
> > >   
> > 
> > Oh... I bet I see what it is.
> > 
> > It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
> > initialisation junk that's biting us in the arse again...
> > 
> > I'll fix it...
> 
> Does this work?

Yup.

--b.

> 
> Cheers
>   Trond
> -------------------------------------------------------------------------------------
> NFSv4.1: Fix the slotid initialisation in nfs_async_rename()
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> 
>  fs/nfs/nfs4proc.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index c46e45e..72aa706 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2550,6 +2550,7 @@ static void nfs4_proc_unlink_setup(struct rpc_message *msg, struct inode *dir)
>  
>  	args->bitmask = server->cache_consistency_bitmask;
>  	res->server = server;
> +	res->seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
>  	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_REMOVE];
>  }
>  
> @@ -2575,6 +2576,7 @@ static void nfs4_proc_rename_setup(struct rpc_message *msg, struct inode *dir)
>  	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_RENAME];
>  	arg->bitmask = server->attr_bitmask;
>  	res->server = server;
> +	res->seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
>  }
>  
>  static int nfs4_proc_rename_done(struct rpc_task *task, struct inode *old_dir,
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-21 20:24                     ` [bfields-btOuF8jVhiP3NsPE3w5/ayCwEArCW2h5@public.gmane.org: " J. Bruce Fields
@ 2010-09-21 20:46                       ` Trond Myklebust
  2010-09-21 20:58                         ` J. Bruce Fields
  2010-09-21 21:58                         ` Jeff Layton
  0 siblings, 2 replies; 13+ messages in thread
From: Trond Myklebust @ 2010-09-21 20:46 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

On Tue, 2010-09-21 at 16:24 -0400, J. Bruce Fields wrote:
> On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
> > On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
> > > On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
> > > > All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> > > > see from the test is:
> > > > 
> > > > check for proper open/unlink operation
> > > > nfsjunk files before unlink:
> > > >   
> > > 
> > > Oh... I bet I see what it is.
> > > 
> > > It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
> > > initialisation junk that's biting us in the arse again...
> > > 
> > > I'll fix it...
> > 
> > Does this work?
> 
> Yup.

It seems to fix things for me too... Once you mentioned NFSv4.1 it was
easy to reproduce the bug.

OK. I'll merge these into nfs-for-2.6.37...

Cheers
  Trond


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-21 20:46                       ` [bfields@pop.test.fieldses.org: " Trond Myklebust
@ 2010-09-21 20:58                         ` J. Bruce Fields
  2010-09-21 21:58                         ` Jeff Layton
  1 sibling, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-09-21 20:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jeff Layton, linux-nfs

On Tue, Sep 21, 2010 at 04:46:21PM -0400, Trond Myklebust wrote:
> On Tue, 2010-09-21 at 16:24 -0400, J. Bruce Fields wrote:
> > On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
> > > On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
> > > > On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
> > > > > All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> > > > > see from the test is:
> > > > > 
> > > > > check for proper open/unlink operation
> > > > > nfsjunk files before unlink:
> > > > >   
> > > > 
> > > > Oh... I bet I see what it is.
> > > > 
> > > > It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
> > > > initialisation junk that's biting us in the arse again...
> > > > 
> > > > I'll fix it...
> > > 
> > > Does this work?
> > 
> > Yup.
> 
> It seems to fix things for me too... Once you mentioned NFSv4.1 it was
> easy to reproduce the bug.

OK, good.  I've a slight fear I may have volunteered as Mr. Upstream NFS
QA, which I didn't really intend, though of course I want to see stuff
caught sooner rather than later....

Anyway, I should have noticed which test it failed on sooner.

--b.

> 
> OK. I'll merge these into nfs-for-2.6.37...
> 
> Cheers
>   Trond
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-21 20:46                       ` [bfields@pop.test.fieldses.org: " Trond Myklebust
  2010-09-21 20:58                         ` J. Bruce Fields
@ 2010-09-21 21:58                         ` Jeff Layton
  2010-09-23 17:20                           ` Benny Halevy
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2010-09-21 21:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: J. Bruce Fields, linux-nfs

On Tue, 21 Sep 2010 16:46:21 -0400
Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

> On Tue, 2010-09-21 at 16:24 -0400, J. Bruce Fields wrote:
> > On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
> > > On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
> > > > On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
> > > > > All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> > > > > see from the test is:
> > > > > 
> > > > > check for proper open/unlink operation
> > > > > nfsjunk files before unlink:
> > > > >   
> > > > 
> > > > Oh... I bet I see what it is.
> > > > 
> > > > It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
> > > > initialisation junk that's biting us in the arse again...
> > > > 
> > > > I'll fix it...
> > > 
> > > Does this work?
> > 
> > Yup.
> 
> It seems to fix things for me too... Once you mentioned NFSv4.1 it was
> easy to reproduce the bug.
> 
> OK. I'll merge these into nfs-for-2.6.37...

Thanks for fixing it. I should have mentioned that the 4.1 parts were
written using the "cargo-cult" programming method of copying what
unlink does, and that they needed careful review. Mea culpa!

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-21 21:58                         ` Jeff Layton
@ 2010-09-23 17:20                           ` Benny Halevy
  2010-09-23 17:29                             ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Benny Halevy @ 2010-09-23 17:20 UTC (permalink / raw)
  To: Jeff Layton, Trond Myklebust; +Cc: J. Bruce Fields, linux-nfs

On 2010-09-21 23:58, Jeff Layton wrote:
> On Tue, 21 Sep 2010 16:46:21 -0400
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> 
>> On Tue, 2010-09-21 at 16:24 -0400, J. Bruce Fields wrote:
>>> On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
>>>> On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
>>>>> On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
>>>>>> All I need to do is mount nfs4.1 and run cthon -s.  The last output I
>>>>>> see from the test is:
>>>>>>
>>>>>> check for proper open/unlink operation
>>>>>> nfsjunk files before unlink:
>>>>>>   
>>>>>
>>>>> Oh... I bet I see what it is.
>>>>>
>>>>> It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
>>>>> initialisation junk that's biting us in the arse again...
>>>>>
>>>>> I'll fix it...
>>>>
>>>> Does this work?
>>>
>>> Yup.
>>
>> It seems to fix things for me too... Once you mentioned NFSv4.1 it was
>> easy to reproduce the bug.
>>
>> OK. I'll merge these into nfs-for-2.6.37...
> 
> Thanks for fixing it. I should have mentioned that the 4.1 parts were
> written using the "cargo-cult" programming method of copying what
> unlink does, and that they needed careful review. Mea culpa!
> 

I still hate it that the sr_slotid requires explicit, non-trivial initialization.
Once upon a time, the equivalent of sr_slotid used to be a pointer to
struct slot.  Using a pointer, implicitly initialized to NULL is significantly
more straight forward as in the patch below.
(passes cthon tests over your nfs-for-2.6.37 branch + the patch I
sent earlier that fixes __put_nfs_open_context. w/o it I readily reproduce the
crash reported on this thread)

>From 50221fb5bd8fbd46f6cabe1fc795db7efc71bff7 Mon Sep 17 00:00:00 2001
From: Benny Halevy <bhalevy@panasas.com>
Date: Thu, 23 Sep 2010 19:08:21 +0200
Subject: [PATCH] NFSv4.1: keep seq_res.sr_slot as pointer rather than an index

Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
resulted in numerous bugs.  Keeping the current slot as a pointer
to the slot table is more straight forward and robust as it's
implicitly set up to NULL wherever the seq_res member is initialized
to zeroes.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4proc.c       |   52 +++++++++++++++++++----------------------------
 fs/nfs/nfs4xdr.c        |    6 +---
 fs/nfs/read.c           |    1 -
 fs/nfs/unlink.c         |    3 +-
 fs/nfs/write.c          |    2 -
 include/linux/nfs_xdr.h |    2 +-
 6 files changed, 25 insertions(+), 41 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 72aa706..7f8cc33 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -334,10 +334,12 @@ static void renew_lease(const struct nfs_server *server, unsigned long timestamp
  * Must be called while holding tbl->slot_tbl_lock
  */
 static void
-nfs4_free_slot(struct nfs4_slot_table *tbl, u8 free_slotid)
+nfs4_free_slot(struct nfs4_slot_table *tbl, struct nfs4_slot *free_slot)
 {
+	int free_slotid = free_slot - tbl->slots;
 	int slotid = free_slotid;
 
+	BUG_ON(slotid < 0 || slotid >= NFS4_MAX_SLOT_TABLE); 
 	/* clear used bit in bitmap */
 	__clear_bit(slotid, tbl->used_slots);
 
@@ -379,7 +381,7 @@ static void nfs41_sequence_free_slot(struct nfs4_sequence_res *res)
 	struct nfs4_slot_table *tbl;
 
 	tbl = &res->sr_session->fc_slot_table;
-	if (res->sr_slotid == NFS4_MAX_SLOT_TABLE) {
+	if (!res->sr_slot) {
 		/* just wake up the next guy waiting since
 		 * we may have not consumed a slot after all */
 		dprintk("%s: No slot\n", __func__);
@@ -387,17 +389,15 @@ static void nfs41_sequence_free_slot(struct nfs4_sequence_res *res)
 	}
 
 	spin_lock(&tbl->slot_tbl_lock);
-	nfs4_free_slot(tbl, res->sr_slotid);
+	nfs4_free_slot(tbl, res->sr_slot);
 	nfs41_check_drain_session_complete(res->sr_session);
 	spin_unlock(&tbl->slot_tbl_lock);
-	res->sr_slotid = NFS4_MAX_SLOT_TABLE;
+	res->sr_slot = NULL;
 }
 
 static int nfs41_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res)
 {
 	unsigned long timestamp;
-	struct nfs4_slot_table *tbl;
-	struct nfs4_slot *slot;
 	struct nfs_client *clp;
 
 	/*
@@ -410,17 +410,14 @@ static int nfs41_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *
 		res->sr_status = NFS_OK;
 
 	/* -ERESTARTSYS can result in skipping nfs41_sequence_setup */
-	if (res->sr_slotid == NFS4_MAX_SLOT_TABLE)
+	if (!res->sr_slot)
 		goto out;
 
-	tbl = &res->sr_session->fc_slot_table;
-	slot = tbl->slots + res->sr_slotid;
-
 	/* Check the SEQUENCE operation status */
 	switch (res->sr_status) {
 	case 0:
 		/* Update the slot's sequence and clientid lease timer */
-		++slot->seq_nr;
+		++res->sr_slot->seq_nr;
 		timestamp = res->sr_renewal_time;
 		clp = res->sr_session->clp;
 		do_renew_lease(clp, timestamp);
@@ -433,12 +430,14 @@ static int nfs41_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *
 		 * returned NFS4ERR_DELAY as per Section 2.10.6.2
 		 * of RFC5661.
 		 */
-		dprintk("%s: slot=%d seq=%d: Operation in progress\n",
-				__func__, res->sr_slotid, slot->seq_nr);
+		dprintk("%s: slot=%ld seq=%d: Operation in progress\n",
+			__func__,
+			res->sr_slot - res->sr_session->fc_slot_table.slots,
+			res->sr_slot->seq_nr);
 		goto out_retry;
 	default:
 		/* Just update the slot sequence no. */
-		++slot->seq_nr;
+		++res->sr_slot->seq_nr;
 	}
 out:
 	/* The session may be reset by one of the error handlers. */
@@ -505,10 +504,9 @@ static int nfs41_setup_sequence(struct nfs4_session *session,
 
 	dprintk("--> %s\n", __func__);
 	/* slot already allocated? */
-	if (res->sr_slotid != NFS4_MAX_SLOT_TABLE)
+	if (res->sr_slot != NULL)
 		return 0;
 
-	res->sr_slotid = NFS4_MAX_SLOT_TABLE;
 	tbl = &session->fc_slot_table;
 
 	spin_lock(&tbl->slot_tbl_lock);
@@ -550,7 +548,7 @@ static int nfs41_setup_sequence(struct nfs4_session *session,
 	dprintk("<-- %s slotid=%d seqid=%d\n", __func__, slotid, slot->seq_nr);
 
 	res->sr_session = session;
-	res->sr_slotid = slotid;
+	res->sr_slot = slot;
 	res->sr_renewal_time = jiffies;
 	res->sr_status_flags = 0;
 	/*
@@ -576,8 +574,9 @@ int nfs4_setup_sequence(const struct nfs_server *server,
 		goto out;
 	}
 
-	dprintk("--> %s clp %p session %p sr_slotid %d\n",
-		__func__, session->clp, session, res->sr_slotid);
+	dprintk("--> %s clp %p session %p sr_slot %ld\n",
+		__func__, session->clp, session, res->sr_slot ?
+			res->sr_slot - session->fc_slot_table.slots : -1);
 
 	ret = nfs41_setup_sequence(session, args, res, cache_reply,
 				   task);
@@ -650,7 +649,7 @@ static int nfs4_call_sync_sequence(struct nfs_server *server,
 		.callback_data = &data
 	};
 
-	res->sr_slotid = NFS4_MAX_SLOT_TABLE;
+	res->sr_slot = NULL;
 	if (privileged)
 		task_setup.callback_ops = &nfs41_call_priv_sync_ops;
 	task = rpc_run_task(&task_setup);
@@ -735,7 +734,6 @@ static void nfs4_init_opendata_res(struct nfs4_opendata *p)
 	p->o_res.server = p->o_arg.server;
 	nfs_fattr_init(&p->f_attr);
 	nfs_fattr_init(&p->dir_attr);
-	p->o_res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 }
 
 static struct nfs4_opendata *nfs4_opendata_alloc(struct path *path,
@@ -1975,7 +1973,6 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, gfp_t gfp_mask, i
 	calldata->res.fattr = &calldata->fattr;
 	calldata->res.seqid = calldata->arg.seqid;
 	calldata->res.server = server;
-	calldata->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	path_get(path);
 	calldata->path = *path;
 
@@ -2550,7 +2547,7 @@ static void nfs4_proc_unlink_setup(struct rpc_message *msg, struct inode *dir)
 
 	args->bitmask = server->cache_consistency_bitmask;
 	res->server = server;
-	res->seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
+	res->seq_res.sr_slot = NULL;
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_REMOVE];
 }
 
@@ -2576,7 +2573,6 @@ static void nfs4_proc_rename_setup(struct rpc_message *msg, struct inode *dir)
 	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_RENAME];
 	arg->bitmask = server->attr_bitmask;
 	res->server = server;
-	res->seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 }
 
 static int nfs4_proc_rename_done(struct rpc_task *task, struct inode *old_dir,
@@ -3646,7 +3642,6 @@ static int _nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, co
 	memcpy(&data->stateid, stateid, sizeof(data->stateid));
 	data->res.fattr = &data->fattr;
 	data->res.server = server;
-	data->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	nfs_fattr_init(data->res.fattr);
 	data->timestamp = jiffies;
 	data->rpc_status = 0;
@@ -3799,7 +3794,6 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
 	p->arg.fl = &p->fl;
 	p->arg.seqid = seqid;
 	p->res.seqid = seqid;
-	p->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	p->arg.stateid = &lsp->ls_stateid;
 	p->lsp = lsp;
 	atomic_inc(&lsp->ls_count);
@@ -3979,7 +3973,6 @@ static struct nfs4_lockdata *nfs4_alloc_lockdata(struct file_lock *fl,
 	p->arg.lock_owner.clientid = server->nfs_client->cl_clientid;
 	p->arg.lock_owner.id = lsp->ls_id.id;
 	p->res.lock_seqid = p->arg.lock_seqid;
-	p->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	p->lsp = lsp;
 	p->server = server;
 	atomic_inc(&lsp->ls_count);
@@ -4612,7 +4605,6 @@ int nfs4_proc_get_lease_time(struct nfs_client *clp, struct nfs_fsinfo *fsinfo)
 	};
 	int status;
 
-	res.lr_seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	dprintk("--> %s\n", __func__);
 	task = rpc_run_task(&task_setup);
 
@@ -5105,12 +5097,11 @@ static struct rpc_task *_nfs41_proc_sequence(struct nfs_client *clp, struct rpc_
 
 	if (!atomic_inc_not_zero(&clp->cl_count))
 		return ERR_PTR(-EIO);
-	calldata = kmalloc(sizeof(*calldata), GFP_NOFS);
+	calldata = kzalloc(sizeof(*calldata), GFP_NOFS);
 	if (calldata == NULL) {
 		nfs_put_client(clp);
 		return ERR_PTR(-ENOMEM);
 	}
-	calldata->res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	msg.rpc_argp = &calldata->args;
 	msg.rpc_resp = &calldata->res;
 	calldata->clp = clp;
@@ -5242,7 +5233,6 @@ static int nfs41_proc_reclaim_complete(struct nfs_client *clp)
 		goto out;
 	calldata->clp = clp;
 	calldata->arg.one_fs = 0;
-	calldata->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 
 	msg.rpc_argp = &calldata->arg;
 	msg.rpc_resp = &calldata->res;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index b0bd7ef..ccf09c4 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -4668,7 +4668,6 @@ static int decode_sequence(struct xdr_stream *xdr,
 			   struct rpc_rqst *rqstp)
 {
 #if defined(CONFIG_NFS_V4_1)
-	struct nfs4_slot *slot;
 	struct nfs4_sessionid id;
 	u32 dummy;
 	int status;
@@ -4700,15 +4699,14 @@ static int decode_sequence(struct xdr_stream *xdr,
 		goto out_overflow;
 
 	/* seqid */
-	slot = &res->sr_session->fc_slot_table.slots[res->sr_slotid];
 	dummy = be32_to_cpup(p++);
-	if (dummy != slot->seq_nr) {
+	if (dummy != res->sr_slot->seq_nr) {
 		dprintk("%s Invalid sequence number\n", __func__);
 		goto out_err;
 	}
 	/* slot id */
 	dummy = be32_to_cpup(p++);
-	if (dummy != res->sr_slotid) {
+	if (dummy != res->sr_slot - res->sr_session->fc_slot_table.slots) {
 		dprintk("%s Invalid slot id\n", __func__);
 		goto out_err;
 	}
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 87adc27..79859c8 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -46,7 +46,6 @@ struct nfs_read_data *nfs_readdata_alloc(unsigned int pagecount)
 		memset(p, 0, sizeof(*p));
 		INIT_LIST_HEAD(&p->pages);
 		p->npages = pagecount;
-		p->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 		if (pagecount <= ARRAY_SIZE(p->page_array))
 			p->pagevec = p->page_array;
 		else {
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 47530aa..9a16bad 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -262,7 +262,6 @@ nfs_async_unlink(struct inode *dir, struct dentry *dentry)
 		status = PTR_ERR(data->cred);
 		goto out_free;
 	}
-	data->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	data->res.dir_attr = &data->dir_attr;
 
 	status = -EBUSY;
@@ -427,7 +426,7 @@ nfs_async_rename(struct inode *old_dir, struct inode *new_dir,
 		.flags = RPC_TASK_ASYNC,
 	};
 
-	data = kmalloc(sizeof(*data), GFP_KERNEL);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (data == NULL)
 		return ERR_PTR(-ENOMEM);
 	task_setup_data.callback_data = data,
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 874972d..4d6d35d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -55,7 +55,6 @@ struct nfs_write_data *nfs_commitdata_alloc(void)
 	if (p) {
 		memset(p, 0, sizeof(*p));
 		INIT_LIST_HEAD(&p->pages);
-		p->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 	}
 	return p;
 }
@@ -75,7 +74,6 @@ struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 		memset(p, 0, sizeof(*p));
 		INIT_LIST_HEAD(&p->pages);
 		p->npages = pagecount;
-		p->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE;
 		if (pagecount <= ARRAY_SIZE(p->page_array))
 			p->pagevec = p->page_array;
 		else {
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 172df83..5772b2c 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -170,7 +170,7 @@ struct nfs4_sequence_args {
 
 struct nfs4_sequence_res {
 	struct nfs4_session	*sr_session;
-	u8			sr_slotid;	/* slot used to send request */
+	struct nfs4_slot	*sr_slot;	/* slot used to send request */
 	int			sr_status;	/* sequence operation status */
 	unsigned long		sr_renewal_time;
 	u32			sr_status_flags;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-23 17:20                           ` Benny Halevy
@ 2010-09-23 17:29                             ` Trond Myklebust
  2010-09-23 18:52                               ` Chuck Lever
  2010-09-24 13:49                               ` Andy Adamson
  0 siblings, 2 replies; 13+ messages in thread
From: Trond Myklebust @ 2010-09-23 17:29 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Jeff Layton, J. Bruce Fields, linux-nfs

On Thu, 2010-09-23 at 19:20 +0200, Benny Halevy wrote:
> On 2010-09-21 23:58, Jeff Layton wrote:
> > On Tue, 21 Sep 2010 16:46:21 -0400
> > Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > 
> >> On Tue, 2010-09-21 at 16:24 -0400, J. Bruce Fields wrote:
> >>> On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
> >>>> On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
> >>>>> On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
> >>>>>> All I need to do is mount nfs4.1 and run cthon -s.  The last output I
> >>>>>> see from the test is:
> >>>>>>
> >>>>>> check for proper open/unlink operation
> >>>>>> nfsjunk files before unlink:
> >>>>>>   
> >>>>>
> >>>>> Oh... I bet I see what it is.
> >>>>>
> >>>>> It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
> >>>>> initialisation junk that's biting us in the arse again...
> >>>>>
> >>>>> I'll fix it...
> >>>>
> >>>> Does this work?
> >>>
> >>> Yup.
> >>
> >> It seems to fix things for me too... Once you mentioned NFSv4.1 it was
> >> easy to reproduce the bug.
> >>
> >> OK. I'll merge these into nfs-for-2.6.37...
> > 
> > Thanks for fixing it. I should have mentioned that the 4.1 parts were
> > written using the "cargo-cult" programming method of copying what
> > unlink does, and that they needed careful review. Mea culpa!
> > 
> 
> I still hate it that the sr_slotid requires explicit, non-trivial initialization.
> Once upon a time, the equivalent of sr_slotid used to be a pointer to
> struct slot.  Using a pointer, implicitly initialized to NULL is significantly
> more straight forward as in the patch below.
> (passes cthon tests over your nfs-for-2.6.37 branch + the patch I
> sent earlier that fixes __put_nfs_open_context. w/o it I readily reproduce the
> crash reported on this thread)
> 
> From 50221fb5bd8fbd46f6cabe1fc795db7efc71bff7 Mon Sep 17 00:00:00 2001
> From: Benny Halevy <bhalevy@panasas.com>
> Date: Thu, 23 Sep 2010 19:08:21 +0200
> Subject: [PATCH] NFSv4.1: keep seq_res.sr_slot as pointer rather than an index
> 
> Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
> resulted in numerous bugs.  Keeping the current slot as a pointer
> to the slot table is more straight forward and robust as it's
> implicitly set up to NULL wherever the seq_res member is initialized
> to zeroes.
> 
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>

Looks fine to me, and I prefer this approach to the one we have today. 

Note, however, that I've already applied commit
d688e11007419fd060ae74d8d952a5c4ece735aa (NFSv4.1: Fix the slotid
initialisation in nfs_async_rename()) to the nfs-for-2.6.37 branch.

Are people happy with me rebasing nfs-for-2.6.37 to remove the above
commit, and apply this one instead?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-23 17:29                             ` Trond Myklebust
@ 2010-09-23 18:52                               ` Chuck Lever
  2010-09-23 21:06                                 ` J. Bruce Fields
  2010-09-24 13:49                               ` Andy Adamson
  1 sibling, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2010-09-23 18:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Benny Halevy, Jeff Layton, J. Bruce Fields, linux-nfs


On Sep 23, 2010, at 1:29 PM, Trond Myklebust wrote:

> On Thu, 2010-09-23 at 19:20 +0200, Benny Halevy wrote:
>> On 2010-09-21 23:58, Jeff Layton wrote:
>>> On Tue, 21 Sep 2010 16:46:21 -0400
>>> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>>> 
>>>> On Tue, 2010-09-21 at 16:24 -0400, J. Bruce Fields wrote:
>>>>> On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
>>>>>> On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
>>>>>>> On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
>>>>>>>> All I need to do is mount nfs4.1 and run cthon -s.  The last output I
>>>>>>>> see from the test is:
>>>>>>>> 
>>>>>>>> check for proper open/unlink operation
>>>>>>>> nfsjunk files before unlink:
>>>>>>>> 
>>>>>>> 
>>>>>>> Oh... I bet I see what it is.
>>>>>>> 
>>>>>>> It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
>>>>>>> initialisation junk that's biting us in the arse again...
>>>>>>> 
>>>>>>> I'll fix it...
>>>>>> 
>>>>>> Does this work?
>>>>> 
>>>>> Yup.
>>>> 
>>>> It seems to fix things for me too... Once you mentioned NFSv4.1 it was
>>>> easy to reproduce the bug.
>>>> 
>>>> OK. I'll merge these into nfs-for-2.6.37...
>>> 
>>> Thanks for fixing it. I should have mentioned that the 4.1 parts were
>>> written using the "cargo-cult" programming method of copying what
>>> unlink does, and that they needed careful review. Mea culpa!
>>> 
>> 
>> I still hate it that the sr_slotid requires explicit, non-trivial initialization.
>> Once upon a time, the equivalent of sr_slotid used to be a pointer to
>> struct slot.  Using a pointer, implicitly initialized to NULL is significantly
>> more straight forward as in the patch below.
>> (passes cthon tests over your nfs-for-2.6.37 branch + the patch I
>> sent earlier that fixes __put_nfs_open_context. w/o it I readily reproduce the
>> crash reported on this thread)
>> 
>> From 50221fb5bd8fbd46f6cabe1fc795db7efc71bff7 Mon Sep 17 00:00:00 2001
>> From: Benny Halevy <bhalevy@panasas.com>
>> Date: Thu, 23 Sep 2010 19:08:21 +0200
>> Subject: [PATCH] NFSv4.1: keep seq_res.sr_slot as pointer rather than an index
>> 
>> Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
>> resulted in numerous bugs.  Keeping the current slot as a pointer
>> to the slot table is more straight forward and robust as it's
>> implicitly set up to NULL wherever the seq_res member is initialized
>> to zeroes.
>> 
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> 
> Looks fine to me, and I prefer this approach to the one we have today. 
> 
> Note, however, that I've already applied commit
> d688e11007419fd060ae74d8d952a5c4ece735aa (NFSv4.1: Fix the slotid
> initialisation in nfs_async_rename()) to the nfs-for-2.6.37 branch.
> 
> Are people happy with me rebasing nfs-for-2.6.37 to remove the above
> commit, and apply this one instead?

I don't have a problem with rebasing what is still ostensibly a "development and testing" branch.  And warning us first is nice too, thanks! :-)

I haven't started looking at nfs-for-2.6.37 yet, as I've been waiting for the dust to settle a bit.  I know the merge window approacheth.

-- 
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-23 18:52                               ` Chuck Lever
@ 2010-09-23 21:06                                 ` J. Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-09-23 21:06 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Benny Halevy, Jeff Layton, linux-nfs

> On Sep 23, 2010, at 1:29 PM, Trond Myklebust wrote:
> > Looks fine to me, and I prefer this approach to the one we have today. 
> > 
> > Note, however, that I've already applied commit
> > d688e11007419fd060ae74d8d952a5c4ece735aa (NFSv4.1: Fix the slotid
> > initialisation in nfs_async_rename()) to the nfs-for-2.6.37 branch.
> > 
> > Are people happy with me rebasing nfs-for-2.6.37 to remove the above
> > commit, and apply this one instead?

I'm happier with incremental patches and no rebasing--I like having the
history around--but the rebase won't cause me serious practical
problems.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results]
  2010-09-23 17:29                             ` Trond Myklebust
  2010-09-23 18:52                               ` Chuck Lever
@ 2010-09-24 13:49                               ` Andy Adamson
  1 sibling, 0 replies; 13+ messages in thread
From: Andy Adamson @ 2010-09-24 13:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Benny Halevy, Jeff Layton, J. Bruce Fields, linux-nfs


On Sep 23, 2010, at 1:29 PM, Trond Myklebust wrote:

> On Thu, 2010-09-23 at 19:20 +0200, Benny Halevy wrote:
>> On 2010-09-21 23:58, Jeff Layton wrote:
>>> On Tue, 21 Sep 2010 16:46:21 -0400
>>> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>>> 
>>>> On Tue, 2010-09-21 at 16:24 -0400, J. Bruce Fields wrote:
>>>>> On Tue, Sep 21, 2010 at 04:00:32PM -0400, Trond Myklebust wrote:
>>>>>> On Tue, 2010-09-21 at 15:36 -0400, Trond Myklebust wrote:
>>>>>>> On Tue, 2010-09-21 at 15:20 -0400, J. Bruce Fields wrote:
>>>>>>>> All I need to do is mount nfs4.1 and run cthon -s.  The last output I
>>>>>>>> see from the test is:
>>>>>>>> 
>>>>>>>> check for proper open/unlink operation
>>>>>>>> nfsjunk files before unlink:
>>>>>>>> 
>>>>>>> 
>>>>>>> Oh... I bet I see what it is.
>>>>>>> 
>>>>>>> It's that damned res->sr_slotid = NFS4_MAX_SLOT_TABLE and related
>>>>>>> initialisation junk that's biting us in the arse again...
>>>>>>> 
>>>>>>> I'll fix it...
>>>>>> 
>>>>>> Does this work?
>>>>> 
>>>>> Yup.
>>>> 
>>>> It seems to fix things for me too... Once you mentioned NFSv4.1 it was
>>>> easy to reproduce the bug.
>>>> 
>>>> OK. I'll merge these into nfs-for-2.6.37...
>>> 
>>> Thanks for fixing it. I should have mentioned that the 4.1 parts were
>>> written using the "cargo-cult" programming method of copying what
>>> unlink does, and that they needed careful review. Mea culpa!
>>> 
>> 
>> I still hate it that the sr_slotid requires explicit, non-trivial initialization.
>> Once upon a time, the equivalent of sr_slotid used to be a pointer to
>> struct slot.  Using a pointer, implicitly initialized to NULL is significantly
>> more straight forward as in the patch below.
>> (passes cthon tests over your nfs-for-2.6.37 branch + the patch I
>> sent earlier that fixes __put_nfs_open_context. w/o it I readily reproduce the
>> crash reported on this thread)
>> 
>> From 50221fb5bd8fbd46f6cabe1fc795db7efc71bff7 Mon Sep 17 00:00:00 2001
>> From: Benny Halevy <bhalevy@panasas.com>
>> Date: Thu, 23 Sep 2010 19:08:21 +0200
>> Subject: [PATCH] NFSv4.1: keep seq_res.sr_slot as pointer rather than an index
>> 
>> Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
>> resulted in numerous bugs.  Keeping the current slot as a pointer
>> to the slot table is more straight forward and robust as it's
>> implicitly set up to NULL wherever the seq_res member is initialized
>> to zeroes.
>> 
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> 
> Looks fine to me, and I prefer this approach to the one we have today. 
> 
> Note, however, that I've already applied commit
> d688e11007419fd060ae74d8d952a5c4ece735aa (NFSv4.1: Fix the slotid
> initialisation in nfs_async_rename()) to the nfs-for-2.6.37 branch.
> 
> Are people happy with me rebasing nfs-for-2.6.37 to remove the above
> commit, and apply this one instead?

Yes, I like this approach as well and think it should be applied.

-->Andy

> 
> Cheers
>  Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-09-24 13:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20100918171546.GA9859@fieldses.org>
     [not found] ` <1284831680.2787.1.camel@heimdal.trondhjem.org>
     [not found]   ` <20100919184549.GD32071@fieldses.org>
     [not found]     ` <20100920130106.GB4580@fieldses.org>
     [not found]       ` <20100920131025.GC4580@fieldses.org>
     [not found]         ` <1285091815.30222.19.camel@heimdal.trondhjem.org>
     [not found]           ` <20100921111657.330e7838@corrin.poochiereds.net>
2010-09-21 19:20             ` [bfields@pop.test.fieldses.org: all f08d6374 Merge branch 'nfs-for-next' into linux-next results] J. Bruce Fields
2010-09-21 19:36               ` Trond Myklebust
2010-09-21 20:00                 ` Trond Myklebust
     [not found]                   ` <1285099232.30222.25.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-09-21 20:24                     ` [bfields-btOuF8jVhiP3NsPE3w5/ayCwEArCW2h5@public.gmane.org: " J. Bruce Fields
2010-09-21 20:46                       ` [bfields@pop.test.fieldses.org: " Trond Myklebust
2010-09-21 20:58                         ` J. Bruce Fields
2010-09-21 21:58                         ` Jeff Layton
2010-09-23 17:20                           ` Benny Halevy
2010-09-23 17:29                             ` Trond Myklebust
2010-09-23 18:52                               ` Chuck Lever
2010-09-23 21:06                                 ` J. Bruce Fields
2010-09-24 13:49                               ` Andy Adamson
2010-09-21 19:42               ` [bfields-btOuF8jVhiP3NsPE3w5/ayCwEArCW2h5@public.gmane.org: " J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.