* [PATCH] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on destroy_layout
@ 2013-03-18 14:38 Benny Halevy
2013-03-18 14:45 ` Benny Halevy
2013-03-18 15:48 ` Myklebust, Trond
0 siblings, 2 replies; 6+ messages in thread
From: Benny Halevy @ 2013-03-18 14:38 UTC (permalink / raw)
To: Trond.Myklebust; +Cc: linux-nfs, Benny Halevy
When writing a large file over pNFS, the MDS rebooted and then the client crashed with the following stack trace over RHEL 2.6.32-358.el6.x86_64
<4>RIP: 0010:[<ffffffffa0474f35>] [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
<4>RSP: 0018:ffff8801bab4db60 EFLAGS: 00010286
<4>RAX: ffff8802290755a8 RBX: ffff8802297756a0 RCX: 0000000000000000
<4>RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880229775750
<4>RBP: ffff8801bab4dba0 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000003 R11: 0000000000000000 R12: ffff8802297754d8
<4>R13: ffff880229075400 R14: 0000000000000000 R15: ffff880229775568
<4>FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000000028 CR3: 000000020158a000 CR4: 00000000000027e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process flush-0:30 (pid: 3806, threadinfo ffff8801bab4c000, task ffff880228c76ae0)
<4>Stack:
<4> ffff8802290755a8 ffff880229775750 ffffffffa047ae20 ffff8802297756a0
<4><d> ffff8801bab4dd20 0000000000000000 ffff8801bab4dd20 ffff8802297757c0
<4><d> ffff8801bab4dbd0 ffffffffa0451217 ffff8801bab4dd20 ffff8802297757c0
<4>Call Trace:
<4> [<ffffffffa0451217>] nfs_write_inode+0x87/0x100 [nfs]
<4> [<ffffffff811ac81c>] writeback_single_inode+0x20c/0x290
<4> [<ffffffff811acafe>] writeback_sb_inodes+0xce/0x180
<4> [<ffffffff811acc5b>] writeback_inodes_wb+0xab/0x1b0
<4> [<ffffffff811acffb>] wb_writeback+0x29b/0x3f0
<4> [<ffffffff8150d6e0>] ? thread_return+0x4e/0x76e
<4> [<ffffffff81081ac2>] ? del_timer_sync+0x22/0x30
<4> [<ffffffff811ad2e9>] wb_do_writeback+0x199/0x240
<4> [<ffffffff811ad3f3>] bdi_writeback_task+0x63/0x1b0
<4> [<ffffffff81096b47>] ? bit_waitqueue+0x17/0xd0
<4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
<4> [<ffffffff8113cac6>] bdi_start_fn+0x86/0x100
<4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
<4> [<ffffffff81096916>] kthread+0x96/0xa0
<4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
<4> [<ffffffff81096880>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
<4>Code: 01 00 00 48 89 d7 48 89 55 c8 48 89 45 c0 e8 93 b0 09 e1 f0 41 0f ba 37 09 19 d2 85 d2 48 8b 45 c0 0f 84 af 01 00 00 48 8b 53 f0 <4c> 8b 7a 28 48 8d 4a 28 49 39 cf 75 26 e9 e1 00 00 00 66 0f 1f
<1>RIP [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
<4> RSP <ffff8801bab4db60>
<4>CR2: 0000000000000028
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
fs/nfs/pnfs.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 48ac5aa..483bd94 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -497,6 +497,8 @@ static int mark_lseg_invalid(struct pnfs_layout_segment *lseg,
pnfs_get_layout_hdr(lo);
pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RO_FAILED);
pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RW_FAILED);
+ clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags);
+ clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags);
spin_unlock(&nfsi->vfs_inode.i_lock);
pnfs_free_lseg_list(&tmp_list);
pnfs_put_layout_hdr(lo);
@@ -1813,11 +1815,16 @@ void pnfs_cleanup_layoutcommit(struct nfs4_layoutcommit_data *data)
loff_t end_pos;
int status = 0;
- dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
+ if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
+ dprintk("%s: inode %lu: not required\n", __func__, inode->i_ino);
+ return 0;
+ }
- if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags))
+ if (WARN_ON(!nfsi->layout))
return 0;
+ dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
+
/* Note kzalloc ensures data->res.seq_res.sr_slot == NULL */
data = kzalloc(sizeof(*data), GFP_NOFS);
if (!data) {
--
1.7.11.7
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on destroy_layout
2013-03-18 14:38 [PATCH] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on destroy_layout Benny Halevy
@ 2013-03-18 14:45 ` Benny Halevy
2013-03-18 15:48 ` Myklebust, Trond
1 sibling, 0 replies; 6+ messages in thread
From: Benny Halevy @ 2013-03-18 14:45 UTC (permalink / raw)
To: Trond.Myklebust; +Cc: linux-nfs
On 2013-03-18 16:38, Benny Halevy wrote:
> When writing a large file over pNFS, the MDS rebooted and then the client crashed with the following stack trace over RHEL 2.6.32-358.el6.x86_64
>
> <4>RIP: 0010:[<ffffffffa0474f35>] [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> <4>RSP: 0018:ffff8801bab4db60 EFLAGS: 00010286
> <4>RAX: ffff8802290755a8 RBX: ffff8802297756a0 RCX: 0000000000000000
> <4>RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880229775750
> <4>RBP: ffff8801bab4dba0 R08: 0000000000000000 R09: 0000000000000000
> <4>R10: 0000000000000003 R11: 0000000000000000 R12: ffff8802297754d8
> <4>R13: ffff880229075400 R14: 0000000000000000 R15: ffff880229775568
> <4>FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
> <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <4>CR2: 0000000000000028 CR3: 000000020158a000 CR4: 00000000000027e0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process flush-0:30 (pid: 3806, threadinfo ffff8801bab4c000, task ffff880228c76ae0)
> <4>Stack:
> <4> ffff8802290755a8 ffff880229775750 ffffffffa047ae20 ffff8802297756a0
> <4><d> ffff8801bab4dd20 0000000000000000 ffff8801bab4dd20 ffff8802297757c0
> <4><d> ffff8801bab4dbd0 ffffffffa0451217 ffff8801bab4dd20 ffff8802297757c0
> <4>Call Trace:
> <4> [<ffffffffa0451217>] nfs_write_inode+0x87/0x100 [nfs]
> <4> [<ffffffff811ac81c>] writeback_single_inode+0x20c/0x290
> <4> [<ffffffff811acafe>] writeback_sb_inodes+0xce/0x180
> <4> [<ffffffff811acc5b>] writeback_inodes_wb+0xab/0x1b0
> <4> [<ffffffff811acffb>] wb_writeback+0x29b/0x3f0
> <4> [<ffffffff8150d6e0>] ? thread_return+0x4e/0x76e
> <4> [<ffffffff81081ac2>] ? del_timer_sync+0x22/0x30
> <4> [<ffffffff811ad2e9>] wb_do_writeback+0x199/0x240
> <4> [<ffffffff811ad3f3>] bdi_writeback_task+0x63/0x1b0
> <4> [<ffffffff81096b47>] ? bit_waitqueue+0x17/0xd0
> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff8113cac6>] bdi_start_fn+0x86/0x100
> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff81096916>] kthread+0x96/0xa0
> <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
> <4> [<ffffffff81096880>] ? kthread+0x0/0xa0
> <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> <4>Code: 01 00 00 48 89 d7 48 89 55 c8 48 89 45 c0 e8 93 b0 09 e1 f0 41 0f ba 37 09 19 d2 85 d2 48 8b 45 c0 0f 84 af 01 00 00 48 8b 53 f0 <4c> 8b 7a 28 48 8d 4a 28 49 39 cf 75 26 e9 e1 00 00 00 66 0f 1f
> <1>RIP [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> <4> RSP <ffff8801bab4db60>
> <4>CR2: 0000000000000028
>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Sorry, forgot Cc: stable@kernel.org [>= 3.2]
> ---
> fs/nfs/pnfs.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 48ac5aa..483bd94 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -497,6 +497,8 @@ static int mark_lseg_invalid(struct pnfs_layout_segment *lseg,
> pnfs_get_layout_hdr(lo);
> pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RO_FAILED);
> pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RW_FAILED);
> + clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags);
> + clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags);
> spin_unlock(&nfsi->vfs_inode.i_lock);
> pnfs_free_lseg_list(&tmp_list);
> pnfs_put_layout_hdr(lo);
> @@ -1813,11 +1815,16 @@ void pnfs_cleanup_layoutcommit(struct nfs4_layoutcommit_data *data)
> loff_t end_pos;
> int status = 0;
>
> - dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
> + if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
> + dprintk("%s: inode %lu: not required\n", __func__, inode->i_ino);
> + return 0;
> + }
>
> - if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags))
> + if (WARN_ON(!nfsi->layout))
> return 0;
>
> + dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
> +
> /* Note kzalloc ensures data->res.seq_res.sr_slot == NULL */
> data = kzalloc(sizeof(*data), GFP_NOFS);
> if (!data) {
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on destroy_layout
2013-03-18 14:38 [PATCH] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on destroy_layout Benny Halevy
2013-03-18 14:45 ` Benny Halevy
@ 2013-03-18 15:48 ` Myklebust, Trond
2013-03-18 16:40 ` Benny Halevy
[not found] ` <1363786272-17385-1-git-send-email-bhalevy@tonian.com>
1 sibling, 2 replies; 6+ messages in thread
From: Myklebust, Trond @ 2013-03-18 15:48 UTC (permalink / raw)
To: Benny Halevy; +Cc: linux-nfs
Hi Benny,
On Mon, 2013-03-18 at 16:38 +0200, Benny Halevy wrote:
> When writing a large file over pNFS, the MDS rebooted and then the client crashed with the following stack trace over RHEL 2.6.32-358.el6.x86_64
>
You might want to note that this Oops applies to upstream too. :-)
> <4>RIP: 0010:[<ffffffffa0474f35>] [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> <4>RSP: 0018:ffff8801bab4db60 EFLAGS: 00010286
> <4>RAX: ffff8802290755a8 RBX: ffff8802297756a0 RCX: 0000000000000000
> <4>RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880229775750
> <4>RBP: ffff8801bab4dba0 R08: 0000000000000000 R09: 0000000000000000
> <4>R10: 0000000000000003 R11: 0000000000000000 R12: ffff8802297754d8
> <4>R13: ffff880229075400 R14: 0000000000000000 R15: ffff880229775568
> <4>FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
> <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <4>CR2: 0000000000000028 CR3: 000000020158a000 CR4: 00000000000027e0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process flush-0:30 (pid: 3806, threadinfo ffff8801bab4c000, task ffff880228c76ae0)
> <4>Stack:
> <4> ffff8802290755a8 ffff880229775750 ffffffffa047ae20 ffff8802297756a0
> <4><d> ffff8801bab4dd20 0000000000000000 ffff8801bab4dd20 ffff8802297757c0
> <4><d> ffff8801bab4dbd0 ffffffffa0451217 ffff8801bab4dd20 ffff8802297757c0
> <4>Call Trace:
> <4> [<ffffffffa0451217>] nfs_write_inode+0x87/0x100 [nfs]
> <4> [<ffffffff811ac81c>] writeback_single_inode+0x20c/0x290
> <4> [<ffffffff811acafe>] writeback_sb_inodes+0xce/0x180
> <4> [<ffffffff811acc5b>] writeback_inodes_wb+0xab/0x1b0
> <4> [<ffffffff811acffb>] wb_writeback+0x29b/0x3f0
> <4> [<ffffffff8150d6e0>] ? thread_return+0x4e/0x76e
> <4> [<ffffffff81081ac2>] ? del_timer_sync+0x22/0x30
> <4> [<ffffffff811ad2e9>] wb_do_writeback+0x199/0x240
> <4> [<ffffffff811ad3f3>] bdi_writeback_task+0x63/0x1b0
> <4> [<ffffffff81096b47>] ? bit_waitqueue+0x17/0xd0
> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff8113cac6>] bdi_start_fn+0x86/0x100
> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff81096916>] kthread+0x96/0xa0
> <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
> <4> [<ffffffff81096880>] ? kthread+0x0/0xa0
> <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> <4>Code: 01 00 00 48 89 d7 48 89 55 c8 48 89 45 c0 e8 93 b0 09 e1 f0 41 0f ba 37 09 19 d2 85 d2 48 8b 45 c0 0f 84 af 01 00 00 48 8b 53 f0 <4c> 8b 7a 28 48 8d 4a 28 49 39 cf 75 26 e9 e1 00 00 00 66 0f 1f
> <1>RIP [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> <4> RSP <ffff8801bab4db60>
> <4>CR2: 0000000000000028
>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> ---
> fs/nfs/pnfs.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 48ac5aa..483bd94 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -497,6 +497,8 @@ static int mark_lseg_invalid(struct pnfs_layout_segment *lseg,
> pnfs_get_layout_hdr(lo);
> pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RO_FAILED);
> pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RW_FAILED);
> + clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags);
> + clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags);
> spin_unlock(&nfsi->vfs_inode.i_lock);
> pnfs_free_lseg_list(&tmp_list);
> pnfs_put_layout_hdr(lo);
> @@ -1813,11 +1815,16 @@ void pnfs_cleanup_layoutcommit(struct nfs4_layoutcommit_data *data)
> loff_t end_pos;
> int status = 0;
>
> - dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
> + if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
> + dprintk("%s: inode %lu: not required\n", __func__, inode->i_ino);
> + return 0;
> + }
>
> - if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags))
> + if (WARN_ON(!nfsi->layout))
> return 0;
This looks like the wrong thing to do. We know how the system got here:
someone called fsync() or sync(), and NFS_INO_LAYOUTCOMMIT was set.
What we don't know is why the layout was freed before
pnfs_layoutcommit_inode() was called.
I suggest that we rather put a
WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags));
WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags));
inside pnfs_detach_layout_hdr(). That should stand a better chance of
catching the sort of bug you found above.
BTW: shouldn't pnfs_return_layout() always call
pnfs_layoutcommit_inode()?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on destroy_layout
2013-03-18 15:48 ` Myklebust, Trond
@ 2013-03-18 16:40 ` Benny Halevy
[not found] ` <1363786272-17385-1-git-send-email-bhalevy@tonian.com>
1 sibling, 0 replies; 6+ messages in thread
From: Benny Halevy @ 2013-03-18 16:40 UTC (permalink / raw)
To: Myklebust, Trond; +Cc: linux-nfs
On 2013-03-18 17:48, Myklebust, Trond wrote:
> Hi Benny,
>
> On Mon, 2013-03-18 at 16:38 +0200, Benny Halevy wrote:
>> When writing a large file over pNFS, the MDS rebooted and then the client crashed with the following stack trace over RHEL 2.6.32-358.el6.x86_64
>>
>
> You might want to note that this Oops applies to upstream too. :-)
>
True.
>> <4>RIP: 0010:[<ffffffffa0474f35>] [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
>> <4>RSP: 0018:ffff8801bab4db60 EFLAGS: 00010286
>> <4>RAX: ffff8802290755a8 RBX: ffff8802297756a0 RCX: 0000000000000000
>> <4>RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880229775750
>> <4>RBP: ffff8801bab4dba0 R08: 0000000000000000 R09: 0000000000000000
>> <4>R10: 0000000000000003 R11: 0000000000000000 R12: ffff8802297754d8
>> <4>R13: ffff880229075400 R14: 0000000000000000 R15: ffff880229775568
>> <4>FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
>> <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> <4>CR2: 0000000000000028 CR3: 000000020158a000 CR4: 00000000000027e0
>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> <4>Process flush-0:30 (pid: 3806, threadinfo ffff8801bab4c000, task ffff880228c76ae0)
>> <4>Stack:
>> <4> ffff8802290755a8 ffff880229775750 ffffffffa047ae20 ffff8802297756a0
>> <4><d> ffff8801bab4dd20 0000000000000000 ffff8801bab4dd20 ffff8802297757c0
>> <4><d> ffff8801bab4dbd0 ffffffffa0451217 ffff8801bab4dd20 ffff8802297757c0
>> <4>Call Trace:
>> <4> [<ffffffffa0451217>] nfs_write_inode+0x87/0x100 [nfs]
>> <4> [<ffffffff811ac81c>] writeback_single_inode+0x20c/0x290
>> <4> [<ffffffff811acafe>] writeback_sb_inodes+0xce/0x180
>> <4> [<ffffffff811acc5b>] writeback_inodes_wb+0xab/0x1b0
>> <4> [<ffffffff811acffb>] wb_writeback+0x29b/0x3f0
>> <4> [<ffffffff8150d6e0>] ? thread_return+0x4e/0x76e
>> <4> [<ffffffff81081ac2>] ? del_timer_sync+0x22/0x30
>> <4> [<ffffffff811ad2e9>] wb_do_writeback+0x199/0x240
>> <4> [<ffffffff811ad3f3>] bdi_writeback_task+0x63/0x1b0
>> <4> [<ffffffff81096b47>] ? bit_waitqueue+0x17/0xd0
>> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
>> <4> [<ffffffff8113cac6>] bdi_start_fn+0x86/0x100
>> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
>> <4> [<ffffffff81096916>] kthread+0x96/0xa0
>> <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
>> <4> [<ffffffff81096880>] ? kthread+0x0/0xa0
>> <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
>> <4>Code: 01 00 00 48 89 d7 48 89 55 c8 48 89 45 c0 e8 93 b0 09 e1 f0 41 0f ba 37 09 19 d2 85 d2 48 8b 45 c0 0f 84 af 01 00 00 48 8b 53 f0 <4c> 8b 7a 28 48 8d 4a 28 49 39 cf 75 26 e9 e1 00 00 00 66 0f 1f
>> <1>RIP [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
>> <4> RSP <ffff8801bab4db60>
>> <4>CR2: 0000000000000028
>>
>> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
>> ---
>> fs/nfs/pnfs.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 48ac5aa..483bd94 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -497,6 +497,8 @@ static int mark_lseg_invalid(struct pnfs_layout_segment *lseg,
>> pnfs_get_layout_hdr(lo);
>> pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RO_FAILED);
>> pnfs_layout_clear_fail_bit(lo, NFS_LAYOUT_RW_FAILED);
>> + clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags);
>> + clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags);
>> spin_unlock(&nfsi->vfs_inode.i_lock);
>> pnfs_free_lseg_list(&tmp_list);
>> pnfs_put_layout_hdr(lo);
>> @@ -1813,11 +1815,16 @@ void pnfs_cleanup_layoutcommit(struct nfs4_layoutcommit_data *data)
>> loff_t end_pos;
>> int status = 0;
>>
>> - dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
>> + if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
>> + dprintk("%s: inode %lu: not required\n", __func__, inode->i_ino);
>> + return 0;
>> + }
>>
>> - if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags))
>> + if (WARN_ON(!nfsi->layout))
>> return 0;
>
>
> This looks like the wrong thing to do. We know how the system got here:
> someone called fsync() or sync(), and NFS_INO_LAYOUTCOMMIT was set.
> What we don't know is why the layout was freed before
> pnfs_layoutcommit_inode() was called.
>
I believe that this happens by pnfs_destroy_all_layouts called from
nfs4_establish_lease.
> I suggest that we rather put a
>
> WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags));
> WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags));
>
> inside pnfs_detach_layout_hdr(). That should stand a better chance of
> catching the sort of bug you found above.
>
Sounds good.
> BTW: shouldn't pnfs_return_layout() always call
> pnfs_layoutcommit_inode()?
>
Good point.
Benny
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on pnfs_detach_layout_hdr
[not found] ` <1363786272-17385-1-git-send-email-bhalevy@tonian.com>
@ 2013-03-20 15:32 ` Myklebust, Trond
2013-03-20 15:56 ` Myklebust, Trond
0 siblings, 1 reply; 6+ messages in thread
From: Myklebust, Trond @ 2013-03-20 15:32 UTC (permalink / raw)
To: Benny Halevy; +Cc: linux-nfs
On Wed, 2013-03-20 at 15:31 +0200, Benny Halevy wrote:
> When writing a large file over pNFS, the MDS rebooted and then the client crashed
> with the following stack trace over RHEL 2.6.32-358.el6.x86_64.
> This also applies to upstream since v3.2
>
> <4>RIP: 0010:[<ffffffffa0474f35>] [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> <4>RSP: 0018:ffff8801bab4db60 EFLAGS: 00010286
> <4>RAX: ffff8802290755a8 RBX: ffff8802297756a0 RCX: 0000000000000000
> <4>RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880229775750
> <4>RBP: ffff8801bab4dba0 R08: 0000000000000000 R09: 0000000000000000
> <4>R10: 0000000000000003 R11: 0000000000000000 R12: ffff8802297754d8
> <4>R13: ffff880229075400 R14: 0000000000000000 R15: ffff880229775568
> <4>FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
> <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <4>CR2: 0000000000000028 CR3: 000000020158a000 CR4: 00000000000027e0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process flush-0:30 (pid: 3806, threadinfo ffff8801bab4c000, task ffff880228c76ae0)
> <4>Stack:
> <4> ffff8802290755a8 ffff880229775750 ffffffffa047ae20 ffff8802297756a0
> <4><d> ffff8801bab4dd20 0000000000000000 ffff8801bab4dd20 ffff8802297757c0
> <4><d> ffff8801bab4dbd0 ffffffffa0451217 ffff8801bab4dd20 ffff8802297757c0
> <4>Call Trace:
> <4> [<ffffffffa0451217>] nfs_write_inode+0x87/0x100 [nfs]
> <4> [<ffffffff811ac81c>] writeback_single_inode+0x20c/0x290
> <4> [<ffffffff811acafe>] writeback_sb_inodes+0xce/0x180
> <4> [<ffffffff811acc5b>] writeback_inodes_wb+0xab/0x1b0
> <4> [<ffffffff811acffb>] wb_writeback+0x29b/0x3f0
> <4> [<ffffffff8150d6e0>] ? thread_return+0x4e/0x76e
> <4> [<ffffffff81081ac2>] ? del_timer_sync+0x22/0x30
> <4> [<ffffffff811ad2e9>] wb_do_writeback+0x199/0x240
> <4> [<ffffffff811ad3f3>] bdi_writeback_task+0x63/0x1b0
> <4> [<ffffffff81096b47>] ? bit_waitqueue+0x17/0xd0
> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff8113cac6>] bdi_start_fn+0x86/0x100
> <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff81096916>] kthread+0x96/0xa0
> <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
> <4> [<ffffffff81096880>] ? kthread+0x0/0xa0
> <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> <4>Code: 01 00 00 48 89 d7 48 89 55 c8 48 89 45 c0 e8 93 b0 09 e1 f0 41 0f ba 37 09 19 d2 85 d2 48 8b 45 c0 0f 84 af 01 00 00 48 8b 53 f0 <4c> 8b 7a 28 48 8d 4a 28 49 39 cf 75 26 e9 e1 00 00 00 66 0f 1f
> <1>RIP [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> <4> RSP <ffff8801bab4db60>
> <4>CR2: 0000000000000028
>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> Cc: stable@kernel.org [>= 3.2]
> ---
> fs/nfs/pnfs.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 94db433..3fdaae4 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -225,6 +225,8 @@
> {
> struct nfs_inode *nfsi = NFS_I(lo->plh_inode);
> dprintk("%s: freeing layout cache %p\n", __func__, lo);
> + WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags));
> + WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags));
> nfsi->layout = NULL;
> /* Reset MDS Threshold I/O counters */
> nfsi->write_io = 0;
> @@ -1815,11 +1817,15 @@ void pnfs_cleanup_layoutcommit(struct nfs4_layoutcommit_data *data)
> loff_t end_pos;
> int status = 0;
>
> - dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
> + if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
> + dprintk("%s: inode %lu: not required\n", __func__, inode->i_ino);
> + return 0;
> + }
>
> - if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags))
> + if (WARN_ON(!nfsi->layout))
> return 0;
>
> + dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
> /* Note kzalloc ensures data->res.seq_res.sr_slot == NULL */
> data = kzalloc(sizeof(*data), GFP_NOFS);
> if (!data) {
Hi Benny,
The more I look at this, the more trouble I have seeing how it can
happen. AFAICS the layout is pinned by the layout segments, which again
are pinned by the reference held by the NFS_LSEG_LAYOUTCOMMIT bit.
In other words, as long as the data->lseg_list is not empty when we exit
the spin locked section, the layout should be guaranteed to be pinned
until nfs4_layoutcommit_release().
Are you sure that you've seen this Oops in recent 3.8.x or 3.9-rc
kernels?
Cheers
Trond
PS: note that I do see several places where we clear
NFS_INO_LAYOUTCOMMIT without clearing the NFS_LSEG_LAYOUTCOMMITs. I'm
writing up a patch to fix that...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on pnfs_detach_layout_hdr
2013-03-20 15:32 ` [PATCH v2] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on pnfs_detach_layout_hdr Myklebust, Trond
@ 2013-03-20 15:56 ` Myklebust, Trond
0 siblings, 0 replies; 6+ messages in thread
From: Myklebust, Trond @ 2013-03-20 15:56 UTC (permalink / raw)
To: Benny Halevy; +Cc: linux-nfs
On Wed, 2013-03-20 at 15:32 +0000, Myklebust, Trond wrote:
> On Wed, 2013-03-20 at 15:31 +0200, Benny Halevy wrote:
> > When writing a large file over pNFS, the MDS rebooted and then the client crashed
> > with the following stack trace over RHEL 2.6.32-358.el6.x86_64.
> > This also applies to upstream since v3.2
> >
> > <4>RIP: 0010:[<ffffffffa0474f35>] [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> > <4>RSP: 0018:ffff8801bab4db60 EFLAGS: 00010286
> > <4>RAX: ffff8802290755a8 RBX: ffff8802297756a0 RCX: 0000000000000000
> > <4>RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880229775750
> > <4>RBP: ffff8801bab4dba0 R08: 0000000000000000 R09: 0000000000000000
> > <4>R10: 0000000000000003 R11: 0000000000000000 R12: ffff8802297754d8
> > <4>R13: ffff880229075400 R14: 0000000000000000 R15: ffff880229775568
> > <4>FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
> > <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > <4>CR2: 0000000000000028 CR3: 000000020158a000 CR4: 00000000000027e0
> > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > <4>Process flush-0:30 (pid: 3806, threadinfo ffff8801bab4c000, task ffff880228c76ae0)
> > <4>Stack:
> > <4> ffff8802290755a8 ffff880229775750 ffffffffa047ae20 ffff8802297756a0
> > <4><d> ffff8801bab4dd20 0000000000000000 ffff8801bab4dd20 ffff8802297757c0
> > <4><d> ffff8801bab4dbd0 ffffffffa0451217 ffff8801bab4dd20 ffff8802297757c0
> > <4>Call Trace:
> > <4> [<ffffffffa0451217>] nfs_write_inode+0x87/0x100 [nfs]
> > <4> [<ffffffff811ac81c>] writeback_single_inode+0x20c/0x290
> > <4> [<ffffffff811acafe>] writeback_sb_inodes+0xce/0x180
> > <4> [<ffffffff811acc5b>] writeback_inodes_wb+0xab/0x1b0
> > <4> [<ffffffff811acffb>] wb_writeback+0x29b/0x3f0
> > <4> [<ffffffff8150d6e0>] ? thread_return+0x4e/0x76e
> > <4> [<ffffffff81081ac2>] ? del_timer_sync+0x22/0x30
> > <4> [<ffffffff811ad2e9>] wb_do_writeback+0x199/0x240
> > <4> [<ffffffff811ad3f3>] bdi_writeback_task+0x63/0x1b0
> > <4> [<ffffffff81096b47>] ? bit_waitqueue+0x17/0xd0
> > <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> > <4> [<ffffffff8113cac6>] bdi_start_fn+0x86/0x100
> > <4> [<ffffffff8113ca40>] ? bdi_start_fn+0x0/0x100
> > <4> [<ffffffff81096916>] kthread+0x96/0xa0
> > <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
> > <4> [<ffffffff81096880>] ? kthread+0x0/0xa0
> > <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> > <4>Code: 01 00 00 48 89 d7 48 89 55 c8 48 89 45 c0 e8 93 b0 09 e1 f0 41 0f ba 37 09 19 d2 85 d2 48 8b 45 c0 0f 84 af 01 00 00 48 8b 53 f0 <4c> 8b 7a 28 48 8d 4a 28 49 39 cf 75 26 e9 e1 00 00 00 66 0f 1f
> > <1>RIP [<ffffffffa0474f35>] pnfs_layoutcommit_inode+0x115/0x310 [nfs]
> > <4> RSP <ffff8801bab4db60>
> > <4>CR2: 0000000000000028
> >
> > Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> > Cc: stable@kernel.org [>= 3.2]
> > ---
> > fs/nfs/pnfs.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index 94db433..3fdaae4 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -225,6 +225,8 @@
> > {
> > struct nfs_inode *nfsi = NFS_I(lo->plh_inode);
> > dprintk("%s: freeing layout cache %p\n", __func__, lo);
> > + WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags));
> > + WARN_ON_ONCE(test_and_clear_bit(NFS_INO_LAYOUTCOMMITTING, &nfsi->flags));
> > nfsi->layout = NULL;
> > /* Reset MDS Threshold I/O counters */
> > nfsi->write_io = 0;
> > @@ -1815,11 +1817,15 @@ void pnfs_cleanup_layoutcommit(struct nfs4_layoutcommit_data *data)
> > loff_t end_pos;
> > int status = 0;
> >
> > - dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
> > + if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
> > + dprintk("%s: inode %lu: not required\n", __func__, inode->i_ino);
> > + return 0;
> > + }
> >
> > - if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags))
> > + if (WARN_ON(!nfsi->layout))
> > return 0;
> >
> > + dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
> > /* Note kzalloc ensures data->res.seq_res.sr_slot == NULL */
> > data = kzalloc(sizeof(*data), GFP_NOFS);
> > if (!data) {
>
> Hi Benny,
>
> The more I look at this, the more trouble I have seeing how it can
> happen. AFAICS the layout is pinned by the layout segments, which again
> are pinned by the reference held by the NFS_LSEG_LAYOUTCOMMIT bit.
>
> In other words, as long as the data->lseg_list is not empty when we exit
> the spin locked section, the layout should be guaranteed to be pinned
> until nfs4_layoutcommit_release().
>
> Are you sure that you've seen this Oops in recent 3.8.x or 3.9-rc
> kernels?
Never mind, I think that I've understood what's going on.
> PS: note that I do see several places where we clear
> NFS_INO_LAYOUTCOMMIT without clearing the NFS_LSEG_LAYOUTCOMMITs. I'm
> writing up a patch to fix that...
...and it is related to this. Patch forthcoming...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-03-20 15:56 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-18 14:38 [PATCH] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on destroy_layout Benny Halevy
2013-03-18 14:45 ` Benny Halevy
2013-03-18 15:48 ` Myklebust, Trond
2013-03-18 16:40 ` Benny Halevy
[not found] ` <1363786272-17385-1-git-send-email-bhalevy@tonian.com>
2013-03-20 15:32 ` [PATCH v2] pnfs: clear NFS_INO_LAYOUTCOMMIT* flags on pnfs_detach_layout_hdr Myklebust, Trond
2013-03-20 15:56 ` Myklebust, Trond
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.