All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS4 BAD_STATEID loop (kernel 3.0)
@ 2011-10-24 10:40 David Flynn
  2011-10-24 11:22 ` Trond Myklebust
  2011-10-24 13:43 ` NFS4 BAD_STATEID loop (kernel 3.0) Chuck Lever
  0 siblings, 2 replies; 28+ messages in thread
From: David Flynn @ 2011-10-24 10:40 UTC (permalink / raw)
  To: linux-nfs

Dear All,

On a system running kernel 3.0, mounting a Solaris NFS4 export, we
observe a continuous 20Mbit/sec exchange between client and server that had
been occurring for 10 days.

>From /proc/mounts:
home:/home/ /home nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.29.190.21,minorversion=0,local_lock=none,addr=172.29.120.140 0 0

Reconciling logs, we find that around the time the exchange started,
the kernel reports a process having blocked:

[787321.680029] INFO: task bash:17799 blocked for more than 120 seconds.
[787321.680067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[787321.680104] bash            D 0000000000000002     0 17799      1 0x00000000
[787321.680131]  ffff880173ad9ca8 0000000000000086 ffff88017216b008 0000000000012a40
[787321.680171]  ffff880173ad9fd8 0000000000012a40 ffff880173ad8000 0000000000012a40
[787321.680211]  0000000000012a40 0000000000012a40 ffff880173ad9fd8 0000000000012a40
[787321.680251] Call Trace:
[787321.680275]  [<ffffffff8110e900>] ? __lock_page+0x70/0x70
[787321.680299]  [<ffffffff815fea2c>] io_schedule+0x8c/0xd0
[787321.680321]  [<ffffffff8110e90e>] sleep_on_page+0xe/0x20
[787321.680344]  [<ffffffff815ff2af>] __wait_on_bit+0x5f/0x90
[787321.680366]  [<ffffffff8110ead3>] wait_on_page_bit+0x73/0x80
[787321.680390]  [<ffffffff81085980>] ? autoremove_wake_function+0x40/0x40
[787321.680416]  [<ffffffff8111aeb5>] ? pagevec_lookup_tag+0x25/0x40
[787321.680439]  [<ffffffff8110ed06>] filemap_fdatawait_range+0xf6/0x1a0
[787321.680483]  [<ffffffffa022e740>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
[787321.680507]  [<ffffffff8111a3b1>] ? do_writepages+0x21/0x40
[787321.680530]  [<ffffffff8110ff8b>] ? __filemap_fdatawrite_range+0x5b/0x60
[787321.680555]  [<ffffffff81110000>] filemap_write_and_wait_range+0x70/0x80
[787321.680580]  [<ffffffff8119b45a>] vfs_fsync_range+0x5a/0x90
[787321.680602]  [<ffffffff8119b4fc>] vfs_fsync+0x1c/0x20
[787321.680628]  [<ffffffffa0222be4>] nfs_file_flush+0x54/0x80 [nfs]
[787321.680653]  [<ffffffff8116d66f>] filp_close+0x3f/0x90
[787321.680675]  [<ffffffff8116e097>] sys_close+0xb7/0x120
[787321.680698]  [<ffffffff816090c2>] system_call_fastpath+0x16/0x1b

A network trace showed the following exchange below being continually
repeated.  Unfortunately this system too has since been rebooted.

Any thoughts on the matter would be most welcome,

Kind regards,

..david


No.     Time            Source                Destination           Protocol Size  Info
   9880 11:40:12.833617 172.29.190.21         172.29.120.140        NFS      1122  V4 COMPOUND Call (Reply In 9881) <EMPTY> PUTFH;WRITE;GETATTR

Frame 9880: 1122 bytes on wire (8976 bits), 1122 bytes captured (8976 bits)
    Arrival Time: Oct 17, 2011 11:40:12.833617000 BST
    Frame Length: 1122 bytes (8976 bits)
    Capture Length: 1122 bytes (8976 bits)
Ethernet II, Src: ChelsioC_06:68:f9 (00:07:43:06:68:f9), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
Internet Protocol, Src: 172.29.190.21 (172.29.190.21), Dst: 172.29.120.140 (172.29.120.140)
Transmission Control Protocol, Src Port: 816 (816), Dst Port: nfs (2049), Seq: 5199745, Ack: 275801, Len: 1056
Remote Procedure Call, Type:Call XID:0x5daa6e93
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 0
    Operations (count: 3)
        Opcode: PUTFH (22)
            filehandle
                length: 36
                [hash (CRC-32): 0x6e4b15f3]
                decode type as: unknown
                filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
        Opcode: WRITE (38)
            stateid
                seqid: 0x00000000
                Data: 4e06f15b800f82e300000000
            offset: 11392
            stable: FILE_SYNC4 (2)
            Write length: 814
            Data: <DATA>
                length: 814
                contents: <DATA>
                fill bytes: opaque data
        Opcode: GETATTR (9)
            GETATTR4args
                attr_request
                    bitmap[0] = 0x00000018
                        [2 attributes requested]
                        mand_attr: FATTR4_CHANGE (3)
                        mand_attr: FATTR4_SIZE (4)
                    bitmap[1] = 0x00300000
                        [2 attributes requested]
                        recc_attr: FATTR4_TIME_METADATA (52)
                        recc_attr: FATTR4_TIME_MODIFY (53)

No.     Time            Source                Destination           Protocol Size  Info
   9881 11:40:12.833956 172.29.120.140        172.29.190.21         NFS      122   V4 COMPOUND Reply (Call In 9880) <EMPTY> PUTFH;WRITE

Frame 9881: 122 bytes on wire (976 bits), 122 bytes captured (976 bits)
    Arrival Time: Oct 17, 2011 11:40:12.833956000 BST
    [Time delta from previous captured frame: 0.000339000 seconds]
    Frame Length: 122 bytes (976 bits)
    Capture Length: 122 bytes (976 bits)
Ethernet II, Src: Cisco_1e:f7:80 (00:13:5f:1e:f7:80), Dst: ChelsioC_06:68:f9 (00:07:43:06:68:f9)
Internet Protocol, Src: 172.29.120.140 (172.29.120.140), Dst: 172.29.190.21 (172.29.190.21)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 816 (816), Seq: 275801, Ack: 5200801, Len: 56
Remote Procedure Call, Type:Reply XID:0x5daa6e93
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4ERR_BAD_STATEID (10025)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 2)
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: WRITE (38)
            Status: NFS4ERR_BAD_STATEID (10025)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0)
  2011-10-24 10:40 NFS4 BAD_STATEID loop (kernel 3.0) David Flynn
@ 2011-10-24 11:22 ` Trond Myklebust
  2011-10-24 13:17   ` David Flynn
  2011-10-24 13:43 ` NFS4 BAD_STATEID loop (kernel 3.0) Chuck Lever
  1 sibling, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2011-10-24 11:22 UTC (permalink / raw)
  To: David Flynn; +Cc: linux-nfs

On Mon, 2011-10-24 at 10:40 +0000, David Flynn wrote: 
> Dear All,
> 
> On a system running kernel 3.0, mounting a Solaris NFS4 export, we
> observe a continuous 20Mbit/sec exchange between client and server that had
> been occurring for 10 days.
<snip> 
> No.     Time            Source                Destination           Protocol Size  Info
>    9880 11:40:12.833617 172.29.190.21         172.29.120.140        NFS      1122  V4 COMPOUND Call (Reply In 9881) <EMPTY> PUTFH;WRITE;GETATTR
> 
> Frame 9880: 1122 bytes on wire (8976 bits), 1122 bytes captured (8976 bits)
>     Arrival Time: Oct 17, 2011 11:40:12.833617000 BST
>     Frame Length: 1122 bytes (8976 bits)
>     Capture Length: 1122 bytes (8976 bits)
> Ethernet II, Src: ChelsioC_06:68:f9 (00:07:43:06:68:f9), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
> Internet Protocol, Src: 172.29.190.21 (172.29.190.21), Dst: 172.29.120.140 (172.29.120.140)
> Transmission Control Protocol, Src Port: 816 (816), Dst Port: nfs (2049), Seq: 5199745, Ack: 275801, Len: 1056
> Remote Procedure Call, Type:Call XID:0x5daa6e93
> Network File System
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     minorversion: 0
>     Operations (count: 3)
>         Opcode: PUTFH (22)
>             filehandle
>                 length: 36
>                 [hash (CRC-32): 0x6e4b15f3]
>                 decode type as: unknown
>                 filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
>         Opcode: WRITE (38)
>             stateid
>                 seqid: 0x00000000
>                 Data: 4e06f15b800f82e300000000
>             offset: 11392
>             stable: FILE_SYNC4 (2)
>             Write length: 814
>             Data: <DATA>
>                 length: 814
>                 contents: <DATA>
>                 fill bytes: opaque data
>         Opcode: GETATTR (9)
>             GETATTR4args
>                 attr_request
>                     bitmap[0] = 0x00000018
>                         [2 attributes requested]
>                         mand_attr: FATTR4_CHANGE (3)
>                         mand_attr: FATTR4_SIZE (4)
>                     bitmap[1] = 0x00300000
>                         [2 attributes requested]
>                         recc_attr: FATTR4_TIME_METADATA (52)
>                         recc_attr: FATTR4_TIME_MODIFY (53)
> 
> No.     Time            Source                Destination           Protocol Size  Info
>    9881 11:40:12.833956 172.29.120.140        172.29.190.21         NFS      122   V4 COMPOUND Reply (Call In 9880) <EMPTY> PUTFH;WRITE
> 
> Frame 9881: 122 bytes on wire (976 bits), 122 bytes captured (976 bits)
>     Arrival Time: Oct 17, 2011 11:40:12.833956000 BST
>     [Time delta from previous captured frame: 0.000339000 seconds]
>     Frame Length: 122 bytes (976 bits)
>     Capture Length: 122 bytes (976 bits)
> Ethernet II, Src: Cisco_1e:f7:80 (00:13:5f:1e:f7:80), Dst: ChelsioC_06:68:f9 (00:07:43:06:68:f9)
> Internet Protocol, Src: 172.29.120.140 (172.29.120.140), Dst: 172.29.190.21 (172.29.190.21)
> Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 816 (816), Seq: 275801, Ack: 5200801, Len: 56
> Remote Procedure Call, Type:Reply XID:0x5daa6e93
> Network File System
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Status: NFS4ERR_BAD_STATEID (10025)
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     Operations (count: 2)
>         Opcode: PUTFH (22)
>             Status: NFS4_OK (0)
>         Opcode: WRITE (38)
>             Status: NFS4ERR_BAD_STATEID (10025)

We should in principle be able to recover a BAD_STATEID error by running
the state recovery thread. It's a shame that the machine was rebooted,
but does your syslog trace perhaps show any state recovery thread
errors?

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0)
  2011-10-24 11:22 ` Trond Myklebust
@ 2011-10-24 13:17   ` David Flynn
  2011-10-24 13:32     ` Trond Myklebust
  0 siblings, 1 reply; 28+ messages in thread
From: David Flynn @ 2011-10-24 13:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Flynn, linux-nfs

* Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> We should in principle be able to recover a BAD_STATEID error by running
> the state recovery thread. It's a shame that the machine was rebooted,
> but does your syslog trace perhaps show any state recovery thread
> errors?

There were no other nfs related messages reported between the initial
blocked task and rebooting the machine later.  Additionally, there were
no nfs related messages from bootup of the machine until the blocked
task.

One thing that may be of interest is that the user in question with the
blocked task had hit their quota hard limit.  (It was the same user that
had the issue i reported earlier too on the same filesystem).

Kind regards,
..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0)
  2011-10-24 13:17   ` David Flynn
@ 2011-10-24 13:32     ` Trond Myklebust
  2011-10-24 14:50       ` NFS4 BAD_STATEID loop (kernel 3.0.4) David Flynn
  0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2011-10-24 13:32 UTC (permalink / raw)
  To: David Flynn; +Cc: linux-nfs

On Mon, 2011-10-24 at 13:17 +0000, David Flynn wrote: 
> * Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> > We should in principle be able to recover a BAD_STATEID error by running
> > the state recovery thread. It's a shame that the machine was rebooted,
> > but does your syslog trace perhaps show any state recovery thread
> > errors?
> 
> There were no other nfs related messages reported between the initial
> blocked task and rebooting the machine later.  Additionally, there were
> no nfs related messages from bootup of the machine until the blocked
> task.
> 
> One thing that may be of interest is that the user in question with the
> blocked task had hit their quota hard limit.  (It was the same user that
> had the issue i reported earlier too on the same filesystem).
> 
> Kind regards,
> ..david

I'm assuming then that your network trace showed no sign of any OPEN
calls of that particular file, just retries of the WRITE?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0)
  2011-10-24 10:40 NFS4 BAD_STATEID loop (kernel 3.0) David Flynn
  2011-10-24 11:22 ` Trond Myklebust
@ 2011-10-24 13:43 ` Chuck Lever
  1 sibling, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2011-10-24 13:43 UTC (permalink / raw)
  To: David Flynn; +Cc: linux-nfs


On Oct 24, 2011, at 6:40 AM, David Flynn wrote:

> Dear All,
> 
> On a system running kernel 3.0, mounting a Solaris NFS4 export, we
> observe a continuous 20Mbit/sec exchange between client and server that had
> been occurring for 10 days.

Can you tell us a little more about the server?  Which release of Solaris?  What hardware?

> From /proc/mounts:
> home:/home/ /home nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.29.190.21,minorversion=0,local_lock=none,addr=172.29.120.140 0 0
> 
> Reconciling logs, we find that around the time the exchange started,
> the kernel reports a process having blocked:
> 
> [787321.680029] INFO: task bash:17799 blocked for more than 120 seconds.
> [787321.680067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [787321.680104] bash            D 0000000000000002     0 17799      1 0x00000000
> [787321.680131]  ffff880173ad9ca8 0000000000000086 ffff88017216b008 0000000000012a40
> [787321.680171]  ffff880173ad9fd8 0000000000012a40 ffff880173ad8000 0000000000012a40
> [787321.680211]  0000000000012a40 0000000000012a40 ffff880173ad9fd8 0000000000012a40
> [787321.680251] Call Trace:
> [787321.680275]  [<ffffffff8110e900>] ? __lock_page+0x70/0x70
> [787321.680299]  [<ffffffff815fea2c>] io_schedule+0x8c/0xd0
> [787321.680321]  [<ffffffff8110e90e>] sleep_on_page+0xe/0x20
> [787321.680344]  [<ffffffff815ff2af>] __wait_on_bit+0x5f/0x90
> [787321.680366]  [<ffffffff8110ead3>] wait_on_page_bit+0x73/0x80
> [787321.680390]  [<ffffffff81085980>] ? autoremove_wake_function+0x40/0x40
> [787321.680416]  [<ffffffff8111aeb5>] ? pagevec_lookup_tag+0x25/0x40
> [787321.680439]  [<ffffffff8110ed06>] filemap_fdatawait_range+0xf6/0x1a0
> [787321.680483]  [<ffffffffa022e740>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
> [787321.680507]  [<ffffffff8111a3b1>] ? do_writepages+0x21/0x40
> [787321.680530]  [<ffffffff8110ff8b>] ? __filemap_fdatawrite_range+0x5b/0x60
> [787321.680555]  [<ffffffff81110000>] filemap_write_and_wait_range+0x70/0x80
> [787321.680580]  [<ffffffff8119b45a>] vfs_fsync_range+0x5a/0x90
> [787321.680602]  [<ffffffff8119b4fc>] vfs_fsync+0x1c/0x20
> [787321.680628]  [<ffffffffa0222be4>] nfs_file_flush+0x54/0x80 [nfs]
> [787321.680653]  [<ffffffff8116d66f>] filp_close+0x3f/0x90
> [787321.680675]  [<ffffffff8116e097>] sys_close+0xb7/0x120
> [787321.680698]  [<ffffffff816090c2>] system_call_fastpath+0x16/0x1b
> 
> A network trace showed the following exchange below being continually
> repeated.  Unfortunately this system too has since been rebooted.
> 
> Any thoughts on the matter would be most welcome,
> 
> Kind regards,
> 
> ..david
> 
> 
> No.     Time            Source                Destination           Protocol Size  Info
>   9880 11:40:12.833617 172.29.190.21         172.29.120.140        NFS      1122  V4 COMPOUND Call (Reply In 9881) <EMPTY> PUTFH;WRITE;GETATTR
> 
> Frame 9880: 1122 bytes on wire (8976 bits), 1122 bytes captured (8976 bits)
>    Arrival Time: Oct 17, 2011 11:40:12.833617000 BST
>    Frame Length: 1122 bytes (8976 bits)
>    Capture Length: 1122 bytes (8976 bits)
> Ethernet II, Src: ChelsioC_06:68:f9 (00:07:43:06:68:f9), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
> Internet Protocol, Src: 172.29.190.21 (172.29.190.21), Dst: 172.29.120.140 (172.29.120.140)
> Transmission Control Protocol, Src Port: 816 (816), Dst Port: nfs (2049), Seq: 5199745, Ack: 275801, Len: 1056
> Remote Procedure Call, Type:Call XID:0x5daa6e93
> Network File System
>    [Program Version: 4]
>    [V4 Procedure: COMPOUND (1)]
>    Tag: <EMPTY>
>        length: 0
>        contents: <EMPTY>
>    minorversion: 0
>    Operations (count: 3)
>        Opcode: PUTFH (22)
>            filehandle
>                length: 36
>                [hash (CRC-32): 0x6e4b15f3]
>                decode type as: unknown
>                filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
>        Opcode: WRITE (38)
>            stateid
>                seqid: 0x00000000
>                Data: 4e06f15b800f82e300000000
>            offset: 11392
>            stable: FILE_SYNC4 (2)
>            Write length: 814
>            Data: <DATA>
>                length: 814
>                contents: <DATA>
>                fill bytes: opaque data
>        Opcode: GETATTR (9)
>            GETATTR4args
>                attr_request
>                    bitmap[0] = 0x00000018
>                        [2 attributes requested]
>                        mand_attr: FATTR4_CHANGE (3)
>                        mand_attr: FATTR4_SIZE (4)
>                    bitmap[1] = 0x00300000
>                        [2 attributes requested]
>                        recc_attr: FATTR4_TIME_METADATA (52)
>                        recc_attr: FATTR4_TIME_MODIFY (53)
> 
> No.     Time            Source                Destination           Protocol Size  Info
>   9881 11:40:12.833956 172.29.120.140        172.29.190.21         NFS      122   V4 COMPOUND Reply (Call In 9880) <EMPTY> PUTFH;WRITE
> 
> Frame 9881: 122 bytes on wire (976 bits), 122 bytes captured (976 bits)
>    Arrival Time: Oct 17, 2011 11:40:12.833956000 BST
>    [Time delta from previous captured frame: 0.000339000 seconds]
>    Frame Length: 122 bytes (976 bits)
>    Capture Length: 122 bytes (976 bits)
> Ethernet II, Src: Cisco_1e:f7:80 (00:13:5f:1e:f7:80), Dst: ChelsioC_06:68:f9 (00:07:43:06:68:f9)
> Internet Protocol, Src: 172.29.120.140 (172.29.120.140), Dst: 172.29.190.21 (172.29.190.21)
> Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 816 (816), Seq: 275801, Ack: 5200801, Len: 56
> Remote Procedure Call, Type:Reply XID:0x5daa6e93
> Network File System
>    [Program Version: 4]
>    [V4 Procedure: COMPOUND (1)]
>    Status: NFS4ERR_BAD_STATEID (10025)
>    Tag: <EMPTY>
>        length: 0
>        contents: <EMPTY>
>    Operations (count: 2)
>        Opcode: PUTFH (22)
>            Status: NFS4_OK (0)
>        Opcode: WRITE (38)
>            Status: NFS4ERR_BAD_STATEID (10025)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0.4)
  2011-10-24 13:32     ` Trond Myklebust
@ 2011-10-24 14:50       ` David Flynn
  2011-10-24 15:31         ` Trond Myklebust
  0 siblings, 1 reply; 28+ messages in thread
From: David Flynn @ 2011-10-24 14:50 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Flynn, linux-nfs, Chuck Lever

* Chuck Lever (chuck.lever@oracle.com) wrote:
> Can you tell us a little more about the server?  Which release of
> Solaris?  What hardware?

SunOS 5.10 Generic_141444-09
(sparc)

* Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> I'm assuming then that your network trace showed no sign of any OPEN
> calls of that particular file, just retries of the WRITE?

Correct.

However, the good news is that it has just happened again (certainly
not quota related)

The blocked task:
[179068.773206] INFO: task bash:3293 blocked for more than 120 seconds.
[179068.779660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[179068.787701] bash            D 0000000000000004     0  3293      1 0x00000000
[179068.795173]  ffff88001f97fca8 0000000000000086 ffff880426876008 0000000000012a40
[179068.802992]  ffff88001f97ffd8 0000000000012a40 ffff88001f97e000 0000000000012a40
[179068.810745]  0000000000012a40 0000000000012a40 ffff88001f97ffd8 0000000000012a40
[179068.818810] Call Trace:
[179068.821496]  [<ffffffff81110030>] ? __lock_page+0x70/0x70
[179068.827204]  [<ffffffff8160007c>] io_schedule+0x8c/0xd0
[179068.832952]  [<ffffffff8111003e>] sleep_on_page+0xe/0x20
[179068.838823]  [<ffffffff816008ff>] __wait_on_bit+0x5f/0x90
[179068.844734]  [<ffffffff81110203>] wait_on_page_bit+0x73/0x80
[179068.850798]  [<ffffffff81085bf0>] ? autoremove_wake_function+0x40/0x40
[179068.857879]  [<ffffffff8111c5e5>] ? pagevec_lookup_tag+0x25/0x40
[179068.864173]  [<ffffffff81110436>] filemap_fdatawait_range+0xf6/0x1a0
[179068.870721]  [<ffffffffa02167d0>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
[179068.877963]  [<ffffffff8111bae1>] ? do_writepages+0x21/0x40
[179068.883744]  [<ffffffff811116bb>] ? __filemap_fdatawrite_range+0x5b/0x60
[179068.890867]  [<ffffffff81111730>] filemap_write_and_wait_range+0x70/0x80
[179068.898025]  [<ffffffff8119cc6a>] vfs_fsync_range+0x5a/0x90
[179068.904197]  [<ffffffff8119cd0c>] vfs_fsync+0x1c/0x20
[179068.909721]  [<ffffffffa020ac74>] nfs_file_flush+0x54/0x80 [nfs]
[179068.916069]  [<ffffffff8116ee7f>] filp_close+0x3f/0x90
[179068.921611]  [<ffffffff8116f8a7>] sys_close+0xb7/0x120
[179068.927328]  [<ffffffff8160a702>] system_call_fastpath+0x16/0x1b

$ echo 0 >/proc/sys/sunrpc/rpc_debug
[180179.009328] -pid- flgs status -client- --rqstp- -timeout ---ops--
[180179.015540] 40304 0801      0 ffff8804241ae800   (null)        0 ffffffffa023cd40 nfsv4 WRITE a:call_start q:NFS client

and our pingpong (more details at end):
14:07:07.307191 IP vc-fs1.rd.bbc.co.uk.1837702678 > home.rd.bbc.co.uk.nfs: 300 getattr fh 0,0/22
14:07:07.307471 IP home.rd.bbc.co.uk.nfs > vc-fs1.rd.bbc.co.uk.1837702678: reply ok 52 getattr ERROR: unk 10025

This system is up at the moment, if there is further detail you require
i can provide that.

NB, the system this occurred on is running kernel 3.0.4
Mount options as per earlier.

Kind regards,

..david

No.     Time            Source                Destination           Protocol Size  Info
     39 15:33:59.077143 172.29.190.28         172.29.120.140        NFS      370   V4 COMPOUND Call (Reply In 40) <EMPTY> PUTFH;WRITE;GETATTR

Frame 39: 370 bytes on wire (2960 bits), 370 bytes captured (2960 bits)
Ethernet II, Src: ChelsioC_07:49:6f (00:07:43:07:49:6f), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
Internet Protocol, Src: 172.29.190.28 (172.29.190.28), Dst: 172.29.120.140 (172.29.120.140)
Transmission Control Protocol, Src Port: omginitialrefs (900), Dst Port: nfs (2049), Seq: 40433, Ack: 7449, Len: 304
Remote Procedure Call, Type:Call XID:0x43ce4e16
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 0
    Operations (count: 3)
        Opcode: PUTFH (22)
            filehandle
                length: 36
                [hash (CRC-32): 0x6e4b15f3]
                decode type as: unknown
                filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
        Opcode: WRITE (38)
            stateid
            offset: 11474
            stable: FILE_SYNC4 (2)
            Write length: 68
            Data: <DATA>
                length: 68
                contents: <DATA>
        Opcode: GETATTR (9)
            GETATTR4args
                attr_request
                    bitmap[0] = 0x00000018
                        [2 attributes requested]
                        mand_attr: FATTR4_CHANGE (3)
                        mand_attr: FATTR4_SIZE (4)
                    bitmap[1] = 0x00300000
                        [2 attributes requested]
                        recc_attr: FATTR4_TIME_METADATA (52)
                        recc_attr: FATTR4_TIME_MODIFY (53)

No.     Time            Source                Destination           Protocol Size  Info
     40 15:33:59.077433 172.29.120.140        172.29.190.28         NFS      122   V4 COMPOUND Reply (Call In 39) <EMPTY> PUTFH;WRITE

Frame 40: 122 bytes on wire (976 bits), 122 bytes captured (976 bits)
Ethernet II, Src: Cisco_1e:f7:80 (00:13:5f:1e:f7:80), Dst: ChelsioC_07:49:6f (00:07:43:07:49:6f)
Internet Protocol, Src: 172.29.120.140 (172.29.120.140), Dst: 172.29.190.28 (172.29.190.28)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: omginitialrefs (900), Seq: 7449, Ack: 40737, Len: 56
Remote Procedure Call, Type:Reply XID:0x43ce4e16
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4ERR_BAD_STATEID (10025)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 2)
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: WRITE (38)
            Status: NFS4ERR_BAD_STATEID (10025)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0.4)
  2011-10-24 14:50       ` NFS4 BAD_STATEID loop (kernel 3.0.4) David Flynn
@ 2011-10-24 15:31         ` Trond Myklebust
  2011-10-24 15:55           ` David Flynn
  2011-10-27 22:17           ` David Flynn
  0 siblings, 2 replies; 28+ messages in thread
From: Trond Myklebust @ 2011-10-24 15:31 UTC (permalink / raw)
  To: David Flynn; +Cc: linux-nfs, Chuck Lever

On Mon, 2011-10-24 at 14:50 +0000, David Flynn wrote: 
> No.     Time            Source                Destination           Protocol Size  Info
>      39 15:33:59.077143 172.29.190.28         172.29.120.140        NFS      370   V4 COMPOUND Call (Reply In 40) <EMPTY> PUTFH;WRITE;GETATTR
> 
> Frame 39: 370 bytes on wire (2960 bits), 370 bytes captured (2960 bits)
> Ethernet II, Src: ChelsioC_07:49:6f (00:07:43:07:49:6f), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
> Internet Protocol, Src: 172.29.190.28 (172.29.190.28), Dst: 172.29.120.140 (172.29.120.140)
> Transmission Control Protocol, Src Port: omginitialrefs (900), Dst Port: nfs (2049), Seq: 40433, Ack: 7449, Len: 304
> Remote Procedure Call, Type:Call XID:0x43ce4e16
> Network File System
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     minorversion: 0
>     Operations (count: 3)
>         Opcode: PUTFH (22)
>             filehandle
>                 length: 36
>                 [hash (CRC-32): 0x6e4b15f3]
>                 decode type as: unknown
>                 filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
>         Opcode: WRITE (38)
>             stateid

Do you have an example of the stateid argument's value? Does it change
at all between separate WRITE attempts?

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0.4)
  2011-10-24 15:31         ` Trond Myklebust
@ 2011-10-24 15:55           ` David Flynn
  2011-10-27 22:17           ` David Flynn
  1 sibling, 0 replies; 28+ messages in thread
From: David Flynn @ 2011-10-24 15:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Flynn, linux-nfs, Chuck Lever

* Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> Date: Mon, 24 Oct 2011 17:31:42 +0200
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> To: David Flynn <davidf@rd.bbc.co.uk>
> Cc: linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
> X-Mailer: Evolution 3.0.3 (3.0.3-1.fc15) 
> Organization: NetApp Inc
> Subject: Re: NFS4 BAD_STATEID loop (kernel 3.0.4)
> 
> On Mon, 2011-10-24 at 14:50 +0000, David Flynn wrote: 
> > No.     Time            Source                Destination           Protocol Size  Info
> >      39 15:33:59.077143 172.29.190.28         172.29.120.140        NFS      370   V4 COMPOUND Call (Reply In 40) <EMPTY> PUTFH;WRITE;GETATTR
> > 
> > Frame 39: 370 bytes on wire (2960 bits), 370 bytes captured (2960 bits)
> > Ethernet II, Src: ChelsioC_07:49:6f (00:07:43:07:49:6f), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
> > Internet Protocol, Src: 172.29.190.28 (172.29.190.28), Dst: 172.29.120.140 (172.29.120.140)
> > Transmission Control Protocol, Src Port: omginitialrefs (900), Dst Port: nfs (2049), Seq: 40433, Ack: 7449, Len: 304
> > Remote Procedure Call, Type:Call XID:0x43ce4e16
> > Network File System
> >     [Program Version: 4]
> >     [V4 Procedure: COMPOUND (1)]
> >     Tag: <EMPTY>
> >         length: 0
> >         contents: <EMPTY>
> >     minorversion: 0
> >     Operations (count: 3)
> >         Opcode: PUTFH (22)
> >             filehandle
> >                 length: 36
> >                 [hash (CRC-32): 0x6e4b15f3]
> >                 decode type as: unknown
> >                 filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
> >         Opcode: WRITE (38)
> >             stateid
> 
> Do you have an example of the stateid argument's value? Does it change
> at all between separate WRITE attempts?

Looking at two captures done an hour apart, no, i see no change.
(a packet capture has been supplied privately)

Regards,

..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4 BAD_STATEID loop (kernel 3.0.4)
  2011-10-24 15:31         ` Trond Myklebust
  2011-10-24 15:55           ` David Flynn
@ 2011-10-27 22:17           ` David Flynn
  2011-10-29  0:25             ` NFS4ERR_STALE_CLIENTID loop David Flynn
  1 sibling, 1 reply; 28+ messages in thread
From: David Flynn @ 2011-10-27 22:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Flynn, linux-nfs, Chuck Lever

* Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> Do you have an example of the stateid argument's value? Does it change
> at all between separate WRITE attempts?

Further to all this, i've just had a similar fault on another machine,
producing a huge amounts of:

 [463795.630702] nfs4_reclaim_open_state: Lock reclaim failed!
 [463795.637446] nfs4_reclaim_open_state: Lock reclaim failed!
 [463795.643113] nfs4_reclaim_open_state: Lock reclaim failed!

A network capture is available:
  ftp://ftp.kw.bbc.co.uk/davidf/priv/uekahrae.pcap

$ echo 0 | sudo tee /proc/sys/sunrpc/rpc_debug
[468024.010036] -pid- flgs status -client- --rqstp- -timeout ---ops--
[468024.010051]  6289 0801      0 ffff8801f3e37e00   (null)        0 ffffffffa0229d40 nfsv4 WRITE a:call_start q:NFS client
[468024.010057]  6290 0801      0 ffff8801f3e37e00   (null)        0 ffffffffa0229d40 nfsv4 WRITE a:call_start q:NFS client

blocked task:

 [464304.799306] INFO: task rrdtool:28506 blocked for more than 120 seconds.
 [464304.799309] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 [464304.799311] rrdtool         D 0000000000000001     0 28506   4189 0x00000000
 [464304.799315]  ffff880073bd5ca8 0000000000000082 ffff8804232c5408 0000000000012a40
 [464304.799318]  ffff880073bd5fd8 0000000000012a40 ffff880073bd4000 0000000000012a40
 [464304.799320]  0000000000012a40 0000000000012a40 ffff880073bd5fd8 0000000000012a40
 [464304.799322] Call Trace:
 [464304.799332]  [<ffffffff81110030>] ? __lock_page+0x70/0x70
 [464304.799335]  [<ffffffff8160007c>] io_schedule+0x8c/0xd0
 [464304.799337]  [<ffffffff8111003e>] sleep_on_page+0xe/0x20
 [464304.799339]  [<ffffffff816008ff>] __wait_on_bit+0x5f/0x90
 [464304.799341]  [<ffffffff81110203>] wait_on_page_bit+0x73/0x80
 [464304.799345]  [<ffffffff81085bf0>] ? autoremove_wake_function+0x40/0x40
 [464304.799347]  [<ffffffff8111c5e5>] ? pagevec_lookup_tag+0x25/0x40
 [464304.799349]  [<ffffffff81110436>] filemap_fdatawait_range+0xf6/0x1a0
 [464304.799363]  [<ffffffffa02037d0>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
 [464304.799365]  [<ffffffff8111bae1>] ? do_writepages+0x21/0x40
 [464304.799367]  [<ffffffff811116bb>] ? __filemap_fdatawrite_range+0x5b/0x60
 [464304.799368]  [<ffffffff81111730>] filemap_write_and_wait_range+0x70/0x80
 [464304.799371]  [<ffffffff8119cc6a>] vfs_fsync_range+0x5a/0x90
 [464304.799373]  [<ffffffff8119cd0c>] vfs_fsync+0x1c/0x20
 [464304.799377]  [<ffffffffa01f7c74>] nfs_file_flush+0x54/0x80 [nfs]
 [464304.799380]  [<ffffffff8116ee7f>] filp_close+0x3f/0x90
 [464304.799382]  [<ffffffff8116f8a7>] sys_close+0xb7/0x120
 [464304.799384]  [<ffffffff8160a702>] system_call_fastpath+0x16/0x1b

Regards,
..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-27 22:17           ` David Flynn
@ 2011-10-29  0:25             ` David Flynn
  2011-10-29 17:29               ` Trond Myklebust
  0 siblings, 1 reply; 28+ messages in thread
From: David Flynn @ 2011-10-29  0:25 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Flynn, linux-nfs, Chuck Lever

* David Flynn (davidf@rd.bbc.co.uk) wrote:
> * Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> > Do you have an example of the stateid argument's value? Does it change
> > at all between separate WRITE attempts?
> 
> Further to all this, i've just had a similar fault on another machine,

Using the same kernel, same mountpoint as before, we're currently
experiencing a loop involving NFS4ERR_STALE_CLIENTID.
Trace:
  ftp://ftp.kw.bbc.co.uk/davidf/priv/saesheil.pcap

Unfortunately, this is resulting in about 40 nodes doing their best to
kill the poor solaris server.  Generating a combined total of
250Mbit/sec towards the NFS server (collecting a little under
200Mbit/sec of replies).

Have we not heard of exponential backoff?

This seems to require major attention, given that this amounted to a
site wide DoS: going round all the machines and killing the processes
that were having major problems brought the situation back under
control.  Frankly i'd rather that you panicked the kernel than this.

Regards,
..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29  0:25             ` NFS4ERR_STALE_CLIENTID loop David Flynn
@ 2011-10-29 17:29               ` Trond Myklebust
  2011-10-29 18:02                 ` David Flynn
  2011-10-29 18:15                 ` J. Bruce Fields
  0 siblings, 2 replies; 28+ messages in thread
From: Trond Myklebust @ 2011-10-29 17:29 UTC (permalink / raw)
  To: David Flynn; +Cc: linux-nfs, Chuck Lever

On Sat, 2011-10-29 at 00:25 +0000, David Flynn wrote: 
> * David Flynn (davidf@rd.bbc.co.uk) wrote:
> > * Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> > > Do you have an example of the stateid argument's value? Does it change
> > > at all between separate WRITE attempts?
> > 
> > Further to all this, i've just had a similar fault on another machine,
> 
> Using the same kernel, same mountpoint as before, we're currently
> experiencing a loop involving NFS4ERR_STALE_CLIENTID.
> Trace:
>   ftp://ftp.kw.bbc.co.uk/davidf/priv/saesheil.pcap
> 
> Unfortunately, this is resulting in about 40 nodes doing their best to
> kill the poor solaris server.  Generating a combined total of
> 250Mbit/sec towards the NFS server (collecting a little under
> 200Mbit/sec of replies).
> 
> Have we not heard of exponential backoff?
> 
> This seems to require major attention, given that this amounted to a
> site wide DoS: going round all the machines and killing the processes
> that were having major problems brought the situation back under
> control.  Frankly i'd rather that you panicked the kernel than this.

OK. This is the first time I've seen this tcpdump.

The problem seems like a split-brain issue on the server... On the one
hand, it is happily telling us that our lease is OK when we RENEW. Then
when we try to use said lease in an OPEN, it is replying with
STALE_CLIENTID.

IOW: This isn't a problem I can fix on the client whether or not I add
exponential backoff. The problem needs to be addressed on the server by
the Solaris folks....

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 17:29               ` Trond Myklebust
@ 2011-10-29 18:02                 ` David Flynn
  2011-10-29 18:22                   ` Myklebust, Trond
  2011-10-29 18:15                 ` J. Bruce Fields
  1 sibling, 1 reply; 28+ messages in thread
From: David Flynn @ 2011-10-29 18:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Flynn, linux-nfs, Chuck Lever

* Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> > Using the same kernel, same mountpoint as before, we're currently
> > experiencing a loop involving NFS4ERR_STALE_CLIENTID.
...
> The problem seems like a split-brain issue on the server... On the one
> hand, it is happily telling us that our lease is OK when we RENEW. Then
> when we try to use said lease in an OPEN, it is replying with
> STALE_CLIENTID.

Thank you for the quick update, especially at the weekend.  I'm
wondering if it is possible that the STALE_CLIENTID issue is a by-product
of the BAD_STATEID issue from earlier.  We have observed several times
the BAD_STATEID loop, but the CLIENTID problem only seemed to occur when
all 40+ nodes were all showing problems.

After killing off sufficient processes, the some of the machines then
recovered of their own accord.  So your conclusion that there is a
server issue sounds reasonable.

On any such possible backoff, the previous case was with quite small
requests in quite a tight loop that seemed to cause the server grief.
This morning, a machine with a 10GbE interface had a BAD_STATEID issue
but involving some much larger writes[1], resulting in 1.6Gbit/sec from
that machine alone.  Thankfully there was only a second machine with
1GbE interfaces bringing the total up to 2.5Gbit/sec.

It is this ability for a group of clients to make matters worse that
is just as bad as any fault with Solaris.

(In a similar vein, it can be just as frustrating trying to get a client
to stop looping like this - it is often impossible to kill the process
that triggered the problem; for these, we had to resort to deleting
the files using NFSv3 (which was working quite happily))

Thank you again,
..david

[1] Capture: ftp://ftp.kw.bbc.co.uk/davidf/priv/waquahso.pcap

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 17:29               ` Trond Myklebust
  2011-10-29 18:02                 ` David Flynn
@ 2011-10-29 18:15                 ` J. Bruce Fields
  2011-10-29 18:21                   ` Myklebust, Trond
  1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2011-10-29 18:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Flynn, linux-nfs, Chuck Lever

On Sat, Oct 29, 2011 at 07:29:36PM +0200, Trond Myklebust wrote:
> OK. This is the first time I've seen this tcpdump.
> 
> The problem seems like a split-brain issue on the server... On the one
> hand, it is happily telling us that our lease is OK when we RENEW. Then
> when we try to use said lease in an OPEN, it is replying with
> STALE_CLIENTID.
> 
> IOW: This isn't a problem I can fix on the client whether or not I add
> exponential backoff. The problem needs to be addressed on the server by
> the Solaris folks....

Is there any simple thing we could do on the client to reduce the impact
of these sorts of loops?

Given that we know there are bad servers out there it might be nice to
do if it's not complicated.

(Though as a server implementer my purely selfish impulse is to leave
things as they are since it ensures I'll get bug reports if I screw
up....)

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:15                 ` J. Bruce Fields
@ 2011-10-29 18:21                   ` Myklebust, Trond
  2011-10-29 18:47                     ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: Myklebust, Trond @ 2011-10-29 18:21 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Flynn, linux-nfs, Chuck Lever

> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Saturday, October 29, 2011 8:15 PM
> To: Myklebust, Trond
> Cc: David Flynn; linux-nfs@vger.kernel.org; Chuck Lever
> Subject: Re: NFS4ERR_STALE_CLIENTID loop
> 
> On Sat, Oct 29, 2011 at 07:29:36PM +0200, Trond Myklebust wrote:
> > OK. This is the first time I've seen this tcpdump.
> >
> > The problem seems like a split-brain issue on the server... On the
one
> > hand, it is happily telling us that our lease is OK when we RENEW.
> > Then when we try to use said lease in an OPEN, it is replying with
> > STALE_CLIENTID.
> >
> > IOW: This isn't a problem I can fix on the client whether or not I
add
> > exponential backoff. The problem needs to be addressed on the server
> > by the Solaris folks....
> 
> Is there any simple thing we could do on the client to reduce the
impact of
> these sorts of loops?

WHY? Those loops aren't supposed to happen if the server works according
to spec.

> Given that we know there are bad servers out there it might be nice to
do if
> it's not complicated.

Again WHY? We can't fix a broken server by backing off correctly formed
requests. If a server wants us to back off, it will send us
NFS4ERR_DELAY.

> (Though as a server implementer my purely selfish impulse is to leave
things
> as they are since it ensures I'll get bug reports if I screw
> up....)

You would have gotten the same bugreport. That said, I refuse to change
the client to accommodate buggy servers. The solution to the problem of
buggy servers is to fix those servers.

Trond


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:02                 ` David Flynn
@ 2011-10-29 18:22                   ` Myklebust, Trond
  2011-10-29 18:23                     ` Chuck Lever
  0 siblings, 1 reply; 28+ messages in thread
From: Myklebust, Trond @ 2011-10-29 18:22 UTC (permalink / raw)
  To: David Flynn; +Cc: linux-nfs, Chuck Lever

> -----Original Message-----
> From: David Flynn [mailto:davidf@rd.bbc.co.uk]
> Sent: Saturday, October 29, 2011 8:02 PM
> To: Myklebust, Trond
> Cc: David Flynn; linux-nfs@vger.kernel.org; Chuck Lever
> Subject: Re: NFS4ERR_STALE_CLIENTID loop
> 
> * Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> > > Using the same kernel, same mountpoint as before, we're currently
> > > experiencing a loop involving NFS4ERR_STALE_CLIENTID.
> ...
> > The problem seems like a split-brain issue on the server... On the
one
> > hand, it is happily telling us that our lease is OK when we RENEW.
> > Then when we try to use said lease in an OPEN, it is replying with
> > STALE_CLIENTID.
> 
> Thank you for the quick update, especially at the weekend.  I'm
wondering if
> it is possible that the STALE_CLIENTID issue is a by-product of the
> BAD_STATEID issue from earlier.  We have observed several times the
> BAD_STATEID loop, but the CLIENTID problem only seemed to occur when
all
> 40+ nodes were all showing problems.
> 
> After killing off sufficient processes, the some of the machines then
> recovered of their own accord.  So your conclusion that there is a
server issue
> sounds reasonable.
> 
> On any such possible backoff, the previous case was with quite small
> requests in quite a tight loop that seemed to cause the server grief.
> This morning, a machine with a 10GbE interface had a BAD_STATEID issue
but
> involving some much larger writes[1], resulting in 1.6Gbit/sec from
that
> machine alone.  Thankfully there was only a second machine with 1GbE
> interfaces bringing the total up to 2.5Gbit/sec.
> 
> It is this ability for a group of clients to make matters worse that
is just as bad
> as any fault with Solaris.

Sure, but gone are the days when NFS had "reference implementations"
that everyone had to interoperate with. NFSv4 is a fully documented
protocol which describes how both clients and servers are supposed to
work. If either the client or server fail to work according to that
documentation, then bad things will happen.

While I could litter the client code with lots of little tricks to be
"defensive" in the face of buggy servers, that isn't going to solve
anything: the server will still be buggy, and the client will still be
faced with a situation that it cannot resolve.

> (In a similar vein, it can be just as frustrating trying to get a
client to stop
> looping like this - it is often impossible to kill the process that
triggered the
> problem; for these, we had to resort to deleting the files using NFSv3
(which
> was working quite happily))

'kill -9' should in principle work to kill off the process. Was that
failing to work?

Cheers
   Trond

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:22                   ` Myklebust, Trond
@ 2011-10-29 18:23                     ` Chuck Lever
  2011-10-29 18:26                       ` Myklebust, Trond
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2011-10-29 18:23 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: David Flynn, linux-nfs


On Oct 29, 2011, at 2:22 PM, Myklebust, Trond wrote:

>> -----Original Message-----
>> From: David Flynn [mailto:davidf@rd.bbc.co.uk]
>> Sent: Saturday, October 29, 2011 8:02 PM
>> To: Myklebust, Trond
>> Cc: David Flynn; linux-nfs@vger.kernel.org; Chuck Lever
>> Subject: Re: NFS4ERR_STALE_CLIENTID loop
>> 
>> * Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
>>>> Using the same kernel, same mountpoint as before, we're currently
>>>> experiencing a loop involving NFS4ERR_STALE_CLIENTID.
>> ...
>>> The problem seems like a split-brain issue on the server... On the
> one
>>> hand, it is happily telling us that our lease is OK when we RENEW.
>>> Then when we try to use said lease in an OPEN, it is replying with
>>> STALE_CLIENTID.
>> 
>> Thank you for the quick update, especially at the weekend.  I'm
> wondering if
>> it is possible that the STALE_CLIENTID issue is a by-product of the
>> BAD_STATEID issue from earlier.  We have observed several times the
>> BAD_STATEID loop, but the CLIENTID problem only seemed to occur when
> all
>> 40+ nodes were all showing problems.
>> 
>> After killing off sufficient processes, the some of the machines then
>> recovered of their own accord.  So your conclusion that there is a
> server issue
>> sounds reasonable.
>> 
>> On any such possible backoff, the previous case was with quite small
>> requests in quite a tight loop that seemed to cause the server grief.
>> This morning, a machine with a 10GbE interface had a BAD_STATEID issue
> but
>> involving some much larger writes[1], resulting in 1.6Gbit/sec from
> that
>> machine alone.  Thankfully there was only a second machine with 1GbE
>> interfaces bringing the total up to 2.5Gbit/sec.
>> 
>> It is this ability for a group of clients to make matters worse that
> is just as bad
>> as any fault with Solaris.
> 
> Sure, but gone are the days when NFS had "reference implementations"
> that everyone had to interoperate with. NFSv4 is a fully documented
> protocol which describes how both clients and servers are supposed to
> work. If either the client or server fail to work according to that
> documentation, then bad things will happen.
> 
> While I could litter the client code with lots of little tricks to be
> "defensive" in the face of buggy servers, that isn't going to solve
> anything: the server will still be buggy, and the client will still be
> faced with a situation that it cannot resolve.
> 
>> (In a similar vein, it can be just as frustrating trying to get a
> client to stop
>> looping like this - it is often impossible to kill the process that
> triggered the
>> problem; for these, we had to resort to deleting the files using NFSv3
> (which
>> was working quite happily))
> 
> 'kill -9' should in principle work to kill off the process. Was that
> failing to work?

The trick is knowing which process to kill.  Generally you have to kill the state manager thread in this case.


-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:23                     ` Chuck Lever
@ 2011-10-29 18:26                       ` Myklebust, Trond
  2011-10-29 18:29                         ` David Flynn
  0 siblings, 1 reply; 28+ messages in thread
From: Myklebust, Trond @ 2011-10-29 18:26 UTC (permalink / raw)
  To: Chuck Lever; +Cc: David Flynn, linux-nfs

> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Saturday, October 29, 2011 8:23 PM
> To: Myklebust, Trond
> Cc: David Flynn; linux-nfs@vger.kernel.org
> Subject: Re: NFS4ERR_STALE_CLIENTID loop
> 
> 
> On Oct 29, 2011, at 2:22 PM, Myklebust, Trond wrote:
> 
> >> -----Original Message-----
> >> From: David Flynn [mailto:davidf@rd.bbc.co.uk]
> >> Sent: Saturday, October 29, 2011 8:02 PM
> >> To: Myklebust, Trond
> >> Cc: David Flynn; linux-nfs@vger.kernel.org; Chuck Lever
> >> Subject: Re: NFS4ERR_STALE_CLIENTID loop
> >>
> >> * Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> >>>> Using the same kernel, same mountpoint as before, we're currently
> >>>> experiencing a loop involving NFS4ERR_STALE_CLIENTID.
> >> ...
> >>> The problem seems like a split-brain issue on the server... On the
> > one
> >>> hand, it is happily telling us that our lease is OK when we RENEW.
> >>> Then when we try to use said lease in an OPEN, it is replying with
> >>> STALE_CLIENTID.
> >>
> >> Thank you for the quick update, especially at the weekend.  I'm
> > wondering if
> >> it is possible that the STALE_CLIENTID issue is a by-product of the
> >> BAD_STATEID issue from earlier.  We have observed several times the
> >> BAD_STATEID loop, but the CLIENTID problem only seemed to occur
when
> > all
> >> 40+ nodes were all showing problems.
> >>
> >> After killing off sufficient processes, the some of the machines
then
> >> recovered of their own accord.  So your conclusion that there is a
> > server issue
> >> sounds reasonable.
> >>
> >> On any such possible backoff, the previous case was with quite
small
> >> requests in quite a tight loop that seemed to cause the server
grief.
> >> This morning, a machine with a 10GbE interface had a BAD_STATEID
> >> issue
> > but
> >> involving some much larger writes[1], resulting in 1.6Gbit/sec from
> > that
> >> machine alone.  Thankfully there was only a second machine with
1GbE
> >> interfaces bringing the total up to 2.5Gbit/sec.
> >>
> >> It is this ability for a group of clients to make matters worse
that
> > is just as bad
> >> as any fault with Solaris.
> >
> > Sure, but gone are the days when NFS had "reference implementations"
> > that everyone had to interoperate with. NFSv4 is a fully documented
> > protocol which describes how both clients and servers are supposed
to
> > work. If either the client or server fail to work according to that
> > documentation, then bad things will happen.
> >
> > While I could litter the client code with lots of little tricks to
be
> > "defensive" in the face of buggy servers, that isn't going to solve
> > anything: the server will still be buggy, and the client will still
be
> > faced with a situation that it cannot resolve.
> >
> >> (In a similar vein, it can be just as frustrating trying to get a
> > client to stop
> >> looping like this - it is often impossible to kill the process that
> > triggered the
> >> problem; for these, we had to resort to deleting the files using
> >> NFSv3
> > (which
> >> was working quite happily))
> >
> > 'kill -9' should in principle work to kill off the process. Was that
> > failing to work?
> 
> The trick is knowing which process to kill.  Generally you have to
kill the state
> manager thread in this case.
> 

No. Not in this case: the state manager was doing RENEW, then exiting
because it was told all is A_OK by the server.
The open() system call would be the one that was looping.

Trond


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:26                       ` Myklebust, Trond
@ 2011-10-29 18:29                         ` David Flynn
  0 siblings, 0 replies; 28+ messages in thread
From: David Flynn @ 2011-10-29 18:29 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Chuck Lever, David Flynn, linux-nfs

* Myklebust, Trond (Trond.Myklebust@netapp.com) wrote:
> No. Not in this case: the state manager was doing RENEW, then exiting
> because it was told all is A_OK by the server.
> The open() system call would be the one that was looping.

Sometimes it worked, somtimes not.  From memory:
 - kill -9 worked in situations that suffered from NFS4ERR_STALE_CLIENTID
 - kill -9 failed in situations that suffered from BAD_STATEID

..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:21                   ` Myklebust, Trond
@ 2011-10-29 18:47                     ` J. Bruce Fields
  2011-10-29 18:50                       ` Chuck Lever
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2011-10-29 18:47 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: David Flynn, linux-nfs, Chuck Lever

On Sat, Oct 29, 2011 at 11:21:13AM -0700, Myklebust, Trond wrote:
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > On Sat, Oct 29, 2011 at 07:29:36PM +0200, Trond Myklebust wrote:
> > > OK. This is the first time I've seen this tcpdump.
> > >
> > > The problem seems like a split-brain issue on the server... On the
> one
> > > hand, it is happily telling us that our lease is OK when we RENEW.
> > > Then when we try to use said lease in an OPEN, it is replying with
> > > STALE_CLIENTID.
> > >
> > > IOW: This isn't a problem I can fix on the client whether or not I
> add
> > > exponential backoff. The problem needs to be addressed on the server
> > > by the Solaris folks....
> > 
> > Is there any simple thing we could do on the client to reduce the
> impact of
> > these sorts of loops?
> 
> WHY? Those loops aren't supposed to happen if the server works according
> to spec.

Yes, and it's not something I care that strongly about, really, my only
observation is that this sort of failure (an implementation bug on one
side or another resulting in a loop) seems to have been common (based on
no hard data, just my vague memories of list threads), and the results
fairly obnoxious (possibly even for unrelated hosts on the network).
So if there's some simple way to fail more gracefully it might be
helpful.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:47                     ` J. Bruce Fields
@ 2011-10-29 18:50                       ` Chuck Lever
  2011-10-29 19:19                         ` Myklebust, Trond
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2011-10-29 18:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Myklebust, Trond, David Flynn, linux-nfs


On Oct 29, 2011, at 2:47 PM, J. Bruce Fields wrote:

> On Sat, Oct 29, 2011 at 11:21:13AM -0700, Myklebust, Trond wrote:
>>> From: J. Bruce Fields [mailto:bfields@fieldses.org]
>>> On Sat, Oct 29, 2011 at 07:29:36PM +0200, Trond Myklebust wrote:
>>>> OK. This is the first time I've seen this tcpdump.
>>>> 
>>>> The problem seems like a split-brain issue on the server... On the
>> one
>>>> hand, it is happily telling us that our lease is OK when we RENEW.
>>>> Then when we try to use said lease in an OPEN, it is replying with
>>>> STALE_CLIENTID.
>>>> 
>>>> IOW: This isn't a problem I can fix on the client whether or not I
>> add
>>>> exponential backoff. The problem needs to be addressed on the server
>>>> by the Solaris folks....
>>> 
>>> Is there any simple thing we could do on the client to reduce the
>> impact of
>>> these sorts of loops?
>> 
>> WHY? Those loops aren't supposed to happen if the server works according
>> to spec.
> 
> Yes, and it's not something I care that strongly about, really, my only
> observation is that this sort of failure (an implementation bug on one
> side or another resulting in a loop) seems to have been common (based on
> no hard data, just my vague memories of list threads), and the results
> fairly obnoxious (possibly even for unrelated hosts on the network).
> So if there's some simple way to fail more gracefully it might be
> helpful.

For what it's worth, I agree that client implementations should attempt to behave more gracefully in the face of server problems, be they the result of bugs or the result of other issues specific to that server.  Problems like this make NFSv4 as a protocol look bad.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 18:50                       ` Chuck Lever
@ 2011-10-29 19:19                         ` Myklebust, Trond
  2011-10-29 19:52                           ` David Flynn
  0 siblings, 1 reply; 28+ messages in thread
From: Myklebust, Trond @ 2011-10-29 19:19 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: David Flynn, linux-nfs

> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Saturday, October 29, 2011 8:51 PM
> To: J. Bruce Fields
> Cc: Myklebust, Trond; David Flynn; linux-nfs@vger.kernel.org
> Subject: Re: NFS4ERR_STALE_CLIENTID loop
> 
> 
> On Oct 29, 2011, at 2:47 PM, J. Bruce Fields wrote:
> 
> > On Sat, Oct 29, 2011 at 11:21:13AM -0700, Myklebust, Trond wrote:
> >>> From: J. Bruce Fields [mailto:bfields@fieldses.org] On Sat, Oct
29,
> >>> 2011 at 07:29:36PM +0200, Trond Myklebust wrote:
> >>>> OK. This is the first time I've seen this tcpdump.
> >>>>
> >>>> The problem seems like a split-brain issue on the server... On
the
> >> one
> >>>> hand, it is happily telling us that our lease is OK when we
RENEW.
> >>>> Then when we try to use said lease in an OPEN, it is replying
with
> >>>> STALE_CLIENTID.
> >>>>
> >>>> IOW: This isn't a problem I can fix on the client whether or not
I
> >> add
> >>>> exponential backoff. The problem needs to be addressed on the
> >>>> server by the Solaris folks....
> >>>
> >>> Is there any simple thing we could do on the client to reduce the
> >> impact of
> >>> these sorts of loops?
> >>
> >> WHY? Those loops aren't supposed to happen if the server works
> >> according to spec.
> >
> > Yes, and it's not something I care that strongly about, really, my
> > only observation is that this sort of failure (an implementation bug
> > on one side or another resulting in a loop) seems to have been
common
> > (based on no hard data, just my vague memories of list threads), and
> > the results fairly obnoxious (possibly even for unrelated hosts on
the
> network).
> > So if there's some simple way to fail more gracefully it might be
> > helpful.
> 
> For what it's worth, I agree that client implementations should
attempt to
> behave more gracefully in the face of server problems, be they the
result of
> bugs or the result of other issues specific to that server.  Problems
like this
> make NFSv4 as a protocol look bad.

I can't see what a client can do in this situation except possibly just
give up after a while and throw a SERVER_BROKEN error (which means data
loss). That still won't make NFSv4 look good...

  Trond


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 19:19                         ` Myklebust, Trond
@ 2011-10-29 19:52                           ` David Flynn
  2011-10-29 20:42                             ` Myklebust, Trond
  2011-10-31 13:07                             ` Chuck Lever
  0 siblings, 2 replies; 28+ messages in thread
From: David Flynn @ 2011-10-29 19:52 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Chuck Lever, J. Bruce Fields, David Flynn, linux-nfs

* Myklebust, Trond (Trond.Myklebust@netapp.com) wrote:
> > -----Original Message-----
> > From: Chuck Lever [mailto:chuck.lever@oracle.com]
> > On Oct 29, 2011, at 2:47 PM, J. Bruce Fields wrote:
> > > Yes, and it's not something I care that strongly about, really, my
> > > only observation is that this sort of failure (an implementation
> > > bug on one side or another resulting in a loop) seems to have been
> > > common (based on no hard data, just my vague memories of list
> > > threads), and the results fairly obnoxious (possibly even for
> > > unrelated hosts on the network).
> > > So if there's some simple way to fail more gracefully it might be
> > > helpful.
> >
> > For what it's worth, I agree that client implementations should
> > attempt to behave more gracefully in the face of server problems, be
> > they the result of bugs or the result of other issues specific to
> > that server.  Problems like this make NFSv4 as a protocol look bad.
> 
> I can't see what a client can do in this situation except possibly just
> give up after a while and throw a SERVER_BROKEN error (which means data
> loss). That still won't make NFSv4 look good...

Indeed, it is a quite the dilemma.

I agree that giving and guaranteeing unattended data loss is bad (data
loss at the behest of an operator is ok, afterall they can always fence
a broken machine).

Looking at some of the logs again, even going back to the very original
case, it appears to be about 600us between retries (RTT=400us).  Is
there any way to make that less aggressive?, eg 1s? -- that'd reduce the
impact by three orders of magnitude.  What would be the down-side?  How
often do you expect to get a BAD_STATEID error?

..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 19:52                           ` David Flynn
@ 2011-10-29 20:42                             ` Myklebust, Trond
  2011-10-29 21:07                               ` David Flynn
  2011-10-31 13:07                             ` Chuck Lever
  1 sibling, 1 reply; 28+ messages in thread
From: Myklebust, Trond @ 2011-10-29 20:42 UTC (permalink / raw)
  To: David Flynn; +Cc: Chuck Lever, J. Bruce Fields, linux-nfs

> -----Original Message-----
> From: David Flynn [mailto:davidf@rd.bbc.co.uk]
> Sent: Saturday, October 29, 2011 9:53 PM
> To: Myklebust, Trond
> Cc: Chuck Lever; J. Bruce Fields; David Flynn;
linux-nfs@vger.kernel.org
> Subject: Re: NFS4ERR_STALE_CLIENTID loop
> 
> * Myklebust, Trond (Trond.Myklebust@netapp.com) wrote:
> > > -----Original Message-----
> > > From: Chuck Lever [mailto:chuck.lever@oracle.com] On Oct 29, 2011,
> > > at 2:47 PM, J. Bruce Fields wrote:
> > > > Yes, and it's not something I care that strongly about, really,
my
> > > > only observation is that this sort of failure (an implementation
> > > > bug on one side or another resulting in a loop) seems to have
been
> > > > common (based on no hard data, just my vague memories of list
> > > > threads), and the results fairly obnoxious (possibly even for
> > > > unrelated hosts on the network).
> > > > So if there's some simple way to fail more gracefully it might
be
> > > > helpful.
> > >
> > > For what it's worth, I agree that client implementations should
> > > attempt to behave more gracefully in the face of server problems,
be
> > > they the result of bugs or the result of other issues specific to
> > > that server.  Problems like this make NFSv4 as a protocol look
bad.
> >
> > I can't see what a client can do in this situation except possibly
> > just give up after a while and throw a SERVER_BROKEN error (which
> > means data loss). That still won't make NFSv4 look good...
> 
> Indeed, it is a quite the dilemma.
> 
> I agree that giving and guaranteeing unattended data loss is bad (data
loss at
> the behest of an operator is ok, afterall they can always fence a
broken
> machine).
> 
> Looking at some of the logs again, even going back to the very
original case, it
> appears to be about 600us between retries (RTT=400us).  Is there any
way to
> make that less aggressive?, eg 1s? -- that'd reduce the impact by
three
> orders of magnitude.  What would be the down-side?  How often do you
> expect to get a BAD_STATEID error?

BAD_STATEID is a different matter, and is one that we should have
resolved in the NFS client in the upstream kernel. At least on newer
clients, we should be trying to reopen the file and re-establish all
locks when we get a BAD_STATEID. Can you please remind us which kernel
you are using?

That said... Even on new clients, the recovery attempt may fail due to
the STALE_CLIENTID bug. That will still hit us when we call OPEN in
order to get a new stateid.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 20:42                             ` Myklebust, Trond
@ 2011-10-29 21:07                               ` David Flynn
  2011-10-29 21:12                                 ` Myklebust, Trond
  0 siblings, 1 reply; 28+ messages in thread
From: David Flynn @ 2011-10-29 21:07 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: David Flynn, Chuck Lever, J. Bruce Fields, linux-nfs

* Myklebust, Trond (Trond.Myklebust@netapp.com) wrote:
> BAD_STATEID is a different matter, and is one that we should have
> resolved in the NFS client in the upstream kernel. At least on newer
> clients, we should be trying to reopen the file and re-establish all
> locks when we get a BAD_STATEID. Can you please remind us which kernel
> you are using?

Ah, i see.  This was all on 3.0.0 and 3.0.4 (a quick check didn't reveal
any relevant changes between the two).

Are there any stable patches that can be applied to 3.0.y?

> That said... Even on new clients, the recovery attempt may fail due to
> the STALE_CLIENTID bug. That will still hit us when we call OPEN in
> order to get a new stateid.

The interval between retries on that was ~1-1.5ms, could this be made
slower? -- same questions as before really.

Regards,

..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 21:07                               ` David Flynn
@ 2011-10-29 21:12                                 ` Myklebust, Trond
  0 siblings, 0 replies; 28+ messages in thread
From: Myklebust, Trond @ 2011-10-29 21:12 UTC (permalink / raw)
  To: David Flynn; +Cc: Chuck Lever, J. Bruce Fields, linux-nfs

> -----Original Message-----
> From: David Flynn [mailto:davidf@rd.bbc.co.uk]
> Sent: Saturday, October 29, 2011 11:08 PM
> To: Myklebust, Trond
> Cc: David Flynn; Chuck Lever; J. Bruce Fields;
linux-nfs@vger.kernel.org
> Subject: Re: NFS4ERR_STALE_CLIENTID loop
> 
> * Myklebust, Trond (Trond.Myklebust@netapp.com) wrote:
> > BAD_STATEID is a different matter, and is one that we should have
> > resolved in the NFS client in the upstream kernel. At least on newer
> > clients, we should be trying to reopen the file and re-establish all
> > locks when we get a BAD_STATEID. Can you please remind us which
kernel
> > you are using?
> 
> Ah, i see.  This was all on 3.0.0 and 3.0.4 (a quick check didn't
reveal any
> relevant changes between the two).
> 
> Are there any stable patches that can be applied to 3.0.y?
> 
> > That said... Even on new clients, the recovery attempt may fail due
to
> > the STALE_CLIENTID bug. That will still hit us when we call OPEN in
> > order to get a new stateid.
> 
> The interval between retries on that was ~1-1.5ms, could this be made
> slower? -- same questions as before really.

Given that things such as reboot recovery are subject to a recovery
window (grace period), I'm, again, very reluctant to add an artificial
backoff as that may have consequences for behaviour in the bug-free
case.

If the server wants us to back off, it can say so itself by using the
NFS4ERR_DELAY error mechanism.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-29 19:52                           ` David Flynn
  2011-10-29 20:42                             ` Myklebust, Trond
@ 2011-10-31 13:07                             ` Chuck Lever
  2011-10-31 13:21                               ` David Flynn
  1 sibling, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2011-10-31 13:07 UTC (permalink / raw)
  To: David Flynn; +Cc: Myklebust, Trond, J. Bruce Fields, linux-nfs


On Oct 29, 2011, at 3:52 PM, David Flynn wrote:

> * Myklebust, Trond (Trond.Myklebust@netapp.com) wrote:
>>> -----Original Message-----
>>> From: Chuck Lever [mailto:chuck.lever@oracle.com]
>>> On Oct 29, 2011, at 2:47 PM, J. Bruce Fields wrote:
>>>> Yes, and it's not something I care that strongly about, really, my
>>>> only observation is that this sort of failure (an implementation
>>>> bug on one side or another resulting in a loop) seems to have been
>>>> common (based on no hard data, just my vague memories of list
>>>> threads), and the results fairly obnoxious (possibly even for
>>>> unrelated hosts on the network).
>>>> So if there's some simple way to fail more gracefully it might be
>>>> helpful.
>>> 
>>> For what it's worth, I agree that client implementations should
>>> attempt to behave more gracefully in the face of server problems, be
>>> they the result of bugs or the result of other issues specific to
>>> that server.  Problems like this make NFSv4 as a protocol look bad.
>> 
>> I can't see what a client can do in this situation except possibly just
>> give up after a while and throw a SERVER_BROKEN error (which means data
>> loss). That still won't make NFSv4 look good...
> 
> Indeed, it is a quite the dilemma.
> 
> I agree that giving and guaranteeing unattended data loss is bad (data
> loss at the behest of an operator is ok, afterall they can always fence
> a broken machine).

David, what would help immensely is if you can find a reliable way of reproducing this.  So far we have been unable to find a reproducer.

> Looking at some of the logs again, even going back to the very original
> case, it appears to be about 600us between retries (RTT=400us).  Is
> there any way to make that less aggressive?, eg 1s? -- that'd reduce the
> impact by three orders of magnitude.  What would be the down-side?  How
> often do you expect to get a BAD_STATEID error?
> 
> ..david

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-31 13:07                             ` Chuck Lever
@ 2011-10-31 13:21                               ` David Flynn
  2011-10-31 13:39                                 ` Chuck Lever
  0 siblings, 1 reply; 28+ messages in thread
From: David Flynn @ 2011-10-31 13:21 UTC (permalink / raw)
  To: Chuck Lever; +Cc: David Flynn, Myklebust, Trond, J. Bruce Fields, linux-nfs

* Chuck Lever (chuck.lever@oracle.com) wrote:
> David, what would help immensely is if you can find a reliable way of
> reproducing this.  So far we have been unable to find a reproducer.

While i've managed to have problems with individual machines, that seem
to fail at some random point of their own choosing, the most reliable
way to produce problem for us to have a number of nodes updating various
RRD files frequently.

Given that i haven't found a reliable and short method for reproducing
it either, if we were to re-run the known case and capture all network
traffic, would you be able to extract the relevant detail to generate a
simulation?

Regards,

..david

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: NFS4ERR_STALE_CLIENTID loop
  2011-10-31 13:21                               ` David Flynn
@ 2011-10-31 13:39                                 ` Chuck Lever
  0 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2011-10-31 13:39 UTC (permalink / raw)
  To: David Flynn; +Cc: Myklebust, Trond, J. Bruce Fields, linux-nfs


On Oct 31, 2011, at 9:21 AM, David Flynn wrote:

> * Chuck Lever (chuck.lever@oracle.com) wrote:
>> David, what would help immensely is if you can find a reliable way of
>> reproducing this.  So far we have been unable to find a reproducer.
> 
> While i've managed to have problems with individual machines, that seem
> to fail at some random point of their own choosing, the most reliable
> way to produce problem for us to have a number of nodes updating various
> RRD files frequently.
> 
> Given that i haven't found a reliable and short method for reproducing
> it either, if we were to re-run the known case and capture all network
> traffic, would you be able to extract the relevant detail to generate a
> simulation?

A reproducer would be better for us [*], but I understand the arbitrary nature of the problem.  A network trace would be an excellent start.

Now, it would be interesting if in fact the problem occurs only when multiple clients interact with a server.  In that case, capture a full network trace with snoop on your server.  We'll worry about pruning the size of the trace once you have a clean capture.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

* - A reproducer allows us to perform internal-only tests at will, and it also can confirm we've got the problem properly fixed.



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-10-31 13:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-24 10:40 NFS4 BAD_STATEID loop (kernel 3.0) David Flynn
2011-10-24 11:22 ` Trond Myklebust
2011-10-24 13:17   ` David Flynn
2011-10-24 13:32     ` Trond Myklebust
2011-10-24 14:50       ` NFS4 BAD_STATEID loop (kernel 3.0.4) David Flynn
2011-10-24 15:31         ` Trond Myklebust
2011-10-24 15:55           ` David Flynn
2011-10-27 22:17           ` David Flynn
2011-10-29  0:25             ` NFS4ERR_STALE_CLIENTID loop David Flynn
2011-10-29 17:29               ` Trond Myklebust
2011-10-29 18:02                 ` David Flynn
2011-10-29 18:22                   ` Myklebust, Trond
2011-10-29 18:23                     ` Chuck Lever
2011-10-29 18:26                       ` Myklebust, Trond
2011-10-29 18:29                         ` David Flynn
2011-10-29 18:15                 ` J. Bruce Fields
2011-10-29 18:21                   ` Myklebust, Trond
2011-10-29 18:47                     ` J. Bruce Fields
2011-10-29 18:50                       ` Chuck Lever
2011-10-29 19:19                         ` Myklebust, Trond
2011-10-29 19:52                           ` David Flynn
2011-10-29 20:42                             ` Myklebust, Trond
2011-10-29 21:07                               ` David Flynn
2011-10-29 21:12                                 ` Myklebust, Trond
2011-10-31 13:07                             ` Chuck Lever
2011-10-31 13:21                               ` David Flynn
2011-10-31 13:39                                 ` Chuck Lever
2011-10-24 13:43 ` NFS4 BAD_STATEID loop (kernel 3.0) Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.