All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS4 patch 08/20 (BAD_SEQID recovery)
@ 2014-03-07  9:41 Ben Taylor
  2014-03-07 13:05 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Taylor @ 2014-03-07  9:41 UTC (permalink / raw)
  To: linux-nfs

Hi

We've been getting weird occasional failures on our NFS systems where
our processing gridnodes will gradually grind to a halt (we lose a
couple of machines a day requiring a reboot - hard reboot if left long
enough). Hunting through Wireshark dumps, the problem is that the NFS
client is making repeated requests to open the same file on our
fileserver and every one has the same owner ID and a sequence ID of 0
(which the server throws out again as a bad sequence ID). I've got a
dump I can give you if you want it.

I am convinced that the problem is that described in patch 08/20 from
Chuck Lever (see http://www.spinics.net/lists/linux-nfs/msg29413.html),
where in this case the client gets the same open owner ID from the
server and retries with that, which makes the server think it's the same
request and throw it out again. In that patch Chuck added a uniqifier to
the owner ID to avoid this problem.

The problem is that we can't find any kernel versions that include that
patch - easy way to
check is look for the " therefore safely retry using a new one. We
should still warn the user though..." part - if the "warn the user" part
is there, it's not been patched (we did check other bits of the patch
too). We're running both Fedora 17 and Fedora 19 at the moment (yes, I
know 17 is EOL), neither of which includes the patch. We also can't see
it in the NFS client or server trees at

http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=2da6a698b8f7719c14eefec65e6148a48d030bb3;hb=HEAD#l2327

http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=2da6a698b8f7719c14eefec65e6148a48d030bb3;hb=HEAD#l2327

...and nor does Chuck appear to have it in his merging tree:
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=15052b81df4245e4f797adb0d0b2e523338b23cc;hb=HEAD#l2327

Can anyone tell me what happened to this patch please? Was it lost or
superseded?

TIA
Ben

-- 
Ben Taylor <benj@pml.ac.uk>, http://rsg.pml.ac.uk/
Remote Sensing Group, Plymouth Marine Laboratory
Tel: +44 (0)1752 633432, Fax: +44 (0)1752 633101


Please visit our new website at www.pml.ac.uk and follow us on Twitter  @PlymouthMarine

Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 

This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS4 patch 08/20 (BAD_SEQID recovery)
  2014-03-07  9:41 NFS4 patch 08/20 (BAD_SEQID recovery) Ben Taylor
@ 2014-03-07 13:05 ` Trond Myklebust
  2014-03-10 17:26   ` Ben Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2014-03-07 13:05 UTC (permalink / raw)
  To: Ben Taylor; +Cc: linux-nfs


On Mar 7, 2014, at 4:41, Ben Taylor <benj@pml.ac.uk> wrote:

> Hi
> 
> We've been getting weird occasional failures on our NFS systems where
> our processing gridnodes will gradually grind to a halt (we lose a
> couple of machines a day requiring a reboot - hard reboot if left long
> enough). Hunting through Wireshark dumps, the problem is that the NFS
> client is making repeated requests to open the same file on our
> fileserver and every one has the same owner ID and a sequence ID of 0
> (which the server throws out again as a bad sequence ID). I've got a
> dump I can give you if you want it.
> 
> I am convinced that the problem is that described in patch 08/20 from
> Chuck Lever (see http://www.spinics.net/lists/linux-nfs/msg29413.html),
> where in this case the client gets the same open owner ID from the
> server and retries with that, which makes the server think it's the same
> request and throw it out again. In that patch Chuck added a uniqifier to
> the owner ID to avoid this problem.
> 
> The problem is that we can't find any kernel versions that include that
> patch - easy way to
> check is look for the " therefore safely retry using a new one. We
> should still warn the user though..." part - if the "warn the user" part
> is there, it's not been patched (we did check other bits of the patch
> too). We're running both Fedora 17 and Fedora 19 at the moment (yes, I
> know 17 is EOL), neither of which includes the patch. We also can't see
> it in the NFS client or server trees at
> 
> http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=2da6a698b8f7719c14eefec65e6148a48d030bb3;hb=HEAD#l2327
> 
> http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=2da6a698b8f7719c14eefec65e6148a48d030bb3;hb=HEAD#l2327
> 
> ...and nor does Chuck appear to have it in his merging tree:
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=15052b81df4245e4f797adb0d0b2e523338b23cc;hb=HEAD#l2327
> 
> Can anyone tell me what happened to this patch please? Was it lost or
> superseded?

It was superseded by commit 95b72eb0bdef6 (NFSv4: Ensure we do not reuse open owner names), which is available in linux 3.4 and newer.
_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS4 patch 08/20 (BAD_SEQID recovery)
  2014-03-07 13:05 ` Trond Myklebust
@ 2014-03-10 17:26   ` Ben Taylor
  2014-03-10 23:23     ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Taylor @ 2014-03-10 17:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1437 bytes --]

Hi Trond

On 07/03/14 13:05, Trond Myklebust wrote:
>> Can anyone tell me what happened to this patch please? Was it lost or
>> > superseded?
> It was superseded by commit 95b72eb0bdef6 (NFSv4: Ensure we do not reuse open owner names), which is available in linux 3.4 and newer.

Many thanks. That's a puzzle then, because we're running 3.9 and up and
already have that patch (I've checked).

I've attached my Wireshark dump (or at least a subset of it -
unfortunately I don't have the original call, the dump I've got is all
the same) - don't know if this tells you anything it doesn't tell me?
I'm not exactly experienced at reading these things!

Thanks
Ben

-- 
Ben Taylor <benj@pml.ac.uk>, http://rsg.pml.ac.uk/
Remote Sensing Group, Plymouth Marine Laboratory
Tel: +44 (0)1752 633432, Fax: +44 (0)1752 633101


Please visit our new website at www.pml.ac.uk and follow us on Twitter  @PlymouthMarine

Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 

This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses.


[-- Attachment #2: nfs_bad_seqid_trimmed.dmp --]
[-- Type: application/vnd.tcpdump.pcap, Size: 1452 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS4 patch 08/20 (BAD_SEQID recovery)
  2014-03-10 17:26   ` Ben Taylor
@ 2014-03-10 23:23     ` Trond Myklebust
  2014-03-11  8:49       ` Ben Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2014-03-10 23:23 UTC (permalink / raw)
  To: Ben Taylor; +Cc: linux-nfs


On Mar 10, 2014, at 13:26, Ben Taylor <benj@pml.ac.uk> wrote:

> Hi Trond
> 
> On 07/03/14 13:05, Trond Myklebust wrote:
>>> Can anyone tell me what happened to this patch please? Was it lost or
>>>> superseded?
>> It was superseded by commit 95b72eb0bdef6 (NFSv4: Ensure we do not reuse open owner names), which is available in linux 3.4 and newer.
> 
> Many thanks. That's a puzzle then, because we're running 3.9 and up and
> already have that patch (I've checked).
> 
> I've attached my Wireshark dump (or at least a subset of it -
> unfortunately I don't have the original call, the dump I've got is all
> the same) - don't know if this tells you anything it doesn't tell me?
> I'm not exactly experienced at reading these things!
> 

It looks as if the client is trying to convert a delegation into an open stateid as part of returning that delegation, but the server is disputing the sequence id value.
What server is this?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS4 patch 08/20 (BAD_SEQID recovery)
  2014-03-10 23:23     ` Trond Myklebust
@ 2014-03-11  8:49       ` Ben Taylor
  0 siblings, 0 replies; 5+ messages in thread
From: Ben Taylor @ 2014-03-11  8:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On 10/03/14 23:23, Trond Myklebust wrote:
> 
> On Mar 10, 2014, at 13:26, Ben Taylor <benj@pml.ac.uk> wrote:
> 
>> Hi Trond
>>
>> On 07/03/14 13:05, Trond Myklebust wrote:
>>>> Can anyone tell me what happened to this patch please? Was it lost or
>>>>> superseded?
>>> It was superseded by commit 95b72eb0bdef6 (NFSv4: Ensure we do not reuse open owner names), which is available in linux 3.4 and newer.
>>
>> Many thanks. That's a puzzle then, because we're running 3.9 and up and
>> already have that patch (I've checked).
>>
>> I've attached my Wireshark dump (or at least a subset of it -
>> unfortunately I don't have the original call, the dump I've got is all
>> the same) - don't know if this tells you anything it doesn't tell me?
>> I'm not exactly experienced at reading these things!
>>
> 
> It looks as if the client is trying to convert a delegation into an open stateid as part of returning that delegation, but the server is disputing the sequence id value.
> What server is this?

It's our main user-space file server, running CentOS 6.5, kernel
version... ah. Kernel version 2.6, I only checked the client version
previously.

That's probably the issue then. Sorry, thanks for your help!

Regards
Ben

-- 
Ben Taylor <benj@pml.ac.uk>, http://rsg.pml.ac.uk/
Remote Sensing Group, Plymouth Marine Laboratory
Tel: +44 (0)1752 633432, Fax: +44 (0)1752 633101


Please visit our new website at www.pml.ac.uk and follow us on Twitter  @PlymouthMarine

Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 

This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-11  8:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-07  9:41 NFS4 patch 08/20 (BAD_SEQID recovery) Ben Taylor
2014-03-07 13:05 ` Trond Myklebust
2014-03-10 17:26   ` Ben Taylor
2014-03-10 23:23     ` Trond Myklebust
2014-03-11  8:49       ` Ben Taylor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.