All of lore.kernel.org
 help / color / mirror / Atom feed
* hang on xfstests generic/074
@ 2015-02-10 15:43 J. Bruce Fields
  2015-02-10 15:45 ` J. Bruce Fields
  2015-02-11 12:34 ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: J. Bruce Fields @ 2015-02-10 15:43 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

I finally got around to running xfstests as part of my regular testing
and ran across a reproduceable hang on generic/074:

[110040.300055] INFO: task fstest:22762 blocked for more than 120 seconds.
[110040.300571]       Tainted: G        W      3.19.0-rc4-00206-g53ea83c #16
[110040.301082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[110040.301656] fstest          D ffff88005246bc98 11320 22762  22761 0x00000000
[110040.302334]  ffff88005246bc98 ffff88005246bc58 0000000000009000 ffff880057c4d7d0
[110040.303185]  ffff88005246bfd8 ffff880052f84b90 ffff880057c4d7d0 0000000000000000
[110040.304041]  0000000000000000 0000000000000001 0000000000000000 ffff88005246bc58
[110040.305096] Call Trace:
[110040.305331]  [<ffffffff810b0ed7>] ? prepare_to_wait+0x27/0x90
[110040.305770]  [<ffffffffa000f492>] ? rpc_make_runnable+0xc2/0xd0 [sunrpc]
[110040.306274]  [<ffffffff81a9550d>] ?  _raw_spin_unlock_irqrestore+0x5d/0x80
[110040.306799]  [<ffffffff81a90910>] ? bit_wait+0x60/0x60
[110040.307258]  [<ffffffff810bad1d>] ?  trace_hardirqs_on_caller+0x15d/0x200
[110040.307798]  [<ffffffff810badcd>] ? trace_hardirqs_on+0xd/0x10
[110040.308283]  [<ffffffff81a90910>] ? bit_wait+0x60/0x60
[110040.308678]  [<ffffffff81a8ffc9>] schedule+0x29/0x70
[110040.309054]  [<ffffffff81a90295>] io_schedule+0x55/0x80
[110040.309510]  [<ffffffff81a90944>] bit_wait_io+0x34/0x60
[110040.309909]  [<ffffffff81a905b7>] __wait_on_bit+0x67/0x90
[110040.310313]  [<ffffffff8114e72d>] ? find_get_pages_tag+0xd/0x210
[110040.310756]  [<ffffffff8114d2c6>] wait_on_page_bit+0xb6/0xc0
[110040.311177]  [<ffffffff810b1410>] ?  autoremove_wake_function+0x40/0x40
[110040.311654]  [<ffffffff8114d422>] filemap_fdatawait_range+0xf2/0x190
[110040.312131]  [<ffffffff8114f062>] filemap_write_and_wait_range+0x42/0x70
[110040.312627]  [<ffffffffa017866f>] nfs4_file_fsync+0x5f/0xb0 [nfsv4]
[110040.313090]  [<ffffffffa0178905>] ?  nfs4_do_check_delegation+0x5/0xc0 [nfsv4]
[110040.313616]  [<ffffffff811d05a9>] vfs_fsync+0x29/0x40
[110040.313991]  [<ffffffffa010fdea>] nfs_file_flush+0x7a/0xb0 [nfs]
[110040.314421]  [<ffffffff8119b093>] filp_close+0x33/0x80
[110040.314801]  [<ffffffff811bce42>] __close_fd+0x82/0xa0
[110040.315178]  [<ffffffff8119b103>] SyS_close+0x23/0x50
[110040.315551]  [<ffffffff81a960d2>] system_call_fastpath+0x12/0x17
[110040.316247] no locks held by fstest/22762.
[110063.328025] nfs: server f21-1 not responding, still trying ...

Is this anything known?

Client and server were running some version of my tree, 19-rc4 based.

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on xfstests generic/074
  2015-02-10 15:43 hang on xfstests generic/074 J. Bruce Fields
@ 2015-02-10 15:45 ` J. Bruce Fields
  2015-02-11 12:34 ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: J. Bruce Fields @ 2015-02-10 15:45 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

On Tue, Feb 10, 2015 at 10:43:06AM -0500, bfields wrote:
> I finally got around to running xfstests as part of my regular testing
> and ran across a reproduceable hang on generic/074:

By the way, running ./check -nfs -g auto, I also see failures on:

generic/005 generic/017 generic/031 generic/032 generic/033 generic/035
generic/037 generic/053 generic/062 generic/068 generic/088 generic/089
generic/105 generic/126 generic/133 generic/184 generic/225 generic/277
generic/294 generic/306

and another 94 tests were skipped.  I haven't looked into any of those
yet.

All of this is over 4.1.

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on xfstests generic/074
  2015-02-10 15:43 hang on xfstests generic/074 J. Bruce Fields
  2015-02-10 15:45 ` J. Bruce Fields
@ 2015-02-11 12:34 ` Christoph Hellwig
  2015-03-01 15:56   ` Trond Myklebust
  2016-04-01 14:57   ` Benjamin Coddington
  1 sibling, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2015-02-11 12:34 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs

On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote:
> I finally got around to running xfstests as part of my regular testing
> and ran across a reproduceable hang on generic/074:

Yes, I reported this about half a year ago.  It was caused (or at least
unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC:
Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting
that commit fixes the issue for me.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on xfstests generic/074
  2015-02-11 12:34 ` Christoph Hellwig
@ 2015-03-01 15:56   ` Trond Myklebust
  2015-03-01 15:58     ` Trond Myklebust
  2016-04-01 14:57   ` Benjamin Coddington
  1 sibling, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2015-03-01 15:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, Anna Schumaker, Linux NFS Mailing List

On Wed, Feb 11, 2015 at 7:34 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote:
>> I finally got around to running xfstests as part of my regular testing
>> and ran across a reproduceable hang on generic/074:
>
> Yes, I reported this about half a year ago.  It was caused (or at least
> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC:
> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting
> that commit fixes the issue for me.
>

Can you please recheck with the new 'devel' branch on
git://git.linux-nfs.org/projects/trondmy/linux-nfs.git ? That fixes
the test for me.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on xfstests generic/074
  2015-03-01 15:56   ` Trond Myklebust
@ 2015-03-01 15:58     ` Trond Myklebust
  0 siblings, 0 replies; 8+ messages in thread
From: Trond Myklebust @ 2015-03-01 15:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, Anna Schumaker, Linux NFS Mailing List

On Sun, Mar 1, 2015 at 10:56 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Wed, Feb 11, 2015 at 7:34 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote:
>>> I finally got around to running xfstests as part of my regular testing
>>> and ran across a reproduceable hang on generic/074:
>>
>> Yes, I reported this about half a year ago.  It was caused (or at least
>> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC:
>> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting
>> that commit fixes the issue for me.
>>
>
> Can you please recheck with the new 'devel' branch on
> git://git.linux-nfs.org/projects/trondmy/linux-nfs.git ? That fixes
> the test for me.
>

Oops. Never mind. I was thinking of a different issue, but also with
generic/074.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on xfstests generic/074
  2015-02-11 12:34 ` Christoph Hellwig
  2015-03-01 15:56   ` Trond Myklebust
@ 2016-04-01 14:57   ` Benjamin Coddington
  2016-04-01 15:26     ` Trond Myklebust
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Coddington @ 2016-04-01 14:57 UTC (permalink / raw)
  To: Trond Myklebust, Christoph Hellwig
  Cc: J. Bruce Fields, Anna Schumaker, linux-nfs

On Wed, 11 Feb 2015, Christoph Hellwig wrote:

> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote:
> > I finally got around to running xfstests as part of my regular testing
> > and ran across a reproduceable hang on generic/074:
>
> Yes, I reported this about half a year ago.  It was caused (or at least
> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC:
> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting
> that commit fixes the issue for me.

I just ran into this.

Now that we have SO_REUSEPORT, can we get rid of
RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT?

Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on xfstests generic/074
  2016-04-01 14:57   ` Benjamin Coddington
@ 2016-04-01 15:26     ` Trond Myklebust
  2016-04-08 18:24       ` Benjamin Coddington
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2016-04-01 15:26 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Christoph Hellwig, J. Bruce Fields, Anna Schumaker,
	Linux NFS Mailing List

On Fri, Apr 1, 2016 at 10:57 AM, Benjamin Coddington
<bcodding@redhat.com> wrote:
> On Wed, 11 Feb 2015, Christoph Hellwig wrote:
>
>> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote:
>> > I finally got around to running xfstests as part of my regular testing
>> > and ran across a reproduceable hang on generic/074:
>>
>> Yes, I reported this about half a year ago.  It was caused (or at least
>> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC:
>> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting
>> that commit fixes the issue for me.
>
> I just ran into this.
>
> Now that we have SO_REUSEPORT, can we get rid of
> RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT?

They are unrelated.

If you are hitting this hang, then you have borked server that is
dropping NFSv4 RPC requests. The old behaviour of having the client
break the connection is not actually sanctioned by the NFSv4 protocol.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: hang on xfstests generic/074
  2016-04-01 15:26     ` Trond Myklebust
@ 2016-04-08 18:24       ` Benjamin Coddington
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Coddington @ 2016-04-08 18:24 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, J. Bruce Fields, Anna Schumaker,
	Linux NFS Mailing List

On Fri, 1 Apr 2016, Trond Myklebust wrote:

> On Fri, Apr 1, 2016 at 10:57 AM, Benjamin Coddington
> <bcodding@redhat.com> wrote:
> > On Wed, 11 Feb 2015, Christoph Hellwig wrote:
> >
> >> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote:
> >> > I finally got around to running xfstests as part of my regular testing
> >> > and ran across a reproduceable hang on generic/074:
> >>
> >> Yes, I reported this about half a year ago.  It was caused (or at least
> >> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC:
> >> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting
> >> that commit fixes the issue for me.
> >
> > I just ran into this.
> >
> > Now that we have SO_REUSEPORT, can we get rid of
> > RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT?
>
> They are unrelated.
>
> If you are hitting this hang, then you have borked server that is
> dropping NFSv4 RPC requests. The old behaviour of having the client
> break the connection is not actually sanctioned by the NFSv4 protocol.
>
> Cheers
>   Trond

Ah, thanks for pointing that out.  It is a server bug, I think.  I'm trying
to find out more, and I'll write about it under separate cover.

Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-08 18:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-10 15:43 hang on xfstests generic/074 J. Bruce Fields
2015-02-10 15:45 ` J. Bruce Fields
2015-02-11 12:34 ` Christoph Hellwig
2015-03-01 15:56   ` Trond Myklebust
2015-03-01 15:58     ` Trond Myklebust
2016-04-01 14:57   ` Benjamin Coddington
2016-04-01 15:26     ` Trond Myklebust
2016-04-08 18:24       ` Benjamin Coddington

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.