All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiufei Xue <jiufei.xue@linux.alibaba.com>
To: Trond Myklebust <trondmy@hammerspace.com>,
	"aglo@umich.edu" <aglo@umich.edu>
Cc: "bfields@fieldses.org" <bfields@fieldses.org>,
	"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"joseph.qi@linux.alibaba.com" <joseph.qi@linux.alibaba.com>
Subject: Re: [bug report] task hang while testing xfstests generic/323
Date: Fri, 1 Mar 2019 13:19:54 +0800	[thread overview]
Message-ID: <c4b68217-275d-d40c-1b7e-71fddf91f830@linux.alibaba.com> (raw)
In-Reply-To: <dae18b965a55ed36071b5296d6b1466a57878d16.camel@hammerspace.com>



On 2019/3/1 上午7:56, Trond Myklebust wrote:
> On Thu, 2019-02-28 at 17:26 -0500, Olga Kornievskaia wrote:
>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
>> jiufei.xue@linux.alibaba.com> wrote:
>>> Hi,
>>>
>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
>>> changed to zombie occasionally while a thread is hanging with the
>>> following stack:
>>>
>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>> [<0>] __fput+0xa2/0x1c0
>>> [<0>] task_work_run+0x82/0xa0
>>> [<0>] do_exit+0x2ac/0xc20
>>> [<0>] do_group_exit+0x39/0xa0
>>> [<0>] get_signal+0x1ce/0x5d0
>>> [<0>] do_signal+0x36/0x620
>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>> [<0>] do_syscall_64+0x16c/0x190
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [<0>] 0xffffffffffffffff
>>>
>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>>> NFS4ERR_OLD_STATEID), the client will retry to close the file when
>>> stateid generation number in client is lower than server.
>>>
>>> The original intention of this commit is retrying the operation
>>> while
>>> racing with an OPEN. However, in this case the stateid generation
>>> remains
>>> mismatch forever.
>>>
>>> Any suggestions?
>>
>> Can you include a network trace of the failure? Is it possible that
>> the server has crashed on reply to the close and that's why the task
>> is hung? What server are you testing against?
>>
>> I have seen trace where close would get ERR_OLD_STATEID and would
>> still retry with the same open state until it got a reply to the OPEN
>> which changed the state and when the client received reply to that,
>> it'll retry the CLOSE with the updated stateid.
> 
> I agree with Olga's assessment. The server is not allowed to randomly
> change the values of the seqid, and the client should be taking pains
> to replay any OPEN calls for which a reply is missed. The expectation
> is therefore that NFS4ERR_OLD_STATEID should always be a temporary
> state.
> 
The server bumped the seqid because of a new OPEN from another thread.
And I doubt that maybe the new OPEN task exit while receiving a signal
without update the stateid.

> If it is not, then the bugreport needs to explain why the server bumped
> the seqid without informing the client.
> 

  reply	other threads:[~2019-03-01  5:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 10:10 [bug report] task hang while testing xfstests generic/323 Jiufei Xue
2019-02-28 22:26 ` Olga Kornievskaia
2019-02-28 23:56   ` Trond Myklebust
2019-03-01  5:19     ` Jiufei Xue [this message]
2019-03-01  5:08   ` Jiufei Xue
2019-03-01  8:49     ` Jiufei Xue
2019-03-01 13:08       ` Trond Myklebust
2019-03-02 16:34         ` Jiufei Xue
2019-03-04 15:20         ` Jiufei Xue
2019-03-04 15:50           ` Trond Myklebust
2019-03-05  5:09             ` Jiufei Xue
2019-03-05 14:45               ` Trond Myklebust
2019-03-06  9:59                 ` Jiufei Xue
2019-03-06 16:09                   ` bfields
2019-03-10 22:20                     ` Olga Kornievskaia
2019-03-11 14:30                       ` Trond Myklebust
2019-03-11 15:07                         ` Olga Kornievskaia
2019-03-11 15:13                           ` Olga Kornievskaia
2019-03-15  6:30                             ` Jiufei Xue
2019-03-15 20:33                               ` Olga Kornievskaia
2019-03-15 20:55                                 ` Trond Myklebust
2019-03-16 14:11                                 ` Jiufei Xue
2019-03-19 15:33                                   ` Olga Kornievskaia
2019-03-11 15:12                         ` Trond Myklebust
2019-03-11 15:14                           ` Olga Kornievskaia
2019-03-11 15:28                             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4b68217-275d-d40c-1b7e-71fddf91f830@linux.alibaba.com \
    --to=jiufei.xue@linux.alibaba.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=bfields@fieldses.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.