All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olga Kornievskaia <aglo@umich.edu>
To: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	"bfields@fieldses.org" <bfields@fieldses.org>,
	"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"joseph.qi@linux.alibaba.com" <joseph.qi@linux.alibaba.com>
Subject: Re: [bug report] task hang while testing xfstests generic/323
Date: Fri, 15 Mar 2019 16:33:57 -0400	[thread overview]
Message-ID: <CAN-5tyFHD5ZCqZ59GJzf_CAaJQZ7OXEhaxtYaSDAdXMLSA-xDA@mail.gmail.com> (raw)
In-Reply-To: <37b3d7db-0bdf-014a-adff-ea401ea92fc7@linux.alibaba.com>

On Fri, Mar 15, 2019 at 2:31 AM Jiufei Xue <jiufei.xue@linux.alibaba.com> wrote:
>
> Hi Olga,
>
> On 2019/3/11 下午11:13, Olga Kornievskaia wrote:
> > Let me double check that. I have reproduced the "infinite loop" or
> > CLOSE on the upstream (I'm looking thru the trace points from friday).
>
> Do you try to capture the packages when reproduced this issue on the
> upstream. I still lost kernel packages after some adjustment according
> to bfield's suggestion :(

Hi Jiufei,

Yes I have network trace captures but they are too big to post to the
mailing list. I have reproduced the problem on the latest upstream
origin/testing branch commit "SUNRPC: Take the transport send lock
before binding+connecting". As you have noted before infinite loops is
due to client "losing" an update to the seqid.

one packet would send out an (recovery) OPEN with slot=0 seqid=Y.
tracepoint (nfs4_open_file) would log that status=ERESTARTSYS. The rpc
task would be sent and the rpc task would receive a reply but there is
nobody there to receive it... This open that got a reply has an
updated stateid seqid which client never updates. When CLOSE is sent,
it's sent with the "old" stateid and puts the client in an infinite
loop. Btw, CLOSE is sent on the interrupted slot which should get
FALSE_RETRY which causes the client to terminate the session. But it
would still keep sending the CLOSE with the old stateid.

Some things I've noticed is that TEST_STATE op (as a part of the
nfs41_test_and _free_expired_stateid()) for some reason always has a
signal set even before issuing and RPC task so the task never
completes (ever).

I always thought that OPEN's can't be interrupted but I guess they are
since they call rpc_wait_for_completion_task() and that's a killable
event. But I don't know how to find out what's sending a signal to the
process. I'm rather stuck here trying to figure out where to go from
there. So I'm still trying to figure out what's causing the signal or
also how to recover from it that the client doesn't lose that seqid.

>
> Thanks,
> Jiufei

  reply	other threads:[~2019-03-15 20:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 10:10 [bug report] task hang while testing xfstests generic/323 Jiufei Xue
2019-02-28 22:26 ` Olga Kornievskaia
2019-02-28 23:56   ` Trond Myklebust
2019-03-01  5:19     ` Jiufei Xue
2019-03-01  5:08   ` Jiufei Xue
2019-03-01  8:49     ` Jiufei Xue
2019-03-01 13:08       ` Trond Myklebust
2019-03-02 16:34         ` Jiufei Xue
2019-03-04 15:20         ` Jiufei Xue
2019-03-04 15:50           ` Trond Myklebust
2019-03-05  5:09             ` Jiufei Xue
2019-03-05 14:45               ` Trond Myklebust
2019-03-06  9:59                 ` Jiufei Xue
2019-03-06 16:09                   ` bfields
2019-03-10 22:20                     ` Olga Kornievskaia
2019-03-11 14:30                       ` Trond Myklebust
2019-03-11 15:07                         ` Olga Kornievskaia
2019-03-11 15:13                           ` Olga Kornievskaia
2019-03-15  6:30                             ` Jiufei Xue
2019-03-15 20:33                               ` Olga Kornievskaia [this message]
2019-03-15 20:55                                 ` Trond Myklebust
2019-03-16 14:11                                 ` Jiufei Xue
2019-03-19 15:33                                   ` Olga Kornievskaia
2019-03-11 15:12                         ` Trond Myklebust
2019-03-11 15:14                           ` Olga Kornievskaia
2019-03-11 15:28                             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN-5tyFHD5ZCqZ59GJzf_CAaJQZ7OXEhaxtYaSDAdXMLSA-xDA@mail.gmail.com \
    --to=aglo@umich.edu \
    --cc=Anna.Schumaker@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=jiufei.xue@linux.alibaba.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.