All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olga Kornievskaia <aglo@umich.edu>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Bruce Fields <bfields@redhat.com>,
	Timo Rothenpieler <timo@rothenpieler.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Dai Ngo <dai.ngo@oracle.com>
Subject: Re: Spurious instability with NFSoRDMA under moderate load
Date: Wed, 11 Aug 2021 16:51:41 -0400	[thread overview]
Message-ID: <CAN-5tyGcPiZ83H1-NU67n1jPKrpkrk3E0xXV+d11rQJX6_MbKA@mail.gmail.com> (raw)
In-Reply-To: <D417C606-9E27-431E-B80E-EE927E62A316@oracle.com>

On Wed, Aug 11, 2021 at 4:01 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Aug 11, 2021, at 3:46 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Wed, Aug 11, 2021 at 2:52 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Aug 11, 2021, at 2:38 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>
> >>> On Wed, Aug 11, 2021 at 1:30 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Aug 11, 2021, at 12:20 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> >>>>>
> >>>>> resulting dmesg and trace logs of both client and server are attached.
> >>>>>
> >>>>> Test procedure:
> >>>>>
> >>>>> - start tracing on client and server
> >>>>> - mount NFS on client
> >>>>> - immediately run 'xfs_io -fc "copy_range testfile" testfile.copy' (which succeeds)
> >>>>> - wait 10~15 minutes for the backchannel to time out (still running 5.12.19 with the fix for that reverted)
> >>>>> - run xfs_io command again, getting stuck now
> >>>>> - let it sit there stuck for a minute, then cancel it
> >>>>> - run the command again
> >>>>> - while it's still stuck, finished recording the logs and traces
> >>>>
> >>>> The server tries to send CB_OFFLOAD when the offloaded copy
> >>>> completes, but finds the backchannel transport is not connected.
> >>>>
> >>>> The server can't report the problem until the client sends a
> >>>> SEQUENCE operation, but there's really no other traffic going
> >>>> on, so it just waits.
> >>>>
> >>>> The client eventually sends a singleton SEQUENCE to renew its
> >>>> lease. The server replies with the SEQ4_STATUS_BACKCHANNEL_FAULT
> >>>> flag set at that point. Client's recovery is to destroy that
> >>>> session and create a new one. That appears to be successful.
> >>>>
> >>>> But the server doesn't send another CB_OFFLOAD to let the client
> >>>> know the copy is complete, so the client hangs.
> >>>>
> >>>> This seems to be peculiar to COPY_OFFLOAD, but I wonder if the
> >>>> other CB operations suffer from the same "failed to retransmit
> >>>> after the CB path is restored" issue. It might not matter for
> >>>> some of them, but for others like CB_RECALL, that could be
> >>>> important.
> >>>
> >>> Thank you for the analysis Chuck (btw I haven't seen any attachments
> >>> with Timo's posts so I'm assuming some offline communication must have
> >>> happened).
> >>> ?
> >>> I'm looking at the code and wouldn't the mentioned flags be set on the
> >>> CB_SEQUENCE operation?
> >>
> >> CB_SEQUENCE is sent from server to client, and that can't work if
> >> the callback channel is down.
> >>
> >> So the server waits for the client to send a SEQUENCE and it sets
> >> the SEQ4_STATUS_BACKCHANNEL_FAULT in its reply.
> >
> > yes scratch that, this is for when CB_SEQUENCE has it in its reply.
> >
> >>> nfsd4_cb_done() has code to mark the channel
> >>> and retry (or another way of saying this, this code should generically
> >>> handle retrying whatever operation it is be it CB_OFFLOAD or
> >>> CB_RECALL)?
> >>
> >> cb_done() marks the callback fault, but as far as I can tell the
> >> RPC is terminated at that point and there is no subsequent retry.
> >> The RPC_TASK flags on the CB_OFFLOAD operation cause that RPC to
> >> fail immediately if there's no connection.
> >>
> >> And in the BACKCHANNEL_FAULT case, the bc_xprt is destroyed as
> >> part of recovery. I think that would kill all pending RPC tasks.
> >>
> >>
> >>> Is that not working (not sure if this is  a question or a
> >>> statement).... I would think that would be the place to handle this
> >>> problem.
> >>
> >> IMHO the OFFLOAD code needs to note that the CB_OFFLOAD RPC
> >> failed and then try the call again once the new backchannel is
> >> available.
> >
> > I still argue that this needs to be done generically not per operation
> > as CB_RECALL has the same problem.
>
> Probably not just CB_RECALL, but agreed, there doesn't seem to
> be any mechanism that can re-drive callback operations when the
> backchannel is replaced.
>
>
> > If CB_RECALL call is never
> > answered, rpc would fail with ETIMEOUT.
>
> Well we have a mechanism already to deal with that case, which is
> delegation revocation. That's not optimal, but at least it won't
> leave the client hanging forever.
>
>
> > nfsd4_cb_recall_done() returns
> > 1 back to the nfsd4_cb_done() which then based on the rpc task status
> > would set the callback path down but will do nothing else.
> > nfsd4_cb_offload_one() just always returns 1.
> >
> > Now given that you say during the backchannel fault case, we'll have
> > to destroy that backchannel and create a new one.
>
> BACKCHANNEL_FAULT is the particular case that Timo's reproducer
> exercises. I haven't looked closely at the CB_PATH_DOWN and
> CB_PATH_DOWN_SESSION cases, which use BIND_CONN_TO_SESSION,
> to see if those also destroy the backchannel transport.
>
> However I suspect they are just as broken. There isn't much
> regression testing around these scenarios.
>
>
> > Are we losing all those RPCs that flagged it as faulty?
>
> I believe the RPC tasks are already gone at that point. I'm
> just saying that /hypothetically/ if the RPCs were not set up
> to exit immediately, that wouldn't matter because the transport
> and client are destroyed as part of restoring the backchannel.
>
> Perhaps instead of destroying the rpc_clnt, the server should
> retain it and simply recycle the underlying transport data
> structures.
>
>
> > At this point, nothing obvious
> > comes to mind how to engineer the recovery but it should be done.
>
> Well I think we need to:
>
> 1. Review the specs to see if there's any indication of how to
> recover from this situation

I guess one fix we can consider on the client is to make waiting for
the CB_OFFLOAD time based and then use OFFLOAD_STATUS operation to
check on the copy status.

> 2. Perhaps construct a way to move CB RPCs to a workqueue so
> they can be retried indefinitely (or figure out when they should
> stop being retried: COPY_OFFLOAD_CANCEL comes to mind).
>
> 3. Keep an eye peeled for cases where a malicious or broken
> client could cause CB operations to tie up all the server's
> nfsd threads or other resources.
>
> --
> Chuck Lever
>
>
>

  parent reply	other threads:[~2021-08-11 20:51 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-16 17:29 Spurious instability with NFSoRDMA under moderate load Timo Rothenpieler
2021-05-17 16:27 ` Chuck Lever III
2021-05-17 17:37   ` Timo Rothenpieler
2021-06-21 16:06     ` Timo Rothenpieler
2021-06-21 16:28       ` Chuck Lever III
2021-08-10 12:49       ` Timo Rothenpieler
     [not found]         ` <a28b403e-42cf-3189-a4db-86d20da1b7aa@rothenpieler.org>
2021-08-10 17:17           ` Chuck Lever III
2021-08-10 21:40             ` Timo Rothenpieler
     [not found]               ` <141fdf51-2aa1-6614-fe4e-96f168cbe6cf@rothenpieler.org>
2021-08-11  0:19                 ` Chuck Lever III
     [not found]                   ` <64F9A492-44B9-4057-ABA5-C8202828A8DD@oracle.com>
     [not found]                     ` <1b8a24a9-5dba-3faf-8b0a-16e728a6051c@rothenpieler.org>
     [not found]                       ` <5DD80ADC-0A4B-4D95-8CF7-29096439DE9D@oracle.com>
     [not found]                         ` <0444ca5c-e8b6-1d80-d8a5-8469daa74970@rothenpieler.org>
     [not found]                           ` <cc2f55cd-57d4-d7c3-ed83-8b81ea60d821@rothenpieler.org>
2021-08-11 17:30                             ` Chuck Lever III
2021-08-11 18:38                               ` Olga Kornievskaia
2021-08-11 18:51                                 ` Chuck Lever III
2021-08-11 19:46                                   ` Olga Kornievskaia
2021-08-11 20:01                                     ` Chuck Lever III
2021-08-11 20:14                                       ` J. Bruce Fields
2021-08-11 20:40                                         ` Olga Kornievskaia
2021-08-12 15:40                                           ` J. Bruce Fields
2021-08-11 20:51                                       ` J. Bruce Fields
2021-08-11 20:51                                       ` Olga Kornievskaia [this message]
2021-08-12 18:13                               ` Timo Rothenpieler
2021-08-16 13:26                                 ` Chuck Lever III
2021-08-20 15:12                                   ` Chuck Lever III
2021-08-20 16:21                                     ` Timo Rothenpieler
     [not found]                                     ` <60273c2e-e946-25fb-68af-975f793e73d2@rothenpieler.org>
2021-10-29 15:14                                       ` Chuck Lever III
2021-10-29 18:17                                         ` Timo Rothenpieler
2021-10-29 19:06                                           ` Chuck Lever III
2021-08-17 21:08                                 ` Chuck Lever III
2021-08-17 21:51                                   ` Timo Rothenpieler
2021-08-17 22:55                                     ` dai.ngo
2021-08-17 23:05                                       ` dai.ngo
2021-08-18 16:55                                         ` Chuck Lever III
2021-08-18  0:03                                     ` Timo Rothenpieler
2021-05-19 15:20   ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN-5tyGcPiZ83H1-NU67n1jPKrpkrk3E0xXV+d11rQJX6_MbKA@mail.gmail.com \
    --to=aglo@umich.edu \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=timo@rothenpieler.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.