All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jeff.layton@primarydata.com>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Jeff Layton <jeff.layton@primarydata.com>,
	Christoph Hellwig <hch@infradead.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Bruce Fields <bfields@fieldses.org>
Subject: Re: [PATCH] nfsd: Ensure that NFSv4 always drops the connection when dropping a request
Date: Mon, 27 Oct 2014 15:00:37 -0400	[thread overview]
Message-ID: <20141027150037.65bd3080@tlielax.poochiereds.net> (raw)
In-Reply-To: <20141024130828.1304aa8f@tlielax.poochiereds.net>

On Fri, 24 Oct 2014 13:08:28 -0400
Jeff Layton <jlayton@primarydata.com> wrote:

> On Fri, 24 Oct 2014 18:59:47 +0300
> Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
> > On Fri, Oct 24, 2014 at 5:57 PM, Jeff Layton
> > <jeff.layton@primarydata.com> wrote:
> > >> @@ -1228,6 +1231,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> > >>   dropit:
> > >>       svc_authorise(rqstp);   /* doesn't hurt to call this twice */
> > >>       dprintk("svc: svc_process dropit\n");
> > >
> > > I don't think this will fix it either. I turned the above dprintk into
> > > a normal printk and it never fired during the test. As best I can tell,
> > > svc_process_common is not returning 0 when this occurs.
> > 
> > OK. Is perhaps the "revisit canceled" triggering in svc_revisit()? I'm
> > having trouble understanding the call chain for that stuff, but it too
> > looks as if it can trigger some strange behaviour.
> > 
> 
> I don't think that's it either. I turned the dprintks in svc_revisit
> into a printks just to be sure, and they didn't fire either.
> 
> Basically, I don't think we ever do anything in svc_defer for v4.1
> requests, due to this at the top of it:
> 
>         if (rqstp->rq_arg.page_len || !rqstp->rq_usedeferral)
>                 return NULL; /* if more than a page, give up FIXME */
> 
> ...basically rq_usedeferral should be set in most cases for v4.1
> requests. It gets set when processing the compound and then unset
> afterward.
> 
> That said, I suppose you could end up deferring the request if it
> occurs before the pc_func gets called, but I haven't seen any evidence
> of that happening so far with this test.
> 
> I do concur with Christoph that I've only been able to reproduce this
> while running on the loopback interface. If I have server and client in
> different VMs, then this test runs just fine. Could this be related to
> the changes that Neil sent in recently to make loopback mounts work
> better?
> 
> One idea might be reasonable to backport 2aca5b869ace67 to something
> v3.17-ish and see whether it's still reproducible?
> 

Ok, I've made a bit more progress on this, mostly by adding a fair
number of tracepoints into the client and server request dispatching
code. I'll be posting patches to add those for eventually, but they
need a little cleanup first.

In any case, here's a typical request as it's supposed to work:

       mount.nfs-906   [002] ...1 22711.996969: xprt_transmit: xprt=0xffff8800ce961000 xid=0xa8a34513 status=0
            nfsd-678   [000] ...1 22711.997082: svc_recv: rq_xid=0xa8a34513 status=164
            nfsd-678   [000] ..s8 22711.997185: xprt_lookup_rqst: xprt=0xffff8800ce961000 xid=0xa8a34513 status=0
            nfsd-678   [000] ..s8 22711.997186: xprt_complete_rqst: xprt=0xffff8800ce961000 xid=0xa8a34513 status=140
            nfsd-678   [000] ...1 22711.997236: svc_send: rq_xid=0xa8a34513 dropme=0 status=144
            nfsd-678   [000] ...1 22711.997236: svc_process: rq_xid=0xa8a34513 dropme=0 status=144

...and here's what we see when things start hanging:

     kworker/2:2-107   [002] ...1 22741.696070: xprt_transmit: xprt=0xffff8800ce961000 xid=0xc3a84513 status=0
            nfsd-678   [002] .N.1 22741.696917: svc_recv: rq_xid=0xc3a84513 status=208
            nfsd-678   [002] ...1 22741.699890: svc_send: rq_xid=0xc3a84513 dropme=0 status=262252
            nfsd-678   [002] ...1 22741.699891: svc_process: rq_xid=0xc3a84513 dropme=0 status=262252

What's happening is that xprt_transmit is called to send the request
to the server. The server then gets it, and immediately processes it
(in less than a second) and sends the reply.

The problem is that at some point, we stop getting sk_data_ready
events on the socket, even when the server has clearly sent something
to the client. I'm unclear on why that is, but I suspect that something
is broken in the lower level socket code, possibly just in the loopback
code?

At this point, I think I'm pretty close to ready to call in the netdev
cavalry. I'll write up a synopsis of what we know so far, and send it
there (and cc it here) in the hopes that someone will have some insight.

-- 
Jeff Layton <jlayton@primarydata.com>

  reply	other threads:[~2014-10-27 19:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-20 17:36 xfstests generic/075 failure on recent Linus' tree Christoph Hellwig
2014-10-21 13:12 ` Boaz Harrosh
2014-10-22 11:46 ` Christoph Hellwig
2014-10-22 12:00   ` Trond Myklebust
2014-10-22 13:08     ` Christoph Hellwig
2014-10-24  8:14       ` Trond Myklebust
2014-10-24 10:08         ` [PATCH] nfsd: Ensure that NFSv4 always drops the connection when dropping a request Trond Myklebust
2014-10-24 11:26           ` Jeff Layton
2014-10-24 13:34             ` Jeff Layton
2014-10-24 14:30               ` Trond Myklebust
2014-10-24 14:57                 ` Jeff Layton
2014-10-24 15:59                   ` Trond Myklebust
2014-10-24 17:08                     ` Jeff Layton
2014-10-27 19:00                       ` Jeff Layton [this message]
2014-10-24 13:42           ` Christoph Hellwig
2015-01-16 13:39 ` xfstests generic/075 failure on recent Linus' tree Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141027150037.65bd3080@tlielax.poochiereds.net \
    --to=jeff.layton@primarydata.com \
    --cc=bfields@fieldses.org \
    --cc=hch@infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.