All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH, RFC] backchannel overflows
@ 2015-04-28 20:21 Christoph Hellwig
  2015-04-29 12:08 ` Kinglong Mee
  2015-04-29 14:55 ` Chuck Lever
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2015-04-28 20:21 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Currently the client will just crap out if a CB_NULL comes in at the
same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE.

I don't really understand how the spec wants to treat CB_NULL in
relation to the available backchannel slots, but given that it's
not part of the sequences in CB_SEQUENCE it somehow nees to bypass them.

If we make sure to overallocate the rpc-level buffers so that we
have more than the available NFS-level slots we should be safe from
this condition which makes a 4.1 server doing heavy recalls under
load very unhapy by not returning an NFS level reply to its layout
recalls.

I dont really like this patch much, so any idea for a better solution
would be highly welcome!

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 84326e9..7afb3ef 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp);
  * so we limit their concurrency to 1 by setting up the maximum number
  * of slots for the backchannel.
  */
-#define NFS41_BC_MIN_CALLBACKS 1
+#define NFS41_BC_MIN_CALLBACKS 2
 #define NFS41_BC_MAX_CALLBACKS 1
 
 extern unsigned int nfs_callback_set_tcpport;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-28 20:21 [PATCH, RFC] backchannel overflows Christoph Hellwig
@ 2015-04-29 12:08 ` Kinglong Mee
  2015-04-29 13:46   ` Christoph Hellwig
  2015-04-29 14:55 ` Chuck Lever
  1 sibling, 1 reply; 19+ messages in thread
From: Kinglong Mee @ 2015-04-29 12:08 UTC (permalink / raw)
  To: Christoph Hellwig, Trond Myklebust; +Cc: linux-nfs

On 4/29/2015 4:21 AM, Christoph Hellwig wrote:
> Currently the client will just crap out if a CB_NULL comes in at the
> same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE.
> 
> I don't really understand how the spec wants to treat CB_NULL in
> relation to the available backchannel slots, but given that it's
> not part of the sequences in CB_SEQUENCE it somehow nees to bypass them.
> 
> If we make sure to overallocate the rpc-level buffers so that we
> have more than the available NFS-level slots we should be safe from
> this condition which makes a 4.1 server doing heavy recalls under
> load very unhapy by not returning an NFS level reply to its layout
> recalls.
> 
> I dont really like this patch much, so any idea for a better solution
> would be highly welcome!
> 
> diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
> index 84326e9..7afb3ef 100644
> --- a/fs/nfs/callback.h
> +++ b/fs/nfs/callback.h
> @@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp);
>   * so we limit their concurrency to 1 by setting up the maximum number
>   * of slots for the backchannel.
>   */
> -#define NFS41_BC_MIN_CALLBACKS 1
> +#define NFS41_BC_MIN_CALLBACKS 2
>  #define NFS41_BC_MAX_CALLBACKS 1

Are you sure that's okay without update NFS41_BC_MAX_CALLBACKS?

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-29 12:08 ` Kinglong Mee
@ 2015-04-29 13:46   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:46 UTC (permalink / raw)
  To: Kinglong Mee; +Cc: Christoph Hellwig, Trond Myklebust, linux-nfs

On Wed, Apr 29, 2015 at 08:08:29PM +0800, Kinglong Mee wrote:
> > diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
> > index 84326e9..7afb3ef 100644
> > --- a/fs/nfs/callback.h
> > +++ b/fs/nfs/callback.h
> > @@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp);
> >   * so we limit their concurrency to 1 by setting up the maximum number
> >   * of slots for the backchannel.
> >   */
> > -#define NFS41_BC_MIN_CALLBACKS 1
> > +#define NFS41_BC_MIN_CALLBACKS 2
> >  #define NFS41_BC_MAX_CALLBACKS 1
> 
> Are you sure that's okay without update NFS41_BC_MAX_CALLBACKS?

Yes, NFS41_BC_MIN_CALLBACKS is the number of slots we tell the sunrpc
layer to allocate, while NFS41_BC_MAX_CALLBACKS is used for nfs-internal
accounting.  So having them different is intentional for this rfc.  If
we'll have to merge this instead of a better fix we defintively should
fix up the naming, though.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-28 20:21 [PATCH, RFC] backchannel overflows Christoph Hellwig
  2015-04-29 12:08 ` Kinglong Mee
@ 2015-04-29 14:55 ` Chuck Lever
  2015-04-29 14:58   ` Trond Myklebust
  2015-04-29 15:14   ` Christoph Hellwig
  1 sibling, 2 replies; 19+ messages in thread
From: Chuck Lever @ 2015-04-29 14:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond Myklebust, Linux NFS Mailing List


On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote:

> Currently the client will just crap out if a CB_NULL comes in at the
> same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE.

Under what circumstances does the server send a CB_NULL while a CB_COMPOUND
is in flight?


> I don't really understand how the spec wants to treat CB_NULL in
> relation to the available backchannel slots, but given that it's
> not part of the sequences in CB_SEQUENCE it somehow nees to bypass them.
> 
> If we make sure to overallocate the rpc-level buffers so that we
> have more than the available NFS-level slots we should be safe from
> this condition which makes a 4.1 server doing heavy recalls under
> load very unhapy by not returning an NFS level reply to its layout
> recalls.
> 
> I dont really like this patch much, so any idea for a better solution
> would be highly welcome!
> 
> diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
> index 84326e9..7afb3ef 100644
> --- a/fs/nfs/callback.h
> +++ b/fs/nfs/callback.h
> @@ -205,7 +205,7 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp);
>  * so we limit their concurrency to 1 by setting up the maximum number
>  * of slots for the backchannel.
>  */
> -#define NFS41_BC_MIN_CALLBACKS 1
> +#define NFS41_BC_MIN_CALLBACKS 2
> #define NFS41_BC_MAX_CALLBACKS 1
> 
> extern unsigned int nfs_callback_set_tcpport;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-29 14:55 ` Chuck Lever
@ 2015-04-29 14:58   ` Trond Myklebust
  2015-04-29 15:14   ` Christoph Hellwig
  1 sibling, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2015-04-29 14:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Christoph Hellwig, Linux NFS Mailing List

On Wed, Apr 29, 2015 at 10:55 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote:
>
>> Currently the client will just crap out if a CB_NULL comes in at the
>> same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE.
>
> Under what circumstances does the server send a CB_NULL while a CB_COMPOUND
> is in flight?
>

I agree with Chuck. Why does knfsd send a CB_NULL at all if the
intention is to send CB_COMPOUND?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-29 14:55 ` Chuck Lever
  2015-04-29 14:58   ` Trond Myklebust
@ 2015-04-29 15:14   ` Christoph Hellwig
  2015-04-29 15:24     ` Trond Myklebust
  1 sibling, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2015-04-29 15:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Christoph Hellwig, Trond Myklebust, Linux NFS Mailing List

On Wed, Apr 29, 2015 at 10:55:10AM -0400, Chuck Lever wrote:
> 
> On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> > Currently the client will just crap out if a CB_NULL comes in at the
> > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE.
> 
> Under what circumstances does the server send a CB_NULL while a CB_COMPOUND
> is in flight?

When a client is under heavy loads from fsx or aio-stress, and we lose
the connection (nfsd4_conn_lost is called) while doing heavy recalls.

xfstests against a server offering pnfs layouts for which the client
can't reach the storage devices is an easy reproducer.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-29 15:14   ` Christoph Hellwig
@ 2015-04-29 15:24     ` Trond Myklebust
  2015-04-29 17:34       ` J. Bruce Fields
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2015-04-29 15:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chuck Lever, Linux NFS Mailing List

On Wed, Apr 29, 2015 at 11:14 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Apr 29, 2015 at 10:55:10AM -0400, Chuck Lever wrote:
>>
>> On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote:
>>
>> > Currently the client will just crap out if a CB_NULL comes in at the
>> > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE.
>>
>> Under what circumstances does the server send a CB_NULL while a CB_COMPOUND
>> is in flight?
>
> When a client is under heavy loads from fsx or aio-stress, and we lose
> the connection (nfsd4_conn_lost is called) while doing heavy recalls.
>
> xfstests against a server offering pnfs layouts for which the client
> can't reach the storage devices is an easy reproducer.

Why does it need to do this? If the client has sent the
BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the
server knows that this is a bi-directional connection.
The difference between NFSv4 and NFSv4.1 is that the CB_NULL should
almost always be redundant, because the client initiates the
connection and it explicitly tells the server whether or not it is to
be used for the callback channel.

The CB_NULL should always be redundant.

Trond

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-29 15:24     ` Trond Myklebust
@ 2015-04-29 17:34       ` J. Bruce Fields
  2015-04-30  6:25         ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2015-04-29 17:34 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Christoph Hellwig, Chuck Lever, Linux NFS Mailing List

On Wed, Apr 29, 2015 at 11:24:02AM -0400, Trond Myklebust wrote:
> On Wed, Apr 29, 2015 at 11:14 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > On Wed, Apr 29, 2015 at 10:55:10AM -0400, Chuck Lever wrote:
> >>
> >> On Apr 28, 2015, at 4:21 PM, Christoph Hellwig <hch@infradead.org> wrote:
> >>
> >> > Currently the client will just crap out if a CB_NULL comes in at the
> >> > same time as a slot controlled CB_COMPOUND that includes a CB_SEQUENCE.
> >>
> >> Under what circumstances does the server send a CB_NULL while a CB_COMPOUND
> >> is in flight?
> >
> > When a client is under heavy loads from fsx or aio-stress, and we lose
> > the connection (nfsd4_conn_lost is called) while doing heavy recalls.
> >
> > xfstests against a server offering pnfs layouts for which the client
> > can't reach the storage devices is an easy reproducer.
> 
> Why does it need to do this? If the client has sent the
> BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the
> server knows that this is a bi-directional connection.
> The difference between NFSv4 and NFSv4.1 is that the CB_NULL should
> almost always be redundant, because the client initiates the
> connection and it explicitly tells the server whether or not it is to
> be used for the callback channel.
> 
> The CB_NULL should always be redundant.

I'd be fine with suppressing it.  I think I actually intended to but
screwed it up.  (Chuck or somebody convinced me the
NFSD4_CB_UP/UNKNOWN/DOWN logic is totally broken but I never got around
to fixing it.)

--b.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-29 17:34       ` J. Bruce Fields
@ 2015-04-30  6:25         ` Christoph Hellwig
  2015-04-30 14:34           ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2015-04-30  6:25 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond Myklebust, Christoph Hellwig, Chuck Lever, Linux NFS Mailing List

On Wed, Apr 29, 2015 at 01:34:54PM -0400, J. Bruce Fields wrote:
> > Why does it need to do this? If the client has sent the
> > BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the
> > server knows that this is a bi-directional connection.
> > The difference between NFSv4 and NFSv4.1 is that the CB_NULL should
> > almost always be redundant, because the client initiates the
> > connection and it explicitly tells the server whether or not it is to
> > be used for the callback channel.
> > 
> > The CB_NULL should always be redundant.
> 
> I'd be fine with suppressing it.  I think I actually intended to but
> screwed it up.  (Chuck or somebody convinced me the
> NFSD4_CB_UP/UNKNOWN/DOWN logic is totally broken but I never got around
> to fixing it.)

I've dived into removing CB_NULL, and fixed various major breakage
in the nfsd callback path. for which I will send you an RFC series ASAP.

However, even with that I see the "Callback slot table overflowed" from the
client under load.  I think the problem is the following:

Between sending the callback response in call_bc_transmit -> xprt_transmit
and actually releasing the request from rpc_exit_task -> xprt_release ->
xprt_free_bc_request there is race window, and between and overloaded client
and a fast connection we can hit this one easily.

My patch to increase the number of buffers for the backchannel ensures
this doesn't happen in my setup, but of course I could envinsion a
theoretical setu where the client is so slow that multiple already
processed requests might not be returned yet.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-30  6:25         ` Christoph Hellwig
@ 2015-04-30 14:34           ` Chuck Lever
  2015-04-30 14:37             ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2015-04-30 14:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, Trond Myklebust, Linux NFS Mailing List


On Apr 30, 2015, at 2:25 AM, Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Apr 29, 2015 at 01:34:54PM -0400, J. Bruce Fields wrote:
>>> Why does it need to do this? If the client has sent the
>>> BIND_CONN_TO_SESSION (which I believe that knfsd asks for), then the
>>> server knows that this is a bi-directional connection.
>>> The difference between NFSv4 and NFSv4.1 is that the CB_NULL should
>>> almost always be redundant, because the client initiates the
>>> connection and it explicitly tells the server whether or not it is to
>>> be used for the callback channel.
>>> 
>>> The CB_NULL should always be redundant.
>> 
>> I'd be fine with suppressing it.  I think I actually intended to but
>> screwed it up.  (Chuck or somebody convinced me the
>> NFSD4_CB_UP/UNKNOWN/DOWN logic is totally broken but I never got around
>> to fixing it.)
> 
> I've dived into removing CB_NULL, and fixed various major breakage
> in the nfsd callback path. for which I will send you an RFC series ASAP.
> 
> However, even with that I see the "Callback slot table overflowed" from the
> client under load.  I think the problem is the following:
> 
> Between sending the callback response in call_bc_transmit -> xprt_transmit
> and actually releasing the request from rpc_exit_task -> xprt_release ->
> xprt_free_bc_request there is race window, and between and overloaded client
> and a fast connection we can hit this one easily.
> 
> My patch to increase the number of buffers for the backchannel ensures
> this doesn't happen in my setup, but of course I could envinsion a
> theoretical setu where the client is so slow that multiple already
> processed requests might not be returned yet.

I’ve been discussing the possibility of adding more session slots on
the Linux NFS client with jlayton. We think it would be straightforward,
once the workqueue-based NFSD patches are in, to make the backchannel
service into a workqueue. Then it would be a simple matter to increase
the number of session slots.

We haven’t discussed what would be needed on the server side of this
equation, but sounds like it has some deeper problems if it is not
obeying the session slot table limits advertised by the client.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-30 14:34           ` Chuck Lever
@ 2015-04-30 14:37             ` Christoph Hellwig
       [not found]               ` <CAHQdGtRgEVXidNNYtYf4c3uS0vc6fbm-SZ5AxrY=awXYynmACw@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2015-04-30 14:37 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Christoph Hellwig, J. Bruce Fields, Trond Myklebust,
	Linux NFS Mailing List

On Thu, Apr 30, 2015 at 10:34:02AM -0400, Chuck Lever wrote:
> I?ve been discussing the possibility of adding more session slots on
> the Linux NFS client with jlayton. We think it would be straightforward,
> once the workqueue-based NFSD patches are in, to make the backchannel
> service into a workqueue. Then it would be a simple matter to increase
> the number of session slots.
> 
> We haven?t discussed what would be needed on the server side of this
> equation, but sounds like it has some deeper problems if it is not
> obeying the session slot table limits advertised by the client.

No, the client isn't obeying it's own slot limits

The problem is when the client responds to a callback it still
holds a references on rpc_rqst for a while.  If the server
sends the next callback fast enough to hit that race window the
client incorrectly rejects it.  Note that we never even get
to the nfs code that check the slot id in this case, it's low-level
sunrpc code that is the problem.

Note that with my patches the server can recover from this client
problem by resetting the connection, but that's not very optimal
behavior.

My current patch which includes the explanation is below.

---
commit 7e995048c9917766e0ae96889ef48733c2229b7e
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Apr 30 11:53:11 2015 +0200

    nfs: overallocate backchannel requests
    
    The RPC code releases the rpc_rqst much later than sending the reply
    to the server.  To avoid low-level errors that lead connection requests
    under load allocate twice as many rpc_rqst structures as callback
    slots to allow for double buffering.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 84326e9..7f79345 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -200,13 +200,18 @@ extern int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation,
 					    const nfs4_stateid *stateid);
 extern int nfs4_set_callback_sessionid(struct nfs_client *clp);
 #endif /* CONFIG_NFS_V4 */
+
 /*
  * nfs41: Callbacks are expected to not cause substantial latency,
  * so we limit their concurrency to 1 by setting up the maximum number
  * of slots for the backchannel.
+ *
+ * Note that we allocate a second RPC request structure because the
+ * RPC code holds on to the buffer for a while after sending a reply
+ * to the server.
  */
-#define NFS41_BC_MIN_CALLBACKS 1
-#define NFS41_BC_MAX_CALLBACKS 1
+#define NFS41_BC_SLOTS 1
+#define NFS41_BC_REQUESTS (NFS41_BC_SLOTS * 2)
 
 extern unsigned int nfs_callback_set_tcpport;
 extern unsigned short nfs_callback_tcpport;
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 197806f..d733f04 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -320,7 +320,7 @@ validate_seqid(struct nfs4_slot_table *tbl, struct cb_sequenceargs * args)
 	dprintk("%s enter. slotid %u seqid %u\n",
 		__func__, args->csa_slotid, args->csa_sequenceid);
 
-	if (args->csa_slotid >= NFS41_BC_MAX_CALLBACKS)
+	if (args->csa_slotid >= NFS41_BC_SLOTS)
 		return htonl(NFS4ERR_BADSLOT);
 
 	slot = tbl->slots + args->csa_slotid;
@@ -465,8 +465,8 @@ __be32 nfs4_callback_sequence(struct cb_sequenceargs *args,
 	       sizeof(res->csr_sessionid));
 	res->csr_sequenceid = args->csa_sequenceid;
 	res->csr_slotid = args->csa_slotid;
-	res->csr_highestslotid = NFS41_BC_MAX_CALLBACKS - 1;
-	res->csr_target_highestslotid = NFS41_BC_MAX_CALLBACKS - 1;
+	res->csr_highestslotid = NFS41_BC_SLOTS - 1;
+	res->csr_target_highestslotid = NFS41_BC_SLOTS - 1;
 
 out:
 	cps->clp = clp; /* put in nfs4_callback_compound */
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index e42be52..e36717c 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -248,7 +248,7 @@ static int nfs4_init_callback(struct nfs_client *clp)
 	xprt = rcu_dereference_raw(clp->cl_rpcclient->cl_xprt);
 
 	if (nfs4_has_session(clp)) {
-		error = xprt_setup_backchannel(xprt, NFS41_BC_MIN_CALLBACKS);
+		error = xprt_setup_backchannel(xprt, NFS41_BC_REQUESTS);
 		if (error < 0)
 			return error;
 	}
diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c
index e23366e..3ad4da4 100644
--- a/fs/nfs/nfs4session.c
+++ b/fs/nfs/nfs4session.c
@@ -500,7 +500,7 @@ void nfs4_destroy_session(struct nfs4_session *session)
 	rcu_read_unlock();
 	dprintk("%s Destroy backchannel for xprt %p\n",
 		__func__, xprt);
-	xprt_destroy_backchannel(xprt, NFS41_BC_MIN_CALLBACKS);
+	xprt_destroy_backchannel(xprt, NFS41_BC_REQUESTS);
 	nfs4_destroy_session_slot_tables(session);
 	kfree(session);
 }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
       [not found]               ` <CAHQdGtRgEVXidNNYtYf4c3uS0vc6fbm-SZ5AxrY=awXYynmACw@mail.gmail.com>
@ 2015-04-30 15:02                 ` Trond Myklebust
  2015-04-30 15:11                 ` Chuck Lever
  1 sibling, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2015-04-30 15:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chuck Lever, J. Bruce Fields, Linux NFS Mailing List

Apologies for the resend...

On Thu, Apr 30, 2015 at 11:01 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
>
>
> On Thu, Apr 30, 2015 at 10:37 AM, Christoph Hellwig <hch@infradead.org>
> wrote:
>>
>> On Thu, Apr 30, 2015 at 10:34:02AM -0400, Chuck Lever wrote:
>> > I?ve been discussing the possibility of adding more session slots on
>> > the Linux NFS client with jlayton. We think it would be straightforward,
>> > once the workqueue-based NFSD patches are in, to make the backchannel
>> > service into a workqueue. Then it would be a simple matter to increase
>> > the number of session slots.
>> >
>> > We haven?t discussed what would be needed on the server side of this
>> > equation, but sounds like it has some deeper problems if it is not
>> > obeying the session slot table limits advertised by the client.
>>
>> No, the client isn't obeying it's own slot limits
>>
>> The problem is when the client responds to a callback it still
>> holds a references on rpc_rqst for a while.  If the server
>> sends the next callback fast enough to hit that race window the
>> client incorrectly rejects it.  Note that we never even get
>> to the nfs code that check the slot id in this case, it's low-level
>> sunrpc code that is the problem.
>
>
> We can add dynamic allocation of a new slot as part of the backchannel reply
> transmit workload. That way we close the race without opening for violation
> of session limits.
>
> Cheers
>   Trond

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
       [not found]               ` <CAHQdGtRgEVXidNNYtYf4c3uS0vc6fbm-SZ5AxrY=awXYynmACw@mail.gmail.com>
  2015-04-30 15:02                 ` Trond Myklebust
@ 2015-04-30 15:11                 ` Chuck Lever
  2015-04-30 17:41                   ` Chuck Lever
  1 sibling, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2015-04-30 15:11 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, J. Bruce Fields, Linux NFS Mailing List


On Apr 30, 2015, at 11:01 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:

>> 
>> 
>> On Thu, Apr 30, 2015 at 10:37 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> On Thu, Apr 30, 2015 at 10:34:02AM -0400, Chuck Lever wrote:
>> > I?ve been discussing the possibility of adding more session slots on
>> > the Linux NFS client with jlayton. We think it would be straightforward,
>> > once the workqueue-based NFSD patches are in, to make the backchannel
>> > service into a workqueue. Then it would be a simple matter to increase
>> > the number of session slots.
>> >
>> > We haven?t discussed what would be needed on the server side of this
>> > equation, but sounds like it has some deeper problems if it is not
>> > obeying the session slot table limits advertised by the client.
>> 
>> No, the client isn't obeying it's own slot limits
>> 
>> The problem is when the client responds to a callback it still
>> holds a references on rpc_rqst for a while.  If the server
>> sends the next callback fast enough to hit that race window the
>> client incorrectly rejects it.  Note that we never even get
>> to the nfs code that check the slot id in this case, it's low-level
>> sunrpc code that is the problem.
> 
> We can add dynamic allocation of a new slot as part of the backchannel reply transmit workload. That way we close the race without opening for violation of session limits.

I’ll have to think about how that would affect RPC/RDMA backchannel.
Transport resources are allocated when the transport is created, and
can’t be dynamically added. (It certainly wouldn’t be a problem to
overprovision, as Christoph has done here).

I was thinking maybe using a local copy of the rpc_rqst for sending
the backchannel reply, and freeing the rpc_rqst before sending, might
close the window.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-30 15:11                 ` Chuck Lever
@ 2015-04-30 17:41                   ` Chuck Lever
  2015-05-01 17:23                     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2015-04-30 17:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, Linux NFS Mailing List, Trond Myklebust


On Apr 30, 2015, at 11:11 AM, Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Apr 30, 2015, at 11:01 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
>>> 
>>> 
>>> On Thu, Apr 30, 2015 at 10:37 AM, Christoph Hellwig <hch@infradead.org> wrote:
>>> On Thu, Apr 30, 2015 at 10:34:02AM -0400, Chuck Lever wrote:
>>>> I?ve been discussing the possibility of adding more session slots on
>>>> the Linux NFS client with jlayton. We think it would be straightforward,
>>>> once the workqueue-based NFSD patches are in, to make the backchannel
>>>> service into a workqueue. Then it would be a simple matter to increase
>>>> the number of session slots.
>>>> 
>>>> We haven?t discussed what would be needed on the server side of this
>>>> equation, but sounds like it has some deeper problems if it is not
>>>> obeying the session slot table limits advertised by the client.
>>> 
>>> No, the client isn't obeying it's own slot limits
>>> 
>>> The problem is when the client responds to a callback it still
>>> holds a references on rpc_rqst for a while.  If the server
>>> sends the next callback fast enough to hit that race window the
>>> client incorrectly rejects it.  Note that we never even get
>>> to the nfs code that check the slot id in this case, it's low-level
>>> sunrpc code that is the problem.
>> 
>> We can add dynamic allocation of a new slot as part of the backchannel reply transmit workload. That way we close the race without opening for violation of session limits.
> 
> I’ll have to think about how that would affect RPC/RDMA backchannel.
> Transport resources are allocated when the transport is created, and
> can’t be dynamically added. (It certainly wouldn’t be a problem to
> overprovision, as Christoph has done here).

We discussed this briefly during the Linux NFS town hall meeting.
I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA
can use simple overprovisioning.

This way the upper layer (NFSv4.1 client) doesn’t have to be aware of
limitations in the RPC layer mechanism.

Trond may have an additional concern that I didn’t capture.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-04-30 17:41                   ` Chuck Lever
@ 2015-05-01 17:23                     ` Christoph Hellwig
  2015-05-01 17:28                       ` Trond Myklebust
  2015-05-01 17:31                       ` Chuck Lever
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2015-05-01 17:23 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, Linux NFS Mailing List, Trond Myklebust

On Thu, Apr 30, 2015 at 01:41:02PM -0400, Chuck Lever wrote:
> We discussed this briefly during the Linux NFS town hall meeting.
> I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA
> can use simple overprovisioning.
> 
> This way the upper layer (NFSv4.1 client) doesn?t have to be aware of
> limitations in the RPC layer mechanism.
> 
> Trond may have an additional concern that I didn?t capture.

The other option would be to simply overallocate in the transport layer,
as that is the layer which causes the problem to start with.

That being said, what is the argument for doing any sort of static
allocation here?  I'm fine with doing fully dynamic allocation if that
works out fine, but a mixed static / dynamic allocation sounds like a
nightmare.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-05-01 17:23                     ` Christoph Hellwig
@ 2015-05-01 17:28                       ` Trond Myklebust
  2015-05-01 17:37                         ` Christoph Hellwig
  2015-05-01 17:31                       ` Chuck Lever
  1 sibling, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2015-05-01 17:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chuck Lever, J. Bruce Fields, Linux NFS Mailing List

On Fri, May 1, 2015 at 1:23 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Apr 30, 2015 at 01:41:02PM -0400, Chuck Lever wrote:
>> We discussed this briefly during the Linux NFS town hall meeting.
>> I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA
>> can use simple overprovisioning.
>>
>> This way the upper layer (NFSv4.1 client) doesn?t have to be aware of
>> limitations in the RPC layer mechanism.
>>
>> Trond may have an additional concern that I didn?t capture.
>
> The other option would be to simply overallocate in the transport layer,
> as that is the layer which causes the problem to start with.
>
> That being said, what is the argument for doing any sort of static
> allocation here?  I'm fine with doing fully dynamic allocation if that
> works out fine, but a mixed static / dynamic allocation sounds like a
> nightmare.

The concern is not so much static vs dynamic. The concern is limiting
incoming RPC calls to the number allowed by the NFSv4.1 session. Right
now, the static allocation enforces the limit of 1 slot that the
client offers to the server (albeit with the race) and so I want any
replacement to meet the same requirement.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-05-01 17:23                     ` Christoph Hellwig
  2015-05-01 17:28                       ` Trond Myklebust
@ 2015-05-01 17:31                       ` Chuck Lever
  1 sibling, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2015-05-01 17:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: J. Bruce Fields, Linux NFS Mailing List, Trond Myklebust


On May 1, 2015, at 1:23 PM, Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Apr 30, 2015 at 01:41:02PM -0400, Chuck Lever wrote:
>> We discussed this briefly during the Linux NFS town hall meeting.
>> I agree using dynamic slot allocation for TCP is fine, and RPC/RDMA
>> can use simple overprovisioning.
>> 
>> This way the upper layer (NFSv4.1 client) doesn?t have to be aware of
>> limitations in the RPC layer mechanism.
>> 
>> Trond may have an additional concern that I didn?t capture.
> 
> The other option would be to simply overallocate in the transport layer,
> as that is the layer which causes the problem to start with.

That’s exactly what the RDMA backchannel will do.

> That being said, what is the argument for doing any sort of static
> allocation here?  I'm fine with doing fully dynamic allocation if that
> works out fine, but a mixed static / dynamic allocation sounds like a
> nightmare.

RDMA resources must be allocated and pre-registered up front. The
RDMA transport can’t support dynamic slot allocation without adding
a lot more complexity.

The RDMA transport will need to have separate backchannel setup and
destroy methods anyway. So doing dynamic for TCP and overprovision for
RDMA will be simple.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-05-01 17:28                       ` Trond Myklebust
@ 2015-05-01 17:37                         ` Christoph Hellwig
  2015-05-01 17:47                           ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2015-05-01 17:37 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, Chuck Lever, J. Bruce Fields, Linux NFS Mailing List

On Fri, May 01, 2015 at 01:28:12PM -0400, Trond Myklebust wrote:
> The concern is not so much static vs dynamic. The concern is limiting
> incoming RPC calls to the number allowed by the NFSv4.1 session. Right
> now, the static allocation enforces the limit of 1 slot that the
> client offers to the server (albeit with the race) and so I want any
> replacement to meet the same requirement.

Either variant will allow accepting more than one backchannel
request at the RPC layer, that's the whole point.  With the simple
patch I posted it will accept a 2nd one, with dynamic allocation
the number would be potentially unbound.

But given that the NFS client already enforces the slot limit
properly in validate_seqid, and even returns the proper nfs error
for this case I don't really see what enforcing it at a lower level
we we can't even report proper errors buys us.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH, RFC] backchannel overflows
  2015-05-01 17:37                         ` Christoph Hellwig
@ 2015-05-01 17:47                           ` Trond Myklebust
  0 siblings, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2015-05-01 17:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chuck Lever, J. Bruce Fields, Linux NFS Mailing List

On Fri, May 1, 2015 at 1:37 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, May 01, 2015 at 01:28:12PM -0400, Trond Myklebust wrote:
>> The concern is not so much static vs dynamic. The concern is limiting
>> incoming RPC calls to the number allowed by the NFSv4.1 session. Right
>> now, the static allocation enforces the limit of 1 slot that the
>> client offers to the server (albeit with the race) and so I want any
>> replacement to meet the same requirement.
>
> Either variant will allow accepting more than one backchannel
> request at the RPC layer, that's the whole point.  With the simple
> patch I posted it will accept a 2nd one, with dynamic allocation
> the number would be potentially unbound.

If you do it wrong, yes. The point is if you do the allocation as part
of the send process, then you prevent unbounded growth.

> But given that the NFS client already enforces the slot limit
> properly in validate_seqid, and even returns the proper nfs error
> for this case I don't really see what enforcing it at a lower level
> we we can't even report proper errors buys us.

It buys us better congestion control. We don't even need to process the request.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-05-01 17:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-28 20:21 [PATCH, RFC] backchannel overflows Christoph Hellwig
2015-04-29 12:08 ` Kinglong Mee
2015-04-29 13:46   ` Christoph Hellwig
2015-04-29 14:55 ` Chuck Lever
2015-04-29 14:58   ` Trond Myklebust
2015-04-29 15:14   ` Christoph Hellwig
2015-04-29 15:24     ` Trond Myklebust
2015-04-29 17:34       ` J. Bruce Fields
2015-04-30  6:25         ` Christoph Hellwig
2015-04-30 14:34           ` Chuck Lever
2015-04-30 14:37             ` Christoph Hellwig
     [not found]               ` <CAHQdGtRgEVXidNNYtYf4c3uS0vc6fbm-SZ5AxrY=awXYynmACw@mail.gmail.com>
2015-04-30 15:02                 ` Trond Myklebust
2015-04-30 15:11                 ` Chuck Lever
2015-04-30 17:41                   ` Chuck Lever
2015-05-01 17:23                     ` Christoph Hellwig
2015-05-01 17:28                       ` Trond Myklebust
2015-05-01 17:37                         ` Christoph Hellwig
2015-05-01 17:47                           ` Trond Myklebust
2015-05-01 17:31                       ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.