linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 000 of 11] knfsd: Introduction
@ 2006-08-24  6:36 NeilBrown
  2006-08-24  6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel

Following are 11 patch for knfsd against
 2.6.18-rc4-mm2 plus  nfsd-lockdep-annotation.patch
 (the first patch being a fix for that patch).

They are all appropriate for 2.6.19, not 2.6.18.

The batch in the middle (6-9) allow nfsd to handle IO requests
of up to 1Megabyte (rather than the current limit of 32K).
This only applies on TCP connections.  This has not 
been heavily tested...
The max size to allow is configurable via the nfsd filesystem,
and defaults to something that I hope is reasonable.

This set also includes a few more scalability patches from Greg Banks.

NeilBrown


 [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix
 [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset
 [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist
 [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel
 [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket.
 [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one.
 [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom
 [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
 [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
 [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly
 [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
@ 2006-08-24  6:36 ` NeilBrown
  2006-08-24  6:36 ` [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset NeilBrown
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


nfsv2 needs the I_MUTEX_PARENT on the directory when creating 
a file too.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfsproc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c	2006-08-24 16:21:23.000000000 +1000
+++ ./fs/nfsd/nfsproc.c	2006-08-24 16:21:35.000000000 +1000
@@ -225,7 +225,7 @@ nfsd_proc_create(struct svc_rqst *rqstp,
 	nfserr = nfserr_exist;
 	if (isdotent(argp->name, argp->len))
 		goto done;
-	fh_lock(dirfhp);
+	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
 	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
 	if (IS_ERR(dchild)) {
 		nfserr = nfserrno(PTR_ERR(dchild));

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
  2006-08-24  6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
@ 2006-08-24  6:36 ` NeilBrown
  2006-08-24  6:36 ` [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist NeilBrown
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel



Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./net/sunrpc/svcsock.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c	2006-08-24 16:23:37.000000000 +1000
+++ ./net/sunrpc/svcsock.c	2006-08-24 16:23:37.000000000 +1000
@@ -49,7 +49,7 @@
  *	svc_pool->sp_lock protects most of the fields of that pool.
  * 	svc_serv->sv_lock protects sv_tempsocks, sv_permsocks, sv_tmpcnt.
  *	when both need to be taken (rare), svc_serv->sv_lock is first.
- *	BKL protects svc_serv->sv_nrthread, svc_pool->sp_nrthread
+ *	BKL protects svc_serv->sv_nrthread.
  *	svc_sock->sk_defer_lock protects the svc_sock->sk_deferred list
  *	svc_sock->sk_flags.SK_BUSY prevents a svc_sock being enqueued multiply.
  *

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
  2006-08-24  6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
  2006-08-24  6:36 ` [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset NeilBrown
@ 2006-08-24  6:36 ` NeilBrown
  2006-08-24  6:36 ` [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel NeilBrown
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


The refcount that nfsd holds on lockd is based on the number
of open sockets.
So when we close a socket, we should decrement the ref (with lockd_down).

Currently when a socket is closed via writing to the portlist
file, that doesn't happen.

So: make sure we get an error return if the socket that was requested
does is not found, and call lockd_down if it was.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfsctl.c     |    2 ++
 ./net/sunrpc/svcsock.c |    2 ++
 2 files changed, 4 insertions(+)

diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c	2006-08-24 16:24:21.000000000 +1000
+++ ./fs/nfsd/nfsctl.c	2006-08-24 16:24:21.000000000 +1000
@@ -545,6 +545,8 @@ static ssize_t write_ports(struct file *
 		if (nfsd_serv)
 			len = svc_sock_names(buf, nfsd_serv, toclose);
 		unlock_kernel();
+		if (len >= 0)
+			lockd_down();
 		kfree(toclose);
 		return len;
 	}

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c	2006-08-24 16:23:37.000000000 +1000
+++ ./net/sunrpc/svcsock.c	2006-08-24 16:24:21.000000000 +1000
@@ -493,6 +493,8 @@ svc_sock_names(char *buf, struct svc_ser
 	spin_unlock(&serv->sv_lock);
 	if (closesk)
 		svc_delete_socket(closesk);
+	else if (toclose)
+		return -ENOENT;
 	return len;
 }
 EXPORT_SYMBOL(svc_sock_names);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (2 preceding siblings ...)
  2006-08-24  6:36 ` [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist NeilBrown
@ 2006-08-24  6:36 ` NeilBrown
  2006-08-24  6:36 ` [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket NeilBrown
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel





Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfsctl.c |    2 ++
 1 file changed, 2 insertions(+)

diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c	2006-08-24 16:24:21.000000000 +1000
+++ ./fs/nfsd/nfsctl.c	2006-08-24 16:24:47.000000000 +1000
@@ -532,7 +532,9 @@ static ssize_t write_ports(struct file *
 			/* Decrease the count, but don't shutdown the
 			 * the service
 			 */
+			lock_kernel();
 			nfsd_serv->sv_nrthreads--;
+			unlock_kernel();
 		}
 		return err;
 	}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket.
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (3 preceding siblings ...)
  2006-08-24  6:36 ` [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel NeilBrown
@ 2006-08-24  6:36 ` NeilBrown
  2006-08-24  6:36 ` [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one NeilBrown
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


Arrgg.. 
We cannot 'lockd_up' before 'svc_addsock' as we don't know
the protocol yet....
So switch it around again and save the name of the
created sockets so that it can be closed if lock_up fails.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfsctl.c     |   12 ++++++------
 ./net/sunrpc/svcsock.c |    3 +++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c	2006-08-24 16:24:47.000000000 +1000
+++ ./fs/nfsd/nfsctl.c	2006-08-24 16:24:56.000000000 +1000
@@ -523,11 +523,11 @@ static ssize_t write_ports(struct file *
 		err = nfsd_create_serv();
 		if (!err) {
 			int proto = 0;
-			err = lockd_up(proto);
-			if (!err) {
-				err = svc_addsock(nfsd_serv, fd, buf, &proto);
-				if (err)
-					lockd_down();
+			err = svc_addsock(nfsd_serv, fd, buf, &proto);
+			if (err >= 0) {
+				err = lockd_up(proto);
+				if (err < 0)
+					svc_sock_names(buf+strlen(buf)+1, nfsd_serv, buf);
 			}
 			/* Decrease the count, but don't shutdown the
 			 * the service
@@ -536,7 +536,7 @@ static ssize_t write_ports(struct file *
 			nfsd_serv->sv_nrthreads--;
 			unlock_kernel();
 		}
-		return err;
+		return err < 0 ? err : 0;
 	}
 	if (buf[0] == '-') {
 		char *toclose = kstrdup(buf+1, GFP_KERNEL);

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c	2006-08-24 16:24:21.000000000 +1000
+++ ./net/sunrpc/svcsock.c	2006-08-24 16:24:56.000000000 +1000
@@ -492,6 +492,9 @@ svc_sock_names(char *buf, struct svc_ser
 	}
 	spin_unlock(&serv->sv_lock);
 	if (closesk)
+		/* Should unregister with portmap, but you cannot
+		 * unregister just one protocol...
+		 */
 		svc_delete_socket(closesk);
 	else if (toclose)
 		return -ENOENT;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one.
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (4 preceding siblings ...)
  2006-08-24  6:36 ` [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket NeilBrown
@ 2006-08-24  6:36 ` NeilBrown
  2006-08-24  6:37 ` [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom NeilBrown
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


We are planning to increase RPCSVC_MAXPAGES from about
8 to about 256.  This means we need to be a bit careful
about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays.
However the there are never more that RPCSVC_MAXPAGES
pages in the two arrays together, so only one array is needed.

The two arrays are for the pages holding the request,
and the pages holding the reply.
Instead of two arrays, we can simply keep an index into
where the first reply page is.

This patch also removes a number of small inline functions that
probably server to obscure what is going on rather than clarify it,
and opencode the needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0.
i.e. if the response 'xdr' structure has a non-empty tail it is
always in the same pages as the head.

 check counters are initilised and incr properly
 check for consistant usage of ++ etc
 maybe extra some inlines for common approach
 general review


Description...

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfs2acl.c                 |    2 -
 ./fs/nfsd/nfs3acl.c                 |    2 -
 ./fs/nfsd/nfs3xdr.c                 |   23 ++++--------
 ./fs/nfsd/nfs4xdr.c                 |   27 ++++++--------
 ./fs/nfsd/nfsxdr.c                  |   13 ++----
 ./fs/nfsd/vfs.c                     |   16 +++++---
 ./include/linux/sunrpc/svc.h        |   69 +++++-------------------------------
 ./net/sunrpc/auth_gss/svcauth_gss.c |    4 --
 ./net/sunrpc/svc.c                  |   21 +++-------
 ./net/sunrpc/svcsock.c              |   40 ++++++++++----------
 10 files changed, 76 insertions(+), 141 deletions(-)

diff .prev/fs/nfsd/nfs2acl.c ./fs/nfsd/nfs2acl.c
--- .prev/fs/nfsd/nfs2acl.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs2acl.c	2006-08-24 16:25:13.000000000 +1000
@@ -241,7 +241,7 @@ static int nfsaclsvc_encode_getaclres(st
 
 	rqstp->rq_res.page_len = w;
 	while (w > 0) {
-		if (!svc_take_res_page(rqstp))
+		if (!rqstp->rq_respages[rqstp->rq_resused++])
 			return 0;
 		w -= PAGE_SIZE;
 	}

diff .prev/fs/nfsd/nfs3acl.c ./fs/nfsd/nfs3acl.c
--- .prev/fs/nfsd/nfs3acl.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs3acl.c	2006-08-24 16:25:13.000000000 +1000
@@ -185,7 +185,7 @@ static int nfs3svc_encode_getaclres(stru
 
 		rqstp->rq_res.page_len = w;
 		while (w > 0) {
-			if (!svc_take_res_page(rqstp))
+			if (!rqstp->rq_respages[rqstp->rq_resused++])
 				return 0;
 			w -= PAGE_SIZE;
 		}

diff .prev/fs/nfsd/nfs3xdr.c ./fs/nfsd/nfs3xdr.c
--- .prev/fs/nfsd/nfs3xdr.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs3xdr.c	2006-08-24 16:25:13.000000000 +1000
@@ -343,8 +343,7 @@ nfs3svc_decode_readargs(struct svc_rqst 
 	/* set up the kvec */
 	v=0;
 	while (len > 0) {
-		pn = rqstp->rq_resused;
-		svc_take_page(rqstp);
+		pn = rqstp->rq_resused++;
 		args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
 		args->vec[v].iov_len = len < PAGE_SIZE? len : PAGE_SIZE;
 		len -= args->vec[v].iov_len;
@@ -382,7 +381,7 @@ nfs3svc_decode_writeargs(struct svc_rqst
 	while (len > args->vec[v].iov_len) {
 		len -= args->vec[v].iov_len;
 		v++;
-		args->vec[v].iov_base = page_address(rqstp->rq_argpages[v]);
+		args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
 		args->vec[v].iov_len = PAGE_SIZE;
 	}
 	args->vec[v].iov_len = len;
@@ -446,11 +445,11 @@ nfs3svc_decode_symlinkargs(struct svc_rq
 	 * This page appears in the rq_res.pages list, but as pages_len is always
 	 * 0, it won't get in the way
 	 */
-	svc_take_page(rqstp);
 	len = ntohl(*p++);
 	if (len == 0 || len > NFS3_MAXPATHLEN || len >= PAGE_SIZE)
 		return 0;
-	args->tname = new = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+	args->tname = new =
+		page_address(rqstp->rq_respages[rqstp->rq_resused++]);
 	args->tlen = len;
 	/* first copy and check from the first page */
 	old = (char*)p;
@@ -522,8 +521,8 @@ nfs3svc_decode_readlinkargs(struct svc_r
 {
 	if (!(p = decode_fh(p, &args->fh)))
 		return 0;
-	svc_take_page(rqstp);
-	args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+	args->buffer =
+		page_address(rqstp->rq_respages[rqstp->rq_resused++]);
 
 	return xdr_argsize_check(rqstp, p);
 }
@@ -554,8 +553,8 @@ nfs3svc_decode_readdirargs(struct svc_rq
 	if (args->count > PAGE_SIZE)
 		args->count = PAGE_SIZE;
 
-	svc_take_page(rqstp);
-	args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+	args->buffer =
+		page_address(rqstp->rq_respages[rqstp->rq_resused++]);
 
 	return xdr_argsize_check(rqstp, p);
 }
@@ -578,8 +577,7 @@ nfs3svc_decode_readdirplusargs(struct sv
 	args->count = len;
 
 	while (len > 0) {
-		pn = rqstp->rq_resused;
-		svc_take_page(rqstp);
+		pn = rqstp->rq_resused++;
 		if (!args->buffer)
 			args->buffer = page_address(rqstp->rq_respages[pn]);
 		len -= PAGE_SIZE;
@@ -668,7 +666,6 @@ nfs3svc_encode_readlinkres(struct svc_rq
 		rqstp->rq_res.page_len = resp->len;
 		if (resp->len & 3) {
 			/* need to pad the tail */
-			rqstp->rq_restailpage = 0;
 			rqstp->rq_res.tail[0].iov_base = p;
 			*p = 0;
 			rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3);
@@ -693,7 +690,6 @@ nfs3svc_encode_readres(struct svc_rqst *
 		rqstp->rq_res.page_len = resp->count;
 		if (resp->count & 3) {
 			/* need to pad the tail */
-			rqstp->rq_restailpage = 0;
 			rqstp->rq_res.tail[0].iov_base = p;
 			*p = 0;
 			rqstp->rq_res.tail[0].iov_len = 4 - (resp->count & 3);
@@ -768,7 +764,6 @@ nfs3svc_encode_readdirres(struct svc_rqs
 		rqstp->rq_res.page_len = (resp->count) << 2;
 
 		/* add the 'tail' to the end of the 'head' page - page 0. */
-		rqstp->rq_restailpage = 0;
 		rqstp->rq_res.tail[0].iov_base = p;
 		*p++ = 0;		/* no more entries */
 		*p++ = htonl(resp->common.err == nfserr_eof);

diff .prev/fs/nfsd/nfs4xdr.c ./fs/nfsd/nfs4xdr.c
--- .prev/fs/nfsd/nfs4xdr.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs4xdr.c	2006-08-24 16:25:13.000000000 +1000
@@ -2040,7 +2040,8 @@ nfsd4_encode_open_downgrade(struct nfsd4
 }
 
 static int
-nfsd4_encode_read(struct nfsd4_compoundres *resp, int nfserr, struct nfsd4_read *read)
+nfsd4_encode_read(struct nfsd4_compoundres *resp, int nfserr,
+		  struct nfsd4_read *read)
 {
 	u32 eof;
 	int v, pn;
@@ -2062,10 +2063,11 @@ nfsd4_encode_read(struct nfsd4_compoundr
 	len = maxcount;
 	v = 0;
 	while (len > 0) {
-		pn = resp->rqstp->rq_resused;
-		svc_take_page(resp->rqstp);
-		read->rd_iov[v].iov_base = page_address(resp->rqstp->rq_respages[pn]);
-		read->rd_iov[v].iov_len = len < PAGE_SIZE ? len : PAGE_SIZE;
+		pn = resp->rqstp->rq_resused++;
+		read->rd_iov[v].iov_base =
+			page_address(resp->rqstp->rq_respages[pn]);
+		read->rd_iov[v].iov_len =
+			len < PAGE_SIZE ? len : PAGE_SIZE;
 		v++;
 		len -= PAGE_SIZE;
 	}
@@ -2079,7 +2081,8 @@ nfsd4_encode_read(struct nfsd4_compoundr
 		nfserr = nfserr_inval;
 	if (nfserr)
 		return nfserr;
-	eof = (read->rd_offset + maxcount >= read->rd_fhp->fh_dentry->d_inode->i_size);
+	eof = (read->rd_offset + maxcount >=
+	       read->rd_fhp->fh_dentry->d_inode->i_size);
 
 	WRITE32(eof);
 	WRITE32(maxcount);
@@ -2089,7 +2092,6 @@ nfsd4_encode_read(struct nfsd4_compoundr
 	resp->xbuf->page_len = maxcount;
 
 	/* Use rest of head for padding and remaining ops: */
-	resp->rqstp->rq_restailpage = 0;
 	resp->xbuf->tail[0].iov_base = p;
 	resp->xbuf->tail[0].iov_len = 0;
 	if (maxcount&3) {
@@ -2114,8 +2116,7 @@ nfsd4_encode_readlink(struct nfsd4_compo
 	if (resp->xbuf->page_len)
 		return nfserr_resource;
 
-	svc_take_page(resp->rqstp);
-	page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
+	page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]);
 
 	maxcount = PAGE_SIZE;
 	RESERVE_SPACE(4);
@@ -2139,7 +2140,6 @@ nfsd4_encode_readlink(struct nfsd4_compo
 	resp->xbuf->page_len = maxcount;
 
 	/* Use rest of head for padding and remaining ops: */
-	resp->rqstp->rq_restailpage = 0;
 	resp->xbuf->tail[0].iov_base = p;
 	resp->xbuf->tail[0].iov_len = 0;
 	if (maxcount&3) {
@@ -2190,8 +2190,7 @@ nfsd4_encode_readdir(struct nfsd4_compou
 		goto err_no_verf;
 	}
 
-	svc_take_page(resp->rqstp);
-	page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
+	page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]);
 	readdir->common.err = 0;
 	readdir->buflen = maxcount;
 	readdir->buffer = page;
@@ -2216,10 +2215,10 @@ nfsd4_encode_readdir(struct nfsd4_compou
 	p = readdir->buffer;
 	*p++ = 0;	/* no more entries */
 	*p++ = htonl(readdir->common.err == nfserr_eof);
-	resp->xbuf->page_len = ((char*)p) - (char*)page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
+	resp->xbuf->page_len = ((char*)p) - (char*)page_address(
+		resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
 
 	/* Use rest of head for padding and remaining ops: */
-	resp->rqstp->rq_restailpage = 0;
 	resp->xbuf->tail[0].iov_base = tailbase;
 	resp->xbuf->tail[0].iov_len = 0;
 	resp->p = resp->xbuf->tail[0].iov_base;

diff .prev/fs/nfsd/nfsxdr.c ./fs/nfsd/nfsxdr.c
--- .prev/fs/nfsd/nfsxdr.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfsxdr.c	2006-08-24 16:25:13.000000000 +1000
@@ -262,8 +262,7 @@ nfssvc_decode_readargs(struct svc_rqst *
 	 */
 	v=0;
 	while (len > 0) {
-		pn=rqstp->rq_resused;
-		svc_take_page(rqstp);
+		pn = rqstp->rq_resused++;
 		args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
 		args->vec[v].iov_len = len < PAGE_SIZE?len:PAGE_SIZE;
 		len -= args->vec[v].iov_len;
@@ -295,7 +294,7 @@ nfssvc_decode_writeargs(struct svc_rqst 
 	while (len > args->vec[v].iov_len) {
 		len -= args->vec[v].iov_len;
 		v++;
-		args->vec[v].iov_base = page_address(rqstp->rq_argpages[v]);
+		args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
 		args->vec[v].iov_len = PAGE_SIZE;
 	}
 	args->vec[v].iov_len = len;
@@ -333,8 +332,7 @@ nfssvc_decode_readlinkargs(struct svc_rq
 {
 	if (!(p = decode_fh(p, &args->fh)))
 		return 0;
-	svc_take_page(rqstp);
-	args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+	args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused++]);
 
 	return xdr_argsize_check(rqstp, p);
 }
@@ -375,8 +373,7 @@ nfssvc_decode_readdirargs(struct svc_rqs
 	if (args->count > PAGE_SIZE)
 		args->count = PAGE_SIZE;
 
-	svc_take_page(rqstp);
-	args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+	args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused++]);
 
 	return xdr_argsize_check(rqstp, p);
 }
@@ -416,7 +413,6 @@ nfssvc_encode_readlinkres(struct svc_rqs
 	rqstp->rq_res.page_len = resp->len;
 	if (resp->len & 3) {
 		/* need to pad the tail */
-		rqstp->rq_restailpage = 0;
 		rqstp->rq_res.tail[0].iov_base = p;
 		*p = 0;
 		rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3);
@@ -436,7 +432,6 @@ nfssvc_encode_readres(struct svc_rqst *r
 	rqstp->rq_res.page_len = resp->count;
 	if (resp->count & 3) {
 		/* need to pad the tail */
-		rqstp->rq_restailpage = 0;
 		rqstp->rq_res.tail[0].iov_base = p;
 		*p = 0;
 		rqstp->rq_res.tail[0].iov_len = 4 - (resp->count&3);

diff .prev/fs/nfsd/vfs.c ./fs/nfsd/vfs.c
--- .prev/fs/nfsd/vfs.c	2006-08-24 16:20:33.000000000 +1000
+++ ./fs/nfsd/vfs.c	2006-08-24 16:25:13.000000000 +1000
@@ -791,22 +791,26 @@ nfsd_read_actor(read_descriptor_t *desc,
 {
 	unsigned long count = desc->count;
 	struct svc_rqst *rqstp = desc->arg.data;
+	struct page **pp = rqstp->rq_respages + rqstp->rq_resused;
 
 	if (size > count)
 		size = count;
 
 	if (rqstp->rq_res.page_len == 0) {
 		get_page(page);
-		rqstp->rq_respages[rqstp->rq_resused++] = page;
+		put_page(*pp);
+		*pp = page;
+		rqstp->rq_resused++;
 		rqstp->rq_res.page_base = offset;
 		rqstp->rq_res.page_len = size;
-	} else if (page != rqstp->rq_respages[rqstp->rq_resused-1]) {
+	} else if (page != pp[-1]) {
 		get_page(page);
-		rqstp->rq_respages[rqstp->rq_resused++] = page;
+		put_page(*pp);
+		*pp = page;
+		rqstp->rq_resused++;
 		rqstp->rq_res.page_len += size;
-	} else {
+	} else
 		rqstp->rq_res.page_len += size;
-	}
 
 	desc->count = count - size;
 	desc->written += size;
@@ -840,7 +844,7 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 		file->f_ra = ra->p_ra;
 
 	if (file->f_op->sendfile && rqstp->rq_sendfile_ok) {
-		svc_pushback_unused_pages(rqstp);
+		rqstp->rq_resused = 1;
 		err = file->f_op->sendfile(file, &offset, *count,
 						 nfsd_read_actor, rqstp);
 	} else {

diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h	2006-08-24 16:25:13.000000000 +1000
+++ ./include/linux/sunrpc/svc.h	2006-08-24 16:25:13.000000000 +1000
@@ -153,7 +153,6 @@ static inline void svc_putu32(struct kve
 /*
  * The context of a single thread, including the request currently being
  * processed.
- * NOTE: First two items must be prev/next.
  */
 struct svc_rqst {
 	struct list_head	rq_list;	/* idle list */
@@ -172,12 +171,9 @@ struct svc_rqst {
 
 	struct xdr_buf		rq_arg;
 	struct xdr_buf		rq_res;
-	struct page *		rq_argpages[RPCSVC_MAXPAGES];
-	struct page *		rq_respages[RPCSVC_MAXPAGES];
-	int			rq_restailpage;
-	short			rq_argused;	/* pages used for argument */
-	short			rq_arghi;	/* pages available in argument page list */
-	short			rq_resused;	/* pages used for result */
+	struct page *		rq_pages[RPCSVC_MAXPAGES];
+	struct page *		*rq_respages;	/* points into rq_pages */
+	int			rq_resused;	/* number of pages used for result */
 
 	u32			rq_xid;		/* transmission id */
 	u32			rq_prog;	/* program number */
@@ -238,63 +234,18 @@ xdr_ressize_check(struct svc_rqst *rqstp
 	return vec->iov_len <= PAGE_SIZE;
 }
 
-static inline struct page *
-svc_take_res_page(struct svc_rqst *rqstp)
+static inline void svc_free_res_pages(struct svc_rqst *rqstp)
 {
-	if (rqstp->rq_arghi <= rqstp->rq_argused)
-		return NULL;
-	rqstp->rq_arghi--;
-	rqstp->rq_respages[rqstp->rq_resused] =
-		rqstp->rq_argpages[rqstp->rq_arghi];
-	return rqstp->rq_respages[rqstp->rq_resused++];
-}
-
-static inline void svc_take_page(struct svc_rqst *rqstp)
-{
-	if (rqstp->rq_arghi <= rqstp->rq_argused) {
-		WARN_ON(1);
-		return;
-	}
-	rqstp->rq_arghi--;
-	rqstp->rq_respages[rqstp->rq_resused] =
-		rqstp->rq_argpages[rqstp->rq_arghi];
-	rqstp->rq_resused++;
-}
-
-static inline void svc_pushback_allpages(struct svc_rqst *rqstp)
-{
-        while (rqstp->rq_resused) {
-		if (rqstp->rq_respages[--rqstp->rq_resused] == NULL)
-			continue;
-		rqstp->rq_argpages[rqstp->rq_arghi++] =
-			rqstp->rq_respages[rqstp->rq_resused];
-		rqstp->rq_respages[rqstp->rq_resused] = NULL;
-	}
-}
-
-static inline void svc_pushback_unused_pages(struct svc_rqst *rqstp)
-{
-	while (rqstp->rq_resused &&
-	       rqstp->rq_res.pages != &rqstp->rq_respages[rqstp->rq_resused]) {
-
-		if (rqstp->rq_respages[--rqstp->rq_resused] != NULL) {
-			rqstp->rq_argpages[rqstp->rq_arghi++] =
-				rqstp->rq_respages[rqstp->rq_resused];
-			rqstp->rq_respages[rqstp->rq_resused] = NULL;
+	while (rqstp->rq_resused) {
+		struct page **pp = (rqstp->rq_respages +
+				    --rqstp->rq_resused);
+		if (*pp) {
+			put_page(*pp);
+			*pp = NULL;
 		}
 	}
 }
 
-static inline void svc_free_allpages(struct svc_rqst *rqstp)
-{
-        while (rqstp->rq_resused) {
-		if (rqstp->rq_respages[--rqstp->rq_resused] == NULL)
-			continue;
-		put_page(rqstp->rq_respages[rqstp->rq_resused]);
-		rqstp->rq_respages[rqstp->rq_resused] = NULL;
-	}
-}
-
 struct svc_deferred_req {
 	u32			prot;	/* protocol (UDP or TCP) */
 	struct sockaddr_in	addr;

diff .prev/net/sunrpc/auth_gss/svcauth_gss.c ./net/sunrpc/auth_gss/svcauth_gss.c
--- .prev/net/sunrpc/auth_gss/svcauth_gss.c	2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/auth_gss/svcauth_gss.c	2006-08-24 16:25:13.000000000 +1000
@@ -1191,7 +1191,6 @@ svcauth_gss_wrap_resp_integ(struct svc_r
 		resbuf->tail[0].iov_base = resbuf->head[0].iov_base
 						+ resbuf->head[0].iov_len;
 		resbuf->tail[0].iov_len = 0;
-		rqstp->rq_restailpage = 0;
 		resv = &resbuf->tail[0];
 	} else {
 		resv = &resbuf->tail[0];
@@ -1240,7 +1239,7 @@ svcauth_gss_wrap_resp_priv(struct svc_rq
 	inpages = resbuf->pages;
 	/* XXX: Would be better to write some xdr helper functions for
 	 * nfs{2,3,4}xdr.c that place the data right, instead of copying: */
-	if (resbuf->tail[0].iov_base && rqstp->rq_restailpage == 0) {
+	if (resbuf->tail[0].iov_base) {
 		BUG_ON(resbuf->tail[0].iov_base >= resbuf->head[0].iov_base
 							+ PAGE_SIZE);
 		BUG_ON(resbuf->tail[0].iov_base < resbuf->head[0].iov_base);
@@ -1258,7 +1257,6 @@ svcauth_gss_wrap_resp_priv(struct svc_rq
 		resbuf->tail[0].iov_base = resbuf->head[0].iov_base
 			+ resbuf->head[0].iov_len + RPC_MAX_AUTH_SIZE;
 		resbuf->tail[0].iov_len = 0;
-		rqstp->rq_restailpage = 0;
 	}
 	if (gss_wrap(gsd->rsci->mechctx, offset, resbuf, inpages))
 		return -ENOMEM;

diff .prev/net/sunrpc/svc.c ./net/sunrpc/svc.c
--- .prev/net/sunrpc/svc.c	2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/svc.c	2006-08-24 16:25:13.000000000 +1000
@@ -417,18 +417,15 @@ svc_init_buffer(struct svc_rqst *rqstp, 
 	if (size > RPCSVC_MAXPAYLOAD)
 		size = RPCSVC_MAXPAYLOAD;
 	pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
-	rqstp->rq_argused = 0;
-	rqstp->rq_resused = 0;
 	arghi = 0;
 	BUG_ON(pages > RPCSVC_MAXPAGES);
 	while (pages) {
 		struct page *p = alloc_page(GFP_KERNEL);
 		if (!p)
 			break;
-		rqstp->rq_argpages[arghi++] = p;
+		rqstp->rq_pages[arghi++] = p;
 		pages--;
 	}
-	rqstp->rq_arghi = arghi;
 	return ! pages;
 }
 
@@ -438,14 +435,10 @@ svc_init_buffer(struct svc_rqst *rqstp, 
 static void
 svc_release_buffer(struct svc_rqst *rqstp)
 {
-	while (rqstp->rq_arghi)
-		put_page(rqstp->rq_argpages[--rqstp->rq_arghi]);
-	while (rqstp->rq_resused) {
-		if (rqstp->rq_respages[--rqstp->rq_resused] == NULL)
-			continue;
-		put_page(rqstp->rq_respages[rqstp->rq_resused]);
-	}
-	rqstp->rq_argused = 0;
+	int i;
+	for (i=0; i<ARRAY_SIZE(rqstp->rq_pages); i++)
+		if (rqstp->rq_pages[i])
+			put_page(rqstp->rq_pages[i]);
 }
 
 /*
@@ -707,10 +700,10 @@ svc_process(struct svc_rqst *rqstp)
 	/* setup response xdr_buf.
 	 * Initially it has just one page 
 	 */
-	svc_take_page(rqstp); /* must succeed */
+	rqstp->rq_resused = 1;
 	resv->iov_base = page_address(rqstp->rq_respages[0]);
 	resv->iov_len = 0;
-	rqstp->rq_res.pages = rqstp->rq_respages+1;
+	rqstp->rq_res.pages = rqstp->rq_respages + 1;
 	rqstp->rq_res.len = 0;
 	rqstp->rq_res.page_base = 0;
 	rqstp->rq_res.page_len = 0;

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c	2006-08-24 16:24:56.000000000 +1000
+++ ./net/sunrpc/svcsock.c	2006-08-24 16:25:13.000000000 +1000
@@ -313,7 +313,7 @@ svc_sock_release(struct svc_rqst *rqstp)
 
 	svc_release_skb(rqstp);
 
-	svc_free_allpages(rqstp);
+	svc_free_res_pages(rqstp);
 	rqstp->rq_res.page_len = 0;
 	rqstp->rq_res.page_base = 0;
 
@@ -412,7 +412,8 @@ svc_sendto(struct svc_rqst *rqstp, struc
 	/* send head */
 	if (slen == xdr->head[0].iov_len)
 		flags = 0;
-	len = kernel_sendpage(sock, rqstp->rq_respages[0], 0, xdr->head[0].iov_len, flags);
+	len = kernel_sendpage(sock, rqstp->rq_respages[0], 0,
+				  xdr->head[0].iov_len, flags);
 	if (len != xdr->head[0].iov_len)
 		goto out;
 	slen -= xdr->head[0].iov_len;
@@ -437,8 +438,9 @@ svc_sendto(struct svc_rqst *rqstp, struc
 	}
 	/* send tail */
 	if (xdr->tail[0].iov_len) {
-		result = kernel_sendpage(sock, rqstp->rq_respages[rqstp->rq_restailpage],
-					     ((unsigned long)xdr->tail[0].iov_base)& (PAGE_SIZE-1),
+		result = kernel_sendpage(sock, rqstp->rq_respages[0],
+					     ((unsigned long)xdr->tail[0].iov_base)
+					        & (PAGE_SIZE-1),
 					     xdr->tail[0].iov_len, 0);
 
 		if (result > 0)
@@ -708,9 +710,11 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
 	if (len <= rqstp->rq_arg.head[0].iov_len) {
 		rqstp->rq_arg.head[0].iov_len = len;
 		rqstp->rq_arg.page_len = 0;
+		rqstp->rq_respages = rqstp->rq_pages+1;
 	} else {
 		rqstp->rq_arg.page_len = len - rqstp->rq_arg.head[0].iov_len;
-		rqstp->rq_argused += (rqstp->rq_arg.page_len + PAGE_SIZE - 1)/ PAGE_SIZE;
+		rqstp->rq_respages = rqstp->rq_pages + 1 +
+			(rqstp->rq_arg.page_len + PAGE_SIZE - 1)/ PAGE_SIZE;
 	}
 
 	if (serv->sv_stats)
@@ -1053,11 +1057,12 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
 	vlen = PAGE_SIZE;
 	pnum = 1;
 	while (vlen < len) {
-		vec[pnum].iov_base = page_address(rqstp->rq_argpages[rqstp->rq_argused++]);
+		vec[pnum].iov_base = page_address(rqstp->rq_pages[pnum]);
 		vec[pnum].iov_len = PAGE_SIZE;
 		pnum++;
 		vlen += PAGE_SIZE;
 	}
+	rqstp->rq_respages = &rqstp->rq_pages[pnum];
 
 	/* Now receive data */
 	len = svc_recvfrom(rqstp, vec, pnum, len);
@@ -1209,7 +1214,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
 	struct svc_sock		*svsk =NULL;
 	struct svc_serv		*serv = rqstp->rq_server;
 	struct svc_pool		*pool = rqstp->rq_pool;
-	int			len;
+	int			len, i;
 	int 			pages;
 	struct xdr_buf		*arg;
 	DECLARE_WAITQUEUE(wait, current);
@@ -1226,27 +1231,22 @@ svc_recv(struct svc_rqst *rqstp, long ti
 			"svc_recv: service %p, wait queue active!\n",
 			 rqstp);
 
-	/* Initialize the buffers */
-	/* first reclaim pages that were moved to response list */
-	svc_pushback_allpages(rqstp);
 
 	/* now allocate needed pages.  If we get a failure, sleep briefly */
 	pages = 2 + (serv->sv_bufsz + PAGE_SIZE -1) / PAGE_SIZE;
-	while (rqstp->rq_arghi < pages) {
-		struct page *p = alloc_page(GFP_KERNEL);
-		if (!p) {
-			schedule_timeout_uninterruptible(msecs_to_jiffies(500));
-			continue;
+	for (i=0; i < pages ; i++)
+		while (rqstp->rq_pages[i] == NULL) {
+			struct page *p = alloc_page(GFP_KERNEL);
+			if (!p)
+				schedule_timeout_uninterruptible(msecs_to_jiffies(500));
+			rqstp->rq_pages[i] = p;
 		}
-		rqstp->rq_argpages[rqstp->rq_arghi++] = p;
-	}
 
 	/* Make arg->head point to first page and arg->pages point to rest */
 	arg = &rqstp->rq_arg;
-	arg->head[0].iov_base = page_address(rqstp->rq_argpages[0]);
+	arg->head[0].iov_base = page_address(rqstp->rq_pages[0]);
 	arg->head[0].iov_len = PAGE_SIZE;
-	rqstp->rq_argused = 1;
-	arg->pages = rqstp->rq_argpages + 1;
+	arg->pages = rqstp->rq_pages + 1;
 	arg->page_base = 0;
 	/* save at least one page for response */
 	arg->page_len = (pages-2)*PAGE_SIZE;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (5 preceding siblings ...)
  2006-08-24  6:36 ` [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one NeilBrown
@ 2006-08-24  6:37 ` NeilBrown
  2006-08-24  6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


.. by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we
can no longer allocate an array of this size on the stack.
So we allocate it in 'struct svc_rqst'.

However svc_rqst contains (indirectly) an array of the same
type and size (actually several, but they are in a union).
So rather than waste space, we move those arrays out of the
separately allocated union and into svc_rqst to share with
the kvec moved out of svc_tcp_recvfrom (various arrays are used
at different times, so there is no conflict).


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfs3proc.c         |    4 ++--
 ./fs/nfsd/nfs3xdr.c          |   22 +++++++++++-----------
 ./fs/nfsd/nfs4proc.c         |    2 +-
 ./fs/nfsd/nfs4xdr.c          |   26 +++++++++++++-------------
 ./fs/nfsd/nfsproc.c          |    4 ++--
 ./fs/nfsd/nfsxdr.c           |   22 +++++++++++-----------
 ./include/linux/nfsd/xdr.h   |    2 --
 ./include/linux/nfsd/xdr3.h  |    2 --
 ./include/linux/nfsd/xdr4.h  |    2 --
 ./include/linux/sunrpc/svc.h |    2 ++
 ./net/sunrpc/svcsock.c       |    3 ++-
 11 files changed, 44 insertions(+), 47 deletions(-)

diff .prev/fs/nfsd/nfs3proc.c ./fs/nfsd/nfs3proc.c
--- .prev/fs/nfsd/nfs3proc.c	2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs3proc.c	2006-08-24 16:25:41.000000000 +1000
@@ -180,7 +180,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp, 
 	fh_copy(&resp->fh, &argp->fh);
 	nfserr = nfsd_read(rqstp, &resp->fh, NULL,
 				  argp->offset,
-			   	  argp->vec, argp->vlen,
+			   	  rqstp->rq_vec, argp->vlen,
 				  &resp->count);
 	if (nfserr == 0) {
 		struct inode	*inode = resp->fh.fh_dentry->d_inode;
@@ -210,7 +210,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp,
 	resp->committed = argp->stable;
 	nfserr = nfsd_write(rqstp, &resp->fh, NULL,
 				   argp->offset,
-				   argp->vec, argp->vlen,
+				   rqstp->rq_vec, argp->vlen,
 				   argp->len,
 				   &resp->committed);
 	resp->count = argp->count;

diff .prev/fs/nfsd/nfs3xdr.c ./fs/nfsd/nfs3xdr.c
--- .prev/fs/nfsd/nfs3xdr.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs3xdr.c	2006-08-24 16:25:41.000000000 +1000
@@ -344,9 +344,9 @@ nfs3svc_decode_readargs(struct svc_rqst 
 	v=0;
 	while (len > 0) {
 		pn = rqstp->rq_resused++;
-		args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
-		args->vec[v].iov_len = len < PAGE_SIZE? len : PAGE_SIZE;
-		len -= args->vec[v].iov_len;
+		rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
+		rqstp->rq_vec[v].iov_len = len < PAGE_SIZE? len : PAGE_SIZE;
+		len -= rqstp->rq_vec[v].iov_len;
 		v++;
 	}
 	args->vlen = v;
@@ -372,22 +372,22 @@ nfs3svc_decode_writeargs(struct svc_rqst
 	    rqstp->rq_arg.len - hdr < len)
 		return 0;
 
-	args->vec[0].iov_base = (void*)p;
-	args->vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
+	rqstp->rq_vec[0].iov_base = (void*)p;
+	rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
 
 	if (len > NFSSVC_MAXBLKSIZE)
 		len = NFSSVC_MAXBLKSIZE;
 	v=  0;
-	while (len > args->vec[v].iov_len) {
-		len -= args->vec[v].iov_len;
+	while (len > rqstp->rq_vec[v].iov_len) {
+		len -= rqstp->rq_vec[v].iov_len;
 		v++;
-		args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
-		args->vec[v].iov_len = PAGE_SIZE;
+		rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
+		rqstp->rq_vec[v].iov_len = PAGE_SIZE;
 	}
-	args->vec[v].iov_len = len;
+	rqstp->rq_vec[v].iov_len = len;
 	args->vlen = v+1;
 
-	return args->count == args->len && args->vec[0].iov_len > 0;
+	return args->count == args->len && rqstp->rq_vec[0].iov_len > 0;
 }
 
 int

diff .prev/fs/nfsd/nfs4proc.c ./fs/nfsd/nfs4proc.c
--- .prev/fs/nfsd/nfs4proc.c	2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs4proc.c	2006-08-24 16:25:41.000000000 +1000
@@ -646,7 +646,7 @@ nfsd4_write(struct svc_rqst *rqstp, stru
 	*p++ = nfssvc_boot.tv_usec;
 
 	status =  nfsd_write(rqstp, current_fh, filp, write->wr_offset,
-			write->wr_vec, write->wr_vlen, write->wr_buflen,
+			rqstp->rq_vec, write->wr_vlen, write->wr_buflen,
 			&write->wr_how_written);
 	if (filp)
 		fput(filp);

diff .prev/fs/nfsd/nfs4xdr.c ./fs/nfsd/nfs4xdr.c
--- .prev/fs/nfsd/nfs4xdr.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs4xdr.c	2006-08-24 16:25:41.000000000 +1000
@@ -927,26 +927,26 @@ nfsd4_decode_write(struct nfsd4_compound
 		printk(KERN_NOTICE "xdr error! (%s:%d)\n", __FILE__, __LINE__); 
 		goto xdr_error;
 	}
-	write->wr_vec[0].iov_base = p;
-	write->wr_vec[0].iov_len = avail;
+	argp->rqstp->rq_vec[0].iov_base = p;
+	argp->rqstp->rq_vec[0].iov_len = avail;
 	v = 0;
 	len = write->wr_buflen;
-	while (len > write->wr_vec[v].iov_len) {
-		len -= write->wr_vec[v].iov_len;
+	while (len > argp->rqstp->rq_vec[v].iov_len) {
+		len -= argp->rqstp->rq_vec[v].iov_len;
 		v++;
-		write->wr_vec[v].iov_base = page_address(argp->pagelist[0]);
+		argp->rqstp->rq_vec[v].iov_base = page_address(argp->pagelist[0]);
 		argp->pagelist++;
 		if (argp->pagelen >= PAGE_SIZE) {
-			write->wr_vec[v].iov_len = PAGE_SIZE;
+			argp->rqstp->rq_vec[v].iov_len = PAGE_SIZE;
 			argp->pagelen -= PAGE_SIZE;
 		} else {
-			write->wr_vec[v].iov_len = argp->pagelen;
+			argp->rqstp->rq_vec[v].iov_len = argp->pagelen;
 			argp->pagelen -= len;
 		}
 	}
-	argp->end = (u32*) (write->wr_vec[v].iov_base + write->wr_vec[v].iov_len);
-	argp->p = (u32*)  (write->wr_vec[v].iov_base + (XDR_QUADLEN(len) << 2));
-	write->wr_vec[v].iov_len = len;
+	argp->end = (u32*) (argp->rqstp->rq_vec[v].iov_base + argp->rqstp->rq_vec[v].iov_len);
+	argp->p = (u32*)  (argp->rqstp->rq_vec[v].iov_base + (XDR_QUADLEN(len) << 2));
+	argp->rqstp->rq_vec[v].iov_len = len;
 	write->wr_vlen = v+1;
 
 	DECODE_TAIL;
@@ -2064,9 +2064,9 @@ nfsd4_encode_read(struct nfsd4_compoundr
 	v = 0;
 	while (len > 0) {
 		pn = resp->rqstp->rq_resused++;
-		read->rd_iov[v].iov_base =
+		resp->rqstp->rq_vec[v].iov_base =
 			page_address(resp->rqstp->rq_respages[pn]);
-		read->rd_iov[v].iov_len =
+		resp->rqstp->rq_vec[v].iov_len =
 			len < PAGE_SIZE ? len : PAGE_SIZE;
 		v++;
 		len -= PAGE_SIZE;
@@ -2074,7 +2074,7 @@ nfsd4_encode_read(struct nfsd4_compoundr
 	read->rd_vlen = v;
 
 	nfserr = nfsd_read(read->rd_rqstp, read->rd_fhp, read->rd_filp,
-			read->rd_offset, read->rd_iov, read->rd_vlen,
+			read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
 			&maxcount);
 
 	if (nfserr == nfserr_symlink)

diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c	2006-08-24 16:21:35.000000000 +1000
+++ ./fs/nfsd/nfsproc.c	2006-08-24 16:25:41.000000000 +1000
@@ -159,7 +159,7 @@ nfsd_proc_read(struct svc_rqst *rqstp, s
 	resp->count = argp->count;
 	nfserr = nfsd_read(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
 				  argp->offset,
-			   	  argp->vec, argp->vlen,
+			   	  rqstp->rq_vec, argp->vlen,
 				  &resp->count);
 
 	if (nfserr) return nfserr;
@@ -185,7 +185,7 @@ nfsd_proc_write(struct svc_rqst *rqstp, 
 
 	nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
 				   argp->offset,
-				   argp->vec, argp->vlen,
+				   rqstp->rq_vec, argp->vlen,
 				   argp->len,
 				   &stable);
 	return nfsd_return_attrs(nfserr, resp);

diff .prev/fs/nfsd/nfsxdr.c ./fs/nfsd/nfsxdr.c
--- .prev/fs/nfsd/nfsxdr.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfsxdr.c	2006-08-24 16:25:41.000000000 +1000
@@ -263,9 +263,9 @@ nfssvc_decode_readargs(struct svc_rqst *
 	v=0;
 	while (len > 0) {
 		pn = rqstp->rq_resused++;
-		args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
-		args->vec[v].iov_len = len < PAGE_SIZE?len:PAGE_SIZE;
-		len -= args->vec[v].iov_len;
+		rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
+		rqstp->rq_vec[v].iov_len = len < PAGE_SIZE?len:PAGE_SIZE;
+		len -= rqstp->rq_vec[v].iov_len;
 		v++;
 	}
 	args->vlen = v;
@@ -285,21 +285,21 @@ nfssvc_decode_writeargs(struct svc_rqst 
 	args->offset = ntohl(*p++);	/* offset */
 	p++;				/* totalcount */
 	len = args->len = ntohl(*p++);
-	args->vec[0].iov_base = (void*)p;
-	args->vec[0].iov_len = rqstp->rq_arg.head[0].iov_len -
+	rqstp->rq_vec[0].iov_base = (void*)p;
+	rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len -
 				(((void*)p) - rqstp->rq_arg.head[0].iov_base);
 	if (len > NFSSVC_MAXBLKSIZE)
 		len = NFSSVC_MAXBLKSIZE;
 	v = 0;
-	while (len > args->vec[v].iov_len) {
-		len -= args->vec[v].iov_len;
+	while (len > rqstp->rq_vec[v].iov_len) {
+		len -= rqstp->rq_vec[v].iov_len;
 		v++;
-		args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
-		args->vec[v].iov_len = PAGE_SIZE;
+		rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
+		rqstp->rq_vec[v].iov_len = PAGE_SIZE;
 	}
-	args->vec[v].iov_len = len;
+	rqstp->rq_vec[v].iov_len = len;
 	args->vlen = v+1;
-	return args->vec[0].iov_len > 0;
+	return rqstp->rq_vec[0].iov_len > 0;
 }
 
 int

diff .prev/include/linux/nfsd/xdr.h ./include/linux/nfsd/xdr.h
--- .prev/include/linux/nfsd/xdr.h	2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/nfsd/xdr.h	2006-08-24 16:25:41.000000000 +1000
@@ -30,7 +30,6 @@ struct nfsd_readargs {
 	struct svc_fh		fh;
 	__u32			offset;
 	__u32			count;
-	struct kvec		vec[RPCSVC_MAXPAGES];
 	int			vlen;
 };
 
@@ -38,7 +37,6 @@ struct nfsd_writeargs {
 	svc_fh			fh;
 	__u32			offset;
 	int			len;
-	struct kvec		vec[RPCSVC_MAXPAGES];
 	int			vlen;
 };
 

diff .prev/include/linux/nfsd/xdr3.h ./include/linux/nfsd/xdr3.h
--- .prev/include/linux/nfsd/xdr3.h	2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/nfsd/xdr3.h	2006-08-24 16:25:41.000000000 +1000
@@ -33,7 +33,6 @@ struct nfsd3_readargs {
 	struct svc_fh		fh;
 	__u64			offset;
 	__u32			count;
-	struct kvec		vec[RPCSVC_MAXPAGES];
 	int			vlen;
 };
 
@@ -43,7 +42,6 @@ struct nfsd3_writeargs {
 	__u32			count;
 	int			stable;
 	__u32			len;
-	struct kvec		vec[RPCSVC_MAXPAGES];
 	int			vlen;
 };
 

diff .prev/include/linux/nfsd/xdr4.h ./include/linux/nfsd/xdr4.h
--- .prev/include/linux/nfsd/xdr4.h	2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/nfsd/xdr4.h	2006-08-24 16:25:41.000000000 +1000
@@ -241,7 +241,6 @@ struct nfsd4_read {
 	stateid_t	rd_stateid;         /* request */
 	u64		rd_offset;          /* request */
 	u32		rd_length;          /* request */
-	struct kvec	rd_iov[RPCSVC_MAXPAGES];
 	int		rd_vlen;
 	struct file     *rd_filp;
 	
@@ -326,7 +325,6 @@ struct nfsd4_write {
 	u64		wr_offset;          /* request */
 	u32		wr_stable_how;      /* request */
 	u32		wr_buflen;          /* request */
-	struct kvec	wr_vec[RPCSVC_MAXPAGES]; /* request */
 	int		wr_vlen;
 
 	u32		wr_bytes_written;   /* response */

diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h	2006-08-24 16:25:13.000000000 +1000
+++ ./include/linux/sunrpc/svc.h	2006-08-24 16:25:41.000000000 +1000
@@ -175,6 +175,8 @@ struct svc_rqst {
 	struct page *		*rq_respages;	/* points into rq_pages */
 	int			rq_resused;	/* number of pages used for result */
 
+	struct kvec		rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */
+
 	u32			rq_xid;		/* transmission id */
 	u32			rq_prog;	/* program number */
 	u32			rq_vers;	/* program version */

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c	2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/svcsock.c	2006-08-24 16:25:41.000000000 +1000
@@ -955,7 +955,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
 	struct svc_sock	*svsk = rqstp->rq_sock;
 	struct svc_serv	*serv = svsk->sk_server;
 	int		len;
-	struct kvec vec[RPCSVC_MAXPAGES];
+	struct kvec *vec;
 	int pnum, vlen;
 
 	dprintk("svc: tcp_recv %p data %d conn %d close %d\n",
@@ -1053,6 +1053,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
 	len = svsk->sk_reclen;
 	set_bit(SK_DATA, &svsk->sk_flags);
 
+	vec = rqstp->rq_vec;
 	vec[0] = rqstp->rq_arg.head[0];
 	vlen = PAGE_SIZE;
 	pnum = 1;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (6 preceding siblings ...)
  2006-08-24  6:37 ` [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom NeilBrown
@ 2006-08-24  6:37 ` NeilBrown
  2006-09-25 15:43   ` [NFS] " J. Bruce Fields
  2006-08-24  6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


The limit over UDP remains at 32K.  Also, make some of
the apparently arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE
by a function of the rqstp.  This allows it to be different
for different protocols (udp/tcp) and also allows it
to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet.
That comes next.

From: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfs3proc.c              |   14 +++++++------
 ./fs/nfsd/nfs3xdr.c               |   13 +++++++-----
 ./fs/nfsd/nfs4xdr.c               |    6 ++---
 ./fs/nfsd/nfsproc.c               |    6 ++---
 ./fs/nfsd/nfsxdr.c                |   10 ++++-----
 ./include/linux/nfsd/const.h      |   15 +++++++++++++-
 ./include/linux/sunrpc/auth.h     |    3 --
 ./include/linux/sunrpc/msg_prot.h |   40 ++++++++++++++++++++++++++++++++++++++
 ./include/linux/sunrpc/svc.h      |   25 +++++++++++++++++++++--
 ./include/linux/sunrpc/xprt.h     |    8 -------
 ./net/sunrpc/svc.c                |   15 ++++++++++++++
 11 files changed, 120 insertions(+), 35 deletions(-)

diff .prev/fs/nfsd/nfs3proc.c ./fs/nfsd/nfs3proc.c
--- .prev/fs/nfsd/nfs3proc.c	2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs3proc.c	2006-08-24 16:25:56.000000000 +1000
@@ -160,6 +160,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp, 
 				        struct nfsd3_readres  *resp)
 {
 	int	nfserr;
+	u32	max_blocksize = svc_max_payload(rqstp);
 
 	dprintk("nfsd: READ(3) %s %lu bytes at %lu\n",
 				SVCFH_fmt(&argp->fh),
@@ -172,8 +173,8 @@ nfsd3_proc_read(struct svc_rqst *rqstp, 
 	 */
 
 	resp->count = argp->count;
-	if (NFSSVC_MAXBLKSIZE < resp->count)
-		resp->count = NFSSVC_MAXBLKSIZE;
+	if (max_blocksize < resp->count)
+		resp->count = max_blocksize;
 
 	svc_reserve(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3)<<2) + resp->count +4);
 
@@ -538,15 +539,16 @@ nfsd3_proc_fsinfo(struct svc_rqst * rqst
 					   struct nfsd3_fsinfores *resp)
 {
 	int	nfserr;
+	u32	max_blocksize = svc_max_payload(rqstp);
 
 	dprintk("nfsd: FSINFO(3)   %s\n",
 				SVCFH_fmt(&argp->fh));
 
-	resp->f_rtmax  = NFSSVC_MAXBLKSIZE;
-	resp->f_rtpref = NFSSVC_MAXBLKSIZE;
+	resp->f_rtmax  = max_blocksize;
+	resp->f_rtpref = max_blocksize;
 	resp->f_rtmult = PAGE_SIZE;
-	resp->f_wtmax  = NFSSVC_MAXBLKSIZE;
-	resp->f_wtpref = NFSSVC_MAXBLKSIZE;
+	resp->f_wtmax  = max_blocksize;
+	resp->f_wtpref = max_blocksize;
 	resp->f_wtmult = PAGE_SIZE;
 	resp->f_dtpref = PAGE_SIZE;
 	resp->f_maxfilesize = ~(u32) 0;

diff .prev/fs/nfsd/nfs3xdr.c ./fs/nfsd/nfs3xdr.c
--- .prev/fs/nfsd/nfs3xdr.c	2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs3xdr.c	2006-08-24 16:25:56.000000000 +1000
@@ -330,6 +330,7 @@ nfs3svc_decode_readargs(struct svc_rqst 
 {
 	unsigned int len;
 	int v,pn;
+	u32 max_blocksize = svc_max_payload(rqstp);
 
 	if (!(p = decode_fh(p, &args->fh))
 	 || !(p = xdr_decode_hyper(p, &args->offset)))
@@ -337,8 +338,8 @@ nfs3svc_decode_readargs(struct svc_rqst 
 
 	len = args->count = ntohl(*p++);
 
-	if (len > NFSSVC_MAXBLKSIZE)
-		len = NFSSVC_MAXBLKSIZE;
+	if (len > max_blocksize)
+		len = max_blocksize;
 
 	/* set up the kvec */
 	v=0;
@@ -358,6 +359,7 @@ nfs3svc_decode_writeargs(struct svc_rqst
 					struct nfsd3_writeargs *args)
 {
 	unsigned int len, v, hdr;
+	u32 max_blocksize = svc_max_payload(rqstp);
 
 	if (!(p = decode_fh(p, &args->fh))
 	 || !(p = xdr_decode_hyper(p, &args->offset)))
@@ -375,8 +377,8 @@ nfs3svc_decode_writeargs(struct svc_rqst
 	rqstp->rq_vec[0].iov_base = (void*)p;
 	rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
 
-	if (len > NFSSVC_MAXBLKSIZE)
-		len = NFSSVC_MAXBLKSIZE;
+	if (len > max_blocksize)
+		len = max_blocksize;
 	v=  0;
 	while (len > rqstp->rq_vec[v].iov_len) {
 		len -= rqstp->rq_vec[v].iov_len;
@@ -564,6 +566,7 @@ nfs3svc_decode_readdirplusargs(struct sv
 					struct nfsd3_readdirargs *args)
 {
 	int len, pn;
+	u32 max_blocksize = svc_max_payload(rqstp);
 
 	if (!(p = decode_fh(p, &args->fh)))
 		return 0;
@@ -572,7 +575,7 @@ nfs3svc_decode_readdirplusargs(struct sv
 	args->dircount = ntohl(*p++);
 	args->count    = ntohl(*p++);
 
-	len = (args->count > NFSSVC_MAXBLKSIZE) ? NFSSVC_MAXBLKSIZE :
+	len = (args->count > max_blocksize) ? max_blocksize :
 						  args->count;
 	args->count = len;
 

diff .prev/fs/nfsd/nfs4xdr.c ./fs/nfsd/nfs4xdr.c
--- .prev/fs/nfsd/nfs4xdr.c	2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs4xdr.c	2006-08-24 16:25:56.000000000 +1000
@@ -1537,12 +1537,12 @@ out_acl:
 	if (bmval0 & FATTR4_WORD0_MAXREAD) {
 		if ((buflen -= 8) < 0)
 			goto out_resource;
-		WRITE64((u64) NFSSVC_MAXBLKSIZE);
+		WRITE64((u64) svc_max_payload(rqstp));
 	}
 	if (bmval0 & FATTR4_WORD0_MAXWRITE) {
 		if ((buflen -= 8) < 0)
 			goto out_resource;
-		WRITE64((u64) NFSSVC_MAXBLKSIZE);
+		WRITE64((u64) svc_max_payload(rqstp));
 	}
 	if (bmval1 & FATTR4_WORD1_MODE) {
 		if ((buflen -= 4) < 0)
@@ -2056,7 +2056,7 @@ nfsd4_encode_read(struct nfsd4_compoundr
 
 	RESERVE_SPACE(8); /* eof flag and byte count */
 
-	maxcount = NFSSVC_MAXBLKSIZE;
+	maxcount = svc_max_payload(resp->rqstp);
 	if (maxcount > read->rd_length)
 		maxcount = read->rd_length;
 

diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c	2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfsproc.c	2006-08-24 16:25:56.000000000 +1000
@@ -146,13 +146,13 @@ nfsd_proc_read(struct svc_rqst *rqstp, s
 	 * status, 17 words for fattr, and 1 word for the byte count.
 	 */
 
-	if (NFSSVC_MAXBLKSIZE < argp->count) {
+	if (NFSSVC_MAXBLKSIZE_V2 < argp->count) {
 		printk(KERN_NOTICE
 			"oversized read request from %u.%u.%u.%u:%d (%d bytes)\n",
 				NIPQUAD(rqstp->rq_addr.sin_addr.s_addr),
 				ntohs(rqstp->rq_addr.sin_port),
 				argp->count);
-		argp->count = NFSSVC_MAXBLKSIZE;
+		argp->count = NFSSVC_MAXBLKSIZE_V2;
 	}
 	svc_reserve(rqstp, (19<<2) + argp->count + 4);
 
@@ -553,7 +553,7 @@ static struct svc_procedure		nfsd_proced
   PROC(none,	 void,		void,		none,		RC_NOCACHE, ST),
   PROC(lookup,	 diropargs,	diropres,	fhandle,	RC_NOCACHE, ST+FH+AT),
   PROC(readlink, readlinkargs,	readlinkres,	none,		RC_NOCACHE, ST+1+NFS_MAXPATHLEN/4),
-  PROC(read,	 readargs,	readres,	fhandle,	RC_NOCACHE, ST+AT+1+NFSSVC_MAXBLKSIZE/4),
+  PROC(read,	 readargs,	readres,	fhandle,	RC_NOCACHE, ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4),
   PROC(none,	 void,		void,		none,		RC_NOCACHE, ST),
   PROC(write,	 writeargs,	attrstat,	fhandle,	RC_REPLBUFF, ST+AT),
   PROC(create,	 createargs,	diropres,	fhandle,	RC_REPLBUFF, ST+FH+AT),

diff .prev/fs/nfsd/nfsxdr.c ./fs/nfsd/nfsxdr.c
--- .prev/fs/nfsd/nfsxdr.c	2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfsxdr.c	2006-08-24 16:25:56.000000000 +1000
@@ -254,8 +254,8 @@ nfssvc_decode_readargs(struct svc_rqst *
 	len = args->count     = ntohl(*p++);
 	p++; /* totalcount - unused */
 
-	if (len > NFSSVC_MAXBLKSIZE)
-		len = NFSSVC_MAXBLKSIZE;
+	if (len > NFSSVC_MAXBLKSIZE_V2)
+		len = NFSSVC_MAXBLKSIZE_V2;
 
 	/* set up somewhere to store response.
 	 * We take pages, put them on reslist and include in iovec
@@ -288,8 +288,8 @@ nfssvc_decode_writeargs(struct svc_rqst 
 	rqstp->rq_vec[0].iov_base = (void*)p;
 	rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len -
 				(((void*)p) - rqstp->rq_arg.head[0].iov_base);
-	if (len > NFSSVC_MAXBLKSIZE)
-		len = NFSSVC_MAXBLKSIZE;
+	if (len > NFSSVC_MAXBLKSIZE_V2)
+		len = NFSSVC_MAXBLKSIZE_V2;
 	v = 0;
 	while (len > rqstp->rq_vec[v].iov_len) {
 		len -= rqstp->rq_vec[v].iov_len;
@@ -458,7 +458,7 @@ nfssvc_encode_statfsres(struct svc_rqst 
 {
 	struct kstatfs	*stat = &resp->stats;
 
-	*p++ = htonl(NFSSVC_MAXBLKSIZE);	/* max transfer size */
+	*p++ = htonl(NFSSVC_MAXBLKSIZE_V2);	/* max transfer size */
 	*p++ = htonl(stat->f_bsize);
 	*p++ = htonl(stat->f_blocks);
 	*p++ = htonl(stat->f_bfree);

diff .prev/include/linux/nfsd/const.h ./include/linux/nfsd/const.h
--- .prev/include/linux/nfsd/const.h	2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/nfsd/const.h	2006-08-24 16:25:56.000000000 +1000
@@ -13,6 +13,7 @@
 #include <linux/nfs2.h>
 #include <linux/nfs3.h>
 #include <linux/nfs4.h>
+#include <linux/sunrpc/msg_prot.h>
 
 /*
  * Maximum protocol version supported by knfsd
@@ -23,6 +24,8 @@
  * Maximum blocksize supported by daemon currently at 32K
  */
 #define NFSSVC_MAXBLKSIZE	(32*1024)
+/* NFSv2 is limited by the protocol specification, see RFC 1094 */
+#define NFSSVC_MAXBLKSIZE_V2	(8*1024)
 
 #ifdef __KERNEL__
 
@@ -30,7 +33,17 @@
 # define NFS_SUPER_MAGIC	0x6969
 #endif
 
-#define NFSD_BUFSIZE		(1024 + NFSSVC_MAXBLKSIZE)
+/*
+ * Largest number of bytes we need to allocate for an NFS
+ * call or reply.  Used to control buffer sizes.  We use
+ * the length of v3 WRITE, READDIR and READDIR replies
+ * which are an RPC header, up to 26 XDR units of reply
+ * data, and some page data.
+ *
+ * Note that accuracy here doesn't matter too much as the
+ * size is rounded up to a page size when allocating space.
+ */
+#define NFSD_BUFSIZE		((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)
 
 #ifdef CONFIG_NFSD_V4
 # define NFSSVC_XDRSIZE		NFS4_SVC_XDRSIZE

diff .prev/include/linux/sunrpc/auth.h ./include/linux/sunrpc/auth.h
--- .prev/include/linux/sunrpc/auth.h	2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/sunrpc/auth.h	2006-08-24 16:25:56.000000000 +1000
@@ -20,9 +20,6 @@
 /* size of the nodename buffer */
 #define UNX_MAXNODENAME	32
 
-/* Maximum size (in bytes) of an rpc credential or verifier */
-#define RPC_MAX_AUTH_SIZE (400)
-
 /* Work around the lack of a VFS credential */
 struct auth_cred {
 	uid_t	uid;

diff .prev/include/linux/sunrpc/msg_prot.h ./include/linux/sunrpc/msg_prot.h
--- .prev/include/linux/sunrpc/msg_prot.h	2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/sunrpc/msg_prot.h	2006-08-24 16:25:56.000000000 +1000
@@ -11,6 +11,9 @@
 
 #define RPC_VERSION 2
 
+/* size of an XDR encoding unit in bytes, i.e. 32bit */
+#define XDR_UNIT	(4)
+
 /* spec defines authentication flavor as an unsigned 32 bit integer */
 typedef u32	rpc_authflavor_t;
 
@@ -34,6 +37,9 @@ enum rpc_auth_flavors {
 	RPC_AUTH_GSS_SPKMP = 390011,
 };
 
+/* Maximum size (in bytes) of an rpc credential or verifier */
+#define RPC_MAX_AUTH_SIZE (400)
+
 enum rpc_msg_type {
 	RPC_CALL = 0,
 	RPC_REPLY = 1
@@ -101,5 +107,39 @@ typedef u32	rpc_fraghdr;
 #define	RPC_FRAGMENT_SIZE_MASK		(~RPC_LAST_STREAM_FRAGMENT)
 #define	RPC_MAX_FRAGMENT_SIZE		((1U << 31) - 1)
 
+/*
+ * RPC call and reply header size as number of 32bit words (verifier
+ * size computed separately, see below)
+ */
+#define RPC_CALLHDRSIZE		(6)
+#define RPC_REPHDRSIZE		(4)
+
+
+/*
+ * Maximum RPC header size, including authentication,
+ * as number of 32bit words (see RFCs 1831, 1832).
+ *
+ *	xid			    1 xdr unit = 4 bytes
+ *	mtype			    1
+ *	rpc_version		    1
+ *	program			    1
+ *	prog_version		    1
+ *	procedure		    1
+ *	cred {
+ *	    flavor		    1
+ *	    length		    1
+ *	    body<RPC_MAX_AUTH_SIZE> 100 xdr units = 400 bytes
+ *	}
+ *	verf {
+ *	    flavor		    1
+ *	    length		    1
+ *	    body<RPC_MAX_AUTH_SIZE> 100 xdr units = 400 bytes
+ *	}
+ *	TOTAL			    210 xdr units = 840 bytes
+ */
+#define RPC_MAX_HEADER_WITH_AUTH \
+	(RPC_CALLHDRSIZE + 2*(2+RPC_MAX_AUTH_SIZE/4))
+
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_SUNRPC_MSGPROT_H_ */

diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h	2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/sunrpc/svc.h	2006-08-24 16:25:56.000000000 +1000
@@ -13,6 +13,7 @@
 #include <linux/in.h>
 #include <linux/sunrpc/types.h>
 #include <linux/sunrpc/xdr.h>
+#include <linux/sunrpc/auth.h>
 #include <linux/sunrpc/svcauth.h>
 #include <linux/wait.h>
 #include <linux/mm.h>
@@ -95,8 +96,28 @@ static inline void svc_get(struct svc_se
  * Maximum payload size supported by a kernel RPC server.
  * This is use to determine the max number of pages nfsd is
  * willing to return in a single READ operation.
- */
-#define RPCSVC_MAXPAYLOAD	(64*1024u)
+ *
+ * These happen to all be powers of 2, which is not strictly
+ * necessary but helps enforce the real limitation, which is
+ * that they should be multiples of PAGE_CACHE_SIZE.
+ *
+ * For UDP transports, a block plus NFS,RPC, and UDP headers
+ * has to fit into the IP datagram limit of 64K.  The largest
+ * feasible number for all known page sizes is probably 48K,
+ * but we choose 32K here.  This is the same as the historical
+ * Linux limit; someone who cares more about NFS/UDP performance
+ * can test a larger number.
+ *
+ * For TCP transports we have more freedom.  A size of 1MB is
+ * chosen to match the client limit.  Other OSes are known to
+ * have larger limits, but those numbers are probably beyond
+ * the point of diminishing returns.
+ */
+#define RPCSVC_MAXPAYLOAD	(1*1024*1024u)
+#define RPCSVC_MAXPAYLOAD_TCP	RPCSVC_MAXPAYLOAD
+#define RPCSVC_MAXPAYLOAD_UDP	(32*1024u)
+
+extern u32 svc_max_payload(const struct svc_rqst *rqstp);
 
 /*
  * RPC Requsts and replies are stored in one or more pages.

diff .prev/include/linux/sunrpc/xprt.h ./include/linux/sunrpc/xprt.h
--- .prev/include/linux/sunrpc/xprt.h	2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/sunrpc/xprt.h	2006-08-24 16:25:56.000000000 +1000
@@ -14,6 +14,7 @@
 #include <linux/in.h>
 #include <linux/sunrpc/sched.h>
 #include <linux/sunrpc/xdr.h>
+#include <linux/sunrpc/msg_prot.h>
 
 extern unsigned int xprt_udp_slot_table_entries;
 extern unsigned int xprt_tcp_slot_table_entries;
@@ -23,13 +24,6 @@ extern unsigned int xprt_tcp_slot_table_
 #define RPC_MAX_SLOT_TABLE	(128U)
 
 /*
- * RPC call and reply header size as number of 32bit words (verifier
- * size computed separately)
- */
-#define RPC_CALLHDRSIZE		6
-#define RPC_REPHDRSIZE		4
-
-/*
  * Parameters for choosing a free port
  */
 extern unsigned int xprt_min_resvport;

diff .prev/net/sunrpc/svc.c ./net/sunrpc/svc.c
--- .prev/net/sunrpc/svc.c	2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/svc.c	2006-08-24 16:25:56.000000000 +1000
@@ -919,3 +919,18 @@ err_bad:
 	svc_putu32(resv, rpc_stat);
 	goto sendit;
 }
+
+/*
+ * Return (transport-specific) limit on the rpc payload.
+ */
+u32 svc_max_payload(const struct svc_rqst *rqstp)
+{
+	int max = RPCSVC_MAXPAYLOAD_TCP;
+
+	if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
+		max = RPCSVC_MAXPAYLOAD_UDP;
+	if (rqstp->rq_server->sv_bufsz < max)
+		max = rqstp->rq_server->sv_bufsz;
+	return max;
+}
+EXPORT_SYMBOL_GPL(svc_max_payload);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (7 preceding siblings ...)
  2006-08-24  6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
@ 2006-08-24  6:37 ` NeilBrown
  2006-09-25 21:24   ` [NFS] " J. Bruce Fields
  2006-08-24  6:37 ` [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly NeilBrown
  2006-08-24  6:37 ` [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket NeilBrown
  10 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


The max possible is the maximum RPC payload.
The default depends on amount of total memory.

The value can be set within reason as long as 
 no nfsd threads are currently running.
The value can also be ready, allowing the default
to be determined after nfsd has started.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfsctl.c           |   33 +++++++++++++++++++++++++++++++++
 ./fs/nfsd/nfssvc.c           |   19 ++++++++++++++++++-
 ./include/linux/nfsd/const.h |    4 ++--
 ./include/linux/nfsd/nfsd.h  |    1 +
 4 files changed, 54 insertions(+), 3 deletions(-)

diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c	2006-08-24 16:24:56.000000000 +1000
+++ ./fs/nfsd/nfsctl.c	2006-08-24 16:26:10.000000000 +1000
@@ -57,6 +57,7 @@ enum {
 	NFSD_Pool_Threads,
 	NFSD_Versions,
 	NFSD_Ports,
+	NFSD_MaxBlkSize,
 	/*
 	 * The below MUST come last.  Otherwise we leave a hole in nfsd_files[]
 	 * with !CONFIG_NFSD_V4 and simple_fill_super() goes oops
@@ -82,6 +83,7 @@ static ssize_t write_threads(struct file
 static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
 static ssize_t write_versions(struct file *file, char *buf, size_t size);
 static ssize_t write_ports(struct file *file, char *buf, size_t size);
+static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
 #ifdef CONFIG_NFSD_V4
 static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
 static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
@@ -100,6 +102,7 @@ static ssize_t (*write_op[])(struct file
 	[NFSD_Pool_Threads] = write_pool_threads,
 	[NFSD_Versions] = write_versions,
 	[NFSD_Ports] = write_ports,
+	[NFSD_MaxBlkSize] = write_maxblksize,
 #ifdef CONFIG_NFSD_V4
 	[NFSD_Leasetime] = write_leasetime,
 	[NFSD_RecoveryDir] = write_recoverydir,
@@ -555,6 +558,35 @@ static ssize_t write_ports(struct file *
 	return -EINVAL;
 }
 
+int nfsd_max_blksize;
+
+static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
+{
+	char *mesg = buf;
+	if (size > 0) {
+		int bsize;
+		int rv = get_int(&mesg, &bsize);
+		if (rv)
+			return rv;
+		/* force bsize into allowed range and
+		 * required alignment.
+		 */
+		if (bsize < 1024)
+			bsize = 1024;
+		if (bsize > NFSSVC_MAXBLKSIZE)
+			bsize = NFSSVC_MAXBLKSIZE;
+		bsize &= ~(1024-1);
+		lock_kernel();
+		if (nfsd_serv && nfsd_serv->sv_nrthreads) {
+			unlock_kernel();
+			return -EBUSY;
+		}
+		nfsd_max_blksize = bsize;
+		unlock_kernel();
+	}
+	return sprintf(buf, "%d\n", nfsd_max_blksize);
+}
+
 #ifdef CONFIG_NFSD_V4
 extern time_t nfs4_leasetime(void);
 
@@ -620,6 +652,7 @@ static int nfsd_fill_super(struct super_
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
+		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
 #ifdef CONFIG_NFSD_V4
 		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_RecoveryDir] = {"nfsv4recoverydir", &transaction_ops, S_IWUSR|S_IRUSR},

diff .prev/fs/nfsd/nfssvc.c ./fs/nfsd/nfssvc.c
--- .prev/fs/nfsd/nfssvc.c	2006-08-24 16:26:10.000000000 +1000
+++ ./fs/nfsd/nfssvc.c	2006-08-24 16:26:10.000000000 +1000
@@ -198,9 +198,26 @@ int nfsd_create_serv(void)
 		unlock_kernel();
 		return 0;
 	}
+	if (nfsd_max_blksize == 0) {
+		/* choose a suitable default */
+		struct sysinfo i;
+		si_meminfo(&i);
+		/* Aim for 1/4096 of memory per thread
+		 * This gives 1MB on 4Gig machines
+		 * But only uses 32K on 128M machines.
+		 * Bottom out at 8K on 32M and smaller.
+		 * Of course, this is only a default.
+		 */
+		nfsd_max_blksize = NFSSVC_MAXBLKSIZE;
+		i.totalram >>= 12;
+		while (nfsd_max_blksize > i.totalram &&
+		       nfsd_max_blksize >= 8*1024*2)
+			nfsd_max_blksize /= 2;
+	}
 
 	atomic_set(&nfsd_busy, 0);
-	nfsd_serv = svc_create_pooled(&nfsd_program, NFSD_BUFSIZE,
+	nfsd_serv = svc_create_pooled(&nfsd_program,
+				      NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE + nfsd_max_blksize,
 				      nfsd_last_thread,
 				      nfsd, SIG_NOCLEAN, THIS_MODULE);
 	if (nfsd_serv == NULL)

diff .prev/include/linux/nfsd/const.h ./include/linux/nfsd/const.h
--- .prev/include/linux/nfsd/const.h	2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/nfsd/const.h	2006-08-24 16:26:10.000000000 +1000
@@ -21,9 +21,9 @@
 #define NFSSVC_MAXVERS		3
 
 /*
- * Maximum blocksize supported by daemon currently at 32K
+ * Maximum blocksizes supported by daemon under various circumstances.
  */
-#define NFSSVC_MAXBLKSIZE	(32*1024)
+#define NFSSVC_MAXBLKSIZE	RPCSVC_MAXPAYLOAD
 /* NFSv2 is limited by the protocol specification, see RFC 1094 */
 #define NFSSVC_MAXBLKSIZE_V2	(8*1024)
 

diff .prev/include/linux/nfsd/nfsd.h ./include/linux/nfsd/nfsd.h
--- .prev/include/linux/nfsd/nfsd.h	2006-08-24 16:26:10.000000000 +1000
+++ ./include/linux/nfsd/nfsd.h	2006-08-24 16:26:10.000000000 +1000
@@ -145,6 +145,7 @@ int nfsd_vers(int vers, enum vers_op cha
 void nfsd_reset_versions(void);
 int nfsd_create_serv(void);
 
+extern int nfsd_max_blksize;
 
 /* 
  * NFSv4 State

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (8 preceding siblings ...)
  2006-08-24  6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
@ 2006-08-24  6:37 ` NeilBrown
  2006-08-24  6:37 ` [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket NeilBrown
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


From: Greg Banks <gnb@melbourne.sgi.com>

knfsd: make the nfsd read-ahead params cache more SMP-friendly by
changing the single global list and lock into a fixed 16-bucket
hashtable with per-bucket locks.  This reduces spinlock contention
in nfsd_read() on read-heavy workloads on multiprocessor servers.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing 1K
streaming reads at full line rate.  The server had 128 nfsd threads,
which sizes the RA cache at 256 entries, of which only a handful
were used.  Flat profiling shows nfsd_read(), including the inlined
nfsd_get_raparms(), taking 10.4% of each CPU.  This patch drops the
contribution from nfsd() to 1.71% for each CPU.


Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/vfs.c |   60 +++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 16 deletions(-)

diff .prev/fs/nfsd/vfs.c ./fs/nfsd/vfs.c
--- .prev/fs/nfsd/vfs.c	2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/vfs.c	2006-08-24 16:27:01.000000000 +1000
@@ -54,6 +54,7 @@
 #include <linux/nfsd_idmap.h>
 #include <linux/security.h>
 #endif /* CONFIG_NFSD_V4 */
+#include <linux/jhash.h>
 
 #include <asm/uaccess.h>
 
@@ -81,10 +82,19 @@ struct raparms {
 	dev_t			p_dev;
 	int			p_set;
 	struct file_ra_state	p_ra;
+	unsigned int		p_hindex;
 };
 
+struct raparm_hbucket {
+	struct raparms		*pb_head;
+	spinlock_t		pb_lock;
+} ____cacheline_aligned_in_smp;
+
 static struct raparms *		raparml;
-static struct raparms *		raparm_cache;
+#define RAPARM_HASH_BITS	4
+#define RAPARM_HASH_SIZE	(1<<RAPARM_HASH_BITS)
+#define RAPARM_HASH_MASK	(RAPARM_HASH_SIZE-1)
+static struct raparm_hbucket	raparm_hash[RAPARM_HASH_SIZE];
 
 /* 
  * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
@@ -743,16 +753,20 @@ nfsd_sync_dir(struct dentry *dp)
  * Obtain the readahead parameters for the file
  * specified by (dev, ino).
  */
-static DEFINE_SPINLOCK(ra_lock);
 
 static inline struct raparms *
 nfsd_get_raparms(dev_t dev, ino_t ino)
 {
 	struct raparms	*ra, **rap, **frap = NULL;
 	int depth = 0;
+	unsigned int hash;
+	struct raparm_hbucket *rab;
+
+	hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
+	rab = &raparm_hash[hash];
 
-	spin_lock(&ra_lock);
-	for (rap = &raparm_cache; (ra = *rap); rap = &ra->p_next) {
+	spin_lock(&rab->pb_lock);
+	for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
 		if (ra->p_ino == ino && ra->p_dev == dev)
 			goto found;
 		depth++;
@@ -761,7 +775,7 @@ nfsd_get_raparms(dev_t dev, ino_t ino)
 	}
 	depth = nfsdstats.ra_size*11/10;
 	if (!frap) {	
-		spin_unlock(&ra_lock);
+		spin_unlock(&rab->pb_lock);
 		return NULL;
 	}
 	rap = frap;
@@ -769,15 +783,16 @@ nfsd_get_raparms(dev_t dev, ino_t ino)
 	ra->p_dev = dev;
 	ra->p_ino = ino;
 	ra->p_set = 0;
+	ra->p_hindex = hash;
 found:
-	if (rap != &raparm_cache) {
+	if (rap != &rab->pb_head) {
 		*rap = ra->p_next;
-		ra->p_next   = raparm_cache;
-		raparm_cache = ra;
+		ra->p_next   = rab->pb_head;
+		rab->pb_head = ra;
 	}
 	ra->p_count++;
 	nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
-	spin_unlock(&ra_lock);
+	spin_unlock(&rab->pb_lock);
 	return ra;
 }
 
@@ -856,11 +871,12 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 
 	/* Write back readahead params */
 	if (ra) {
-		spin_lock(&ra_lock);
+		struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
+		spin_lock(&rab->pb_lock);
 		ra->p_ra = file->f_ra;
 		ra->p_set = 1;
 		ra->p_count--;
-		spin_unlock(&ra_lock);
+		spin_unlock(&rab->pb_lock);
 	}
 
 	if (err >= 0) {
@@ -1836,11 +1852,11 @@ nfsd_permission(struct svc_export *exp, 
 void
 nfsd_racache_shutdown(void)
 {
-	if (!raparm_cache)
+	if (!raparml)
 		return;
 	dprintk("nfsd: freeing readahead buffers.\n");
 	kfree(raparml);
-	raparm_cache = raparml = NULL;
+	raparml = NULL;
 }
 /*
  * Initialize readahead param cache
@@ -1849,19 +1865,31 @@ int
 nfsd_racache_init(int cache_size)
 {
 	int	i;
+	int	j = 0;
+	int	nperbucket;
 
-	if (raparm_cache)
+
+	if (raparml)
 		return 0;
+	if (cache_size < 2*RAPARM_HASH_SIZE)
+		cache_size = 2*RAPARM_HASH_SIZE;
 	raparml = kmalloc(sizeof(struct raparms) * cache_size, GFP_KERNEL);
 
 	if (raparml != NULL) {
 		dprintk("nfsd: allocating %d readahead buffers.\n",
 			cache_size);
+		for (i = 0 ; i < RAPARM_HASH_SIZE ; i++) {
+			raparm_hash[i].pb_head = NULL;
+			spin_lock_init(&raparm_hash[i].pb_lock);
+		}
+		nperbucket = cache_size >> RAPARM_HASH_BITS;
 		memset(raparml, 0, sizeof(struct raparms) * cache_size);
 		for (i = 0; i < cache_size - 1; i++) {
-			raparml[i].p_next = raparml + i + 1;
+			if (i % nperbucket == 0)
+				raparm_hash[j++].pb_head = raparml + i;
+			if (i % nperbucket < nperbucket-1)
+				raparml[i].p_next = raparml + i + 1;
 		}
-		raparm_cache = raparml;
 	} else {
 		printk(KERN_WARNING
 		       "nfsd: Could not allocate memory read-ahead cache.\n");

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket
  2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
                   ` (9 preceding siblings ...)
  2006-08-24  6:37 ` [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly NeilBrown
@ 2006-08-24  6:37 ` NeilBrown
  10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24  6:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nfs, linux-kernel


From: Greg Banks <gnb@melbourne.sgi.com>

knfsd: speed up high call-rate workloads by caching the struct ip_map
for the peer on the connected struct svc_sock instead of looking it
up in the ip_map cache hashtable on every call.  This helps workloads
using AUTH_SYS authentication over TCP.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480
regular files in 10841 directories) on the server.  That tree is small
enough to fill in the server's RAM so no disk traffic was involved.
This setup gives a sustained call rate in excess of 60000 calls/sec
before being CPU-bound on the server.

Profiling showed strcmp(), called from ip_map_match(), was taking 4.8%
of each CPU, and ip_map_lookup() was taking 2.9%.  This patch drops
both contribution into the profile noise.

Note that the above result overstates this value of this patch
for most workloads.  The synthetic clients are all using separate
IP addresses, so there are 64 entries in the ip_map cache hash.
Because the kernel measured contained the bug fixed in commit

commit 1f1e030bf75774b6a283518e1534d598e14147d4

and was running on 64bit little-endian machine, probably all of
those 64 entries were on a single chain, thus increasing the cost
of ip_map_lookup().

With a modern kernel you would need more clients to see the same
amount of performance improvement.  This patch has helped to scale
knfsd to handle a deployment with 2000 NFS clients.


Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./include/linux/sunrpc/cache.h   |   11 +++++++++
 ./include/linux/sunrpc/svcauth.h |    1 
 ./include/linux/sunrpc/svcsock.h |    3 ++
 ./net/sunrpc/svcauth_unix.c      |   47 ++++++++++++++++++++++++++++++++++++---
 ./net/sunrpc/svcsock.c           |    2 +
 5 files changed, 61 insertions(+), 3 deletions(-)

diff .prev/include/linux/sunrpc/cache.h ./include/linux/sunrpc/cache.h
--- .prev/include/linux/sunrpc/cache.h	2006-08-24 16:27:18.000000000 +1000
+++ ./include/linux/sunrpc/cache.h	2006-08-24 16:27:18.000000000 +1000
@@ -163,6 +163,17 @@ static inline void cache_put(struct cach
 	kref_put(&h->ref, cd->cache_put);
 }
 
+static inline int cache_valid(struct cache_head *h)
+{
+	/* If an item has been unhashed pending removal when
+	 * the refcount drops to 0, the expiry_time will be
+	 * set to 0.  We don't want to consider such items
+	 * valid in this context even though CACHE_VALID is
+	 * set.
+	 */
+	return (h->expiry_time != 0 && test_bit(CACHE_VALID, &h->flags));
+}
+
 extern int cache_check(struct cache_detail *detail,
 		       struct cache_head *h, struct cache_req *rqstp);
 extern void cache_flush(void);

diff .prev/include/linux/sunrpc/svcauth.h ./include/linux/sunrpc/svcauth.h
--- .prev/include/linux/sunrpc/svcauth.h	2006-08-24 16:27:18.000000000 +1000
+++ ./include/linux/sunrpc/svcauth.h	2006-08-24 16:27:18.000000000 +1000
@@ -126,6 +126,7 @@ extern struct auth_domain *auth_domain_f
 extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
 extern int auth_unix_forget_old(struct auth_domain *dom);
 extern void svcauth_unix_purge(void);
+extern void svcauth_unix_info_release(void *);
 
 static inline unsigned long hash_str(char *name, int bits)
 {

diff .prev/include/linux/sunrpc/svcsock.h ./include/linux/sunrpc/svcsock.h
--- .prev/include/linux/sunrpc/svcsock.h	2006-08-24 16:27:18.000000000 +1000
+++ ./include/linux/sunrpc/svcsock.h	2006-08-24 16:27:18.000000000 +1000
@@ -54,6 +54,9 @@ struct svc_sock {
 	int			sk_reclen;	/* length of record */
 	int			sk_tcplen;	/* current read length */
 	time_t			sk_lastrecv;	/* time of last received request */
+
+	/* cache of various info for TCP sockets */
+	void			*sk_info_authunix;
 };
 
 /*

diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c
--- .prev/net/sunrpc/svcauth_unix.c	2006-08-24 16:27:18.000000000 +1000
+++ ./net/sunrpc/svcauth_unix.c	2006-08-24 16:27:18.000000000 +1000
@@ -9,6 +9,7 @@
 #include <linux/seq_file.h>
 #include <linux/hash.h>
 #include <linux/string.h>
+#include <net/sock.h>
 
 #define RPCDBG_FACILITY	RPCDBG_AUTH
 
@@ -375,6 +376,44 @@ void svcauth_unix_purge(void)
 	cache_purge(&ip_map_cache);
 }
 
+static inline struct ip_map *
+ip_map_cached_get(struct svc_rqst *rqstp)
+{
+	struct ip_map *ipm = rqstp->rq_sock->sk_info_authunix;
+	if (ipm != NULL) {
+		if (!cache_valid(&ipm->h)) {
+			/*
+			 * The entry has been invalidated since it was
+			 * remembered, e.g. by a second mount from the
+			 * same IP address.
+			 */
+			rqstp->rq_sock->sk_info_authunix = NULL;
+			cache_put(&ipm->h, &ip_map_cache);
+			return NULL;
+		}
+		cache_get(&ipm->h);
+	}
+	return ipm;
+}
+
+static inline void
+ip_map_cached_put(struct svc_rqst *rqstp, struct ip_map *ipm)
+{
+	struct svc_sock *svsk = rqstp->rq_sock;
+
+	if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == NULL)
+		svsk->sk_info_authunix = ipm;	/* newly cached, keep the reference */
+	else
+		cache_put(&ipm->h, &ip_map_cache);
+}
+
+void
+svcauth_unix_info_release(void *info)
+{
+	struct ip_map *ipm = info;
+	cache_put(&ipm->h, &ip_map_cache);
+}
+
 static int
 svcauth_unix_set_client(struct svc_rqst *rqstp)
 {
@@ -384,8 +423,10 @@ svcauth_unix_set_client(struct svc_rqst 
 	if (rqstp->rq_proc == 0)
 		return SVC_OK;
 
-	ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
-			    rqstp->rq_addr.sin_addr);
+	ipm = ip_map_cached_get(rqstp);
+	if (ipm == NULL)
+		ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
+				    rqstp->rq_addr.sin_addr);
 
 	if (ipm == NULL)
 		return SVC_DENIED;
@@ -400,7 +441,7 @@ svcauth_unix_set_client(struct svc_rqst 
 		case 0:
 			rqstp->rq_client = &ipm->m_client->h;
 			kref_get(&rqstp->rq_client->ref);
-			cache_put(&ipm->h, &ip_map_cache);
+			ip_map_cached_put(rqstp, ipm);
 			break;
 	}
 	return SVC_OK;

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c	2006-08-24 16:25:41.000000000 +1000
+++ ./net/sunrpc/svcsock.c	2006-08-24 16:27:18.000000000 +1000
@@ -1612,6 +1612,8 @@ svc_delete_socket(struct svc_sock *svsk)
 			sockfd_put(svsk->sk_sock);
 		else
 			sock_release(svsk->sk_sock);
+		if (svsk->sk_info_authunix != NULL)
+			svcauth_unix_info_release(svsk->sk_info_authunix);
 		kfree(svsk);
 	} else {
 		spin_unlock_bh(&serv->sv_lock);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-08-24  6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
@ 2006-09-25 15:43   ` J. Bruce Fields
  2006-09-28  3:41     ` Neil Brown
  2006-10-03  1:36     ` Neil Brown
  0 siblings, 2 replies; 25+ messages in thread
From: J. Bruce Fields @ 2006-09-25 15:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andrew Morton, nfs, linux-kernel, Greg Banks

On Thu, Aug 24, 2006 at 04:37:11PM +1000, NeilBrown wrote:
> The limit over UDP remains at 32K.  Also, make some of
> the apparently arbitrary sizing constants clearer.
> 
> The biggest change here involves replacing NFSSVC_MAXBLKSIZE
> by a function of the rqstp.  This allows it to be different
> for different protocols (udp/tcp) and also allows it
> to depend on the servers declared sv_bufsiz.
> 
> Note that we don't actually increase sv_bufsz for nfs yet.
> That comes next.

This patch has some problems.  (Apologies for being so slow to look at
them!)

We're reporting svc_max_payload(rqstp) as the server's maximum
read/write block size:

> @@ -538,15 +539,16 @@ nfsd3_proc_fsinfo(struct svc_rqst * rqst
>  					   struct nfsd3_fsinfores *resp)
>  {
>  	int	nfserr;
> +	u32	max_blocksize = svc_max_payload(rqstp);
>  
>  	dprintk("nfsd: FSINFO(3)   %s\n",
>  				SVCFH_fmt(&argp->fh));
>  
> -	resp->f_rtmax  = NFSSVC_MAXBLKSIZE;
> -	resp->f_rtpref = NFSSVC_MAXBLKSIZE;
> +	resp->f_rtmax  = max_blocksize;
> +	resp->f_rtpref = max_blocksize;
>  	resp->f_rtmult = PAGE_SIZE;
> -	resp->f_wtmax  = NFSSVC_MAXBLKSIZE;
> -	resp->f_wtpref = NFSSVC_MAXBLKSIZE;
> +	resp->f_wtmax  = max_blocksize;
> +	resp->f_wtpref = max_blocksize;
>  	resp->f_wtmult = PAGE_SIZE;
>  	resp->f_dtpref = PAGE_SIZE;
>  	resp->f_maxfilesize = ~(u32) 0;

But svc_max_payload() usually returns sv_bufsz in the TCP case:

> +u32 svc_max_payload(const struct svc_rqst *rqstp)
> +{
> +	int max = RPCSVC_MAXPAYLOAD_TCP;
> +
> +	if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
> +		max = RPCSVC_MAXPAYLOAD_UDP;
> +	if (rqstp->rq_server->sv_bufsz < max)
> +		max = rqstp->rq_server->sv_bufsz;
> +	return max;
> +}


That's the *total* size of the buffer for holding requests and replies.

If a client actually tries to send a write of that size, the entire
request will of course exceed sv_bufsz, so we'll drop it.  (We've seen
this happen with the Solaris v4 client.)

> -#define NFSD_BUFSIZE		(1024 + NFSSVC_MAXBLKSIZE)
> +/*
> + * Largest number of bytes we need to allocate for an NFS
> + * call or reply.  Used to control buffer sizes.  We use
> + * the length of v3 WRITE, READDIR and READDIR replies
> + * which are an RPC header, up to 26 XDR units of reply
> + * data, and some page data.
> + *
> + * Note that accuracy here doesn't matter too much as the
> + * size is rounded up to a page size when allocating space.
> + */

Is the rounding up *always* going to increase the size?  And if not,
then why doesn't accuracy matter?

> +#define NFSD_BUFSIZE		((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)

I think this results in 80 less bytes less than before, I think.

No doubt we have lots of wiggle room here, but I'd rather we didn't
decrease that size without seeing a careful analysis.

--b.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
  2006-08-24  6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
@ 2006-09-25 21:24   ` J. Bruce Fields
  2006-09-28  4:22     ` Neil Brown
  0 siblings, 1 reply; 25+ messages in thread
From: J. Bruce Fields @ 2006-09-25 21:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andrew Morton, nfs, linux-kernel

On Thu, Aug 24, 2006 at 04:37:16PM +1000, NeilBrown wrote:
> diff .prev/fs/nfsd/nfssvc.c ./fs/nfsd/nfssvc.c
> --- .prev/fs/nfsd/nfssvc.c	2006-08-24 16:26:10.000000000 +1000
> +++ ./fs/nfsd/nfssvc.c	2006-08-24 16:26:10.000000000 +1000
> @@ -198,9 +198,26 @@ int nfsd_create_serv(void)
>  		unlock_kernel();
>  		return 0;
>  	}
> +	if (nfsd_max_blksize == 0) {
> +		/* choose a suitable default */
> +		struct sysinfo i;
> +		si_meminfo(&i);
> +		/* Aim for 1/4096 of memory per thread
> +		 * This gives 1MB on 4Gig machines
> +		 * But only uses 32K on 128M machines.
> +		 * Bottom out at 8K on 32M and smaller.
> +		 * Of course, this is only a default.
> +		 */
> +		nfsd_max_blksize = NFSSVC_MAXBLKSIZE;
> +		i.totalram >>= 12;
> +		while (nfsd_max_blksize > i.totalram &&
> +		       nfsd_max_blksize >= 8*1024*2)
> +			nfsd_max_blksize /= 2;
> +	}

It looks to me like totalram is actually measured in pages.  So in
practice this gives almost everyone 8k here.  So that 12 should be
something like 12 - PAGE_CACHE_SHIFT?

--b.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-09-25 15:43   ` [NFS] " J. Bruce Fields
@ 2006-09-28  3:41     ` Neil Brown
  2006-09-28  3:46       ` Andrew Morton
  2006-10-03  1:36     ` Neil Brown
  1 sibling, 1 reply; 25+ messages in thread
From: Neil Brown @ 2006-09-28  3:41 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Andrew Morton, nfs, linux-kernel, Greg Banks

On Monday September 25, bfields@fieldses.org wrote:
> On Thu, Aug 24, 2006 at 04:37:11PM +1000, NeilBrown wrote:
> > The limit over UDP remains at 32K.  Also, make some of
> > the apparently arbitrary sizing constants clearer.
> > 
> > The biggest change here involves replacing NFSSVC_MAXBLKSIZE
> > by a function of the rqstp.  This allows it to be different
> > for different protocols (udp/tcp) and also allows it
> > to depend on the servers declared sv_bufsiz.
> > 
> > Note that we don't actually increase sv_bufsz for nfs yet.
> > That comes next.
> 
> This patch has some problems.  (Apologies for being so slow to look at
> them!)

Problems. Yes.  It makes my brain hurt for one!  We have various
things called a 'size' with some being rounded up version of others
and ... ARG.

> 
> We're reporting svc_max_payload(rqstp) as the server's maximum
> read/write block size:
> 
> > @@ -538,15 +539,16 @@ nfsd3_proc_fsinfo(struct svc_rqst * rqst
> >  					   struct nfsd3_fsinfores *resp)
> >  {
> >  	int	nfserr;
> > +	u32	max_blocksize = svc_max_payload(rqstp);
...
> 
> But svc_max_payload() usually returns sv_bufsz in the TCP case:
> 
...
> 
> That's the *total* size of the buffer for holding requests and replies.

Yes... for consistency with nfsd_create_serv, this should probably
be
   max_blocksize = svc_max_payload(rqstp) - (NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE);

as (NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE) have been determined to be the
maximum overhead in a read reply / write request.

> > -#define NFSD_BUFSIZE		(1024 + NFSSVC_MAXBLKSIZE)
> > +/*
> > + * Largest number of bytes we need to allocate for an NFS
> > + * call or reply.  Used to control buffer sizes.  We use
> > + * the length of v3 WRITE, READDIR and READDIR replies
> > + * which are an RPC header, up to 26 XDR units of reply
> > + * data, and some page data.
> > + *
> > + * Note that accuracy here doesn't matter too much as the
> > + * size is rounded up to a page size when allocating space.
> > + */
> 
> Is the rounding up *always* going to increase the size?  And if not,
> then why doesn't accuracy matter?
> 
> > +#define NFSD_BUFSIZE		((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)

Well the code in svc_init_buffer says:
	pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
So it doesn't just round up, but adds one page.  It might look like it
is adding 2 pages, but one of those is for the message in the other
direction.
It is really one page for the request, one page for the reply, and
N pages for the data.  So why do we add all that padding to
NFSD_BUFSIZE?
I'm not sure.  I think there is a good reason, but as I said - it
makes my brain hurt.

And the above comment only mentions v3.  v4 could presumably have lots
more overhead.  A 'write' could be in compound with lots of other
stuff, and if we say we can handle a 32k write, might the client send
a 40K message with 8k of UNLINK requests???

> 
> No doubt we have lots of wiggle room here, but I'd rather we didn't
> decrease that size without seeing a careful analysis.

Yes. careful analysis.  That sounds like a good idea.
I'll race you ... but I hope you win :-)

NeilBrown

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-09-28  3:41     ` Neil Brown
@ 2006-09-28  3:46       ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2006-09-28  3:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: J. Bruce Fields, nfs, linux-kernel, Greg Banks

On Thu, 28 Sep 2006 13:41:27 +1000
Neil Brown <neilb@suse.de> wrote:

> 	pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;

That's (the newly-added) DIV_ROUND_UP().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
  2006-09-25 21:24   ` [NFS] " J. Bruce Fields
@ 2006-09-28  4:22     ` Neil Brown
  2006-09-28 17:09       ` Hugh Dickins
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Brown @ 2006-09-28  4:22 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Andrew Morton, nfs, linux-kernel

On Monday September 25, bfields@fieldses.org wrote:
> 
> It looks to me like totalram is actually measured in pages.  So in
> practice this gives almost everyone 8k here.  So that 12 should be
> something like 12 - PAGE_CACHE_SHIFT?

Uhm.... yes.  Thanks.
But are the pages that totalram is measure in, normal pages, of
page_cache pages?  And is there a difference?
Should we use PAGE_CACHE_SHIFT, or PAGE_SHIFT?
And why do we have both if they are numerically identical?

I'll submit a patch which uses
        12 - PAGE_SHIFT
in a little while.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
  2006-09-28  4:22     ` Neil Brown
@ 2006-09-28 17:09       ` Hugh Dickins
  2006-09-29  1:59         ` Neil Brown
  0 siblings, 1 reply; 25+ messages in thread
From: Hugh Dickins @ 2006-09-28 17:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: J. Bruce Fields, Andrew Morton, nfs, linux-kernel

On Thu, 28 Sep 2006, Neil Brown wrote:
> But are the pages that totalram is measure in, normal pages, of
> page_cache pages?  And is there a difference?

There's never yet been a difference, outside of some patches by bcrl.
But totalram_pages comes "before" any idea of page cache, so it's in
normal pages.

> Should we use PAGE_CACHE_SHIFT, or PAGE_SHIFT?

PAGE_SHIFT.

> And why do we have both if they are numerically identical?

Very irritating: the time I've wasted on "correcting" code for the
"difference" between them!  Yet there's still plenty wrong and I've
largely given up on it.

Probably never will be a difference: but the idea was that the page
cache might use >0-order pages (unclear what happens to swap cache).

I wish they'd waited for a working implementation before introducing
the distinction; but never quite felt like deleting all trace of it.

> 
> I'll submit a patch which uses
>         12 - PAGE_SHIFT
> in a little while.

I haven't seen your context; but "12 - PAGE_SHIFT" sounds like a
bad idea on all those architectures with PAGE_SHIFT 13 or more;
you'll be on much safer ground working with "PAGE_SHIFT - 12".

Hugh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
  2006-09-28 17:09       ` Hugh Dickins
@ 2006-09-29  1:59         ` Neil Brown
  0 siblings, 0 replies; 25+ messages in thread
From: Neil Brown @ 2006-09-29  1:59 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: J. Bruce Fields, Andrew Morton, nfs, linux-kernel

On Thursday September 28, hugh@veritas.com wrote:
> > 
> > I'll submit a patch which uses
> >         12 - PAGE_SHIFT
> > in a little while.
> 
> I haven't seen your context; but "12 - PAGE_SHIFT" sounds like a
> bad idea on all those architectures with PAGE_SHIFT 13 or more;
> you'll be on much safer ground working with "PAGE_SHIFT - 12".

Ahhh yes... of course.  Thanks.

		totalram <<= PAGE_SHIFT - 12;

Is what I want to convert a number of pages to 1/4096 the number of
bytes.

Thanks :-)

NeilBrown

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-09-25 15:43   ` [NFS] " J. Bruce Fields
  2006-09-28  3:41     ` Neil Brown
@ 2006-10-03  1:36     ` Neil Brown
  2006-10-03  1:59       ` Greg Banks
  2006-10-03  2:13       ` J. Bruce Fields
  1 sibling, 2 replies; 25+ messages in thread
From: Neil Brown @ 2006-10-03  1:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: NeilBrown, nfs, linux-kernel, Greg Banks

On Monday September 25, bfields@fieldses.org wrote:
> 
> We're reporting svc_max_payload(rqstp) as the server's maximum
> read/write block size:

Yes.  So I'm going to change the number returned by
svc_max_payload(rqstp) to mean the maximum read/write block size.
i.e. when a service is created, the number passed isn't the maximum
packet size, but is the maximum payload size.
The assumption is that all of the request that is not payload will fit
into one page, and all of the reply that is not payload will also fit
into one page (though a different page).

It means that RPC services that have lots of non-payload data combined
with payload data won't work, but making sunrpc code completely
general when there are only two users is just too painful.

The only real problem is that NFSv4 can have arbitrarily large
non-payload data, and arbitrarily many payloads.  But I guess any
client that trying to send two full-sized payloads in the one request
is asking for trouble (I don't suppose the RPC spells this out at
all?).

> 
> > -#define NFSD_BUFSIZE		(1024 + NFSSVC_MAXBLKSIZE)
> > +/*
> > + * Largest number of bytes we need to allocate for an NFS
> > + * call or reply.  Used to control buffer sizes.  We use
> > + * the length of v3 WRITE, READDIR and READDIR replies
> > + * which are an RPC header, up to 26 XDR units of reply
> > + * data, and some page data.
> > + *
> > + * Note that accuracy here doesn't matter too much as the
> > + * size is rounded up to a page size when allocating space.
> > + */
> 
> Is the rounding up *always* going to increase the size?  And if not,
> then why doesn't accuracy matter?
> 
> > +#define NFSD_BUFSIZE		((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)
> 
> I think this results in 80 less bytes less than before, I think.
> 
> No doubt we have lots of wiggle room here, but I'd rather we didn't
> decrease that size without seeing a careful analysis.

The above change makes this loss in bytes irrelevant. NFSD_BUFSIZE
will now only be used once - near the end of nfs4proc.c and there if
it is wrong you just get a warning.

And the fact that the code change to effect this is so tiny seems to
imply that most of the code was already assuming that sv_bufsz was
really the payload size rather than the packet size.

So this is my proposed 'fix' for
	knfsd-prepare-knfsd-for-support-of-rsize-wsize-of-up-to-1mb-over-tcp.patch.

NeilBrown

------------
Make sv_bufsiz really be the payload size for rpc requests.

svc.c already allocated 2 extra pages for the request and the reply,
so it is perfectly consistent to assume that the size passed to
svc_create_pooled is the size of the payload.  This means that
the number returned by svc_max_payload - and thus returned to the client
as the maxiumu IO size - is exactly the chosen max block size.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfsd/nfssvc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/fs/nfsd/nfssvc.c ./fs/nfsd/nfssvc.c
--- .prev/fs/nfsd/nfssvc.c	2006-09-29 11:57:27.000000000 +1000
+++ ./fs/nfsd/nfssvc.c	2006-10-03 11:23:11.000000000 +1000
@@ -217,7 +217,7 @@ int nfsd_create_serv(void)
 
 	atomic_set(&nfsd_busy, 0);
 	nfsd_serv = svc_create_pooled(&nfsd_program,
-				      NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE + nfsd_max_blksize,
+				      nfsd_max_blksize,
 				      nfsd_last_thread,
 				      nfsd, SIG_NOCLEAN, THIS_MODULE);
 	if (nfsd_serv == NULL)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-10-03  1:36     ` Neil Brown
@ 2006-10-03  1:59       ` Greg Banks
  2006-10-03  2:13       ` J. Bruce Fields
  1 sibling, 0 replies; 25+ messages in thread
From: Greg Banks @ 2006-10-03  1:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: J. Bruce Fields, nfs, linux-kernel

On Tue, Oct 03, 2006 at 11:36:32AM +1000, Neil Brown wrote:
> On Monday September 25, bfields@fieldses.org wrote:
> > 
> > We're reporting svc_max_payload(rqstp) as the server's maximum
> > read/write block size:
> 
> Yes.  So I'm going to change the number returned by
> svc_max_payload(rqstp) to mean the maximum read/write block size.
> i.e. when a service is created, the number passed isn't the maximum
> packet size, but is the maximum payload size.

I'm confused.  Last time I looked at the code that was
exactly what the semantics were?

> The assumption is that all of the request that is not payload will fit
> into one page, and all of the reply that is not payload will also fit
> into one page (though a different page).

This is a pretty good assumption for v3.

> It means that RPC services that have lots of non-payload data combined
> with payload data won't work, but making sunrpc code completely
> general when there are only two users is just too painful.
> 
> The only real problem is that NFSv4 can have arbitrarily large
> non-payload data, and arbitrarily many payloads.  But I guess any
> client that trying to send two full-sized payloads in the one request
> is asking for trouble (I don't suppose the RPC spells this out at
> all?).

Bruce and I briefly discussed this when I dropped into CITI the other
week.  The conclusion was that this is a non-issue in the short term
because all the clients do a single READ or WRITE per call.  In the
long term I hope to rewrite some parts of that code to do away with
one of the memcpy()s in the WRITE path, and handling multiple WRITEs
for v4 would be a natural extension of that.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-10-03  1:36     ` Neil Brown
  2006-10-03  1:59       ` Greg Banks
@ 2006-10-03  2:13       ` J. Bruce Fields
  2006-10-03  5:41         ` Neil Brown
  1 sibling, 1 reply; 25+ messages in thread
From: J. Bruce Fields @ 2006-10-03  2:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs, linux-kernel, Greg Banks

On Tue, Oct 03, 2006 at 11:36:32AM +1000, Neil Brown wrote:
> The only real problem is that NFSv4 can have arbitrarily large
> non-payload data, and arbitrarily many payloads.  But I guess any
> client that trying to send two full-sized payloads in the one request
> is asking for trouble (I don't suppose the RPC spells this out at
> all?).

The RFC?  Well, it does have a "RESOURCE" error that the server can
return for overly complicated compounds.  It doesn't give much guidance
on when exactly that could happen, but if there's ever a clear case for
returning NFS4ERR_RESOURCE, I think it must be the case of a client
trying to circumvent the maximum read/write size by using multiple read
or write operations in a single compound.

(We have some other odd restrictions on the sorts of compounds we can
accept, which I'd like to relax.  But that's a problem for another day.)

> And the fact that the code change to effect this is so tiny seems to
> imply that most of the code was already assuming that sv_bufsz was
> really the payload size rather than the packet size.

There's also the check at the end of svc_tcp_recvfrom():

	if (svsk->sk_reclen > serv->sv_bufsz) {
		printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
		       (unsigned long) svsk->sk_reclen);
		goto err_delete;
	}

--b.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-10-03  2:13       ` J. Bruce Fields
@ 2006-10-03  5:41         ` Neil Brown
  2006-10-03  8:02           ` Greg Banks
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Brown @ 2006-10-03  5:41 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, linux-kernel, Greg Banks

On Monday October 2, bfields@fieldses.org wrote:
> On Tue, Oct 03, 2006 at 11:36:32AM +1000, Neil Brown wrote:
> > The only real problem is that NFSv4 can have arbitrarily large
> > non-payload data, and arbitrarily many payloads.  But I guess any
> > client that trying to send two full-sized payloads in the one request
> > is asking for trouble (I don't suppose the RPC spells this out at
> > all?).
> 
> The RFC?

RFC, RPC, only a few pixel different :-)

  Well, it does have a "RESOURCE" error that the server can
> return for overly complicated compounds.  It doesn't give much guidance
> on when exactly that could happen, but if there's ever a clear case for
> returning NFS4ERR_RESOURCE, I think it must be the case of a client
> trying to circumvent the maximum read/write size by using multiple read
> or write operations in a single compound.

It would be nice if the RFC specified some minimum that the client
could be sure of not getting NFS4ERR_RESOURCE for, but maybe that
isn't really necessary.

> 
> There's also the check at the end of svc_tcp_recvfrom():
> 
> 	if (svsk->sk_reclen > serv->sv_bufsz) {
> 		printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
> 		       (unsigned long) svsk->sk_reclen);
> 		goto err_delete;
> 	}

Groan... and a whole lot more besides.  Thanks.
I had tried to avoid storing separate 'payload max' and 'message max'
numbers, but it doesn't seem like that is going to work, so it is time
to just give in and make it explicit.

Comments on the below?

Thanks,
NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./include/linux/sunrpc/svc.h |    3 ++-
 ./net/sunrpc/svc.c           |   17 ++++++++++-------
 ./net/sunrpc/svcsock.c       |   28 ++++++++++++++--------------
 3 files changed, 26 insertions(+), 22 deletions(-)

diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h	2006-09-29 11:44:31.000000000 +1000
+++ ./include/linux/sunrpc/svc.h	2006-10-03 15:26:38.000000000 +1000
@@ -57,7 +57,8 @@ struct svc_serv {
 	struct svc_stat *	sv_stats;	/* RPC statistics */
 	spinlock_t		sv_lock;
 	unsigned int		sv_nrthreads;	/* # of server threads */
-	unsigned int		sv_bufsz;	/* datagram buffer size */
+	unsigned int		sv_max_payload;	/* datagram payload size */
+	unsigned int		sv_max_mesg;	/* bufsz + 1 page for overheads */
 	unsigned int		sv_xdrsize;	/* XDR buffer size */
 
 	struct list_head	sv_permsocks;	/* all permanent sockets */

diff .prev/net/sunrpc/svc.c ./net/sunrpc/svc.c
--- .prev/net/sunrpc/svc.c	2006-09-29 11:44:31.000000000 +1000
+++ ./net/sunrpc/svc.c	2006-10-03 15:39:27.000000000 +1000
@@ -282,7 +282,8 @@ __svc_create(struct svc_program *prog, u
 	serv->sv_program   = prog;
 	serv->sv_nrthreads = 1;
 	serv->sv_stats     = prog->pg_stats;
-	serv->sv_bufsz	   = bufsize? bufsize : 4096;
+	serv->sv_max_payload = bufsize? bufsize : 4096;
+	serv->sv_max_mesg  = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
 	serv->sv_shutdown  = shutdown;
 	xdrsize = 0;
 	while (prog) {
@@ -414,9 +415,11 @@ svc_init_buffer(struct svc_rqst *rqstp, 
 	int pages;
 	int arghi;
 	
-	if (size > RPCSVC_MAXPAYLOAD)
-		size = RPCSVC_MAXPAYLOAD;
-	pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
+	if (size > RPCSVC_MAXPAYLOAD + PAGE_SIZE)
+		size = RPCSVC_MAXPAYLOAD + PAGE_SIZE;
+	pages = size + PAGE_SIZE; /* extra page as we hold both request and reply.
+				   * We assume one is at most one page
+				   */
 	arghi = 0;
 	BUG_ON(pages > RPCSVC_MAXPAGES);
 	while (pages) {
@@ -463,7 +466,7 @@ __svc_create_thread(svc_thread_fn func, 
 
 	if (!(rqstp->rq_argp = kmalloc(serv->sv_xdrsize, GFP_KERNEL))
 	 || !(rqstp->rq_resp = kmalloc(serv->sv_xdrsize, GFP_KERNEL))
-	 || !svc_init_buffer(rqstp, serv->sv_bufsz))
+	 || !svc_init_buffer(rqstp, serv->sv_max_mesg))
 		goto out_thread;
 
 	serv->sv_nrthreads++;
@@ -938,8 +941,8 @@ u32 svc_max_payload(const struct svc_rqs
 
 	if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
 		max = RPCSVC_MAXPAYLOAD_UDP;
-	if (rqstp->rq_server->sv_bufsz < max)
-		max = rqstp->rq_server->sv_bufsz;
+	if (rqstp->rq_server->sv_max_payload < max)
+		max = rqstp->rq_server->sv_max_payload;
 	return max;
 }
 EXPORT_SYMBOL_GPL(svc_max_payload);

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c	2006-09-29 11:44:33.000000000 +1000
+++ ./net/sunrpc/svcsock.c	2006-10-03 15:35:08.000000000 +1000
@@ -192,13 +192,13 @@ svc_sock_enqueue(struct svc_sock *svsk)
 	svsk->sk_pool = pool;
 
 	set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
-	if (((atomic_read(&svsk->sk_reserved) + serv->sv_bufsz)*2
+	if (((atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg)*2
 	     > svc_sock_wspace(svsk))
 	    && !test_bit(SK_CLOSE, &svsk->sk_flags)
 	    && !test_bit(SK_CONN, &svsk->sk_flags)) {
 		/* Don't enqueue while not enough space for reply */
 		dprintk("svc: socket %p  no space, %d*2 > %ld, not enqueued\n",
-			svsk->sk_sk, atomic_read(&svsk->sk_reserved)+serv->sv_bufsz,
+			svsk->sk_sk, atomic_read(&svsk->sk_reserved)+serv->sv_max_mesg,
 			svc_sock_wspace(svsk));
 		svsk->sk_pool = NULL;
 		clear_bit(SK_BUSY, &svsk->sk_flags);
@@ -220,7 +220,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
 				rqstp, rqstp->rq_sock);
 		rqstp->rq_sock = svsk;
 		atomic_inc(&svsk->sk_inuse);
-		rqstp->rq_reserved = serv->sv_bufsz;
+		rqstp->rq_reserved = serv->sv_max_mesg;
 		atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
 		BUG_ON(svsk->sk_pool != pool);
 		wake_up(&rqstp->rq_wait);
@@ -639,8 +639,8 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
 	     * which will access the socket.
 	     */
 	    svc_sock_setbufsize(svsk->sk_sock,
-				(serv->sv_nrthreads+3) * serv->sv_bufsz,
-				(serv->sv_nrthreads+3) * serv->sv_bufsz);
+				(serv->sv_nrthreads+3) * serv->sv_max_mesg,
+				(serv->sv_nrthreads+3) * serv->sv_max_mesg);
 
 	if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
 		svc_sock_received(svsk);
@@ -749,8 +749,8 @@ svc_udp_init(struct svc_sock *svsk)
 	 * svc_udp_recvfrom will re-adjust if necessary
 	 */
 	svc_sock_setbufsize(svsk->sk_sock,
-			    3 * svsk->sk_server->sv_bufsz,
-			    3 * svsk->sk_server->sv_bufsz);
+			    3 * svsk->sk_server->sv_max_mesg,
+			    3 * svsk->sk_server->sv_max_mesg);
 
 	set_bit(SK_DATA, &svsk->sk_flags); /* might have come in before data_ready set up */
 	set_bit(SK_CHNGBUF, &svsk->sk_flags);
@@ -993,8 +993,8 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
 		 * as soon a a complete request arrives.
 		 */
 		svc_sock_setbufsize(svsk->sk_sock,
-				    (serv->sv_nrthreads+3) * serv->sv_bufsz,
-				    3 * serv->sv_bufsz);
+				    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
+				    3 * serv->sv_max_mesg);
 
 	clear_bit(SK_DATA, &svsk->sk_flags);
 
@@ -1032,7 +1032,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
 		}
 		svsk->sk_reclen &= 0x7fffffff;
 		dprintk("svc: TCP record, %d bytes\n", svsk->sk_reclen);
-		if (svsk->sk_reclen > serv->sv_bufsz) {
+		if (svsk->sk_reclen > serv->sv_max_mesg) {
 			printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
 			       (unsigned long) svsk->sk_reclen);
 			goto err_delete;
@@ -1171,8 +1171,8 @@ svc_tcp_init(struct svc_sock *svsk)
 		 * svc_tcp_recvfrom will re-adjust if necessary
 		 */
 		svc_sock_setbufsize(svsk->sk_sock,
-				    3 * svsk->sk_server->sv_bufsz,
-				    3 * svsk->sk_server->sv_bufsz);
+				    3 * svsk->sk_server->sv_max_mesg,
+				    3 * svsk->sk_server->sv_max_mesg);
 
 		set_bit(SK_CHNGBUF, &svsk->sk_flags);
 		set_bit(SK_DATA, &svsk->sk_flags);
@@ -1234,7 +1234,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
 
 
 	/* now allocate needed pages.  If we get a failure, sleep briefly */
-	pages = 2 + (serv->sv_bufsz + PAGE_SIZE -1) / PAGE_SIZE;
+	pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE;
 	for (i=0; i < pages ; i++)
 		while (rqstp->rq_pages[i] == NULL) {
 			struct page *p = alloc_page(GFP_KERNEL);
@@ -1263,7 +1263,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
 	if ((svsk = svc_sock_dequeue(pool)) != NULL) {
 		rqstp->rq_sock = svsk;
 		atomic_inc(&svsk->sk_inuse);
-		rqstp->rq_reserved = serv->sv_bufsz;	
+		rqstp->rq_reserved = serv->sv_max_mesg;
 		atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
 	} else {
 		/* No data pending. Go to sleep */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-10-03  5:41         ` Neil Brown
@ 2006-10-03  8:02           ` Greg Banks
  2006-10-05  7:07             ` Neil Brown
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Banks @ 2006-10-03  8:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: J. Bruce Fields, nfs, linux-kernel

On Tue, Oct 03, 2006 at 03:41:43PM +1000, Neil Brown wrote:
> Comments on the below?

Looks ok, except...

> @@ -57,7 +57,8 @@ struct svc_serv {
>  	struct svc_stat *	sv_stats;	/* RPC statistics */
>  	spinlock_t		sv_lock;
>  	unsigned int		sv_nrthreads;	/* # of server threads */
> -	unsigned int		sv_bufsz;	/* datagram buffer size */
> +	unsigned int		sv_max_payload;	/* datagram payload size */
> +	unsigned int		sv_max_mesg;	/* bufsz + 1 page for overheads */

Presumably the comment should read "max_payload + 1 page..." ?

> @@ -414,9 +415,11 @@ svc_init_buffer(struct svc_rqst *rqstp, 
>  	int pages;
>  	int arghi;
>  	
> -	if (size > RPCSVC_MAXPAYLOAD)
> -		size = RPCSVC_MAXPAYLOAD;
> -	pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
> +	if (size > RPCSVC_MAXPAYLOAD + PAGE_SIZE)
> +		size = RPCSVC_MAXPAYLOAD + PAGE_SIZE;
> +	pages = size + PAGE_SIZE; /* extra page as we hold both request and reply.
> +				   * We assume one is at most one page
> +				   */

Isn't there a divide by PAGE_SIZE missing here?  Looks
like we'll be allocating a *lot* of pages ;-)

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
  2006-10-03  8:02           ` Greg Banks
@ 2006-10-05  7:07             ` Neil Brown
  0 siblings, 0 replies; 25+ messages in thread
From: Neil Brown @ 2006-10-05  7:07 UTC (permalink / raw)
  To: Greg Banks; +Cc: J. Bruce Fields, nfs, linux-kernel

On Tuesday October 3, gnb@sgi.com wrote:
> On Tue, Oct 03, 2006 at 03:41:43PM +1000, Neil Brown wrote:
> > Comments on the below?
> 
> Looks ok, except...
> 
> > @@ -57,7 +57,8 @@ struct svc_serv {
> >  	struct svc_stat *	sv_stats;	/* RPC statistics */
> >  	spinlock_t		sv_lock;
> >  	unsigned int		sv_nrthreads;	/* # of server threads */
> > -	unsigned int		sv_bufsz;	/* datagram buffer size */
> > +	unsigned int		sv_max_payload;	/* datagram payload size */
> > +	unsigned int		sv_max_mesg;	/* bufsz + 1 page for overheads */
> 
> Presumably the comment should read "max_payload + 1 page..." ?
> 

Yes....

> > @@ -414,9 +415,11 @@ svc_init_buffer(struct svc_rqst *rqstp, 
> >  	int pages;
> >  	int arghi;
> >  	
> > -	if (size > RPCSVC_MAXPAYLOAD)
> > -		size = RPCSVC_MAXPAYLOAD;
> > -	pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
> > +	if (size > RPCSVC_MAXPAYLOAD + PAGE_SIZE)
> > +		size = RPCSVC_MAXPAYLOAD + PAGE_SIZE;
> > +	pages = size + PAGE_SIZE; /* extra page as we hold both request and reply.
> > +				   * We assume one is at most one page
> > +				   */
> 
> Isn't there a divide by PAGE_SIZE missing here?  Looks
> like we'll be allocating a *lot* of pages ;-)

Better safe that sorry?  But yes, we would be very sorry if we tried
to allocate that many pages.

Thanks for the review.

NeilBrown

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-10-05  7:07 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-24  6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
2006-08-24  6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
2006-08-24  6:36 ` [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset NeilBrown
2006-08-24  6:36 ` [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist NeilBrown
2006-08-24  6:36 ` [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel NeilBrown
2006-08-24  6:36 ` [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket NeilBrown
2006-08-24  6:36 ` [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one NeilBrown
2006-08-24  6:37 ` [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom NeilBrown
2006-08-24  6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
2006-09-25 15:43   ` [NFS] " J. Bruce Fields
2006-09-28  3:41     ` Neil Brown
2006-09-28  3:46       ` Andrew Morton
2006-10-03  1:36     ` Neil Brown
2006-10-03  1:59       ` Greg Banks
2006-10-03  2:13       ` J. Bruce Fields
2006-10-03  5:41         ` Neil Brown
2006-10-03  8:02           ` Greg Banks
2006-10-05  7:07             ` Neil Brown
2006-08-24  6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
2006-09-25 21:24   ` [NFS] " J. Bruce Fields
2006-09-28  4:22     ` Neil Brown
2006-09-28 17:09       ` Hugh Dickins
2006-09-29  1:59         ` Neil Brown
2006-08-24  6:37 ` [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly NeilBrown
2006-08-24  6:37 ` [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).