* [PATCH 000 of 11] knfsd: Introduction
@ 2006-08-24 6:36 NeilBrown
2006-08-24 6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
` (10 more replies)
0 siblings, 11 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
Following are 11 patch for knfsd against
2.6.18-rc4-mm2 plus nfsd-lockdep-annotation.patch
(the first patch being a fix for that patch).
They are all appropriate for 2.6.19, not 2.6.18.
The batch in the middle (6-9) allow nfsd to handle IO requests
of up to 1Megabyte (rather than the current limit of 32K).
This only applies on TCP connections. This has not
been heavily tested...
The max size to allow is configurable via the nfsd filesystem,
and defaults to something that I hope is reasonable.
This set also includes a few more scalability patches from Greg Banks.
NeilBrown
[PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix
[PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset
[PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist
[PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel
[PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket.
[PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one.
[PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom
[PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
[PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
[PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly
[PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
@ 2006-08-24 6:36 ` NeilBrown
2006-08-24 6:36 ` [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset NeilBrown
` (9 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
nfsv2 needs the I_MUTEX_PARENT on the directory when creating
a file too.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfsproc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c 2006-08-24 16:21:23.000000000 +1000
+++ ./fs/nfsd/nfsproc.c 2006-08-24 16:21:35.000000000 +1000
@@ -225,7 +225,7 @@ nfsd_proc_create(struct svc_rqst *rqstp,
nfserr = nfserr_exist;
if (isdotent(argp->name, argp->len))
goto done;
- fh_lock(dirfhp);
+ fh_lock_nested(dirfhp, I_MUTEX_PARENT);
dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
if (IS_ERR(dchild)) {
nfserr = nfserrno(PTR_ERR(dchild));
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
2006-08-24 6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
@ 2006-08-24 6:36 ` NeilBrown
2006-08-24 6:36 ` [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist NeilBrown
` (8 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./net/sunrpc/svcsock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2006-08-24 16:23:37.000000000 +1000
+++ ./net/sunrpc/svcsock.c 2006-08-24 16:23:37.000000000 +1000
@@ -49,7 +49,7 @@
* svc_pool->sp_lock protects most of the fields of that pool.
* svc_serv->sv_lock protects sv_tempsocks, sv_permsocks, sv_tmpcnt.
* when both need to be taken (rare), svc_serv->sv_lock is first.
- * BKL protects svc_serv->sv_nrthread, svc_pool->sp_nrthread
+ * BKL protects svc_serv->sv_nrthread.
* svc_sock->sk_defer_lock protects the svc_sock->sk_deferred list
* svc_sock->sk_flags.SK_BUSY prevents a svc_sock being enqueued multiply.
*
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
2006-08-24 6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
2006-08-24 6:36 ` [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset NeilBrown
@ 2006-08-24 6:36 ` NeilBrown
2006-08-24 6:36 ` [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel NeilBrown
` (7 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
The refcount that nfsd holds on lockd is based on the number
of open sockets.
So when we close a socket, we should decrement the ref (with lockd_down).
Currently when a socket is closed via writing to the portlist
file, that doesn't happen.
So: make sure we get an error return if the socket that was requested
does is not found, and call lockd_down if it was.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfsctl.c | 2 ++
./net/sunrpc/svcsock.c | 2 ++
2 files changed, 4 insertions(+)
diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c 2006-08-24 16:24:21.000000000 +1000
+++ ./fs/nfsd/nfsctl.c 2006-08-24 16:24:21.000000000 +1000
@@ -545,6 +545,8 @@ static ssize_t write_ports(struct file *
if (nfsd_serv)
len = svc_sock_names(buf, nfsd_serv, toclose);
unlock_kernel();
+ if (len >= 0)
+ lockd_down();
kfree(toclose);
return len;
}
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2006-08-24 16:23:37.000000000 +1000
+++ ./net/sunrpc/svcsock.c 2006-08-24 16:24:21.000000000 +1000
@@ -493,6 +493,8 @@ svc_sock_names(char *buf, struct svc_ser
spin_unlock(&serv->sv_lock);
if (closesk)
svc_delete_socket(closesk);
+ else if (toclose)
+ return -ENOENT;
return len;
}
EXPORT_SYMBOL(svc_sock_names);
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (2 preceding siblings ...)
2006-08-24 6:36 ` [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist NeilBrown
@ 2006-08-24 6:36 ` NeilBrown
2006-08-24 6:36 ` [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket NeilBrown
` (6 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfsctl.c | 2 ++
1 file changed, 2 insertions(+)
diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c 2006-08-24 16:24:21.000000000 +1000
+++ ./fs/nfsd/nfsctl.c 2006-08-24 16:24:47.000000000 +1000
@@ -532,7 +532,9 @@ static ssize_t write_ports(struct file *
/* Decrease the count, but don't shutdown the
* the service
*/
+ lock_kernel();
nfsd_serv->sv_nrthreads--;
+ unlock_kernel();
}
return err;
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket.
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (3 preceding siblings ...)
2006-08-24 6:36 ` [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel NeilBrown
@ 2006-08-24 6:36 ` NeilBrown
2006-08-24 6:36 ` [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one NeilBrown
` (5 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
Arrgg..
We cannot 'lockd_up' before 'svc_addsock' as we don't know
the protocol yet....
So switch it around again and save the name of the
created sockets so that it can be closed if lock_up fails.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfsctl.c | 12 ++++++------
./net/sunrpc/svcsock.c | 3 +++
2 files changed, 9 insertions(+), 6 deletions(-)
diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c 2006-08-24 16:24:47.000000000 +1000
+++ ./fs/nfsd/nfsctl.c 2006-08-24 16:24:56.000000000 +1000
@@ -523,11 +523,11 @@ static ssize_t write_ports(struct file *
err = nfsd_create_serv();
if (!err) {
int proto = 0;
- err = lockd_up(proto);
- if (!err) {
- err = svc_addsock(nfsd_serv, fd, buf, &proto);
- if (err)
- lockd_down();
+ err = svc_addsock(nfsd_serv, fd, buf, &proto);
+ if (err >= 0) {
+ err = lockd_up(proto);
+ if (err < 0)
+ svc_sock_names(buf+strlen(buf)+1, nfsd_serv, buf);
}
/* Decrease the count, but don't shutdown the
* the service
@@ -536,7 +536,7 @@ static ssize_t write_ports(struct file *
nfsd_serv->sv_nrthreads--;
unlock_kernel();
}
- return err;
+ return err < 0 ? err : 0;
}
if (buf[0] == '-') {
char *toclose = kstrdup(buf+1, GFP_KERNEL);
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2006-08-24 16:24:21.000000000 +1000
+++ ./net/sunrpc/svcsock.c 2006-08-24 16:24:56.000000000 +1000
@@ -492,6 +492,9 @@ svc_sock_names(char *buf, struct svc_ser
}
spin_unlock(&serv->sv_lock);
if (closesk)
+ /* Should unregister with portmap, but you cannot
+ * unregister just one protocol...
+ */
svc_delete_socket(closesk);
else if (toclose)
return -ENOENT;
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one.
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (4 preceding siblings ...)
2006-08-24 6:36 ` [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket NeilBrown
@ 2006-08-24 6:36 ` NeilBrown
2006-08-24 6:37 ` [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom NeilBrown
` (4 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
We are planning to increase RPCSVC_MAXPAGES from about
8 to about 256. This means we need to be a bit careful
about arrays of size RPCSVC_MAXPAGES.
struct svc_rqst contains two such arrays.
However the there are never more that RPCSVC_MAXPAGES
pages in the two arrays together, so only one array is needed.
The two arrays are for the pages holding the request,
and the pages holding the reply.
Instead of two arrays, we can simply keep an index into
where the first reply page is.
This patch also removes a number of small inline functions that
probably server to obscure what is going on rather than clarify it,
and opencode the needed functionality.
Also remove the 'rq_restailpage' variable as it is *always* 0.
i.e. if the response 'xdr' structure has a non-empty tail it is
always in the same pages as the head.
check counters are initilised and incr properly
check for consistant usage of ++ etc
maybe extra some inlines for common approach
general review
Description...
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfs2acl.c | 2 -
./fs/nfsd/nfs3acl.c | 2 -
./fs/nfsd/nfs3xdr.c | 23 ++++--------
./fs/nfsd/nfs4xdr.c | 27 ++++++--------
./fs/nfsd/nfsxdr.c | 13 ++----
./fs/nfsd/vfs.c | 16 +++++---
./include/linux/sunrpc/svc.h | 69 +++++-------------------------------
./net/sunrpc/auth_gss/svcauth_gss.c | 4 --
./net/sunrpc/svc.c | 21 +++-------
./net/sunrpc/svcsock.c | 40 ++++++++++----------
10 files changed, 76 insertions(+), 141 deletions(-)
diff .prev/fs/nfsd/nfs2acl.c ./fs/nfsd/nfs2acl.c
--- .prev/fs/nfsd/nfs2acl.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs2acl.c 2006-08-24 16:25:13.000000000 +1000
@@ -241,7 +241,7 @@ static int nfsaclsvc_encode_getaclres(st
rqstp->rq_res.page_len = w;
while (w > 0) {
- if (!svc_take_res_page(rqstp))
+ if (!rqstp->rq_respages[rqstp->rq_resused++])
return 0;
w -= PAGE_SIZE;
}
diff .prev/fs/nfsd/nfs3acl.c ./fs/nfsd/nfs3acl.c
--- .prev/fs/nfsd/nfs3acl.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs3acl.c 2006-08-24 16:25:13.000000000 +1000
@@ -185,7 +185,7 @@ static int nfs3svc_encode_getaclres(stru
rqstp->rq_res.page_len = w;
while (w > 0) {
- if (!svc_take_res_page(rqstp))
+ if (!rqstp->rq_respages[rqstp->rq_resused++])
return 0;
w -= PAGE_SIZE;
}
diff .prev/fs/nfsd/nfs3xdr.c ./fs/nfsd/nfs3xdr.c
--- .prev/fs/nfsd/nfs3xdr.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs3xdr.c 2006-08-24 16:25:13.000000000 +1000
@@ -343,8 +343,7 @@ nfs3svc_decode_readargs(struct svc_rqst
/* set up the kvec */
v=0;
while (len > 0) {
- pn = rqstp->rq_resused;
- svc_take_page(rqstp);
+ pn = rqstp->rq_resused++;
args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
args->vec[v].iov_len = len < PAGE_SIZE? len : PAGE_SIZE;
len -= args->vec[v].iov_len;
@@ -382,7 +381,7 @@ nfs3svc_decode_writeargs(struct svc_rqst
while (len > args->vec[v].iov_len) {
len -= args->vec[v].iov_len;
v++;
- args->vec[v].iov_base = page_address(rqstp->rq_argpages[v]);
+ args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
args->vec[v].iov_len = PAGE_SIZE;
}
args->vec[v].iov_len = len;
@@ -446,11 +445,11 @@ nfs3svc_decode_symlinkargs(struct svc_rq
* This page appears in the rq_res.pages list, but as pages_len is always
* 0, it won't get in the way
*/
- svc_take_page(rqstp);
len = ntohl(*p++);
if (len == 0 || len > NFS3_MAXPATHLEN || len >= PAGE_SIZE)
return 0;
- args->tname = new = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+ args->tname = new =
+ page_address(rqstp->rq_respages[rqstp->rq_resused++]);
args->tlen = len;
/* first copy and check from the first page */
old = (char*)p;
@@ -522,8 +521,8 @@ nfs3svc_decode_readlinkargs(struct svc_r
{
if (!(p = decode_fh(p, &args->fh)))
return 0;
- svc_take_page(rqstp);
- args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+ args->buffer =
+ page_address(rqstp->rq_respages[rqstp->rq_resused++]);
return xdr_argsize_check(rqstp, p);
}
@@ -554,8 +553,8 @@ nfs3svc_decode_readdirargs(struct svc_rq
if (args->count > PAGE_SIZE)
args->count = PAGE_SIZE;
- svc_take_page(rqstp);
- args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+ args->buffer =
+ page_address(rqstp->rq_respages[rqstp->rq_resused++]);
return xdr_argsize_check(rqstp, p);
}
@@ -578,8 +577,7 @@ nfs3svc_decode_readdirplusargs(struct sv
args->count = len;
while (len > 0) {
- pn = rqstp->rq_resused;
- svc_take_page(rqstp);
+ pn = rqstp->rq_resused++;
if (!args->buffer)
args->buffer = page_address(rqstp->rq_respages[pn]);
len -= PAGE_SIZE;
@@ -668,7 +666,6 @@ nfs3svc_encode_readlinkres(struct svc_rq
rqstp->rq_res.page_len = resp->len;
if (resp->len & 3) {
/* need to pad the tail */
- rqstp->rq_restailpage = 0;
rqstp->rq_res.tail[0].iov_base = p;
*p = 0;
rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3);
@@ -693,7 +690,6 @@ nfs3svc_encode_readres(struct svc_rqst *
rqstp->rq_res.page_len = resp->count;
if (resp->count & 3) {
/* need to pad the tail */
- rqstp->rq_restailpage = 0;
rqstp->rq_res.tail[0].iov_base = p;
*p = 0;
rqstp->rq_res.tail[0].iov_len = 4 - (resp->count & 3);
@@ -768,7 +764,6 @@ nfs3svc_encode_readdirres(struct svc_rqs
rqstp->rq_res.page_len = (resp->count) << 2;
/* add the 'tail' to the end of the 'head' page - page 0. */
- rqstp->rq_restailpage = 0;
rqstp->rq_res.tail[0].iov_base = p;
*p++ = 0; /* no more entries */
*p++ = htonl(resp->common.err == nfserr_eof);
diff .prev/fs/nfsd/nfs4xdr.c ./fs/nfsd/nfs4xdr.c
--- .prev/fs/nfsd/nfs4xdr.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs4xdr.c 2006-08-24 16:25:13.000000000 +1000
@@ -2040,7 +2040,8 @@ nfsd4_encode_open_downgrade(struct nfsd4
}
static int
-nfsd4_encode_read(struct nfsd4_compoundres *resp, int nfserr, struct nfsd4_read *read)
+nfsd4_encode_read(struct nfsd4_compoundres *resp, int nfserr,
+ struct nfsd4_read *read)
{
u32 eof;
int v, pn;
@@ -2062,10 +2063,11 @@ nfsd4_encode_read(struct nfsd4_compoundr
len = maxcount;
v = 0;
while (len > 0) {
- pn = resp->rqstp->rq_resused;
- svc_take_page(resp->rqstp);
- read->rd_iov[v].iov_base = page_address(resp->rqstp->rq_respages[pn]);
- read->rd_iov[v].iov_len = len < PAGE_SIZE ? len : PAGE_SIZE;
+ pn = resp->rqstp->rq_resused++;
+ read->rd_iov[v].iov_base =
+ page_address(resp->rqstp->rq_respages[pn]);
+ read->rd_iov[v].iov_len =
+ len < PAGE_SIZE ? len : PAGE_SIZE;
v++;
len -= PAGE_SIZE;
}
@@ -2079,7 +2081,8 @@ nfsd4_encode_read(struct nfsd4_compoundr
nfserr = nfserr_inval;
if (nfserr)
return nfserr;
- eof = (read->rd_offset + maxcount >= read->rd_fhp->fh_dentry->d_inode->i_size);
+ eof = (read->rd_offset + maxcount >=
+ read->rd_fhp->fh_dentry->d_inode->i_size);
WRITE32(eof);
WRITE32(maxcount);
@@ -2089,7 +2092,6 @@ nfsd4_encode_read(struct nfsd4_compoundr
resp->xbuf->page_len = maxcount;
/* Use rest of head for padding and remaining ops: */
- resp->rqstp->rq_restailpage = 0;
resp->xbuf->tail[0].iov_base = p;
resp->xbuf->tail[0].iov_len = 0;
if (maxcount&3) {
@@ -2114,8 +2116,7 @@ nfsd4_encode_readlink(struct nfsd4_compo
if (resp->xbuf->page_len)
return nfserr_resource;
- svc_take_page(resp->rqstp);
- page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
+ page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]);
maxcount = PAGE_SIZE;
RESERVE_SPACE(4);
@@ -2139,7 +2140,6 @@ nfsd4_encode_readlink(struct nfsd4_compo
resp->xbuf->page_len = maxcount;
/* Use rest of head for padding and remaining ops: */
- resp->rqstp->rq_restailpage = 0;
resp->xbuf->tail[0].iov_base = p;
resp->xbuf->tail[0].iov_len = 0;
if (maxcount&3) {
@@ -2190,8 +2190,7 @@ nfsd4_encode_readdir(struct nfsd4_compou
goto err_no_verf;
}
- svc_take_page(resp->rqstp);
- page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
+ page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]);
readdir->common.err = 0;
readdir->buflen = maxcount;
readdir->buffer = page;
@@ -2216,10 +2215,10 @@ nfsd4_encode_readdir(struct nfsd4_compou
p = readdir->buffer;
*p++ = 0; /* no more entries */
*p++ = htonl(readdir->common.err == nfserr_eof);
- resp->xbuf->page_len = ((char*)p) - (char*)page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
+ resp->xbuf->page_len = ((char*)p) - (char*)page_address(
+ resp->rqstp->rq_respages[resp->rqstp->rq_resused-1]);
/* Use rest of head for padding and remaining ops: */
- resp->rqstp->rq_restailpage = 0;
resp->xbuf->tail[0].iov_base = tailbase;
resp->xbuf->tail[0].iov_len = 0;
resp->p = resp->xbuf->tail[0].iov_base;
diff .prev/fs/nfsd/nfsxdr.c ./fs/nfsd/nfsxdr.c
--- .prev/fs/nfsd/nfsxdr.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfsxdr.c 2006-08-24 16:25:13.000000000 +1000
@@ -262,8 +262,7 @@ nfssvc_decode_readargs(struct svc_rqst *
*/
v=0;
while (len > 0) {
- pn=rqstp->rq_resused;
- svc_take_page(rqstp);
+ pn = rqstp->rq_resused++;
args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
args->vec[v].iov_len = len < PAGE_SIZE?len:PAGE_SIZE;
len -= args->vec[v].iov_len;
@@ -295,7 +294,7 @@ nfssvc_decode_writeargs(struct svc_rqst
while (len > args->vec[v].iov_len) {
len -= args->vec[v].iov_len;
v++;
- args->vec[v].iov_base = page_address(rqstp->rq_argpages[v]);
+ args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
args->vec[v].iov_len = PAGE_SIZE;
}
args->vec[v].iov_len = len;
@@ -333,8 +332,7 @@ nfssvc_decode_readlinkargs(struct svc_rq
{
if (!(p = decode_fh(p, &args->fh)))
return 0;
- svc_take_page(rqstp);
- args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+ args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused++]);
return xdr_argsize_check(rqstp, p);
}
@@ -375,8 +373,7 @@ nfssvc_decode_readdirargs(struct svc_rqs
if (args->count > PAGE_SIZE)
args->count = PAGE_SIZE;
- svc_take_page(rqstp);
- args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused-1]);
+ args->buffer = page_address(rqstp->rq_respages[rqstp->rq_resused++]);
return xdr_argsize_check(rqstp, p);
}
@@ -416,7 +413,6 @@ nfssvc_encode_readlinkres(struct svc_rqs
rqstp->rq_res.page_len = resp->len;
if (resp->len & 3) {
/* need to pad the tail */
- rqstp->rq_restailpage = 0;
rqstp->rq_res.tail[0].iov_base = p;
*p = 0;
rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3);
@@ -436,7 +432,6 @@ nfssvc_encode_readres(struct svc_rqst *r
rqstp->rq_res.page_len = resp->count;
if (resp->count & 3) {
/* need to pad the tail */
- rqstp->rq_restailpage = 0;
rqstp->rq_res.tail[0].iov_base = p;
*p = 0;
rqstp->rq_res.tail[0].iov_len = 4 - (resp->count&3);
diff .prev/fs/nfsd/vfs.c ./fs/nfsd/vfs.c
--- .prev/fs/nfsd/vfs.c 2006-08-24 16:20:33.000000000 +1000
+++ ./fs/nfsd/vfs.c 2006-08-24 16:25:13.000000000 +1000
@@ -791,22 +791,26 @@ nfsd_read_actor(read_descriptor_t *desc,
{
unsigned long count = desc->count;
struct svc_rqst *rqstp = desc->arg.data;
+ struct page **pp = rqstp->rq_respages + rqstp->rq_resused;
if (size > count)
size = count;
if (rqstp->rq_res.page_len == 0) {
get_page(page);
- rqstp->rq_respages[rqstp->rq_resused++] = page;
+ put_page(*pp);
+ *pp = page;
+ rqstp->rq_resused++;
rqstp->rq_res.page_base = offset;
rqstp->rq_res.page_len = size;
- } else if (page != rqstp->rq_respages[rqstp->rq_resused-1]) {
+ } else if (page != pp[-1]) {
get_page(page);
- rqstp->rq_respages[rqstp->rq_resused++] = page;
+ put_page(*pp);
+ *pp = page;
+ rqstp->rq_resused++;
rqstp->rq_res.page_len += size;
- } else {
+ } else
rqstp->rq_res.page_len += size;
- }
desc->count = count - size;
desc->written += size;
@@ -840,7 +844,7 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
file->f_ra = ra->p_ra;
if (file->f_op->sendfile && rqstp->rq_sendfile_ok) {
- svc_pushback_unused_pages(rqstp);
+ rqstp->rq_resused = 1;
err = file->f_op->sendfile(file, &offset, *count,
nfsd_read_actor, rqstp);
} else {
diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h 2006-08-24 16:25:13.000000000 +1000
+++ ./include/linux/sunrpc/svc.h 2006-08-24 16:25:13.000000000 +1000
@@ -153,7 +153,6 @@ static inline void svc_putu32(struct kve
/*
* The context of a single thread, including the request currently being
* processed.
- * NOTE: First two items must be prev/next.
*/
struct svc_rqst {
struct list_head rq_list; /* idle list */
@@ -172,12 +171,9 @@ struct svc_rqst {
struct xdr_buf rq_arg;
struct xdr_buf rq_res;
- struct page * rq_argpages[RPCSVC_MAXPAGES];
- struct page * rq_respages[RPCSVC_MAXPAGES];
- int rq_restailpage;
- short rq_argused; /* pages used for argument */
- short rq_arghi; /* pages available in argument page list */
- short rq_resused; /* pages used for result */
+ struct page * rq_pages[RPCSVC_MAXPAGES];
+ struct page * *rq_respages; /* points into rq_pages */
+ int rq_resused; /* number of pages used for result */
u32 rq_xid; /* transmission id */
u32 rq_prog; /* program number */
@@ -238,63 +234,18 @@ xdr_ressize_check(struct svc_rqst *rqstp
return vec->iov_len <= PAGE_SIZE;
}
-static inline struct page *
-svc_take_res_page(struct svc_rqst *rqstp)
+static inline void svc_free_res_pages(struct svc_rqst *rqstp)
{
- if (rqstp->rq_arghi <= rqstp->rq_argused)
- return NULL;
- rqstp->rq_arghi--;
- rqstp->rq_respages[rqstp->rq_resused] =
- rqstp->rq_argpages[rqstp->rq_arghi];
- return rqstp->rq_respages[rqstp->rq_resused++];
-}
-
-static inline void svc_take_page(struct svc_rqst *rqstp)
-{
- if (rqstp->rq_arghi <= rqstp->rq_argused) {
- WARN_ON(1);
- return;
- }
- rqstp->rq_arghi--;
- rqstp->rq_respages[rqstp->rq_resused] =
- rqstp->rq_argpages[rqstp->rq_arghi];
- rqstp->rq_resused++;
-}
-
-static inline void svc_pushback_allpages(struct svc_rqst *rqstp)
-{
- while (rqstp->rq_resused) {
- if (rqstp->rq_respages[--rqstp->rq_resused] == NULL)
- continue;
- rqstp->rq_argpages[rqstp->rq_arghi++] =
- rqstp->rq_respages[rqstp->rq_resused];
- rqstp->rq_respages[rqstp->rq_resused] = NULL;
- }
-}
-
-static inline void svc_pushback_unused_pages(struct svc_rqst *rqstp)
-{
- while (rqstp->rq_resused &&
- rqstp->rq_res.pages != &rqstp->rq_respages[rqstp->rq_resused]) {
-
- if (rqstp->rq_respages[--rqstp->rq_resused] != NULL) {
- rqstp->rq_argpages[rqstp->rq_arghi++] =
- rqstp->rq_respages[rqstp->rq_resused];
- rqstp->rq_respages[rqstp->rq_resused] = NULL;
+ while (rqstp->rq_resused) {
+ struct page **pp = (rqstp->rq_respages +
+ --rqstp->rq_resused);
+ if (*pp) {
+ put_page(*pp);
+ *pp = NULL;
}
}
}
-static inline void svc_free_allpages(struct svc_rqst *rqstp)
-{
- while (rqstp->rq_resused) {
- if (rqstp->rq_respages[--rqstp->rq_resused] == NULL)
- continue;
- put_page(rqstp->rq_respages[rqstp->rq_resused]);
- rqstp->rq_respages[rqstp->rq_resused] = NULL;
- }
-}
-
struct svc_deferred_req {
u32 prot; /* protocol (UDP or TCP) */
struct sockaddr_in addr;
diff .prev/net/sunrpc/auth_gss/svcauth_gss.c ./net/sunrpc/auth_gss/svcauth_gss.c
--- .prev/net/sunrpc/auth_gss/svcauth_gss.c 2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/auth_gss/svcauth_gss.c 2006-08-24 16:25:13.000000000 +1000
@@ -1191,7 +1191,6 @@ svcauth_gss_wrap_resp_integ(struct svc_r
resbuf->tail[0].iov_base = resbuf->head[0].iov_base
+ resbuf->head[0].iov_len;
resbuf->tail[0].iov_len = 0;
- rqstp->rq_restailpage = 0;
resv = &resbuf->tail[0];
} else {
resv = &resbuf->tail[0];
@@ -1240,7 +1239,7 @@ svcauth_gss_wrap_resp_priv(struct svc_rq
inpages = resbuf->pages;
/* XXX: Would be better to write some xdr helper functions for
* nfs{2,3,4}xdr.c that place the data right, instead of copying: */
- if (resbuf->tail[0].iov_base && rqstp->rq_restailpage == 0) {
+ if (resbuf->tail[0].iov_base) {
BUG_ON(resbuf->tail[0].iov_base >= resbuf->head[0].iov_base
+ PAGE_SIZE);
BUG_ON(resbuf->tail[0].iov_base < resbuf->head[0].iov_base);
@@ -1258,7 +1257,6 @@ svcauth_gss_wrap_resp_priv(struct svc_rq
resbuf->tail[0].iov_base = resbuf->head[0].iov_base
+ resbuf->head[0].iov_len + RPC_MAX_AUTH_SIZE;
resbuf->tail[0].iov_len = 0;
- rqstp->rq_restailpage = 0;
}
if (gss_wrap(gsd->rsci->mechctx, offset, resbuf, inpages))
return -ENOMEM;
diff .prev/net/sunrpc/svc.c ./net/sunrpc/svc.c
--- .prev/net/sunrpc/svc.c 2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/svc.c 2006-08-24 16:25:13.000000000 +1000
@@ -417,18 +417,15 @@ svc_init_buffer(struct svc_rqst *rqstp,
if (size > RPCSVC_MAXPAYLOAD)
size = RPCSVC_MAXPAYLOAD;
pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
- rqstp->rq_argused = 0;
- rqstp->rq_resused = 0;
arghi = 0;
BUG_ON(pages > RPCSVC_MAXPAGES);
while (pages) {
struct page *p = alloc_page(GFP_KERNEL);
if (!p)
break;
- rqstp->rq_argpages[arghi++] = p;
+ rqstp->rq_pages[arghi++] = p;
pages--;
}
- rqstp->rq_arghi = arghi;
return ! pages;
}
@@ -438,14 +435,10 @@ svc_init_buffer(struct svc_rqst *rqstp,
static void
svc_release_buffer(struct svc_rqst *rqstp)
{
- while (rqstp->rq_arghi)
- put_page(rqstp->rq_argpages[--rqstp->rq_arghi]);
- while (rqstp->rq_resused) {
- if (rqstp->rq_respages[--rqstp->rq_resused] == NULL)
- continue;
- put_page(rqstp->rq_respages[rqstp->rq_resused]);
- }
- rqstp->rq_argused = 0;
+ int i;
+ for (i=0; i<ARRAY_SIZE(rqstp->rq_pages); i++)
+ if (rqstp->rq_pages[i])
+ put_page(rqstp->rq_pages[i]);
}
/*
@@ -707,10 +700,10 @@ svc_process(struct svc_rqst *rqstp)
/* setup response xdr_buf.
* Initially it has just one page
*/
- svc_take_page(rqstp); /* must succeed */
+ rqstp->rq_resused = 1;
resv->iov_base = page_address(rqstp->rq_respages[0]);
resv->iov_len = 0;
- rqstp->rq_res.pages = rqstp->rq_respages+1;
+ rqstp->rq_res.pages = rqstp->rq_respages + 1;
rqstp->rq_res.len = 0;
rqstp->rq_res.page_base = 0;
rqstp->rq_res.page_len = 0;
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2006-08-24 16:24:56.000000000 +1000
+++ ./net/sunrpc/svcsock.c 2006-08-24 16:25:13.000000000 +1000
@@ -313,7 +313,7 @@ svc_sock_release(struct svc_rqst *rqstp)
svc_release_skb(rqstp);
- svc_free_allpages(rqstp);
+ svc_free_res_pages(rqstp);
rqstp->rq_res.page_len = 0;
rqstp->rq_res.page_base = 0;
@@ -412,7 +412,8 @@ svc_sendto(struct svc_rqst *rqstp, struc
/* send head */
if (slen == xdr->head[0].iov_len)
flags = 0;
- len = kernel_sendpage(sock, rqstp->rq_respages[0], 0, xdr->head[0].iov_len, flags);
+ len = kernel_sendpage(sock, rqstp->rq_respages[0], 0,
+ xdr->head[0].iov_len, flags);
if (len != xdr->head[0].iov_len)
goto out;
slen -= xdr->head[0].iov_len;
@@ -437,8 +438,9 @@ svc_sendto(struct svc_rqst *rqstp, struc
}
/* send tail */
if (xdr->tail[0].iov_len) {
- result = kernel_sendpage(sock, rqstp->rq_respages[rqstp->rq_restailpage],
- ((unsigned long)xdr->tail[0].iov_base)& (PAGE_SIZE-1),
+ result = kernel_sendpage(sock, rqstp->rq_respages[0],
+ ((unsigned long)xdr->tail[0].iov_base)
+ & (PAGE_SIZE-1),
xdr->tail[0].iov_len, 0);
if (result > 0)
@@ -708,9 +710,11 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
if (len <= rqstp->rq_arg.head[0].iov_len) {
rqstp->rq_arg.head[0].iov_len = len;
rqstp->rq_arg.page_len = 0;
+ rqstp->rq_respages = rqstp->rq_pages+1;
} else {
rqstp->rq_arg.page_len = len - rqstp->rq_arg.head[0].iov_len;
- rqstp->rq_argused += (rqstp->rq_arg.page_len + PAGE_SIZE - 1)/ PAGE_SIZE;
+ rqstp->rq_respages = rqstp->rq_pages + 1 +
+ (rqstp->rq_arg.page_len + PAGE_SIZE - 1)/ PAGE_SIZE;
}
if (serv->sv_stats)
@@ -1053,11 +1057,12 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
vlen = PAGE_SIZE;
pnum = 1;
while (vlen < len) {
- vec[pnum].iov_base = page_address(rqstp->rq_argpages[rqstp->rq_argused++]);
+ vec[pnum].iov_base = page_address(rqstp->rq_pages[pnum]);
vec[pnum].iov_len = PAGE_SIZE;
pnum++;
vlen += PAGE_SIZE;
}
+ rqstp->rq_respages = &rqstp->rq_pages[pnum];
/* Now receive data */
len = svc_recvfrom(rqstp, vec, pnum, len);
@@ -1209,7 +1214,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
struct svc_sock *svsk =NULL;
struct svc_serv *serv = rqstp->rq_server;
struct svc_pool *pool = rqstp->rq_pool;
- int len;
+ int len, i;
int pages;
struct xdr_buf *arg;
DECLARE_WAITQUEUE(wait, current);
@@ -1226,27 +1231,22 @@ svc_recv(struct svc_rqst *rqstp, long ti
"svc_recv: service %p, wait queue active!\n",
rqstp);
- /* Initialize the buffers */
- /* first reclaim pages that were moved to response list */
- svc_pushback_allpages(rqstp);
/* now allocate needed pages. If we get a failure, sleep briefly */
pages = 2 + (serv->sv_bufsz + PAGE_SIZE -1) / PAGE_SIZE;
- while (rqstp->rq_arghi < pages) {
- struct page *p = alloc_page(GFP_KERNEL);
- if (!p) {
- schedule_timeout_uninterruptible(msecs_to_jiffies(500));
- continue;
+ for (i=0; i < pages ; i++)
+ while (rqstp->rq_pages[i] == NULL) {
+ struct page *p = alloc_page(GFP_KERNEL);
+ if (!p)
+ schedule_timeout_uninterruptible(msecs_to_jiffies(500));
+ rqstp->rq_pages[i] = p;
}
- rqstp->rq_argpages[rqstp->rq_arghi++] = p;
- }
/* Make arg->head point to first page and arg->pages point to rest */
arg = &rqstp->rq_arg;
- arg->head[0].iov_base = page_address(rqstp->rq_argpages[0]);
+ arg->head[0].iov_base = page_address(rqstp->rq_pages[0]);
arg->head[0].iov_len = PAGE_SIZE;
- rqstp->rq_argused = 1;
- arg->pages = rqstp->rq_argpages + 1;
+ arg->pages = rqstp->rq_pages + 1;
arg->page_base = 0;
/* save at least one page for response */
arg->page_len = (pages-2)*PAGE_SIZE;
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (5 preceding siblings ...)
2006-08-24 6:36 ` [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one NeilBrown
@ 2006-08-24 6:37 ` NeilBrown
2006-08-24 6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
` (3 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
.. by allocating the array of 'kvec' in 'struct svc_rqst'.
As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we
can no longer allocate an array of this size on the stack.
So we allocate it in 'struct svc_rqst'.
However svc_rqst contains (indirectly) an array of the same
type and size (actually several, but they are in a union).
So rather than waste space, we move those arrays out of the
separately allocated union and into svc_rqst to share with
the kvec moved out of svc_tcp_recvfrom (various arrays are used
at different times, so there is no conflict).
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfs3proc.c | 4 ++--
./fs/nfsd/nfs3xdr.c | 22 +++++++++++-----------
./fs/nfsd/nfs4proc.c | 2 +-
./fs/nfsd/nfs4xdr.c | 26 +++++++++++++-------------
./fs/nfsd/nfsproc.c | 4 ++--
./fs/nfsd/nfsxdr.c | 22 +++++++++++-----------
./include/linux/nfsd/xdr.h | 2 --
./include/linux/nfsd/xdr3.h | 2 --
./include/linux/nfsd/xdr4.h | 2 --
./include/linux/sunrpc/svc.h | 2 ++
./net/sunrpc/svcsock.c | 3 ++-
11 files changed, 44 insertions(+), 47 deletions(-)
diff .prev/fs/nfsd/nfs3proc.c ./fs/nfsd/nfs3proc.c
--- .prev/fs/nfsd/nfs3proc.c 2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs3proc.c 2006-08-24 16:25:41.000000000 +1000
@@ -180,7 +180,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp,
fh_copy(&resp->fh, &argp->fh);
nfserr = nfsd_read(rqstp, &resp->fh, NULL,
argp->offset,
- argp->vec, argp->vlen,
+ rqstp->rq_vec, argp->vlen,
&resp->count);
if (nfserr == 0) {
struct inode *inode = resp->fh.fh_dentry->d_inode;
@@ -210,7 +210,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp,
resp->committed = argp->stable;
nfserr = nfsd_write(rqstp, &resp->fh, NULL,
argp->offset,
- argp->vec, argp->vlen,
+ rqstp->rq_vec, argp->vlen,
argp->len,
&resp->committed);
resp->count = argp->count;
diff .prev/fs/nfsd/nfs3xdr.c ./fs/nfsd/nfs3xdr.c
--- .prev/fs/nfsd/nfs3xdr.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs3xdr.c 2006-08-24 16:25:41.000000000 +1000
@@ -344,9 +344,9 @@ nfs3svc_decode_readargs(struct svc_rqst
v=0;
while (len > 0) {
pn = rqstp->rq_resused++;
- args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
- args->vec[v].iov_len = len < PAGE_SIZE? len : PAGE_SIZE;
- len -= args->vec[v].iov_len;
+ rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
+ rqstp->rq_vec[v].iov_len = len < PAGE_SIZE? len : PAGE_SIZE;
+ len -= rqstp->rq_vec[v].iov_len;
v++;
}
args->vlen = v;
@@ -372,22 +372,22 @@ nfs3svc_decode_writeargs(struct svc_rqst
rqstp->rq_arg.len - hdr < len)
return 0;
- args->vec[0].iov_base = (void*)p;
- args->vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
+ rqstp->rq_vec[0].iov_base = (void*)p;
+ rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
if (len > NFSSVC_MAXBLKSIZE)
len = NFSSVC_MAXBLKSIZE;
v= 0;
- while (len > args->vec[v].iov_len) {
- len -= args->vec[v].iov_len;
+ while (len > rqstp->rq_vec[v].iov_len) {
+ len -= rqstp->rq_vec[v].iov_len;
v++;
- args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
- args->vec[v].iov_len = PAGE_SIZE;
+ rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
+ rqstp->rq_vec[v].iov_len = PAGE_SIZE;
}
- args->vec[v].iov_len = len;
+ rqstp->rq_vec[v].iov_len = len;
args->vlen = v+1;
- return args->count == args->len && args->vec[0].iov_len > 0;
+ return args->count == args->len && rqstp->rq_vec[0].iov_len > 0;
}
int
diff .prev/fs/nfsd/nfs4proc.c ./fs/nfsd/nfs4proc.c
--- .prev/fs/nfsd/nfs4proc.c 2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs4proc.c 2006-08-24 16:25:41.000000000 +1000
@@ -646,7 +646,7 @@ nfsd4_write(struct svc_rqst *rqstp, stru
*p++ = nfssvc_boot.tv_usec;
status = nfsd_write(rqstp, current_fh, filp, write->wr_offset,
- write->wr_vec, write->wr_vlen, write->wr_buflen,
+ rqstp->rq_vec, write->wr_vlen, write->wr_buflen,
&write->wr_how_written);
if (filp)
fput(filp);
diff .prev/fs/nfsd/nfs4xdr.c ./fs/nfsd/nfs4xdr.c
--- .prev/fs/nfsd/nfs4xdr.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfs4xdr.c 2006-08-24 16:25:41.000000000 +1000
@@ -927,26 +927,26 @@ nfsd4_decode_write(struct nfsd4_compound
printk(KERN_NOTICE "xdr error! (%s:%d)\n", __FILE__, __LINE__);
goto xdr_error;
}
- write->wr_vec[0].iov_base = p;
- write->wr_vec[0].iov_len = avail;
+ argp->rqstp->rq_vec[0].iov_base = p;
+ argp->rqstp->rq_vec[0].iov_len = avail;
v = 0;
len = write->wr_buflen;
- while (len > write->wr_vec[v].iov_len) {
- len -= write->wr_vec[v].iov_len;
+ while (len > argp->rqstp->rq_vec[v].iov_len) {
+ len -= argp->rqstp->rq_vec[v].iov_len;
v++;
- write->wr_vec[v].iov_base = page_address(argp->pagelist[0]);
+ argp->rqstp->rq_vec[v].iov_base = page_address(argp->pagelist[0]);
argp->pagelist++;
if (argp->pagelen >= PAGE_SIZE) {
- write->wr_vec[v].iov_len = PAGE_SIZE;
+ argp->rqstp->rq_vec[v].iov_len = PAGE_SIZE;
argp->pagelen -= PAGE_SIZE;
} else {
- write->wr_vec[v].iov_len = argp->pagelen;
+ argp->rqstp->rq_vec[v].iov_len = argp->pagelen;
argp->pagelen -= len;
}
}
- argp->end = (u32*) (write->wr_vec[v].iov_base + write->wr_vec[v].iov_len);
- argp->p = (u32*) (write->wr_vec[v].iov_base + (XDR_QUADLEN(len) << 2));
- write->wr_vec[v].iov_len = len;
+ argp->end = (u32*) (argp->rqstp->rq_vec[v].iov_base + argp->rqstp->rq_vec[v].iov_len);
+ argp->p = (u32*) (argp->rqstp->rq_vec[v].iov_base + (XDR_QUADLEN(len) << 2));
+ argp->rqstp->rq_vec[v].iov_len = len;
write->wr_vlen = v+1;
DECODE_TAIL;
@@ -2064,9 +2064,9 @@ nfsd4_encode_read(struct nfsd4_compoundr
v = 0;
while (len > 0) {
pn = resp->rqstp->rq_resused++;
- read->rd_iov[v].iov_base =
+ resp->rqstp->rq_vec[v].iov_base =
page_address(resp->rqstp->rq_respages[pn]);
- read->rd_iov[v].iov_len =
+ resp->rqstp->rq_vec[v].iov_len =
len < PAGE_SIZE ? len : PAGE_SIZE;
v++;
len -= PAGE_SIZE;
@@ -2074,7 +2074,7 @@ nfsd4_encode_read(struct nfsd4_compoundr
read->rd_vlen = v;
nfserr = nfsd_read(read->rd_rqstp, read->rd_fhp, read->rd_filp,
- read->rd_offset, read->rd_iov, read->rd_vlen,
+ read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
&maxcount);
if (nfserr == nfserr_symlink)
diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c 2006-08-24 16:21:35.000000000 +1000
+++ ./fs/nfsd/nfsproc.c 2006-08-24 16:25:41.000000000 +1000
@@ -159,7 +159,7 @@ nfsd_proc_read(struct svc_rqst *rqstp, s
resp->count = argp->count;
nfserr = nfsd_read(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
argp->offset,
- argp->vec, argp->vlen,
+ rqstp->rq_vec, argp->vlen,
&resp->count);
if (nfserr) return nfserr;
@@ -185,7 +185,7 @@ nfsd_proc_write(struct svc_rqst *rqstp,
nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
argp->offset,
- argp->vec, argp->vlen,
+ rqstp->rq_vec, argp->vlen,
argp->len,
&stable);
return nfsd_return_attrs(nfserr, resp);
diff .prev/fs/nfsd/nfsxdr.c ./fs/nfsd/nfsxdr.c
--- .prev/fs/nfsd/nfsxdr.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/nfsxdr.c 2006-08-24 16:25:41.000000000 +1000
@@ -263,9 +263,9 @@ nfssvc_decode_readargs(struct svc_rqst *
v=0;
while (len > 0) {
pn = rqstp->rq_resused++;
- args->vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
- args->vec[v].iov_len = len < PAGE_SIZE?len:PAGE_SIZE;
- len -= args->vec[v].iov_len;
+ rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_respages[pn]);
+ rqstp->rq_vec[v].iov_len = len < PAGE_SIZE?len:PAGE_SIZE;
+ len -= rqstp->rq_vec[v].iov_len;
v++;
}
args->vlen = v;
@@ -285,21 +285,21 @@ nfssvc_decode_writeargs(struct svc_rqst
args->offset = ntohl(*p++); /* offset */
p++; /* totalcount */
len = args->len = ntohl(*p++);
- args->vec[0].iov_base = (void*)p;
- args->vec[0].iov_len = rqstp->rq_arg.head[0].iov_len -
+ rqstp->rq_vec[0].iov_base = (void*)p;
+ rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len -
(((void*)p) - rqstp->rq_arg.head[0].iov_base);
if (len > NFSSVC_MAXBLKSIZE)
len = NFSSVC_MAXBLKSIZE;
v = 0;
- while (len > args->vec[v].iov_len) {
- len -= args->vec[v].iov_len;
+ while (len > rqstp->rq_vec[v].iov_len) {
+ len -= rqstp->rq_vec[v].iov_len;
v++;
- args->vec[v].iov_base = page_address(rqstp->rq_pages[v]);
- args->vec[v].iov_len = PAGE_SIZE;
+ rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
+ rqstp->rq_vec[v].iov_len = PAGE_SIZE;
}
- args->vec[v].iov_len = len;
+ rqstp->rq_vec[v].iov_len = len;
args->vlen = v+1;
- return args->vec[0].iov_len > 0;
+ return rqstp->rq_vec[0].iov_len > 0;
}
int
diff .prev/include/linux/nfsd/xdr.h ./include/linux/nfsd/xdr.h
--- .prev/include/linux/nfsd/xdr.h 2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/nfsd/xdr.h 2006-08-24 16:25:41.000000000 +1000
@@ -30,7 +30,6 @@ struct nfsd_readargs {
struct svc_fh fh;
__u32 offset;
__u32 count;
- struct kvec vec[RPCSVC_MAXPAGES];
int vlen;
};
@@ -38,7 +37,6 @@ struct nfsd_writeargs {
svc_fh fh;
__u32 offset;
int len;
- struct kvec vec[RPCSVC_MAXPAGES];
int vlen;
};
diff .prev/include/linux/nfsd/xdr3.h ./include/linux/nfsd/xdr3.h
--- .prev/include/linux/nfsd/xdr3.h 2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/nfsd/xdr3.h 2006-08-24 16:25:41.000000000 +1000
@@ -33,7 +33,6 @@ struct nfsd3_readargs {
struct svc_fh fh;
__u64 offset;
__u32 count;
- struct kvec vec[RPCSVC_MAXPAGES];
int vlen;
};
@@ -43,7 +42,6 @@ struct nfsd3_writeargs {
__u32 count;
int stable;
__u32 len;
- struct kvec vec[RPCSVC_MAXPAGES];
int vlen;
};
diff .prev/include/linux/nfsd/xdr4.h ./include/linux/nfsd/xdr4.h
--- .prev/include/linux/nfsd/xdr4.h 2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/nfsd/xdr4.h 2006-08-24 16:25:41.000000000 +1000
@@ -241,7 +241,6 @@ struct nfsd4_read {
stateid_t rd_stateid; /* request */
u64 rd_offset; /* request */
u32 rd_length; /* request */
- struct kvec rd_iov[RPCSVC_MAXPAGES];
int rd_vlen;
struct file *rd_filp;
@@ -326,7 +325,6 @@ struct nfsd4_write {
u64 wr_offset; /* request */
u32 wr_stable_how; /* request */
u32 wr_buflen; /* request */
- struct kvec wr_vec[RPCSVC_MAXPAGES]; /* request */
int wr_vlen;
u32 wr_bytes_written; /* response */
diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h 2006-08-24 16:25:13.000000000 +1000
+++ ./include/linux/sunrpc/svc.h 2006-08-24 16:25:41.000000000 +1000
@@ -175,6 +175,8 @@ struct svc_rqst {
struct page * *rq_respages; /* points into rq_pages */
int rq_resused; /* number of pages used for result */
+ struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */
+
u32 rq_xid; /* transmission id */
u32 rq_prog; /* program number */
u32 rq_vers; /* program version */
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/svcsock.c 2006-08-24 16:25:41.000000000 +1000
@@ -955,7 +955,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
struct svc_sock *svsk = rqstp->rq_sock;
struct svc_serv *serv = svsk->sk_server;
int len;
- struct kvec vec[RPCSVC_MAXPAGES];
+ struct kvec *vec;
int pnum, vlen;
dprintk("svc: tcp_recv %p data %d conn %d close %d\n",
@@ -1053,6 +1053,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
len = svsk->sk_reclen;
set_bit(SK_DATA, &svsk->sk_flags);
+ vec = rqstp->rq_vec;
vec[0] = rqstp->rq_arg.head[0];
vlen = PAGE_SIZE;
pnum = 1;
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (6 preceding siblings ...)
2006-08-24 6:37 ` [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom NeilBrown
@ 2006-08-24 6:37 ` NeilBrown
2006-09-25 15:43 ` [NFS] " J. Bruce Fields
2006-08-24 6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
` (2 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
The limit over UDP remains at 32K. Also, make some of
the apparently arbitrary sizing constants clearer.
The biggest change here involves replacing NFSSVC_MAXBLKSIZE
by a function of the rqstp. This allows it to be different
for different protocols (udp/tcp) and also allows it
to depend on the servers declared sv_bufsiz.
Note that we don't actually increase sv_bufsz for nfs yet.
That comes next.
From: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfs3proc.c | 14 +++++++------
./fs/nfsd/nfs3xdr.c | 13 +++++++-----
./fs/nfsd/nfs4xdr.c | 6 ++---
./fs/nfsd/nfsproc.c | 6 ++---
./fs/nfsd/nfsxdr.c | 10 ++++-----
./include/linux/nfsd/const.h | 15 +++++++++++++-
./include/linux/sunrpc/auth.h | 3 --
./include/linux/sunrpc/msg_prot.h | 40 ++++++++++++++++++++++++++++++++++++++
./include/linux/sunrpc/svc.h | 25 +++++++++++++++++++++--
./include/linux/sunrpc/xprt.h | 8 -------
./net/sunrpc/svc.c | 15 ++++++++++++++
11 files changed, 120 insertions(+), 35 deletions(-)
diff .prev/fs/nfsd/nfs3proc.c ./fs/nfsd/nfs3proc.c
--- .prev/fs/nfsd/nfs3proc.c 2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs3proc.c 2006-08-24 16:25:56.000000000 +1000
@@ -160,6 +160,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp,
struct nfsd3_readres *resp)
{
int nfserr;
+ u32 max_blocksize = svc_max_payload(rqstp);
dprintk("nfsd: READ(3) %s %lu bytes at %lu\n",
SVCFH_fmt(&argp->fh),
@@ -172,8 +173,8 @@ nfsd3_proc_read(struct svc_rqst *rqstp,
*/
resp->count = argp->count;
- if (NFSSVC_MAXBLKSIZE < resp->count)
- resp->count = NFSSVC_MAXBLKSIZE;
+ if (max_blocksize < resp->count)
+ resp->count = max_blocksize;
svc_reserve(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3)<<2) + resp->count +4);
@@ -538,15 +539,16 @@ nfsd3_proc_fsinfo(struct svc_rqst * rqst
struct nfsd3_fsinfores *resp)
{
int nfserr;
+ u32 max_blocksize = svc_max_payload(rqstp);
dprintk("nfsd: FSINFO(3) %s\n",
SVCFH_fmt(&argp->fh));
- resp->f_rtmax = NFSSVC_MAXBLKSIZE;
- resp->f_rtpref = NFSSVC_MAXBLKSIZE;
+ resp->f_rtmax = max_blocksize;
+ resp->f_rtpref = max_blocksize;
resp->f_rtmult = PAGE_SIZE;
- resp->f_wtmax = NFSSVC_MAXBLKSIZE;
- resp->f_wtpref = NFSSVC_MAXBLKSIZE;
+ resp->f_wtmax = max_blocksize;
+ resp->f_wtpref = max_blocksize;
resp->f_wtmult = PAGE_SIZE;
resp->f_dtpref = PAGE_SIZE;
resp->f_maxfilesize = ~(u32) 0;
diff .prev/fs/nfsd/nfs3xdr.c ./fs/nfsd/nfs3xdr.c
--- .prev/fs/nfsd/nfs3xdr.c 2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs3xdr.c 2006-08-24 16:25:56.000000000 +1000
@@ -330,6 +330,7 @@ nfs3svc_decode_readargs(struct svc_rqst
{
unsigned int len;
int v,pn;
+ u32 max_blocksize = svc_max_payload(rqstp);
if (!(p = decode_fh(p, &args->fh))
|| !(p = xdr_decode_hyper(p, &args->offset)))
@@ -337,8 +338,8 @@ nfs3svc_decode_readargs(struct svc_rqst
len = args->count = ntohl(*p++);
- if (len > NFSSVC_MAXBLKSIZE)
- len = NFSSVC_MAXBLKSIZE;
+ if (len > max_blocksize)
+ len = max_blocksize;
/* set up the kvec */
v=0;
@@ -358,6 +359,7 @@ nfs3svc_decode_writeargs(struct svc_rqst
struct nfsd3_writeargs *args)
{
unsigned int len, v, hdr;
+ u32 max_blocksize = svc_max_payload(rqstp);
if (!(p = decode_fh(p, &args->fh))
|| !(p = xdr_decode_hyper(p, &args->offset)))
@@ -375,8 +377,8 @@ nfs3svc_decode_writeargs(struct svc_rqst
rqstp->rq_vec[0].iov_base = (void*)p;
rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
- if (len > NFSSVC_MAXBLKSIZE)
- len = NFSSVC_MAXBLKSIZE;
+ if (len > max_blocksize)
+ len = max_blocksize;
v= 0;
while (len > rqstp->rq_vec[v].iov_len) {
len -= rqstp->rq_vec[v].iov_len;
@@ -564,6 +566,7 @@ nfs3svc_decode_readdirplusargs(struct sv
struct nfsd3_readdirargs *args)
{
int len, pn;
+ u32 max_blocksize = svc_max_payload(rqstp);
if (!(p = decode_fh(p, &args->fh)))
return 0;
@@ -572,7 +575,7 @@ nfs3svc_decode_readdirplusargs(struct sv
args->dircount = ntohl(*p++);
args->count = ntohl(*p++);
- len = (args->count > NFSSVC_MAXBLKSIZE) ? NFSSVC_MAXBLKSIZE :
+ len = (args->count > max_blocksize) ? max_blocksize :
args->count;
args->count = len;
diff .prev/fs/nfsd/nfs4xdr.c ./fs/nfsd/nfs4xdr.c
--- .prev/fs/nfsd/nfs4xdr.c 2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfs4xdr.c 2006-08-24 16:25:56.000000000 +1000
@@ -1537,12 +1537,12 @@ out_acl:
if (bmval0 & FATTR4_WORD0_MAXREAD) {
if ((buflen -= 8) < 0)
goto out_resource;
- WRITE64((u64) NFSSVC_MAXBLKSIZE);
+ WRITE64((u64) svc_max_payload(rqstp));
}
if (bmval0 & FATTR4_WORD0_MAXWRITE) {
if ((buflen -= 8) < 0)
goto out_resource;
- WRITE64((u64) NFSSVC_MAXBLKSIZE);
+ WRITE64((u64) svc_max_payload(rqstp));
}
if (bmval1 & FATTR4_WORD1_MODE) {
if ((buflen -= 4) < 0)
@@ -2056,7 +2056,7 @@ nfsd4_encode_read(struct nfsd4_compoundr
RESERVE_SPACE(8); /* eof flag and byte count */
- maxcount = NFSSVC_MAXBLKSIZE;
+ maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
maxcount = read->rd_length;
diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c 2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfsproc.c 2006-08-24 16:25:56.000000000 +1000
@@ -146,13 +146,13 @@ nfsd_proc_read(struct svc_rqst *rqstp, s
* status, 17 words for fattr, and 1 word for the byte count.
*/
- if (NFSSVC_MAXBLKSIZE < argp->count) {
+ if (NFSSVC_MAXBLKSIZE_V2 < argp->count) {
printk(KERN_NOTICE
"oversized read request from %u.%u.%u.%u:%d (%d bytes)\n",
NIPQUAD(rqstp->rq_addr.sin_addr.s_addr),
ntohs(rqstp->rq_addr.sin_port),
argp->count);
- argp->count = NFSSVC_MAXBLKSIZE;
+ argp->count = NFSSVC_MAXBLKSIZE_V2;
}
svc_reserve(rqstp, (19<<2) + argp->count + 4);
@@ -553,7 +553,7 @@ static struct svc_procedure nfsd_proced
PROC(none, void, void, none, RC_NOCACHE, ST),
PROC(lookup, diropargs, diropres, fhandle, RC_NOCACHE, ST+FH+AT),
PROC(readlink, readlinkargs, readlinkres, none, RC_NOCACHE, ST+1+NFS_MAXPATHLEN/4),
- PROC(read, readargs, readres, fhandle, RC_NOCACHE, ST+AT+1+NFSSVC_MAXBLKSIZE/4),
+ PROC(read, readargs, readres, fhandle, RC_NOCACHE, ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4),
PROC(none, void, void, none, RC_NOCACHE, ST),
PROC(write, writeargs, attrstat, fhandle, RC_REPLBUFF, ST+AT),
PROC(create, createargs, diropres, fhandle, RC_REPLBUFF, ST+FH+AT),
diff .prev/fs/nfsd/nfsxdr.c ./fs/nfsd/nfsxdr.c
--- .prev/fs/nfsd/nfsxdr.c 2006-08-24 16:25:41.000000000 +1000
+++ ./fs/nfsd/nfsxdr.c 2006-08-24 16:25:56.000000000 +1000
@@ -254,8 +254,8 @@ nfssvc_decode_readargs(struct svc_rqst *
len = args->count = ntohl(*p++);
p++; /* totalcount - unused */
- if (len > NFSSVC_MAXBLKSIZE)
- len = NFSSVC_MAXBLKSIZE;
+ if (len > NFSSVC_MAXBLKSIZE_V2)
+ len = NFSSVC_MAXBLKSIZE_V2;
/* set up somewhere to store response.
* We take pages, put them on reslist and include in iovec
@@ -288,8 +288,8 @@ nfssvc_decode_writeargs(struct svc_rqst
rqstp->rq_vec[0].iov_base = (void*)p;
rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len -
(((void*)p) - rqstp->rq_arg.head[0].iov_base);
- if (len > NFSSVC_MAXBLKSIZE)
- len = NFSSVC_MAXBLKSIZE;
+ if (len > NFSSVC_MAXBLKSIZE_V2)
+ len = NFSSVC_MAXBLKSIZE_V2;
v = 0;
while (len > rqstp->rq_vec[v].iov_len) {
len -= rqstp->rq_vec[v].iov_len;
@@ -458,7 +458,7 @@ nfssvc_encode_statfsres(struct svc_rqst
{
struct kstatfs *stat = &resp->stats;
- *p++ = htonl(NFSSVC_MAXBLKSIZE); /* max transfer size */
+ *p++ = htonl(NFSSVC_MAXBLKSIZE_V2); /* max transfer size */
*p++ = htonl(stat->f_bsize);
*p++ = htonl(stat->f_blocks);
*p++ = htonl(stat->f_bfree);
diff .prev/include/linux/nfsd/const.h ./include/linux/nfsd/const.h
--- .prev/include/linux/nfsd/const.h 2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/nfsd/const.h 2006-08-24 16:25:56.000000000 +1000
@@ -13,6 +13,7 @@
#include <linux/nfs2.h>
#include <linux/nfs3.h>
#include <linux/nfs4.h>
+#include <linux/sunrpc/msg_prot.h>
/*
* Maximum protocol version supported by knfsd
@@ -23,6 +24,8 @@
* Maximum blocksize supported by daemon currently at 32K
*/
#define NFSSVC_MAXBLKSIZE (32*1024)
+/* NFSv2 is limited by the protocol specification, see RFC 1094 */
+#define NFSSVC_MAXBLKSIZE_V2 (8*1024)
#ifdef __KERNEL__
@@ -30,7 +33,17 @@
# define NFS_SUPER_MAGIC 0x6969
#endif
-#define NFSD_BUFSIZE (1024 + NFSSVC_MAXBLKSIZE)
+/*
+ * Largest number of bytes we need to allocate for an NFS
+ * call or reply. Used to control buffer sizes. We use
+ * the length of v3 WRITE, READDIR and READDIR replies
+ * which are an RPC header, up to 26 XDR units of reply
+ * data, and some page data.
+ *
+ * Note that accuracy here doesn't matter too much as the
+ * size is rounded up to a page size when allocating space.
+ */
+#define NFSD_BUFSIZE ((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)
#ifdef CONFIG_NFSD_V4
# define NFSSVC_XDRSIZE NFS4_SVC_XDRSIZE
diff .prev/include/linux/sunrpc/auth.h ./include/linux/sunrpc/auth.h
--- .prev/include/linux/sunrpc/auth.h 2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/sunrpc/auth.h 2006-08-24 16:25:56.000000000 +1000
@@ -20,9 +20,6 @@
/* size of the nodename buffer */
#define UNX_MAXNODENAME 32
-/* Maximum size (in bytes) of an rpc credential or verifier */
-#define RPC_MAX_AUTH_SIZE (400)
-
/* Work around the lack of a VFS credential */
struct auth_cred {
uid_t uid;
diff .prev/include/linux/sunrpc/msg_prot.h ./include/linux/sunrpc/msg_prot.h
--- .prev/include/linux/sunrpc/msg_prot.h 2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/sunrpc/msg_prot.h 2006-08-24 16:25:56.000000000 +1000
@@ -11,6 +11,9 @@
#define RPC_VERSION 2
+/* size of an XDR encoding unit in bytes, i.e. 32bit */
+#define XDR_UNIT (4)
+
/* spec defines authentication flavor as an unsigned 32 bit integer */
typedef u32 rpc_authflavor_t;
@@ -34,6 +37,9 @@ enum rpc_auth_flavors {
RPC_AUTH_GSS_SPKMP = 390011,
};
+/* Maximum size (in bytes) of an rpc credential or verifier */
+#define RPC_MAX_AUTH_SIZE (400)
+
enum rpc_msg_type {
RPC_CALL = 0,
RPC_REPLY = 1
@@ -101,5 +107,39 @@ typedef u32 rpc_fraghdr;
#define RPC_FRAGMENT_SIZE_MASK (~RPC_LAST_STREAM_FRAGMENT)
#define RPC_MAX_FRAGMENT_SIZE ((1U << 31) - 1)
+/*
+ * RPC call and reply header size as number of 32bit words (verifier
+ * size computed separately, see below)
+ */
+#define RPC_CALLHDRSIZE (6)
+#define RPC_REPHDRSIZE (4)
+
+
+/*
+ * Maximum RPC header size, including authentication,
+ * as number of 32bit words (see RFCs 1831, 1832).
+ *
+ * xid 1 xdr unit = 4 bytes
+ * mtype 1
+ * rpc_version 1
+ * program 1
+ * prog_version 1
+ * procedure 1
+ * cred {
+ * flavor 1
+ * length 1
+ * body<RPC_MAX_AUTH_SIZE> 100 xdr units = 400 bytes
+ * }
+ * verf {
+ * flavor 1
+ * length 1
+ * body<RPC_MAX_AUTH_SIZE> 100 xdr units = 400 bytes
+ * }
+ * TOTAL 210 xdr units = 840 bytes
+ */
+#define RPC_MAX_HEADER_WITH_AUTH \
+ (RPC_CALLHDRSIZE + 2*(2+RPC_MAX_AUTH_SIZE/4))
+
+
#endif /* __KERNEL__ */
#endif /* _LINUX_SUNRPC_MSGPROT_H_ */
diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h 2006-08-24 16:25:41.000000000 +1000
+++ ./include/linux/sunrpc/svc.h 2006-08-24 16:25:56.000000000 +1000
@@ -13,6 +13,7 @@
#include <linux/in.h>
#include <linux/sunrpc/types.h>
#include <linux/sunrpc/xdr.h>
+#include <linux/sunrpc/auth.h>
#include <linux/sunrpc/svcauth.h>
#include <linux/wait.h>
#include <linux/mm.h>
@@ -95,8 +96,28 @@ static inline void svc_get(struct svc_se
* Maximum payload size supported by a kernel RPC server.
* This is use to determine the max number of pages nfsd is
* willing to return in a single READ operation.
- */
-#define RPCSVC_MAXPAYLOAD (64*1024u)
+ *
+ * These happen to all be powers of 2, which is not strictly
+ * necessary but helps enforce the real limitation, which is
+ * that they should be multiples of PAGE_CACHE_SIZE.
+ *
+ * For UDP transports, a block plus NFS,RPC, and UDP headers
+ * has to fit into the IP datagram limit of 64K. The largest
+ * feasible number for all known page sizes is probably 48K,
+ * but we choose 32K here. This is the same as the historical
+ * Linux limit; someone who cares more about NFS/UDP performance
+ * can test a larger number.
+ *
+ * For TCP transports we have more freedom. A size of 1MB is
+ * chosen to match the client limit. Other OSes are known to
+ * have larger limits, but those numbers are probably beyond
+ * the point of diminishing returns.
+ */
+#define RPCSVC_MAXPAYLOAD (1*1024*1024u)
+#define RPCSVC_MAXPAYLOAD_TCP RPCSVC_MAXPAYLOAD
+#define RPCSVC_MAXPAYLOAD_UDP (32*1024u)
+
+extern u32 svc_max_payload(const struct svc_rqst *rqstp);
/*
* RPC Requsts and replies are stored in one or more pages.
diff .prev/include/linux/sunrpc/xprt.h ./include/linux/sunrpc/xprt.h
--- .prev/include/linux/sunrpc/xprt.h 2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/sunrpc/xprt.h 2006-08-24 16:25:56.000000000 +1000
@@ -14,6 +14,7 @@
#include <linux/in.h>
#include <linux/sunrpc/sched.h>
#include <linux/sunrpc/xdr.h>
+#include <linux/sunrpc/msg_prot.h>
extern unsigned int xprt_udp_slot_table_entries;
extern unsigned int xprt_tcp_slot_table_entries;
@@ -23,13 +24,6 @@ extern unsigned int xprt_tcp_slot_table_
#define RPC_MAX_SLOT_TABLE (128U)
/*
- * RPC call and reply header size as number of 32bit words (verifier
- * size computed separately)
- */
-#define RPC_CALLHDRSIZE 6
-#define RPC_REPHDRSIZE 4
-
-/*
* Parameters for choosing a free port
*/
extern unsigned int xprt_min_resvport;
diff .prev/net/sunrpc/svc.c ./net/sunrpc/svc.c
--- .prev/net/sunrpc/svc.c 2006-08-24 16:25:13.000000000 +1000
+++ ./net/sunrpc/svc.c 2006-08-24 16:25:56.000000000 +1000
@@ -919,3 +919,18 @@ err_bad:
svc_putu32(resv, rpc_stat);
goto sendit;
}
+
+/*
+ * Return (transport-specific) limit on the rpc payload.
+ */
+u32 svc_max_payload(const struct svc_rqst *rqstp)
+{
+ int max = RPCSVC_MAXPAYLOAD_TCP;
+
+ if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
+ max = RPCSVC_MAXPAYLOAD_UDP;
+ if (rqstp->rq_server->sv_bufsz < max)
+ max = rqstp->rq_server->sv_bufsz;
+ return max;
+}
+EXPORT_SYMBOL_GPL(svc_max_payload);
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (7 preceding siblings ...)
2006-08-24 6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
@ 2006-08-24 6:37 ` NeilBrown
2006-09-25 21:24 ` [NFS] " J. Bruce Fields
2006-08-24 6:37 ` [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly NeilBrown
2006-08-24 6:37 ` [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket NeilBrown
10 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
The max possible is the maximum RPC payload.
The default depends on amount of total memory.
The value can be set within reason as long as
no nfsd threads are currently running.
The value can also be ready, allowing the default
to be determined after nfsd has started.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfsctl.c | 33 +++++++++++++++++++++++++++++++++
./fs/nfsd/nfssvc.c | 19 ++++++++++++++++++-
./include/linux/nfsd/const.h | 4 ++--
./include/linux/nfsd/nfsd.h | 1 +
4 files changed, 54 insertions(+), 3 deletions(-)
diff .prev/fs/nfsd/nfsctl.c ./fs/nfsd/nfsctl.c
--- .prev/fs/nfsd/nfsctl.c 2006-08-24 16:24:56.000000000 +1000
+++ ./fs/nfsd/nfsctl.c 2006-08-24 16:26:10.000000000 +1000
@@ -57,6 +57,7 @@ enum {
NFSD_Pool_Threads,
NFSD_Versions,
NFSD_Ports,
+ NFSD_MaxBlkSize,
/*
* The below MUST come last. Otherwise we leave a hole in nfsd_files[]
* with !CONFIG_NFSD_V4 and simple_fill_super() goes oops
@@ -82,6 +83,7 @@ static ssize_t write_threads(struct file
static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
static ssize_t write_versions(struct file *file, char *buf, size_t size);
static ssize_t write_ports(struct file *file, char *buf, size_t size);
+static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
#ifdef CONFIG_NFSD_V4
static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
@@ -100,6 +102,7 @@ static ssize_t (*write_op[])(struct file
[NFSD_Pool_Threads] = write_pool_threads,
[NFSD_Versions] = write_versions,
[NFSD_Ports] = write_ports,
+ [NFSD_MaxBlkSize] = write_maxblksize,
#ifdef CONFIG_NFSD_V4
[NFSD_Leasetime] = write_leasetime,
[NFSD_RecoveryDir] = write_recoverydir,
@@ -555,6 +558,35 @@ static ssize_t write_ports(struct file *
return -EINVAL;
}
+int nfsd_max_blksize;
+
+static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
+{
+ char *mesg = buf;
+ if (size > 0) {
+ int bsize;
+ int rv = get_int(&mesg, &bsize);
+ if (rv)
+ return rv;
+ /* force bsize into allowed range and
+ * required alignment.
+ */
+ if (bsize < 1024)
+ bsize = 1024;
+ if (bsize > NFSSVC_MAXBLKSIZE)
+ bsize = NFSSVC_MAXBLKSIZE;
+ bsize &= ~(1024-1);
+ lock_kernel();
+ if (nfsd_serv && nfsd_serv->sv_nrthreads) {
+ unlock_kernel();
+ return -EBUSY;
+ }
+ nfsd_max_blksize = bsize;
+ unlock_kernel();
+ }
+ return sprintf(buf, "%d\n", nfsd_max_blksize);
+}
+
#ifdef CONFIG_NFSD_V4
extern time_t nfs4_leasetime(void);
@@ -620,6 +652,7 @@ static int nfsd_fill_super(struct super_
[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
+ [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
#ifdef CONFIG_NFSD_V4
[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_RecoveryDir] = {"nfsv4recoverydir", &transaction_ops, S_IWUSR|S_IRUSR},
diff .prev/fs/nfsd/nfssvc.c ./fs/nfsd/nfssvc.c
--- .prev/fs/nfsd/nfssvc.c 2006-08-24 16:26:10.000000000 +1000
+++ ./fs/nfsd/nfssvc.c 2006-08-24 16:26:10.000000000 +1000
@@ -198,9 +198,26 @@ int nfsd_create_serv(void)
unlock_kernel();
return 0;
}
+ if (nfsd_max_blksize == 0) {
+ /* choose a suitable default */
+ struct sysinfo i;
+ si_meminfo(&i);
+ /* Aim for 1/4096 of memory per thread
+ * This gives 1MB on 4Gig machines
+ * But only uses 32K on 128M machines.
+ * Bottom out at 8K on 32M and smaller.
+ * Of course, this is only a default.
+ */
+ nfsd_max_blksize = NFSSVC_MAXBLKSIZE;
+ i.totalram >>= 12;
+ while (nfsd_max_blksize > i.totalram &&
+ nfsd_max_blksize >= 8*1024*2)
+ nfsd_max_blksize /= 2;
+ }
atomic_set(&nfsd_busy, 0);
- nfsd_serv = svc_create_pooled(&nfsd_program, NFSD_BUFSIZE,
+ nfsd_serv = svc_create_pooled(&nfsd_program,
+ NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE + nfsd_max_blksize,
nfsd_last_thread,
nfsd, SIG_NOCLEAN, THIS_MODULE);
if (nfsd_serv == NULL)
diff .prev/include/linux/nfsd/const.h ./include/linux/nfsd/const.h
--- .prev/include/linux/nfsd/const.h 2006-08-24 16:25:56.000000000 +1000
+++ ./include/linux/nfsd/const.h 2006-08-24 16:26:10.000000000 +1000
@@ -21,9 +21,9 @@
#define NFSSVC_MAXVERS 3
/*
- * Maximum blocksize supported by daemon currently at 32K
+ * Maximum blocksizes supported by daemon under various circumstances.
*/
-#define NFSSVC_MAXBLKSIZE (32*1024)
+#define NFSSVC_MAXBLKSIZE RPCSVC_MAXPAYLOAD
/* NFSv2 is limited by the protocol specification, see RFC 1094 */
#define NFSSVC_MAXBLKSIZE_V2 (8*1024)
diff .prev/include/linux/nfsd/nfsd.h ./include/linux/nfsd/nfsd.h
--- .prev/include/linux/nfsd/nfsd.h 2006-08-24 16:26:10.000000000 +1000
+++ ./include/linux/nfsd/nfsd.h 2006-08-24 16:26:10.000000000 +1000
@@ -145,6 +145,7 @@ int nfsd_vers(int vers, enum vers_op cha
void nfsd_reset_versions(void);
int nfsd_create_serv(void);
+extern int nfsd_max_blksize;
/*
* NFSv4 State
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (8 preceding siblings ...)
2006-08-24 6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
@ 2006-08-24 6:37 ` NeilBrown
2006-08-24 6:37 ` [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket NeilBrown
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
From: Greg Banks <gnb@melbourne.sgi.com>
knfsd: make the nfsd read-ahead params cache more SMP-friendly by
changing the single global list and lock into a fixed 16-bucket
hashtable with per-bucket locks. This reduces spinlock contention
in nfsd_read() on read-heavy workloads on multiprocessor servers.
Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing 1K
streaming reads at full line rate. The server had 128 nfsd threads,
which sizes the RA cache at 256 entries, of which only a handful
were used. Flat profiling shows nfsd_read(), including the inlined
nfsd_get_raparms(), taking 10.4% of each CPU. This patch drops the
contribution from nfsd() to 1.71% for each CPU.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/vfs.c | 60 +++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 44 insertions(+), 16 deletions(-)
diff .prev/fs/nfsd/vfs.c ./fs/nfsd/vfs.c
--- .prev/fs/nfsd/vfs.c 2006-08-24 16:25:13.000000000 +1000
+++ ./fs/nfsd/vfs.c 2006-08-24 16:27:01.000000000 +1000
@@ -54,6 +54,7 @@
#include <linux/nfsd_idmap.h>
#include <linux/security.h>
#endif /* CONFIG_NFSD_V4 */
+#include <linux/jhash.h>
#include <asm/uaccess.h>
@@ -81,10 +82,19 @@ struct raparms {
dev_t p_dev;
int p_set;
struct file_ra_state p_ra;
+ unsigned int p_hindex;
};
+struct raparm_hbucket {
+ struct raparms *pb_head;
+ spinlock_t pb_lock;
+} ____cacheline_aligned_in_smp;
+
static struct raparms * raparml;
-static struct raparms * raparm_cache;
+#define RAPARM_HASH_BITS 4
+#define RAPARM_HASH_SIZE (1<<RAPARM_HASH_BITS)
+#define RAPARM_HASH_MASK (RAPARM_HASH_SIZE-1)
+static struct raparm_hbucket raparm_hash[RAPARM_HASH_SIZE];
/*
* Called from nfsd_lookup and encode_dirent. Check if we have crossed
@@ -743,16 +753,20 @@ nfsd_sync_dir(struct dentry *dp)
* Obtain the readahead parameters for the file
* specified by (dev, ino).
*/
-static DEFINE_SPINLOCK(ra_lock);
static inline struct raparms *
nfsd_get_raparms(dev_t dev, ino_t ino)
{
struct raparms *ra, **rap, **frap = NULL;
int depth = 0;
+ unsigned int hash;
+ struct raparm_hbucket *rab;
+
+ hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
+ rab = &raparm_hash[hash];
- spin_lock(&ra_lock);
- for (rap = &raparm_cache; (ra = *rap); rap = &ra->p_next) {
+ spin_lock(&rab->pb_lock);
+ for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
if (ra->p_ino == ino && ra->p_dev == dev)
goto found;
depth++;
@@ -761,7 +775,7 @@ nfsd_get_raparms(dev_t dev, ino_t ino)
}
depth = nfsdstats.ra_size*11/10;
if (!frap) {
- spin_unlock(&ra_lock);
+ spin_unlock(&rab->pb_lock);
return NULL;
}
rap = frap;
@@ -769,15 +783,16 @@ nfsd_get_raparms(dev_t dev, ino_t ino)
ra->p_dev = dev;
ra->p_ino = ino;
ra->p_set = 0;
+ ra->p_hindex = hash;
found:
- if (rap != &raparm_cache) {
+ if (rap != &rab->pb_head) {
*rap = ra->p_next;
- ra->p_next = raparm_cache;
- raparm_cache = ra;
+ ra->p_next = rab->pb_head;
+ rab->pb_head = ra;
}
ra->p_count++;
nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
- spin_unlock(&ra_lock);
+ spin_unlock(&rab->pb_lock);
return ra;
}
@@ -856,11 +871,12 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
/* Write back readahead params */
if (ra) {
- spin_lock(&ra_lock);
+ struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
+ spin_lock(&rab->pb_lock);
ra->p_ra = file->f_ra;
ra->p_set = 1;
ra->p_count--;
- spin_unlock(&ra_lock);
+ spin_unlock(&rab->pb_lock);
}
if (err >= 0) {
@@ -1836,11 +1852,11 @@ nfsd_permission(struct svc_export *exp,
void
nfsd_racache_shutdown(void)
{
- if (!raparm_cache)
+ if (!raparml)
return;
dprintk("nfsd: freeing readahead buffers.\n");
kfree(raparml);
- raparm_cache = raparml = NULL;
+ raparml = NULL;
}
/*
* Initialize readahead param cache
@@ -1849,19 +1865,31 @@ int
nfsd_racache_init(int cache_size)
{
int i;
+ int j = 0;
+ int nperbucket;
- if (raparm_cache)
+
+ if (raparml)
return 0;
+ if (cache_size < 2*RAPARM_HASH_SIZE)
+ cache_size = 2*RAPARM_HASH_SIZE;
raparml = kmalloc(sizeof(struct raparms) * cache_size, GFP_KERNEL);
if (raparml != NULL) {
dprintk("nfsd: allocating %d readahead buffers.\n",
cache_size);
+ for (i = 0 ; i < RAPARM_HASH_SIZE ; i++) {
+ raparm_hash[i].pb_head = NULL;
+ spin_lock_init(&raparm_hash[i].pb_lock);
+ }
+ nperbucket = cache_size >> RAPARM_HASH_BITS;
memset(raparml, 0, sizeof(struct raparms) * cache_size);
for (i = 0; i < cache_size - 1; i++) {
- raparml[i].p_next = raparml + i + 1;
+ if (i % nperbucket == 0)
+ raparm_hash[j++].pb_head = raparml + i;
+ if (i % nperbucket < nperbucket-1)
+ raparml[i].p_next = raparml + i + 1;
}
- raparm_cache = raparml;
} else {
printk(KERN_WARNING
"nfsd: Could not allocate memory read-ahead cache.\n");
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
` (9 preceding siblings ...)
2006-08-24 6:37 ` [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly NeilBrown
@ 2006-08-24 6:37 ` NeilBrown
10 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2006-08-24 6:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: nfs, linux-kernel
From: Greg Banks <gnb@melbourne.sgi.com>
knfsd: speed up high call-rate workloads by caching the struct ip_map
for the peer on the connected struct svc_sock instead of looking it
up in the ip_map cache hashtable on every call. This helps workloads
using AUTH_SYS authentication over TCP.
Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480
regular files in 10841 directories) on the server. That tree is small
enough to fill in the server's RAM so no disk traffic was involved.
This setup gives a sustained call rate in excess of 60000 calls/sec
before being CPU-bound on the server.
Profiling showed strcmp(), called from ip_map_match(), was taking 4.8%
of each CPU, and ip_map_lookup() was taking 2.9%. This patch drops
both contribution into the profile noise.
Note that the above result overstates this value of this patch
for most workloads. The synthetic clients are all using separate
IP addresses, so there are 64 entries in the ip_map cache hash.
Because the kernel measured contained the bug fixed in commit
commit 1f1e030bf75774b6a283518e1534d598e14147d4
and was running on 64bit little-endian machine, probably all of
those 64 entries were on a single chain, thus increasing the cost
of ip_map_lookup().
With a modern kernel you would need more clients to see the same
amount of performance improvement. This patch has helped to scale
knfsd to handle a deployment with 2000 NFS clients.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./include/linux/sunrpc/cache.h | 11 +++++++++
./include/linux/sunrpc/svcauth.h | 1
./include/linux/sunrpc/svcsock.h | 3 ++
./net/sunrpc/svcauth_unix.c | 47 ++++++++++++++++++++++++++++++++++++---
./net/sunrpc/svcsock.c | 2 +
5 files changed, 61 insertions(+), 3 deletions(-)
diff .prev/include/linux/sunrpc/cache.h ./include/linux/sunrpc/cache.h
--- .prev/include/linux/sunrpc/cache.h 2006-08-24 16:27:18.000000000 +1000
+++ ./include/linux/sunrpc/cache.h 2006-08-24 16:27:18.000000000 +1000
@@ -163,6 +163,17 @@ static inline void cache_put(struct cach
kref_put(&h->ref, cd->cache_put);
}
+static inline int cache_valid(struct cache_head *h)
+{
+ /* If an item has been unhashed pending removal when
+ * the refcount drops to 0, the expiry_time will be
+ * set to 0. We don't want to consider such items
+ * valid in this context even though CACHE_VALID is
+ * set.
+ */
+ return (h->expiry_time != 0 && test_bit(CACHE_VALID, &h->flags));
+}
+
extern int cache_check(struct cache_detail *detail,
struct cache_head *h, struct cache_req *rqstp);
extern void cache_flush(void);
diff .prev/include/linux/sunrpc/svcauth.h ./include/linux/sunrpc/svcauth.h
--- .prev/include/linux/sunrpc/svcauth.h 2006-08-24 16:27:18.000000000 +1000
+++ ./include/linux/sunrpc/svcauth.h 2006-08-24 16:27:18.000000000 +1000
@@ -126,6 +126,7 @@ extern struct auth_domain *auth_domain_f
extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
extern int auth_unix_forget_old(struct auth_domain *dom);
extern void svcauth_unix_purge(void);
+extern void svcauth_unix_info_release(void *);
static inline unsigned long hash_str(char *name, int bits)
{
diff .prev/include/linux/sunrpc/svcsock.h ./include/linux/sunrpc/svcsock.h
--- .prev/include/linux/sunrpc/svcsock.h 2006-08-24 16:27:18.000000000 +1000
+++ ./include/linux/sunrpc/svcsock.h 2006-08-24 16:27:18.000000000 +1000
@@ -54,6 +54,9 @@ struct svc_sock {
int sk_reclen; /* length of record */
int sk_tcplen; /* current read length */
time_t sk_lastrecv; /* time of last received request */
+
+ /* cache of various info for TCP sockets */
+ void *sk_info_authunix;
};
/*
diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c
--- .prev/net/sunrpc/svcauth_unix.c 2006-08-24 16:27:18.000000000 +1000
+++ ./net/sunrpc/svcauth_unix.c 2006-08-24 16:27:18.000000000 +1000
@@ -9,6 +9,7 @@
#include <linux/seq_file.h>
#include <linux/hash.h>
#include <linux/string.h>
+#include <net/sock.h>
#define RPCDBG_FACILITY RPCDBG_AUTH
@@ -375,6 +376,44 @@ void svcauth_unix_purge(void)
cache_purge(&ip_map_cache);
}
+static inline struct ip_map *
+ip_map_cached_get(struct svc_rqst *rqstp)
+{
+ struct ip_map *ipm = rqstp->rq_sock->sk_info_authunix;
+ if (ipm != NULL) {
+ if (!cache_valid(&ipm->h)) {
+ /*
+ * The entry has been invalidated since it was
+ * remembered, e.g. by a second mount from the
+ * same IP address.
+ */
+ rqstp->rq_sock->sk_info_authunix = NULL;
+ cache_put(&ipm->h, &ip_map_cache);
+ return NULL;
+ }
+ cache_get(&ipm->h);
+ }
+ return ipm;
+}
+
+static inline void
+ip_map_cached_put(struct svc_rqst *rqstp, struct ip_map *ipm)
+{
+ struct svc_sock *svsk = rqstp->rq_sock;
+
+ if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == NULL)
+ svsk->sk_info_authunix = ipm; /* newly cached, keep the reference */
+ else
+ cache_put(&ipm->h, &ip_map_cache);
+}
+
+void
+svcauth_unix_info_release(void *info)
+{
+ struct ip_map *ipm = info;
+ cache_put(&ipm->h, &ip_map_cache);
+}
+
static int
svcauth_unix_set_client(struct svc_rqst *rqstp)
{
@@ -384,8 +423,10 @@ svcauth_unix_set_client(struct svc_rqst
if (rqstp->rq_proc == 0)
return SVC_OK;
- ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
- rqstp->rq_addr.sin_addr);
+ ipm = ip_map_cached_get(rqstp);
+ if (ipm == NULL)
+ ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
+ rqstp->rq_addr.sin_addr);
if (ipm == NULL)
return SVC_DENIED;
@@ -400,7 +441,7 @@ svcauth_unix_set_client(struct svc_rqst
case 0:
rqstp->rq_client = &ipm->m_client->h;
kref_get(&rqstp->rq_client->ref);
- cache_put(&ipm->h, &ip_map_cache);
+ ip_map_cached_put(rqstp, ipm);
break;
}
return SVC_OK;
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2006-08-24 16:25:41.000000000 +1000
+++ ./net/sunrpc/svcsock.c 2006-08-24 16:27:18.000000000 +1000
@@ -1612,6 +1612,8 @@ svc_delete_socket(struct svc_sock *svsk)
sockfd_put(svsk->sk_sock);
else
sock_release(svsk->sk_sock);
+ if (svsk->sk_info_authunix != NULL)
+ svcauth_unix_info_release(svsk->sk_info_authunix);
kfree(svsk);
} else {
spin_unlock_bh(&serv->sv_lock);
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-08-24 6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
@ 2006-09-25 15:43 ` J. Bruce Fields
2006-09-28 3:41 ` Neil Brown
2006-10-03 1:36 ` Neil Brown
0 siblings, 2 replies; 25+ messages in thread
From: J. Bruce Fields @ 2006-09-25 15:43 UTC (permalink / raw)
To: NeilBrown; +Cc: Andrew Morton, nfs, linux-kernel, Greg Banks
On Thu, Aug 24, 2006 at 04:37:11PM +1000, NeilBrown wrote:
> The limit over UDP remains at 32K. Also, make some of
> the apparently arbitrary sizing constants clearer.
>
> The biggest change here involves replacing NFSSVC_MAXBLKSIZE
> by a function of the rqstp. This allows it to be different
> for different protocols (udp/tcp) and also allows it
> to depend on the servers declared sv_bufsiz.
>
> Note that we don't actually increase sv_bufsz for nfs yet.
> That comes next.
This patch has some problems. (Apologies for being so slow to look at
them!)
We're reporting svc_max_payload(rqstp) as the server's maximum
read/write block size:
> @@ -538,15 +539,16 @@ nfsd3_proc_fsinfo(struct svc_rqst * rqst
> struct nfsd3_fsinfores *resp)
> {
> int nfserr;
> + u32 max_blocksize = svc_max_payload(rqstp);
>
> dprintk("nfsd: FSINFO(3) %s\n",
> SVCFH_fmt(&argp->fh));
>
> - resp->f_rtmax = NFSSVC_MAXBLKSIZE;
> - resp->f_rtpref = NFSSVC_MAXBLKSIZE;
> + resp->f_rtmax = max_blocksize;
> + resp->f_rtpref = max_blocksize;
> resp->f_rtmult = PAGE_SIZE;
> - resp->f_wtmax = NFSSVC_MAXBLKSIZE;
> - resp->f_wtpref = NFSSVC_MAXBLKSIZE;
> + resp->f_wtmax = max_blocksize;
> + resp->f_wtpref = max_blocksize;
> resp->f_wtmult = PAGE_SIZE;
> resp->f_dtpref = PAGE_SIZE;
> resp->f_maxfilesize = ~(u32) 0;
But svc_max_payload() usually returns sv_bufsz in the TCP case:
> +u32 svc_max_payload(const struct svc_rqst *rqstp)
> +{
> + int max = RPCSVC_MAXPAYLOAD_TCP;
> +
> + if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
> + max = RPCSVC_MAXPAYLOAD_UDP;
> + if (rqstp->rq_server->sv_bufsz < max)
> + max = rqstp->rq_server->sv_bufsz;
> + return max;
> +}
That's the *total* size of the buffer for holding requests and replies.
If a client actually tries to send a write of that size, the entire
request will of course exceed sv_bufsz, so we'll drop it. (We've seen
this happen with the Solaris v4 client.)
> -#define NFSD_BUFSIZE (1024 + NFSSVC_MAXBLKSIZE)
> +/*
> + * Largest number of bytes we need to allocate for an NFS
> + * call or reply. Used to control buffer sizes. We use
> + * the length of v3 WRITE, READDIR and READDIR replies
> + * which are an RPC header, up to 26 XDR units of reply
> + * data, and some page data.
> + *
> + * Note that accuracy here doesn't matter too much as the
> + * size is rounded up to a page size when allocating space.
> + */
Is the rounding up *always* going to increase the size? And if not,
then why doesn't accuracy matter?
> +#define NFSD_BUFSIZE ((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)
I think this results in 80 less bytes less than before, I think.
No doubt we have lots of wiggle room here, but I'd rather we didn't
decrease that size without seeing a careful analysis.
--b.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
2006-08-24 6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
@ 2006-09-25 21:24 ` J. Bruce Fields
2006-09-28 4:22 ` Neil Brown
0 siblings, 1 reply; 25+ messages in thread
From: J. Bruce Fields @ 2006-09-25 21:24 UTC (permalink / raw)
To: NeilBrown; +Cc: Andrew Morton, nfs, linux-kernel
On Thu, Aug 24, 2006 at 04:37:16PM +1000, NeilBrown wrote:
> diff .prev/fs/nfsd/nfssvc.c ./fs/nfsd/nfssvc.c
> --- .prev/fs/nfsd/nfssvc.c 2006-08-24 16:26:10.000000000 +1000
> +++ ./fs/nfsd/nfssvc.c 2006-08-24 16:26:10.000000000 +1000
> @@ -198,9 +198,26 @@ int nfsd_create_serv(void)
> unlock_kernel();
> return 0;
> }
> + if (nfsd_max_blksize == 0) {
> + /* choose a suitable default */
> + struct sysinfo i;
> + si_meminfo(&i);
> + /* Aim for 1/4096 of memory per thread
> + * This gives 1MB on 4Gig machines
> + * But only uses 32K on 128M machines.
> + * Bottom out at 8K on 32M and smaller.
> + * Of course, this is only a default.
> + */
> + nfsd_max_blksize = NFSSVC_MAXBLKSIZE;
> + i.totalram >>= 12;
> + while (nfsd_max_blksize > i.totalram &&
> + nfsd_max_blksize >= 8*1024*2)
> + nfsd_max_blksize /= 2;
> + }
It looks to me like totalram is actually measured in pages. So in
practice this gives almost everyone 8k here. So that 12 should be
something like 12 - PAGE_CACHE_SHIFT?
--b.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-09-25 15:43 ` [NFS] " J. Bruce Fields
@ 2006-09-28 3:41 ` Neil Brown
2006-09-28 3:46 ` Andrew Morton
2006-10-03 1:36 ` Neil Brown
1 sibling, 1 reply; 25+ messages in thread
From: Neil Brown @ 2006-09-28 3:41 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Andrew Morton, nfs, linux-kernel, Greg Banks
On Monday September 25, bfields@fieldses.org wrote:
> On Thu, Aug 24, 2006 at 04:37:11PM +1000, NeilBrown wrote:
> > The limit over UDP remains at 32K. Also, make some of
> > the apparently arbitrary sizing constants clearer.
> >
> > The biggest change here involves replacing NFSSVC_MAXBLKSIZE
> > by a function of the rqstp. This allows it to be different
> > for different protocols (udp/tcp) and also allows it
> > to depend on the servers declared sv_bufsiz.
> >
> > Note that we don't actually increase sv_bufsz for nfs yet.
> > That comes next.
>
> This patch has some problems. (Apologies for being so slow to look at
> them!)
Problems. Yes. It makes my brain hurt for one! We have various
things called a 'size' with some being rounded up version of others
and ... ARG.
>
> We're reporting svc_max_payload(rqstp) as the server's maximum
> read/write block size:
>
> > @@ -538,15 +539,16 @@ nfsd3_proc_fsinfo(struct svc_rqst * rqst
> > struct nfsd3_fsinfores *resp)
> > {
> > int nfserr;
> > + u32 max_blocksize = svc_max_payload(rqstp);
...
>
> But svc_max_payload() usually returns sv_bufsz in the TCP case:
>
...
>
> That's the *total* size of the buffer for holding requests and replies.
Yes... for consistency with nfsd_create_serv, this should probably
be
max_blocksize = svc_max_payload(rqstp) - (NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE);
as (NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE) have been determined to be the
maximum overhead in a read reply / write request.
> > -#define NFSD_BUFSIZE (1024 + NFSSVC_MAXBLKSIZE)
> > +/*
> > + * Largest number of bytes we need to allocate for an NFS
> > + * call or reply. Used to control buffer sizes. We use
> > + * the length of v3 WRITE, READDIR and READDIR replies
> > + * which are an RPC header, up to 26 XDR units of reply
> > + * data, and some page data.
> > + *
> > + * Note that accuracy here doesn't matter too much as the
> > + * size is rounded up to a page size when allocating space.
> > + */
>
> Is the rounding up *always* going to increase the size? And if not,
> then why doesn't accuracy matter?
>
> > +#define NFSD_BUFSIZE ((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)
Well the code in svc_init_buffer says:
pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
So it doesn't just round up, but adds one page. It might look like it
is adding 2 pages, but one of those is for the message in the other
direction.
It is really one page for the request, one page for the reply, and
N pages for the data. So why do we add all that padding to
NFSD_BUFSIZE?
I'm not sure. I think there is a good reason, but as I said - it
makes my brain hurt.
And the above comment only mentions v3. v4 could presumably have lots
more overhead. A 'write' could be in compound with lots of other
stuff, and if we say we can handle a 32k write, might the client send
a 40K message with 8k of UNLINK requests???
>
> No doubt we have lots of wiggle room here, but I'd rather we didn't
> decrease that size without seeing a careful analysis.
Yes. careful analysis. That sounds like a good idea.
I'll race you ... but I hope you win :-)
NeilBrown
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-09-28 3:41 ` Neil Brown
@ 2006-09-28 3:46 ` Andrew Morton
0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2006-09-28 3:46 UTC (permalink / raw)
To: Neil Brown; +Cc: J. Bruce Fields, nfs, linux-kernel, Greg Banks
On Thu, 28 Sep 2006 13:41:27 +1000
Neil Brown <neilb@suse.de> wrote:
> pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
That's (the newly-added) DIV_ROUND_UP().
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
2006-09-25 21:24 ` [NFS] " J. Bruce Fields
@ 2006-09-28 4:22 ` Neil Brown
2006-09-28 17:09 ` Hugh Dickins
0 siblings, 1 reply; 25+ messages in thread
From: Neil Brown @ 2006-09-28 4:22 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Andrew Morton, nfs, linux-kernel
On Monday September 25, bfields@fieldses.org wrote:
>
> It looks to me like totalram is actually measured in pages. So in
> practice this gives almost everyone 8k here. So that 12 should be
> something like 12 - PAGE_CACHE_SHIFT?
Uhm.... yes. Thanks.
But are the pages that totalram is measure in, normal pages, of
page_cache pages? And is there a difference?
Should we use PAGE_CACHE_SHIFT, or PAGE_SHIFT?
And why do we have both if they are numerically identical?
I'll submit a patch which uses
12 - PAGE_SHIFT
in a little while.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
2006-09-28 4:22 ` Neil Brown
@ 2006-09-28 17:09 ` Hugh Dickins
2006-09-29 1:59 ` Neil Brown
0 siblings, 1 reply; 25+ messages in thread
From: Hugh Dickins @ 2006-09-28 17:09 UTC (permalink / raw)
To: Neil Brown; +Cc: J. Bruce Fields, Andrew Morton, nfs, linux-kernel
On Thu, 28 Sep 2006, Neil Brown wrote:
> But are the pages that totalram is measure in, normal pages, of
> page_cache pages? And is there a difference?
There's never yet been a difference, outside of some patches by bcrl.
But totalram_pages comes "before" any idea of page cache, so it's in
normal pages.
> Should we use PAGE_CACHE_SHIFT, or PAGE_SHIFT?
PAGE_SHIFT.
> And why do we have both if they are numerically identical?
Very irritating: the time I've wasted on "correcting" code for the
"difference" between them! Yet there's still plenty wrong and I've
largely given up on it.
Probably never will be a difference: but the idea was that the page
cache might use >0-order pages (unclear what happens to swap cache).
I wish they'd waited for a working implementation before introducing
the distinction; but never quite felt like deleting all trace of it.
>
> I'll submit a patch which uses
> 12 - PAGE_SHIFT
> in a little while.
I haven't seen your context; but "12 - PAGE_SHIFT" sounds like a
bad idea on all those architectures with PAGE_SHIFT 13 or more;
you'll be on much safer ground working with "PAGE_SHIFT - 12".
Hugh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured.
2006-09-28 17:09 ` Hugh Dickins
@ 2006-09-29 1:59 ` Neil Brown
0 siblings, 0 replies; 25+ messages in thread
From: Neil Brown @ 2006-09-29 1:59 UTC (permalink / raw)
To: Hugh Dickins; +Cc: J. Bruce Fields, Andrew Morton, nfs, linux-kernel
On Thursday September 28, hugh@veritas.com wrote:
> >
> > I'll submit a patch which uses
> > 12 - PAGE_SHIFT
> > in a little while.
>
> I haven't seen your context; but "12 - PAGE_SHIFT" sounds like a
> bad idea on all those architectures with PAGE_SHIFT 13 or more;
> you'll be on much safer ground working with "PAGE_SHIFT - 12".
Ahhh yes... of course. Thanks.
totalram <<= PAGE_SHIFT - 12;
Is what I want to convert a number of pages to 1/4096 the number of
bytes.
Thanks :-)
NeilBrown
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-09-25 15:43 ` [NFS] " J. Bruce Fields
2006-09-28 3:41 ` Neil Brown
@ 2006-10-03 1:36 ` Neil Brown
2006-10-03 1:59 ` Greg Banks
2006-10-03 2:13 ` J. Bruce Fields
1 sibling, 2 replies; 25+ messages in thread
From: Neil Brown @ 2006-10-03 1:36 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: NeilBrown, nfs, linux-kernel, Greg Banks
On Monday September 25, bfields@fieldses.org wrote:
>
> We're reporting svc_max_payload(rqstp) as the server's maximum
> read/write block size:
Yes. So I'm going to change the number returned by
svc_max_payload(rqstp) to mean the maximum read/write block size.
i.e. when a service is created, the number passed isn't the maximum
packet size, but is the maximum payload size.
The assumption is that all of the request that is not payload will fit
into one page, and all of the reply that is not payload will also fit
into one page (though a different page).
It means that RPC services that have lots of non-payload data combined
with payload data won't work, but making sunrpc code completely
general when there are only two users is just too painful.
The only real problem is that NFSv4 can have arbitrarily large
non-payload data, and arbitrarily many payloads. But I guess any
client that trying to send two full-sized payloads in the one request
is asking for trouble (I don't suppose the RPC spells this out at
all?).
>
> > -#define NFSD_BUFSIZE (1024 + NFSSVC_MAXBLKSIZE)
> > +/*
> > + * Largest number of bytes we need to allocate for an NFS
> > + * call or reply. Used to control buffer sizes. We use
> > + * the length of v3 WRITE, READDIR and READDIR replies
> > + * which are an RPC header, up to 26 XDR units of reply
> > + * data, and some page data.
> > + *
> > + * Note that accuracy here doesn't matter too much as the
> > + * size is rounded up to a page size when allocating space.
> > + */
>
> Is the rounding up *always* going to increase the size? And if not,
> then why doesn't accuracy matter?
>
> > +#define NFSD_BUFSIZE ((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT + NFSSVC_MAXBLKSIZE)
>
> I think this results in 80 less bytes less than before, I think.
>
> No doubt we have lots of wiggle room here, but I'd rather we didn't
> decrease that size without seeing a careful analysis.
The above change makes this loss in bytes irrelevant. NFSD_BUFSIZE
will now only be used once - near the end of nfs4proc.c and there if
it is wrong you just get a warning.
And the fact that the code change to effect this is so tiny seems to
imply that most of the code was already assuming that sv_bufsz was
really the payload size rather than the packet size.
So this is my proposed 'fix' for
knfsd-prepare-knfsd-for-support-of-rsize-wsize-of-up-to-1mb-over-tcp.patch.
NeilBrown
------------
Make sv_bufsiz really be the payload size for rpc requests.
svc.c already allocated 2 extra pages for the request and the reply,
so it is perfectly consistent to assume that the size passed to
svc_create_pooled is the size of the payload. This means that
the number returned by svc_max_payload - and thus returned to the client
as the maxiumu IO size - is exactly the chosen max block size.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/nfsd/nfssvc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff .prev/fs/nfsd/nfssvc.c ./fs/nfsd/nfssvc.c
--- .prev/fs/nfsd/nfssvc.c 2006-09-29 11:57:27.000000000 +1000
+++ ./fs/nfsd/nfssvc.c 2006-10-03 11:23:11.000000000 +1000
@@ -217,7 +217,7 @@ int nfsd_create_serv(void)
atomic_set(&nfsd_busy, 0);
nfsd_serv = svc_create_pooled(&nfsd_program,
- NFSD_BUFSIZE - NFSSVC_MAXBLKSIZE + nfsd_max_blksize,
+ nfsd_max_blksize,
nfsd_last_thread,
nfsd, SIG_NOCLEAN, THIS_MODULE);
if (nfsd_serv == NULL)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-10-03 1:36 ` Neil Brown
@ 2006-10-03 1:59 ` Greg Banks
2006-10-03 2:13 ` J. Bruce Fields
1 sibling, 0 replies; 25+ messages in thread
From: Greg Banks @ 2006-10-03 1:59 UTC (permalink / raw)
To: Neil Brown; +Cc: J. Bruce Fields, nfs, linux-kernel
On Tue, Oct 03, 2006 at 11:36:32AM +1000, Neil Brown wrote:
> On Monday September 25, bfields@fieldses.org wrote:
> >
> > We're reporting svc_max_payload(rqstp) as the server's maximum
> > read/write block size:
>
> Yes. So I'm going to change the number returned by
> svc_max_payload(rqstp) to mean the maximum read/write block size.
> i.e. when a service is created, the number passed isn't the maximum
> packet size, but is the maximum payload size.
I'm confused. Last time I looked at the code that was
exactly what the semantics were?
> The assumption is that all of the request that is not payload will fit
> into one page, and all of the reply that is not payload will also fit
> into one page (though a different page).
This is a pretty good assumption for v3.
> It means that RPC services that have lots of non-payload data combined
> with payload data won't work, but making sunrpc code completely
> general when there are only two users is just too painful.
>
> The only real problem is that NFSv4 can have arbitrarily large
> non-payload data, and arbitrarily many payloads. But I guess any
> client that trying to send two full-sized payloads in the one request
> is asking for trouble (I don't suppose the RPC spells this out at
> all?).
Bruce and I briefly discussed this when I dropped into CITI the other
week. The conclusion was that this is a non-issue in the short term
because all the clients do a single READ or WRITE per call. In the
long term I hope to rewrite some parts of that code to do away with
one of the memcpy()s in the WRITE path, and handling multiple WRITEs
for v4 would be a natural extension of that.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-10-03 1:36 ` Neil Brown
2006-10-03 1:59 ` Greg Banks
@ 2006-10-03 2:13 ` J. Bruce Fields
2006-10-03 5:41 ` Neil Brown
1 sibling, 1 reply; 25+ messages in thread
From: J. Bruce Fields @ 2006-10-03 2:13 UTC (permalink / raw)
To: Neil Brown; +Cc: nfs, linux-kernel, Greg Banks
On Tue, Oct 03, 2006 at 11:36:32AM +1000, Neil Brown wrote:
> The only real problem is that NFSv4 can have arbitrarily large
> non-payload data, and arbitrarily many payloads. But I guess any
> client that trying to send two full-sized payloads in the one request
> is asking for trouble (I don't suppose the RPC spells this out at
> all?).
The RFC? Well, it does have a "RESOURCE" error that the server can
return for overly complicated compounds. It doesn't give much guidance
on when exactly that could happen, but if there's ever a clear case for
returning NFS4ERR_RESOURCE, I think it must be the case of a client
trying to circumvent the maximum read/write size by using multiple read
or write operations in a single compound.
(We have some other odd restrictions on the sorts of compounds we can
accept, which I'd like to relax. But that's a problem for another day.)
> And the fact that the code change to effect this is so tiny seems to
> imply that most of the code was already assuming that sv_bufsz was
> really the payload size rather than the packet size.
There's also the check at the end of svc_tcp_recvfrom():
if (svsk->sk_reclen > serv->sv_bufsz) {
printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
(unsigned long) svsk->sk_reclen);
goto err_delete;
}
--b.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-10-03 2:13 ` J. Bruce Fields
@ 2006-10-03 5:41 ` Neil Brown
2006-10-03 8:02 ` Greg Banks
0 siblings, 1 reply; 25+ messages in thread
From: Neil Brown @ 2006-10-03 5:41 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: nfs, linux-kernel, Greg Banks
On Monday October 2, bfields@fieldses.org wrote:
> On Tue, Oct 03, 2006 at 11:36:32AM +1000, Neil Brown wrote:
> > The only real problem is that NFSv4 can have arbitrarily large
> > non-payload data, and arbitrarily many payloads. But I guess any
> > client that trying to send two full-sized payloads in the one request
> > is asking for trouble (I don't suppose the RPC spells this out at
> > all?).
>
> The RFC?
RFC, RPC, only a few pixel different :-)
Well, it does have a "RESOURCE" error that the server can
> return for overly complicated compounds. It doesn't give much guidance
> on when exactly that could happen, but if there's ever a clear case for
> returning NFS4ERR_RESOURCE, I think it must be the case of a client
> trying to circumvent the maximum read/write size by using multiple read
> or write operations in a single compound.
It would be nice if the RFC specified some minimum that the client
could be sure of not getting NFS4ERR_RESOURCE for, but maybe that
isn't really necessary.
>
> There's also the check at the end of svc_tcp_recvfrom():
>
> if (svsk->sk_reclen > serv->sv_bufsz) {
> printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
> (unsigned long) svsk->sk_reclen);
> goto err_delete;
> }
Groan... and a whole lot more besides. Thanks.
I had tried to avoid storing separate 'payload max' and 'message max'
numbers, but it doesn't seem like that is going to work, so it is time
to just give in and make it explicit.
Comments on the below?
Thanks,
NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./include/linux/sunrpc/svc.h | 3 ++-
./net/sunrpc/svc.c | 17 ++++++++++-------
./net/sunrpc/svcsock.c | 28 ++++++++++++++--------------
3 files changed, 26 insertions(+), 22 deletions(-)
diff .prev/include/linux/sunrpc/svc.h ./include/linux/sunrpc/svc.h
--- .prev/include/linux/sunrpc/svc.h 2006-09-29 11:44:31.000000000 +1000
+++ ./include/linux/sunrpc/svc.h 2006-10-03 15:26:38.000000000 +1000
@@ -57,7 +57,8 @@ struct svc_serv {
struct svc_stat * sv_stats; /* RPC statistics */
spinlock_t sv_lock;
unsigned int sv_nrthreads; /* # of server threads */
- unsigned int sv_bufsz; /* datagram buffer size */
+ unsigned int sv_max_payload; /* datagram payload size */
+ unsigned int sv_max_mesg; /* bufsz + 1 page for overheads */
unsigned int sv_xdrsize; /* XDR buffer size */
struct list_head sv_permsocks; /* all permanent sockets */
diff .prev/net/sunrpc/svc.c ./net/sunrpc/svc.c
--- .prev/net/sunrpc/svc.c 2006-09-29 11:44:31.000000000 +1000
+++ ./net/sunrpc/svc.c 2006-10-03 15:39:27.000000000 +1000
@@ -282,7 +282,8 @@ __svc_create(struct svc_program *prog, u
serv->sv_program = prog;
serv->sv_nrthreads = 1;
serv->sv_stats = prog->pg_stats;
- serv->sv_bufsz = bufsize? bufsize : 4096;
+ serv->sv_max_payload = bufsize? bufsize : 4096;
+ serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
serv->sv_shutdown = shutdown;
xdrsize = 0;
while (prog) {
@@ -414,9 +415,11 @@ svc_init_buffer(struct svc_rqst *rqstp,
int pages;
int arghi;
- if (size > RPCSVC_MAXPAYLOAD)
- size = RPCSVC_MAXPAYLOAD;
- pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
+ if (size > RPCSVC_MAXPAYLOAD + PAGE_SIZE)
+ size = RPCSVC_MAXPAYLOAD + PAGE_SIZE;
+ pages = size + PAGE_SIZE; /* extra page as we hold both request and reply.
+ * We assume one is at most one page
+ */
arghi = 0;
BUG_ON(pages > RPCSVC_MAXPAGES);
while (pages) {
@@ -463,7 +466,7 @@ __svc_create_thread(svc_thread_fn func,
if (!(rqstp->rq_argp = kmalloc(serv->sv_xdrsize, GFP_KERNEL))
|| !(rqstp->rq_resp = kmalloc(serv->sv_xdrsize, GFP_KERNEL))
- || !svc_init_buffer(rqstp, serv->sv_bufsz))
+ || !svc_init_buffer(rqstp, serv->sv_max_mesg))
goto out_thread;
serv->sv_nrthreads++;
@@ -938,8 +941,8 @@ u32 svc_max_payload(const struct svc_rqs
if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
max = RPCSVC_MAXPAYLOAD_UDP;
- if (rqstp->rq_server->sv_bufsz < max)
- max = rqstp->rq_server->sv_bufsz;
+ if (rqstp->rq_server->sv_max_payload < max)
+ max = rqstp->rq_server->sv_max_payload;
return max;
}
EXPORT_SYMBOL_GPL(svc_max_payload);
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2006-09-29 11:44:33.000000000 +1000
+++ ./net/sunrpc/svcsock.c 2006-10-03 15:35:08.000000000 +1000
@@ -192,13 +192,13 @@ svc_sock_enqueue(struct svc_sock *svsk)
svsk->sk_pool = pool;
set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
- if (((atomic_read(&svsk->sk_reserved) + serv->sv_bufsz)*2
+ if (((atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg)*2
> svc_sock_wspace(svsk))
&& !test_bit(SK_CLOSE, &svsk->sk_flags)
&& !test_bit(SK_CONN, &svsk->sk_flags)) {
/* Don't enqueue while not enough space for reply */
dprintk("svc: socket %p no space, %d*2 > %ld, not enqueued\n",
- svsk->sk_sk, atomic_read(&svsk->sk_reserved)+serv->sv_bufsz,
+ svsk->sk_sk, atomic_read(&svsk->sk_reserved)+serv->sv_max_mesg,
svc_sock_wspace(svsk));
svsk->sk_pool = NULL;
clear_bit(SK_BUSY, &svsk->sk_flags);
@@ -220,7 +220,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
rqstp, rqstp->rq_sock);
rqstp->rq_sock = svsk;
atomic_inc(&svsk->sk_inuse);
- rqstp->rq_reserved = serv->sv_bufsz;
+ rqstp->rq_reserved = serv->sv_max_mesg;
atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
BUG_ON(svsk->sk_pool != pool);
wake_up(&rqstp->rq_wait);
@@ -639,8 +639,8 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
* which will access the socket.
*/
svc_sock_setbufsize(svsk->sk_sock,
- (serv->sv_nrthreads+3) * serv->sv_bufsz,
- (serv->sv_nrthreads+3) * serv->sv_bufsz);
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg,
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg);
if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
svc_sock_received(svsk);
@@ -749,8 +749,8 @@ svc_udp_init(struct svc_sock *svsk)
* svc_udp_recvfrom will re-adjust if necessary
*/
svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_server->sv_bufsz,
- 3 * svsk->sk_server->sv_bufsz);
+ 3 * svsk->sk_server->sv_max_mesg,
+ 3 * svsk->sk_server->sv_max_mesg);
set_bit(SK_DATA, &svsk->sk_flags); /* might have come in before data_ready set up */
set_bit(SK_CHNGBUF, &svsk->sk_flags);
@@ -993,8 +993,8 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
* as soon a a complete request arrives.
*/
svc_sock_setbufsize(svsk->sk_sock,
- (serv->sv_nrthreads+3) * serv->sv_bufsz,
- 3 * serv->sv_bufsz);
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg,
+ 3 * serv->sv_max_mesg);
clear_bit(SK_DATA, &svsk->sk_flags);
@@ -1032,7 +1032,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
}
svsk->sk_reclen &= 0x7fffffff;
dprintk("svc: TCP record, %d bytes\n", svsk->sk_reclen);
- if (svsk->sk_reclen > serv->sv_bufsz) {
+ if (svsk->sk_reclen > serv->sv_max_mesg) {
printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
(unsigned long) svsk->sk_reclen);
goto err_delete;
@@ -1171,8 +1171,8 @@ svc_tcp_init(struct svc_sock *svsk)
* svc_tcp_recvfrom will re-adjust if necessary
*/
svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_server->sv_bufsz,
- 3 * svsk->sk_server->sv_bufsz);
+ 3 * svsk->sk_server->sv_max_mesg,
+ 3 * svsk->sk_server->sv_max_mesg);
set_bit(SK_CHNGBUF, &svsk->sk_flags);
set_bit(SK_DATA, &svsk->sk_flags);
@@ -1234,7 +1234,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
/* now allocate needed pages. If we get a failure, sleep briefly */
- pages = 2 + (serv->sv_bufsz + PAGE_SIZE -1) / PAGE_SIZE;
+ pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE;
for (i=0; i < pages ; i++)
while (rqstp->rq_pages[i] == NULL) {
struct page *p = alloc_page(GFP_KERNEL);
@@ -1263,7 +1263,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
if ((svsk = svc_sock_dequeue(pool)) != NULL) {
rqstp->rq_sock = svsk;
atomic_inc(&svsk->sk_inuse);
- rqstp->rq_reserved = serv->sv_bufsz;
+ rqstp->rq_reserved = serv->sv_max_mesg;
atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
} else {
/* No data pending. Go to sleep */
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-10-03 5:41 ` Neil Brown
@ 2006-10-03 8:02 ` Greg Banks
2006-10-05 7:07 ` Neil Brown
0 siblings, 1 reply; 25+ messages in thread
From: Greg Banks @ 2006-10-03 8:02 UTC (permalink / raw)
To: Neil Brown; +Cc: J. Bruce Fields, nfs, linux-kernel
On Tue, Oct 03, 2006 at 03:41:43PM +1000, Neil Brown wrote:
> Comments on the below?
Looks ok, except...
> @@ -57,7 +57,8 @@ struct svc_serv {
> struct svc_stat * sv_stats; /* RPC statistics */
> spinlock_t sv_lock;
> unsigned int sv_nrthreads; /* # of server threads */
> - unsigned int sv_bufsz; /* datagram buffer size */
> + unsigned int sv_max_payload; /* datagram payload size */
> + unsigned int sv_max_mesg; /* bufsz + 1 page for overheads */
Presumably the comment should read "max_payload + 1 page..." ?
> @@ -414,9 +415,11 @@ svc_init_buffer(struct svc_rqst *rqstp,
> int pages;
> int arghi;
>
> - if (size > RPCSVC_MAXPAYLOAD)
> - size = RPCSVC_MAXPAYLOAD;
> - pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
> + if (size > RPCSVC_MAXPAYLOAD + PAGE_SIZE)
> + size = RPCSVC_MAXPAYLOAD + PAGE_SIZE;
> + pages = size + PAGE_SIZE; /* extra page as we hold both request and reply.
> + * We assume one is at most one page
> + */
Isn't there a divide by PAGE_SIZE missing here? Looks
like we'll be allocating a *lot* of pages ;-)
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NFS] [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.
2006-10-03 8:02 ` Greg Banks
@ 2006-10-05 7:07 ` Neil Brown
0 siblings, 0 replies; 25+ messages in thread
From: Neil Brown @ 2006-10-05 7:07 UTC (permalink / raw)
To: Greg Banks; +Cc: J. Bruce Fields, nfs, linux-kernel
On Tuesday October 3, gnb@sgi.com wrote:
> On Tue, Oct 03, 2006 at 03:41:43PM +1000, Neil Brown wrote:
> > Comments on the below?
>
> Looks ok, except...
>
> > @@ -57,7 +57,8 @@ struct svc_serv {
> > struct svc_stat * sv_stats; /* RPC statistics */
> > spinlock_t sv_lock;
> > unsigned int sv_nrthreads; /* # of server threads */
> > - unsigned int sv_bufsz; /* datagram buffer size */
> > + unsigned int sv_max_payload; /* datagram payload size */
> > + unsigned int sv_max_mesg; /* bufsz + 1 page for overheads */
>
> Presumably the comment should read "max_payload + 1 page..." ?
>
Yes....
> > @@ -414,9 +415,11 @@ svc_init_buffer(struct svc_rqst *rqstp,
> > int pages;
> > int arghi;
> >
> > - if (size > RPCSVC_MAXPAYLOAD)
> > - size = RPCSVC_MAXPAYLOAD;
> > - pages = 2 + (size+ PAGE_SIZE -1) / PAGE_SIZE;
> > + if (size > RPCSVC_MAXPAYLOAD + PAGE_SIZE)
> > + size = RPCSVC_MAXPAYLOAD + PAGE_SIZE;
> > + pages = size + PAGE_SIZE; /* extra page as we hold both request and reply.
> > + * We assume one is at most one page
> > + */
>
> Isn't there a divide by PAGE_SIZE missing here? Looks
> like we'll be allocating a *lot* of pages ;-)
Better safe that sorry? But yes, we would be very sorry if we tried
to allocate that many pages.
Thanks for the review.
NeilBrown
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2006-10-05 7:07 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-24 6:36 [PATCH 000 of 11] knfsd: Introduction NeilBrown
2006-08-24 6:36 ` [PATCH 001 of 11] knfsd: nfsd: lockdep annotation fix NeilBrown
2006-08-24 6:36 ` [PATCH 002 of 11] knfsd: Fix a botched comment from the last patchset NeilBrown
2006-08-24 6:36 ` [PATCH 003 of 11] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist NeilBrown
2006-08-24 6:36 ` [PATCH 004 of 11] knfsd: Protect update to sn_nrthreads with lock_kernel NeilBrown
2006-08-24 6:36 ` [PATCH 005 of 11] knfsd: Fixed handling of lockd fail when adding nfsd socket NeilBrown
2006-08-24 6:36 ` [PATCH 006 of 11] knfsd: Replace two page lists in struct svc_rqst with one NeilBrown
2006-08-24 6:37 ` [PATCH 007 of 11] knfsd: Avoid excess stack usage in svc_tcp_recvfrom NeilBrown
2006-08-24 6:37 ` [PATCH 008 of 11] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP NeilBrown
2006-09-25 15:43 ` [NFS] " J. Bruce Fields
2006-09-28 3:41 ` Neil Brown
2006-09-28 3:46 ` Andrew Morton
2006-10-03 1:36 ` Neil Brown
2006-10-03 1:59 ` Greg Banks
2006-10-03 2:13 ` J. Bruce Fields
2006-10-03 5:41 ` Neil Brown
2006-10-03 8:02 ` Greg Banks
2006-10-05 7:07 ` Neil Brown
2006-08-24 6:37 ` [PATCH 009 of 11] knfsd: Allow max size of NFSd payload to be configured NeilBrown
2006-09-25 21:24 ` [NFS] " J. Bruce Fields
2006-09-28 4:22 ` Neil Brown
2006-09-28 17:09 ` Hugh Dickins
2006-09-29 1:59 ` Neil Brown
2006-08-24 6:37 ` [PATCH 010 of 11] knfsd: make nfsd readahead params cache SMP-friendly NeilBrown
2006-08-24 6:37 ` [PATCH 011 of 11] knfsd: knfsd: cache ipmap per TCP socket NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).