All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] nfsd: better stats gathering for DRC
@ 2013-02-15 17:23 Jeff Layton
  2013-02-15 17:23 ` [PATCH v2 1/3] nfsd: break out comparator into separate function Jeff Layton
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jeff Layton @ 2013-02-15 17:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

This is a respin of the series I posted yesterday. The main differences are:

1/ I dropped the first patch that fixes the comments, since Bruce said
   that he was applying it.

2/ I've changed the csum_misses counter to only increment when the
   checksum comparison fails (instead of checksum and the length). I
   also cleaned up the comparator function a bit.

3/ I added a new patch to track memory utilization by the DRC, figuring
   that might be handy if someone runs into memory problems later.

Further comments and suggestions welcome. I'd like to see this in 3.9
if possible.

Jeff Layton (3):
  nfsd: break out comparator into separate function
  nfsd: track memory utilization in the DRC
  nfsd: add new reply_cache_stats file in nfsdfs

 fs/nfsd/cache.h    |  1 +
 fs/nfsd/nfscache.c | 77 +++++++++++++++++++++++++++++++++++++++++++++---------
 fs/nfsd/nfsctl.c   |  9 +++++++
 3 files changed, 74 insertions(+), 13 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/3] nfsd: break out comparator into separate function
  2013-02-15 17:23 [PATCH v2 0/3] nfsd: better stats gathering for DRC Jeff Layton
@ 2013-02-15 17:23 ` Jeff Layton
  2013-02-15 17:23 ` [PATCH v2 2/3] nfsd: track memory utilization in the DRC Jeff Layton
  2013-02-15 17:23 ` [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
  2 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2013-02-15 17:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Break the function that compares the rqstp and checksum against a reply
cache entry. While we're at it, track the efficacy of the checksum over
the NFS data by tracking the cases where we would have incorrectly
matched a DRC entry if we had not tracked it or the length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 2f9c2d2..c00c5bc 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -25,6 +25,7 @@ static struct list_head 	lru_head;
 static struct kmem_cache	*drc_slab;
 static unsigned int		num_drc_entries;
 static unsigned int		max_drc_entries;
+static unsigned int		csum_misses;
 
 /*
  * Calculate the hash index from an XID.
@@ -272,6 +273,26 @@ nfsd_cache_csum(struct svc_rqst *rqstp)
 	return csum;
 }
 
+static bool
+nfsd_cache_match(struct svc_rqst *rqstp, __wsum csum, struct svc_cacherep *rp)
+{
+	/* Check RPC header info first */
+	if (rqstp->rq_xid != rp->c_xid || rqstp->rq_proc != rp->c_proc ||
+	    rqstp->rq_prot != rp->c_prot || rqstp->rq_vers != rp->c_vers ||
+	    rqstp->rq_arg.len != rp->c_len ||
+	    !rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) ||
+	    rpc_get_port(svc_addr(rqstp)) != rpc_get_port((struct sockaddr *)&rp->c_addr))
+		return false;
+
+	/* compare checksum of NFS data */
+	if (csum != rp->c_csum) {
+		++csum_misses;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * Search the request hash for an entry that matches the given rqstp.
  * Must be called with cache_lock held. Returns the found entry or
@@ -283,18 +304,10 @@ nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
 	struct svc_cacherep	*rp;
 	struct hlist_node	*hn;
 	struct hlist_head 	*rh;
-	__be32			xid = rqstp->rq_xid;
-	u32			proto =  rqstp->rq_prot,
-				vers = rqstp->rq_vers,
-				proc = rqstp->rq_proc;
 
-	rh = &cache_hash[request_hash(xid)];
+	rh = &cache_hash[request_hash(rqstp->rq_xid)];
 	hlist_for_each_entry(rp, hn, rh, c_hash) {
-		if (xid == rp->c_xid && proc == rp->c_proc &&
-		    proto == rp->c_prot && vers == rp->c_vers &&
-		    rqstp->rq_arg.len == rp->c_len && csum == rp->c_csum &&
-		    rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
-		    rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
+		if (nfsd_cache_match(rqstp, csum, rp))
 			return rp;
 	}
 	return NULL;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/3] nfsd: track memory utilization in the DRC
  2013-02-15 17:23 [PATCH v2 0/3] nfsd: better stats gathering for DRC Jeff Layton
  2013-02-15 17:23 ` [PATCH v2 1/3] nfsd: break out comparator into separate function Jeff Layton
@ 2013-02-15 17:23 ` Jeff Layton
  2013-02-15 17:23 ` [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
  2 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2013-02-15 17:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index c00c5bc..ad1d069 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -25,6 +25,7 @@ static struct list_head 	lru_head;
 static struct kmem_cache	*drc_slab;
 static unsigned int		num_drc_entries;
 static unsigned int		max_drc_entries;
+static unsigned int		drc_mem_usage;
 static unsigned int		csum_misses;
 
 /*
@@ -101,11 +102,14 @@ nfsd_reply_cache_alloc(void)
 static void
 nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
 {
-	if (rp->c_type == RC_REPLBUFF)
+	if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) {
+		drc_mem_usage -= ksize(rp->c_replvec.iov_base);
 		kfree(rp->c_replvec.iov_base);
+	}
 	hlist_del(&rp->c_hash);
 	list_del(&rp->c_lru);
 	--num_drc_entries;
+	drc_mem_usage -= ksize(rp);
 	kmem_cache_free(drc_slab, rp);
 }
 
@@ -367,6 +371,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
 		return RC_DOIT;
 	}
 	spin_lock(&cache_lock);
+	drc_mem_usage += ksize(rp);
 	++num_drc_entries;
 
 	/*
@@ -407,6 +412,7 @@ setup_entry:
 
 	/* release any buffer */
 	if (rp->c_type == RC_REPLBUFF) {
+		drc_mem_usage -= ksize(rp->c_replvec.iov_base);
 		kfree(rp->c_replvec.iov_base);
 		rp->c_replvec.iov_base = NULL;
 	}
@@ -475,6 +481,7 @@ nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp)
 	struct svc_cacherep *rp = rqstp->rq_cacherep;
 	struct kvec	*resv = &rqstp->rq_res.head[0], *cachv;
 	int		len;
+	size_t		replbufsize = 0;
 
 	if (!rp)
 		return;
@@ -501,6 +508,7 @@ nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp)
 			nfsd_reply_cache_free(rp);
 			return;
 		}
+		replbufsize = ksize(cachv->iov_base);
 		cachv->iov_len = len << 2;
 		memcpy(cachv->iov_base, statp, len << 2);
 		break;
@@ -509,6 +517,7 @@ nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp)
 		return;
 	}
 	spin_lock(&cache_lock);
+	drc_mem_usage += replbufsize;
 	lru_put_end(rp);
 	rp->c_secure = rqstp->rq_secure;
 	rp->c_type = cachetype;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs
  2013-02-15 17:23 [PATCH v2 0/3] nfsd: better stats gathering for DRC Jeff Layton
  2013-02-15 17:23 ` [PATCH v2 1/3] nfsd: break out comparator into separate function Jeff Layton
  2013-02-15 17:23 ` [PATCH v2 2/3] nfsd: track memory utilization in the DRC Jeff Layton
@ 2013-02-15 17:23 ` Jeff Layton
  2013-02-15 17:29   ` Chuck Lever
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-15 17:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

For presenting statistics relating to duplicate reply cache.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/cache.h    |  1 +
 fs/nfsd/nfscache.c | 37 +++++++++++++++++++++++++++++++++----
 fs/nfsd/nfsctl.c   |  9 +++++++++
 3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
index 87fd141..d5c5b3e 100644
--- a/fs/nfsd/cache.h
+++ b/fs/nfsd/cache.h
@@ -82,6 +82,7 @@ int	nfsd_reply_cache_init(void);
 void	nfsd_reply_cache_shutdown(void);
 int	nfsd_cache_lookup(struct svc_rqst *);
 void	nfsd_cache_update(struct svc_rqst *, int, __be32 *);
+int	nfsd_reply_cache_stats_open(struct inode *, struct file *);
 
 #ifdef CONFIG_NFSD_V4
 void	nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp);
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index ad1d069..b155afa 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -23,12 +23,17 @@
 static struct hlist_head *	cache_hash;
 static struct list_head 	lru_head;
 static struct kmem_cache	*drc_slab;
-static unsigned int		num_drc_entries;
-static unsigned int		max_drc_entries;
-static unsigned int		drc_mem_usage;
-static unsigned int		csum_misses;
 
 /*
+ * Stats on the duplicate reply cache code. All of these and the "rc" fields
+ * in nfsdstats are protected by the cache_lock
+ */
+static unsigned int		num_drc_entries; /* number of entries */
+static unsigned int		max_drc_entries; /* max number of entries */
+static unsigned int		drc_mem_usage;   /* memory used by cache */
+static unsigned int		csum_misses;     /* cache misses due only to
+						    csum comparison failures */
+/*
  * Calculate the hash index from an XID.
  */
 static inline u32 request_hash(u32 xid)
@@ -545,3 +550,27 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data)
 	vec->iov_len += data->iov_len;
 	return 1;
 }
+
+/*
+ * Note that fields may be added, removed or reordered in the future. Programs
+ * scraping this file for info should test the labels to ensure they're
+ * getting the correct field.
+ */
+static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
+{
+	spin_lock(&cache_lock);
+	seq_printf(m, "max entries:     %u\n", max_drc_entries);
+	seq_printf(m, "num entries:     %u\n", num_drc_entries);
+	seq_printf(m, "mem usage:       %u\n", drc_mem_usage);
+	seq_printf(m, "cache hits:      %u\n", nfsdstats.rchits);
+	seq_printf(m, "cache misses:    %u\n", nfsdstats.rcmisses);
+	seq_printf(m, "not cached:      %u\n", nfsdstats.rcnocache);
+	seq_printf(m, "checksum misses: %u\n", csum_misses);
+	spin_unlock(&cache_lock);
+	return 0;
+}
+
+int nfsd_reply_cache_stats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, nfsd_reply_cache_stats_show, NULL);
+}
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 29c3f0d..5dea5af 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -35,6 +35,7 @@ enum {
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Pool_Stats,
+	NFSD_Reply_Cache_Stats,
 	NFSD_Versions,
 	NFSD_Ports,
 	NFSD_MaxBlkSize,
@@ -194,6 +195,13 @@ static const struct file_operations pool_stats_operations = {
 	.owner		= THIS_MODULE,
 };
 
+static struct file_operations reply_cache_stats_operations = {
+	.open		= nfsd_reply_cache_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 /*----------------------------------------------------------------------------*/
 /*
  * payload - write methods
@@ -1024,6 +1032,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
+		[NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO},
 		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs
  2013-02-15 17:23 ` [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
@ 2013-02-15 17:29   ` Chuck Lever
  2013-02-15 18:34     ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2013-02-15 17:29 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, linux-nfs


On Feb 15, 2013, at 12:23 PM, Jeff Layton <jlayton@redhat.com> wrote:

> For presenting statistics relating to duplicate reply cache.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
> fs/nfsd/cache.h    |  1 +
> fs/nfsd/nfscache.c | 37 +++++++++++++++++++++++++++++++++----
> fs/nfsd/nfsctl.c   |  9 +++++++++
> 3 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> index 87fd141..d5c5b3e 100644
> --- a/fs/nfsd/cache.h
> +++ b/fs/nfsd/cache.h
> @@ -82,6 +82,7 @@ int	nfsd_reply_cache_init(void);
> void	nfsd_reply_cache_shutdown(void);
> int	nfsd_cache_lookup(struct svc_rqst *);
> void	nfsd_cache_update(struct svc_rqst *, int, __be32 *);
> +int	nfsd_reply_cache_stats_open(struct inode *, struct file *);
> 
> #ifdef CONFIG_NFSD_V4
> void	nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp);
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index ad1d069..b155afa 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -23,12 +23,17 @@
> static struct hlist_head *	cache_hash;
> static struct list_head 	lru_head;
> static struct kmem_cache	*drc_slab;
> -static unsigned int		num_drc_entries;
> -static unsigned int		max_drc_entries;
> -static unsigned int		drc_mem_usage;
> -static unsigned int		csum_misses;
> 
> /*
> + * Stats on the duplicate reply cache code. All of these and the "rc" fields
> + * in nfsdstats are protected by the cache_lock
> + */
> +static unsigned int		num_drc_entries; /* number of entries */
> +static unsigned int		max_drc_entries; /* max number of entries */
> +static unsigned int		drc_mem_usage;   /* memory used by cache */
> +static unsigned int		csum_misses;     /* cache misses due only to
> +						    csum comparison failures */
> +/*
>  * Calculate the hash index from an XID.
>  */
> static inline u32 request_hash(u32 xid)
> @@ -545,3 +550,27 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data)
> 	vec->iov_len += data->iov_len;
> 	return 1;
> }
> +
> +/*
> + * Note that fields may be added, removed or reordered in the future. Programs
> + * scraping this file for info should test the labels to ensure they're
> + * getting the correct field.
> + */
> +static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
> +{
> +	spin_lock(&cache_lock);
> +	seq_printf(m, "max entries:     %u\n", max_drc_entries);
> +	seq_printf(m, "num entries:     %u\n", num_drc_entries);
> +	seq_printf(m, "mem usage:       %u\n", drc_mem_usage);
> +	seq_printf(m, "cache hits:      %u\n", nfsdstats.rchits);
> +	seq_printf(m, "cache misses:    %u\n", nfsdstats.rcmisses);
> +	seq_printf(m, "not cached:      %u\n", nfsdstats.rcnocache);
> +	seq_printf(m, "checksum misses: %u\n", csum_misses);
> +	spin_unlock(&cache_lock);
> +	return 0;
> +}

Would it be hard to add a statistic that quantified the distribution of entries in the cache hash table?  Something like "maximum length of cache hash chain" ?


> +
> +int nfsd_reply_cache_stats_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, nfsd_reply_cache_stats_show, NULL);
> +}
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 29c3f0d..5dea5af 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -35,6 +35,7 @@ enum {
> 	NFSD_Threads,
> 	NFSD_Pool_Threads,
> 	NFSD_Pool_Stats,
> +	NFSD_Reply_Cache_Stats,
> 	NFSD_Versions,
> 	NFSD_Ports,
> 	NFSD_MaxBlkSize,
> @@ -194,6 +195,13 @@ static const struct file_operations pool_stats_operations = {
> 	.owner		= THIS_MODULE,
> };
> 
> +static struct file_operations reply_cache_stats_operations = {
> +	.open		= nfsd_reply_cache_stats_open,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= single_release,
> +};
> +
> /*----------------------------------------------------------------------------*/
> /*
>  * payload - write methods
> @@ -1024,6 +1032,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
> 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
> 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> 		[NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
> +		[NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO},
> 		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
> 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
> 		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs
  2013-02-15 17:29   ` Chuck Lever
@ 2013-02-15 18:34     ` Jeff Layton
  2013-02-15 20:04       ` [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-15 18:34 UTC (permalink / raw)
  To: Chuck Lever; +Cc: bfields, linux-nfs

On Fri, 15 Feb 2013 12:29:08 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Feb 15, 2013, at 12:23 PM, Jeff Layton <jlayton@redhat.com> wrote:
> 
> > For presenting statistics relating to duplicate reply cache.
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> > fs/nfsd/cache.h    |  1 +
> > fs/nfsd/nfscache.c | 37 +++++++++++++++++++++++++++++++++----
> > fs/nfsd/nfsctl.c   |  9 +++++++++
> > 3 files changed, 43 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> > index 87fd141..d5c5b3e 100644
> > --- a/fs/nfsd/cache.h
> > +++ b/fs/nfsd/cache.h
> > @@ -82,6 +82,7 @@ int	nfsd_reply_cache_init(void);
> > void	nfsd_reply_cache_shutdown(void);
> > int	nfsd_cache_lookup(struct svc_rqst *);
> > void	nfsd_cache_update(struct svc_rqst *, int, __be32 *);
> > +int	nfsd_reply_cache_stats_open(struct inode *, struct file *);
> > 
> > #ifdef CONFIG_NFSD_V4
> > void	nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp);
> > diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> > index ad1d069..b155afa 100644
> > --- a/fs/nfsd/nfscache.c
> > +++ b/fs/nfsd/nfscache.c
> > @@ -23,12 +23,17 @@
> > static struct hlist_head *	cache_hash;
> > static struct list_head 	lru_head;
> > static struct kmem_cache	*drc_slab;
> > -static unsigned int		num_drc_entries;
> > -static unsigned int		max_drc_entries;
> > -static unsigned int		drc_mem_usage;
> > -static unsigned int		csum_misses;
> > 
> > /*
> > + * Stats on the duplicate reply cache code. All of these and the "rc" fields
> > + * in nfsdstats are protected by the cache_lock
> > + */
> > +static unsigned int		num_drc_entries; /* number of entries */
> > +static unsigned int		max_drc_entries; /* max number of entries */
> > +static unsigned int		drc_mem_usage;   /* memory used by cache */
> > +static unsigned int		csum_misses;     /* cache misses due only to
> > +						    csum comparison failures */
> > +/*
> >  * Calculate the hash index from an XID.
> >  */
> > static inline u32 request_hash(u32 xid)
> > @@ -545,3 +550,27 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data)
> > 	vec->iov_len += data->iov_len;
> > 	return 1;
> > }
> > +
> > +/*
> > + * Note that fields may be added, removed or reordered in the future. Programs
> > + * scraping this file for info should test the labels to ensure they're
> > + * getting the correct field.
> > + */
> > +static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
> > +{
> > +	spin_lock(&cache_lock);
> > +	seq_printf(m, "max entries:     %u\n", max_drc_entries);
> > +	seq_printf(m, "num entries:     %u\n", num_drc_entries);
> > +	seq_printf(m, "mem usage:       %u\n", drc_mem_usage);
> > +	seq_printf(m, "cache hits:      %u\n", nfsdstats.rchits);
> > +	seq_printf(m, "cache misses:    %u\n", nfsdstats.rcmisses);
> > +	seq_printf(m, "not cached:      %u\n", nfsdstats.rcnocache);
> > +	seq_printf(m, "checksum misses: %u\n", csum_misses);
> > +	spin_unlock(&cache_lock);
> > +	return 0;
> > +}
> 
> Would it be hard to add a statistic that quantified the distribution of entries in the cache hash table?  Something like "maximum length of cache hash chain" ?
> 

Hard? Not very, but it wouldn't be free...

I guess we'd need to keep an array of HASHSIZE (64) uints to track the
length of each hash bucket. Currently too, we don't need to know what
hash bucket an entry sits when freeing it. So we'd need to recompute
that when freeing the entry (which is fairly trivial).

Also, what sort of info do you want here? Longest hash chain that the
cache has ever had, or longest hash chain currently in the cache? It
may be better to just present the length of each chain instead and let
userland sort out the max.

I'd probably suggest that do this in a later patch on top of this
series once we determine what info we're looking for here. It should
be doable for 3.9 though...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-15 18:34     ` Jeff Layton
@ 2013-02-15 20:04       ` Jeff Layton
  2013-02-15 21:14         ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-15 20:04 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, chuck.lever

So we can get a feel for how effective the hashing function is.

As Chuck Lever pointed out to me, it's generally acceptable to do
"expensive" stuff when reading the stats since that's a relatively
rare activity.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index a5ac9ab..172e211 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -551,6 +551,25 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data)
 	return 1;
 }
 
+/* Get stats on the hashtable itself */
+static unsigned int
+nfsd_repcache_max_chain_len(void)
+{
+	int i;
+	struct hlist_node *pos;
+	unsigned int max = 0;
+
+	for (i = 0; i < HASHSIZE; ++i) {
+		unsigned int cur = 0;
+
+		hlist_for_each(pos, &cache_hash[i])
+			++cur;
+		max = max(cur, max);
+	}
+
+	return max;
+}
+
 /*
  * Note that fields may be added, removed or reordered in the future. Programs
  * scraping this file for info should test the labels to ensure they're
@@ -566,6 +585,7 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "cache misses:    %u\n", nfsdstats.rcmisses);
 	seq_printf(m, "not cached:      %u\n", nfsdstats.rcnocache);
 	seq_printf(m, "checksum misses: %u\n", csum_misses);
+	seq_printf(m, "max chain len:   %u\n", nfsd_repcache_max_chain_len());
 	spin_unlock(&cache_lock);
 	return 0;
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-15 20:04       ` [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats Jeff Layton
@ 2013-02-15 21:14         ` Chuck Lever
  2013-02-15 22:20           ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2013-02-15 21:14 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, linux-nfs


On Feb 15, 2013, at 3:04 PM, Jeff Layton <jlayton@redhat.com> wrote:

> So we can get a feel for how effective the hashing function is.
> 
> As Chuck Lever pointed out to me, it's generally acceptable to do
> "expensive" stuff when reading the stats since that's a relatively
> rare activity.

A good measure of the efficacy of a hash function is the ratio of the maximum chain length to the optimal chain length (which can be computed by dividing the total number of cache entries by the number of hash chains).

If we plan to stick with a hash table for this cache, there should be some indication when the hash function falls over.  This will matter because the DRC can now grow much larger, which is turning out to be the real fundamental change with this work.

A philosophical question though is "How can we know when the DRC is large enough?"


> Cc: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
> fs/nfsd/nfscache.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
> 
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index a5ac9ab..172e211 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -551,6 +551,25 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data)
> 	return 1;
> }
> 
> +/* Get stats on the hashtable itself */
> +static unsigned int
> +nfsd_repcache_max_chain_len(void)
> +{
> +	int i;
> +	struct hlist_node *pos;
> +	unsigned int max = 0;
> +
> +	for (i = 0; i < HASHSIZE; ++i) {
> +		unsigned int cur = 0;
> +
> +		hlist_for_each(pos, &cache_hash[i])
> +			++cur;
> +		max = max(cur, max);
> +	}
> +
> +	return max;
> +}
> +
> /*
>  * Note that fields may be added, removed or reordered in the future. Programs
>  * scraping this file for info should test the labels to ensure they're
> @@ -566,6 +585,7 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
> 	seq_printf(m, "cache misses:    %u\n", nfsdstats.rcmisses);
> 	seq_printf(m, "not cached:      %u\n", nfsdstats.rcnocache);
> 	seq_printf(m, "checksum misses: %u\n", csum_misses);
> +	seq_printf(m, "max chain len:   %u\n", nfsd_repcache_max_chain_len());
> 	spin_unlock(&cache_lock);
> 	return 0;
> }
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-15 21:14         ` Chuck Lever
@ 2013-02-15 22:20           ` Jeff Layton
  2013-02-16 13:39             ` J. Bruce Fields
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-15 22:20 UTC (permalink / raw)
  To: Chuck Lever; +Cc: bfields, linux-nfs

On Fri, 15 Feb 2013 16:14:56 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Feb 15, 2013, at 3:04 PM, Jeff Layton <jlayton@redhat.com> wrote:
> 
> > So we can get a feel for how effective the hashing function is.
> > 
> > As Chuck Lever pointed out to me, it's generally acceptable to do
> > "expensive" stuff when reading the stats since that's a relatively
> > rare activity.
> 
> A good measure of the efficacy of a hash function is the ratio of the maximum chain length to the optimal chain length (which can be computed by dividing the total number of cache entries by the number of hash chains).
> 

Right, the number of chains is always 64 for now (maybe we should print
that out in the stats too), so you can compute that from the values
provided here.

> If we plan to stick with a hash table for this cache, there should be some indication when the hash function falls over.  This will matter because the DRC can now grow much larger, which is turning out to be the real fundamental change with this work.
> 

That's the kicker. With the patch below, computing the max chain length
on the fly is somewhat expensive since you have to walk every entry.
It's certainly possible though (even likely) that the real max length
will occur at some point when we're not looking at this file. So how to
best gauge that?

Maybe we should just punt and move it all to a rbtree or something. A
self-balancing structure is nice and simple to deal with, even if the
insertion penalty is a bit higher...

> A philosophical question though is "How can we know when the DRC is large enough?"
> 

An excellent question, and not an easy one to answer. Clearly 1024
entries was not enough. We now cap the size as a function of the
available low memory, which I think is a reasonable way to keep it from
ballooning so large that the box falls over. We also have a shrinker
and periodic cache cleaner to prune off entries that have expired.

Of course one thing I haven't really considered enough is the
performance implications of walking the potentially much longer hash
chains here.

If that is a problem, then one way to counter that without moving to a
different structure altogether might be to alter the hash function
based on the max size of the cache. IOW, grow the number of hash buckets
as the max cache size grows?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-15 22:20           ` Jeff Layton
@ 2013-02-16 13:39             ` J. Bruce Fields
  2013-02-16 17:18               ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: J. Bruce Fields @ 2013-02-16 13:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, linux-nfs

On Fri, Feb 15, 2013 at 05:20:58PM -0500, Jeff Layton wrote:
> An excellent question, and not an easy one to answer. Clearly 1024
> entries was not enough. We now cap the size as a function of the
> available low memory, which I think is a reasonable way to keep it from
> ballooning so large that the box falls over. We also have a shrinker
> and periodic cache cleaner to prune off entries that have expired.
> 
> Of course one thing I haven't really considered enough is the
> performance implications of walking the potentially much longer hash
> chains here.
> 
> If that is a problem, then one way to counter that without moving to a
> different structure altogether might be to alter the hash function
> based on the max size of the cache. IOW, grow the number of hash buckets
> as the max cache size grows?

Another reason to organize the cache per client address?

Two levels of hash tables might be good enough: one global hash table
for the client address, one per-client for the rest.

With a per-client maximum number of entries, sizing the hash tables
should be easier.

If we wanted to be fancy in theory the address lookup could probably be
lockless in the typical case.

--b.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-16 13:39             ` J. Bruce Fields
@ 2013-02-16 17:18               ` Chuck Lever
  2013-02-17 16:00                 ` J. Bruce Fields
  2013-02-18 14:39                 ` Jeff Layton
  0 siblings, 2 replies; 17+ messages in thread
From: Chuck Lever @ 2013-02-16 17:18 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs


On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:

> On Fri, Feb 15, 2013 at 05:20:58PM -0500, Jeff Layton wrote:
>> An excellent question, and not an easy one to answer. Clearly 1024
>> entries was not enough. We now cap the size as a function of the
>> available low memory, which I think is a reasonable way to keep it from
>> ballooning so large that the box falls over. We also have a shrinker
>> and periodic cache cleaner to prune off entries that have expired.
>> 
>> Of course one thing I haven't really considered enough is the
>> performance implications of walking the potentially much longer hash
>> chains here.
>> 
>> If that is a problem, then one way to counter that without moving to a
>> different structure altogether might be to alter the hash function
>> based on the max size of the cache. IOW, grow the number of hash buckets
>> as the max cache size grows?

The trouble with a hash table is that once you've allocated it, it's a heavy lift to increase the table size.  That sort of logic adds complexity and additional locking, and is often difficult to test.

> Another reason to organize the cache per client address?


In theory, an active single client could evict all entries for other clients, but do we know this happens in practice?

> With a per-client maximum number of entries, sizing the hash tables
> should be easier.


When a server has only one client, should that client be allowed to maximize the use of a server's resources (eg, use all of the DRC resource the server has available)?  How about when a server has one active client and multiple quiescent clients?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-16 17:18               ` Chuck Lever
@ 2013-02-17 16:00                 ` J. Bruce Fields
  2013-02-17 19:58                   ` Chuck Lever
  2013-02-18 14:21                   ` Jeff Layton
  2013-02-18 14:39                 ` Jeff Layton
  1 sibling, 2 replies; 17+ messages in thread
From: J. Bruce Fields @ 2013-02-17 16:00 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs

On Sat, Feb 16, 2013 at 12:18:18PM -0500, Chuck Lever wrote:
> 
> On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > With a per-client maximum number of entries, sizing the hash tables
> > should be easier.
> When a server has only one client, should that client be allowed to
> maximize the use of a server's resources (eg, use all of the DRC
> resource the server has available)?

I've been assuming there's rapidly diminishing returns to caching a lot
of replies to a single client.  But that might not be true--I guess a
busy UDP client with a long retry timeout might benefit from a large
cache?

> How about when a server has one active client and multiple quiescent
> clients?

I think there's a chance in that case that one of the quiescent clients
is experiencing a temporary network problem, in which case we may want
to preserve a few entries for them even if the active client's activity
would normally evict them.

--b.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-17 16:00                 ` J. Bruce Fields
@ 2013-02-17 19:58                   ` Chuck Lever
  2013-02-18 14:21                   ` Jeff Layton
  1 sibling, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2013-02-17 19:58 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs


On Feb 17, 2013, at 11:00 AM, J. Bruce Fields <bfields@fieldses.org> wrote:

> On Sat, Feb 16, 2013 at 12:18:18PM -0500, Chuck Lever wrote:
>> 
>> On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> With a per-client maximum number of entries, sizing the hash tables
>>> should be easier.
>> When a server has only one client, should that client be allowed to
>> maximize the use of a server's resources (eg, use all of the DRC
>> resource the server has available)?
> 
> I've been assuming there's rapidly diminishing returns to caching a lot
> of replies to a single client.  But that might not be true--I guess a
> busy UDP client with a long retry timeout might benefit from a large
> cache?

I was thinking that the possible value of holding a reply diminishes the _longer_ you hold it.  The number of replies held depends then on how busy each client is, and of course, how much memory is available to hold cached replies.

Jeff's metrics may help us sort this out.

> 
>> How about when a server has one active client and multiple quiescent
>> clients?
> 
> I think there's a chance in that case that one of the quiescent clients
> is experiencing a temporary network problem, in which case we may want
> to preserve a few entries for them even if the active client's activity
> would normally evict them.

OK, sure.  So, we might express that as a lower bound on the number of replies cached per client?  Or would we express it as an upper bound on the age of each cached reply?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-17 16:00                 ` J. Bruce Fields
  2013-02-17 19:58                   ` Chuck Lever
@ 2013-02-18 14:21                   ` Jeff Layton
  2013-02-18 14:30                     ` J. Bruce Fields
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-18 14:21 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs

On Sun, 17 Feb 2013 11:00:56 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Sat, Feb 16, 2013 at 12:18:18PM -0500, Chuck Lever wrote:
> > 
> > On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > With a per-client maximum number of entries, sizing the hash tables
> > > should be easier.
> > When a server has only one client, should that client be allowed to
> > maximize the use of a server's resources (eg, use all of the DRC
> > resource the server has available)?
> 
> I've been assuming there's rapidly diminishing returns to caching a lot
> of replies to a single client.  But that might not be true--I guess a
> busy UDP client with a long retry timeout might benefit from a large
> cache?
> 

Yes, or one with a massively parallel workload and poor slot-table
implementation that just sprays tons of requests at the server?

> > How about when a server has one active client and multiple quiescent
> > clients?
> 
> I think there's a chance in that case that one of the quiescent clients
> is experiencing a temporary network problem, in which case we may want
> to preserve a few entries for them even if the active client's activity
> would normally evict them.
> 

Cache eviction policy is really orthogonal to how we organize it for
efficient lookups. Currently, cache entries sit in a hash table for
lookups, and on a LRU list for eviction.

We can certainly change either or both. I think the trick here is to
nail down what semantics you want for the cache and then look at how
best to organize it to achieve that.

OTOH, maybe we should see whether this is really a problem first before
we go and try to fix anything.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-18 14:21                   ` Jeff Layton
@ 2013-02-18 14:30                     ` J. Bruce Fields
  0 siblings, 0 replies; 17+ messages in thread
From: J. Bruce Fields @ 2013-02-18 14:30 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, linux-nfs

On Mon, Feb 18, 2013 at 09:21:34AM -0500, Jeff Layton wrote:
> On Sun, 17 Feb 2013 11:00:56 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Sat, Feb 16, 2013 at 12:18:18PM -0500, Chuck Lever wrote:
> > > 
> > > On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > > With a per-client maximum number of entries, sizing the hash tables
> > > > should be easier.
> > > When a server has only one client, should that client be allowed to
> > > maximize the use of a server's resources (eg, use all of the DRC
> > > resource the server has available)?
> > 
> > I've been assuming there's rapidly diminishing returns to caching a lot
> > of replies to a single client.  But that might not be true--I guess a
> > busy UDP client with a long retry timeout might benefit from a large
> > cache?
> > 
> 
> Yes, or one with a massively parallel workload and poor slot-table
> implementation that just sprays tons of requests at the server?

I believe we pin cache entries while the request is in progress and
timestamp and insert into the lru when when we send the reply--so the
parallelism doesn't matter so much, in the sense that we're not going to
for example get a big slow write operation evicted by a bunch of quick
concurrent getattrs.

What matters is: from the time we send a reply, to the time the client
retries, how many more requests does the client send?

> > > How about when a server has one active client and multiple quiescent
> > > clients?
> > 
> > I think there's a chance in that case that one of the quiescent clients
> > is experiencing a temporary network problem, in which case we may want
> > to preserve a few entries for them even if the active client's activity
> > would normally evict them.
> > 
> 
> Cache eviction policy is really orthogonal to how we organize it for
> efficient lookups. Currently, cache entries sit in a hash table for
> lookups, and on a LRU list for eviction.
> 
> We can certainly change either or both. I think the trick here is to
> nail down what semantics you want for the cache and then look at how
> best to organize it to achieve that.
> 
> OTOH, maybe we should see whether this is really a problem first before
> we go and try to fix anything.

Yeah.

--b.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-16 17:18               ` Chuck Lever
  2013-02-17 16:00                 ` J. Bruce Fields
@ 2013-02-18 14:39                 ` Jeff Layton
  2013-02-18 16:18                   ` Chuck Lever
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-18 14:39 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, linux-nfs

On Sat, 16 Feb 2013 12:18:18 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > On Fri, Feb 15, 2013 at 05:20:58PM -0500, Jeff Layton wrote:
> >> An excellent question, and not an easy one to answer. Clearly 1024
> >> entries was not enough. We now cap the size as a function of the
> >> available low memory, which I think is a reasonable way to keep it from
> >> ballooning so large that the box falls over. We also have a shrinker
> >> and periodic cache cleaner to prune off entries that have expired.
> >> 
> >> Of course one thing I haven't really considered enough is the
> >> performance implications of walking the potentially much longer hash
> >> chains here.
> >> 
> >> If that is a problem, then one way to counter that without moving to a
> >> different structure altogether might be to alter the hash function
> >> based on the max size of the cache. IOW, grow the number of hash buckets
> >> as the max cache size grows?
> 
> The trouble with a hash table is that once you've allocated it, it's a heavy lift to increase the table size.  That sort of logic adds complexity and additional locking, and is often difficult to test.
> 

I wasn't suggesting that we resize/rebalance the table on the fly. We
determine the max allowable number of cache entries when the server
starts. We could also determine the number of buckets at the same time,
and alter the hashing function to take that number into account.

Of course, more buckets may not help if the hash function just sucks.

> > Another reason to organize the cache per client address?
> 
> 
> In theory, an active single client could evict all entries for other clients, but do we know this happens in practice?
> 

I'm pretty sure that's what's been happening to our QA group. They have
some shared NFS servers set up in the test lab for client testing. When
things get busy, they the DRC just plain doesn't appear to work. It's
hard to know for sure though since the problem only crops up very
rarely.

My hope is that the massive increase in the size of the DRC should
prevent that from occurring now.

> > With a per-client maximum number of entries, sizing the hash tables
> > should be easier.
> 
> 
> When a server has only one client, should that client be allowed to maximize the use of a server's resources (eg, use all of the DRC resource the server has available)?  How about when a server has one active client and multiple quiescent clients?
> 

Again, we have two "problems" to solve, and we need to take care not to
conflate them too much.

1) how do we best organize the cache for efficient lookups?

2) what cache eviction policy should we use?

Organizing the cache based on client address might help both of those,
but we'll need to determine whether it's worth the extra complexity.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
  2013-02-18 14:39                 ` Jeff Layton
@ 2013-02-18 16:18                   ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2013-02-18 16:18 UTC (permalink / raw)
  To: Jeff Layton; +Cc: J. Bruce Fields, linux-nfs


On Feb 18, 2013, at 9:39 AM, Jeff Layton <jlayton@redhat.com> wrote:

> On Sat, 16 Feb 2013 12:18:18 -0500
> Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> 
>>> On Fri, Feb 15, 2013 at 05:20:58PM -0500, Jeff Layton wrote:
>>>> An excellent question, and not an easy one to answer. Clearly 1024
>>>> entries was not enough. We now cap the size as a function of the
>>>> available low memory, which I think is a reasonable way to keep it from
>>>> ballooning so large that the box falls over. We also have a shrinker
>>>> and periodic cache cleaner to prune off entries that have expired.
>>>> 
>>>> Of course one thing I haven't really considered enough is the
>>>> performance implications of walking the potentially much longer hash
>>>> chains here.
>>>> 
>>>> If that is a problem, then one way to counter that without moving to a
>>>> different structure altogether might be to alter the hash function
>>>> based on the max size of the cache. IOW, grow the number of hash buckets
>>>> as the max cache size grows?
>> 
>> The trouble with a hash table is that once you've allocated it, it's a heavy lift to increase the table size.  That sort of logic adds complexity and additional locking, and is often difficult to test.
>> 
> 
> I wasn't suggesting that we resize/rebalance the table on the fly. We
> determine the max allowable number of cache entries when the server
> starts. We could also determine the number of buckets at the same time,
> and alter the hashing function to take that number into account.
> 
> Of course, more buckets may not help if the hash function just sucks.

My point was that using a data structure like an rbtree, as you suggested, has the advantage of not requiring any extra logic (whether done statically or dynamically).  An rbtree implementation is much more likely to exercise exactly the same code paths no matter how many entries are in the cache.  It will scale deterministically as DRC capacity continues to increase over time.  And it is not dependent on efficient hash-ability of client XIDs.


> 
>>> Another reason to organize the cache per client address?
>> 
>> 
>> In theory, an active single client could evict all entries for other clients, but do we know this happens in practice?
>> 
> 
> I'm pretty sure that's what's been happening to our QA group. They have
> some shared NFS servers set up in the test lab for client testing. When
> things get busy, they the DRC just plain doesn't appear to work. It's
> hard to know for sure though since the problem only crops up very
> rarely.
> 
> My hope is that the massive increase in the size of the DRC should
> prevent that from occurring now.

Well, right.  If the DRC is too small already, any client can cause entries to be evicted prematurely, even its own entries.  Now we have an opportunity to ask whether a larger cache makes it much less likely that a single obnoxious client will effect his neighbors.  In other words, maybe per-client sensitivity doesn't buy us much if the DRC can grow larger.


> 
>>> With a per-client maximum number of entries, sizing the hash tables
>>> should be easier.
>> 
>> 
>> When a server has only one client, should that client be allowed to maximize the use of a server's resources (eg, use all of the DRC resource the server has available)?  How about when a server has one active client and multiple quiescent clients?
>> 
> 
> Again, we have two "problems" to solve, and we need to take care not to
> conflate them too much.
> 
> 1) how do we best organize the cache for efficient lookups?
> 
> 2) what cache eviction policy should we use?

These are related, I feel.  Because memory capacity is increasing faster than memory bandwidth, I think lookup time efficiency is relatively more important than cache space efficiency.  Also, above we were arguing that a larger DRC is more effective.

What's more, 2) is a much harder problem to solve well.  And if lookups are efficient anyway, then 2) should matter much less, as it will only be needed to deal with memory pressure on the server, and not to keep lookup time efficient.

I was proposing the approach of dealing with per-client caching with 2) (choosing which entry to evict based on how many entries there are for each client).  But 2) looks less critical than 1), thus perhaps per-client caching may not provide a lot of benefit.


> Organizing the cache based on client address might help both of those,
> but we'll need to determine whether it's worth the extra complexity.

Agree.  Data, data, data.


-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-02-18 16:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-15 17:23 [PATCH v2 0/3] nfsd: better stats gathering for DRC Jeff Layton
2013-02-15 17:23 ` [PATCH v2 1/3] nfsd: break out comparator into separate function Jeff Layton
2013-02-15 17:23 ` [PATCH v2 2/3] nfsd: track memory utilization in the DRC Jeff Layton
2013-02-15 17:23 ` [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
2013-02-15 17:29   ` Chuck Lever
2013-02-15 18:34     ` Jeff Layton
2013-02-15 20:04       ` [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats Jeff Layton
2013-02-15 21:14         ` Chuck Lever
2013-02-15 22:20           ` Jeff Layton
2013-02-16 13:39             ` J. Bruce Fields
2013-02-16 17:18               ` Chuck Lever
2013-02-17 16:00                 ` J. Bruce Fields
2013-02-17 19:58                   ` Chuck Lever
2013-02-18 14:21                   ` Jeff Layton
2013-02-18 14:30                     ` J. Bruce Fields
2013-02-18 14:39                 ` Jeff Layton
2013-02-18 16:18                   ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.