All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] nfsd: Improve NFS server performance
@ 2008-12-30 10:42 Krishna Kumar
       [not found] ` <20081230104245.9409.30030.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  2009-02-04 23:19 ` [RFC PATCH 0/1] nfsd: Improve NFS server performance J. Bruce Fields
  0 siblings, 2 replies; 11+ messages in thread
From: Krishna Kumar @ 2008-12-30 10:42 UTC (permalink / raw)
  To: linux-nfs; +Cc: krkumar2, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

Patch summary:
--------------
Change the readahead caching on the server to a file handle caching model.
Since file handles are unique, this patch removes all dependencies on the
kernel readahead parameters/implementation and instead caches files based
on file handles. This change allows the server to not have to open/close
a file multiple times when the client reads it, and results in faster lookup
times. Also, readahead is automatically taken care of since the file is not
closed while it is getting read (quickly) by the client.


Read algo change:
------------------
The new nfsd_read() is changed to:
	if file {
		Old code
	} else {
		Check if this FH is cached
		if fh && fh has cached file pointer:
			Get file pointer
			Update fields in fhp from cache
			call fh_verify
		else:
			Nothing in the cache, call nfsd_open as usual

		nfsd_vfs_read

		if fh {
			If this is a new fh entry:
				Save cached values
			Drop our reference to fh
		} else
			Close file
	}


Performance:
-------------
This patch was tested with clients running 1, 4, 8, 16 --- 256 test processes,
each doing reads of different files. Each test includes different I/O sizes.
Many individual tests (16% of test cases) got throughput improvement in the
9 to 15% range. The full results are provided at the end of this post.

Please review. Any comments or improvement ideas are greatly appreciated.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---

		(#Test Processes on Client == #NFSD's on Server)
--------------------------------------------------------------
#Test Processes		Org BW KB/s	New BW KB/s	%
--------------------------------------------------------------
4	256		48151.09	50328.70	4.52
4	4096		47700.05	49760.34	4.31
4	8192		47553.34	48509.00	2.00
4	16384		48764.87	51208.54	5.01
4	32768		49306.11	50141.59	1.69
4	65536		48681.46	49491.32	1.66
4	131072		48378.02	49971.95	3.29

8	256		38906.95	42444.95	9.09
8	4096		38141.46	42154.24	10.52
8	8192		37058.55	41241.78	11.28
8	16384		37446.56	40573.70	8.35
8	32768		36655.91	42159.85	15.01
8	65536		38776.11	40619.20	4.75
8	131072		38187.85	41119.04	7.67

16	256		36274.49	36143.00	-0.36
16	4096		34320.56	37664.35	9.74
16	8192		35489.65	34555.43	-2.63
16	16384		35647.32	36289.72	1.80
16	32768		37037.31	36874.33	-0.44
16	65536		36388.14	36991.56	1.65
16	131072		35729.34	37588.85	5.20

32	256		30838.89	32811.47	6.39
32	4096		31291.93	33439.83	6.86
32	8192		29885.57	33337.10	11.54
32	16384		30020.23	31795.97	5.91
32	32768		32805.03	33860.68	3.21
32	65536		31275.12	32997.34	5.50
32	131072		33391.85	34209.86	2.44

64	256		26729.46	28077.13	5.04
64	4096		25705.01	27339.37	6.35
64	8192		27757.06	27488.04	-0.96
64	16384		22927.44	23938.79	4.41
64	32768		26956.16	27848.52	3.31
64	65536		27419.59	29228.76	6.59
64	131072		27623.29	27651.99	.10

128	256		22463.63	22437.45	-.11
128	4096		22039.69	22554.03	2.33
128	8192		22218.42	24010.64	8.06
128	16384		15295.59	16745.28	9.47
128	32768		23319.54	23450.46	0.56
128	65536		22942.03	24169.26	5.34
128	131072		23845.27	23894.14	0.20

256	256		15659.17	16266.38	3.87
256	4096		15614.72	16362.25	4.78
256	8192		16950.24	17092.50	0.83
256	16384		9253.25		10274.28	11.03
256	32768		17872.89	17792.93	-.44
256	65536		18459.78	18641.68	0.98
256	131072		19408.01	20538.80	5.82
--------------------------------------------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/1]: nfsd: By changing RA caching to file handle caching
       [not found] ` <20081230104245.9409.30030.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2008-12-30 10:42   ` Krishna Kumar
  0 siblings, 0 replies; 11+ messages in thread
From: Krishna Kumar @ 2008-12-30 10:42 UTC (permalink / raw)
  To: linux-nfs; +Cc: krkumar2, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

Implement the FH caching. List of changes:

	1. Rename RA to FH, parm to cache, and remove all users of readahead.
	2. Add fields in the fhparms to cache file, svc_export, expiry time
	   and expiry list. Modify some other fields (eg p_count is atomic).
	3. Implement a daemon to clean up cached FH's.
	4. Added four helper functions:
		fh_cache_get: Hold a reference to dentry and svc_export.
		fh_cache_put: Drop a reference to file, dentry and svc_export.
		fh_get_cached_values: Returns file and svc_export.
		fh_cache_upd: Updates file and svc_export. Add entry to list
			for daemon to cleanup.
	5. get_raparms is slightly rewritten.
	6. nfsd_read rewritten to use the cache.
	7. File remove operation from the client results in the server checking
	   the cache and drops reference immediately (remove operation on the
	   server still retains the reference for some time).
	8. init and shutdown are slightly modified.
	9. ra_size, ra_depth, nfsd_racache_init and nfsd_racache_shutdown still
	   retain the "ra" prefix for now.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 fs/nfsd/vfs.c |  449 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 358 insertions(+), 91 deletions(-)

diff -ruNp linux-2.6.28.org/fs/nfsd/vfs.c linux-2.6.28.new/fs/nfsd/vfs.c
--- linux-2.6.28.org/fs/nfsd/vfs.c	2008-12-30 09:52:43.000000000 +0530
+++ linux-2.6.28.new/fs/nfsd/vfs.c	2008-12-30 12:09:57.000000000 +0530
@@ -55,38 +55,53 @@
 #include <linux/security.h>
 #endif /* CONFIG_NFSD_V4 */
 #include <linux/jhash.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
+/* Number of jiffies to cache the file before releasing */
+#define NFSD_CACHE_JIFFIES		100
 
 /*
- * This is a cache of readahead params that help us choose the proper
- * readahead strategy. Initially, we set all readahead parameters to 0
- * and let the VFS handle things.
+ * This is a cache of file handles to quicken file lookup. This also helps
+ * to prevent multiple open/close of a file when the client reads it.
+ *
  * If you increase the number of cached files very much, you'll need to
  * add a hash table here.
  */
-struct raparms {
-	struct raparms		*p_next;
-	unsigned int		p_count;
-	ino_t			p_ino;
-	dev_t			p_dev;
-	int			p_set;
-	struct file_ra_state	p_ra;
+struct fhcache {
+	struct fhcache		*p_next;
+
+	/* Hashed on this parameter */
+	__u32			p_auth;
+
+	/* Cached information */
+	struct file		*p_filp;
+	struct svc_export	*p_exp;
+
+	/* Refcount for overwrite */
+	atomic_t		p_count;
+
+	/* When this entry expires */
+	unsigned long		p_expires;
+
+	/* List of entries linked to 'nfsd_daemon_list' */
+	struct list_head	p_list;
+
 	unsigned int		p_hindex;
 };
 
-struct raparm_hbucket {
-	struct raparms		*pb_head;
+struct fhcache_hbucket {
+	struct fhcache		*pb_head;
 	spinlock_t		pb_lock;
 } ____cacheline_aligned_in_smp;
 
-#define RAPARM_HASH_BITS	4
-#define RAPARM_HASH_SIZE	(1<<RAPARM_HASH_BITS)
-#define RAPARM_HASH_MASK	(RAPARM_HASH_SIZE-1)
-static struct raparm_hbucket	raparm_hash[RAPARM_HASH_SIZE];
+#define FHPARM_HASH_BITS	8
+#define FHPARM_HASH_SIZE	(1<<FHPARM_HASH_BITS)
+#define FHPARM_HASH_MASK	(FHPARM_HASH_SIZE-1)
+static struct fhcache_hbucket	fhcache_hash[FHPARM_HASH_SIZE];
 
 /* 
  * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
@@ -784,51 +799,235 @@ nfsd_sync_dir(struct dentry *dp)
 	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
 }
 
+/* Daemon to handle expired fh cache entries */
+static struct task_struct	*k_nfsd_task;
+
+/* Synchronization for daemon with enqueuer's */
+static spinlock_t		k_nfsd_lock;
+
+/* List of FH cache entries that has to be cleaned up when they expire */
+static struct list_head		nfsd_daemon_list;
+
 /*
- * Obtain the readahead parameters for the file
- * specified by (dev, ino).
+ * Returns cached values of 'file' and svc_export; resets these entries
+ * to NULL.
  */
+static inline void fh_get_cached_values(struct fhcache *fh, struct file **filep,
+					struct svc_export **expp)
+{
+	*filep = fh->p_filp;
+	*expp = fh->p_exp;
+
+	fh->p_filp = NULL;
+	fh->p_exp = NULL;
+}
+
+/*
+ * Hold a reference to dentry and svc_export (file already has an extra
+ * reference count as it is not closed normally.
+ */
+static inline void fh_cache_get(struct file *file, struct svc_export *exp)
+{
+	dget(file->f_path.dentry);
+	cache_get(&exp->h);
+}
+
+/* Drop a reference to file, dentry and svc_export */
+static inline void fh_cache_put(struct file *file, struct svc_export *exp)
+{
+	cache_put(&exp->h, &svc_export_cache);
+	dput(file->f_path.dentry);
+	fput(file);
+}
+
+/*
+ * Holds a reference to 'file' and svc_export, and caches both. Add fh entry
+ * to list for daemon to cleanup later.
+ */
+static inline void fh_cache_upd(struct fhcache *fh, struct file *file,
+				struct svc_export *exp)
+{
+	struct fhcache_hbucket *fhb = &fhcache_hash[fh->p_hindex];
+
+	fh_cache_get(file, exp);
+
+	spin_lock(&fhb->pb_lock);
+	fh->p_filp = file;
+	fh->p_exp = exp;
+
+	/*
+	 * Once we add the entry to the list, we'd rather it expire
+	 * prematurely rather than updating it on every read.
+	 */
+	if (likely(list_empty(&fh->p_list))) {
+		fh->p_expires = jiffies + NFSD_CACHE_JIFFIES;
+		spin_lock(&k_nfsd_lock);
+		list_add_tail(&fh->p_list, &nfsd_daemon_list);
+		spin_unlock(&k_nfsd_lock);
+	}
+	spin_unlock(&fhb->pb_lock);
+}
 
-static inline struct raparms *
-nfsd_get_raparms(dev_t dev, ino_t ino)
+/* Daemon cache cleanup handler */
+void daemon_free_entries(void)
 {
-	struct raparms	*ra, **rap, **frap = NULL;
-	int depth = 0;
-	unsigned int hash;
-	struct raparm_hbucket *rab;
+	unsigned long now = jiffies;
+
+	spin_lock(&k_nfsd_lock);
+	while (!list_empty(&nfsd_daemon_list)) {
+		struct fhcache *fh = list_entry(nfsd_daemon_list.next,
+                                                struct fhcache, p_list);
+		struct fhcache_hbucket *fhb;
 
-	hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
-	rab = &raparm_hash[hash];
+		if (time_after(fh->p_expires, now) || now != jiffies) {
+			/*
+			 * This (and all subsequent entries) have not expired;
+			 * or we have spent too long in this loop.
+			 */
+			break;
+		}
+
+		fhb = &fhcache_hash[fh->p_hindex];
+
+		/*
+		 * Make sure we do not deadlock with updaters - we can free
+		 * entry next time in case of a race.
+		 */
+		if (!spin_trylock(&fhb->pb_lock)) {
+			/*
+			 * Entry is being used, no need to free this, try later
+			 */
+			break;
+		}
+
+		if (unlikely(!fh->p_filp)) {
+			/* 
+			 * Handle race with get_fhcache where it overwrites
+			 * the fh. We remove this entry - it will be added
+			 * back later by upd() which is racing with us.
+			 */
+			list_del_init(&fh->p_list);
+			spin_unlock(&fhb->pb_lock);
+		} else {
+			struct file *file;
+			struct svc_export *exp;
+
+			if (atomic_read(&fh->p_count)) {
+				spin_unlock(&fhb->pb_lock);
+				break;
+			}
+
+			list_del_init(&fh->p_list);
+			fh_get_cached_values(fh, &file, &exp);
+			spin_unlock(&fhb->pb_lock);
+			spin_unlock(&k_nfsd_lock);
+
+			fh_cache_put(file, exp);
+			spin_lock(&k_nfsd_lock);
+		}
+	}
+	spin_unlock(&k_nfsd_lock);
+}
+
+static int k_nfsd_thread(void *unused)
+{
+	while (!kthread_should_stop()) {
+		schedule_timeout_interruptible(NFSD_CACHE_JIFFIES);
 
-	spin_lock(&rab->pb_lock);
-	for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
-		if (ra->p_ino == ino && ra->p_dev == dev)
+		if (kthread_should_stop())
+			break;
+
+		daemon_free_entries();
+	}
+	__set_current_state(TASK_RUNNING);
+
+	return 0;
+}
+
+/*
+ * Obtain the cached file, export and d_inode values for the FH
+ * specified by fh->auth[3]
+ */
+static inline struct fhcache *
+nfsd_get_fhcache(__u32 auth)
+{
+	struct fhcache		*fh, **fhp, **ffhp = NULL;
+	int			depth = 0;
+	unsigned int		hash;
+	struct fhcache_hbucket	*fhb;
+	struct file		*file = NULL;
+	struct svc_export	*exp = NULL;
+
+	if (!auth)
+		return NULL;
+
+	hash = jhash_1word(auth, 0xfeedbeef) & FHPARM_HASH_MASK;
+	fhb = &fhcache_hash[hash];
+
+	spin_lock(&fhb->pb_lock);
+	for (fhp = &fhb->pb_head; (fh = *fhp); fhp = &fh->p_next) {
+		if (fh->p_auth == auth) {
+			/* Same inode */
+			if (!fh->p_filp) {
+				/* Someone is racing in the same code */
+				spin_unlock(&fhb->pb_lock);
+				return NULL;
+			}
+
+			/*
+			 * Hold an extra reference to dentry/exp since these
+			 * are released in fh_put(). 'file' already has an
+			 * extra hold from the first lookup which was never
+			 * dropped.
+			 */
+			fh_cache_get(fh->p_filp, fh->p_exp);
 			goto found;
+		}
+
 		depth++;
-		if (ra->p_count == 0)
-			frap = rap;
+
+		/* Unused or different inode */
+		if (!atomic_read(&fh->p_count)) {
+			if (!ffhp || (*ffhp)->p_filp)
+				ffhp = fhp;
+		}
 	}
-	depth = nfsdstats.ra_size*11/10;
-	if (!frap) {	
-		spin_unlock(&rab->pb_lock);
+
+	if (!ffhp) {
+		spin_unlock(&fhb->pb_lock);
 		return NULL;
 	}
-	rap = frap;
-	ra = *frap;
-	ra->p_dev = dev;
-	ra->p_ino = ino;
-	ra->p_set = 0;
-	ra->p_hindex = hash;
+
+	depth = nfsdstats.ra_size*11/10;
+	fhp = ffhp;
+	fh = *ffhp;
+	fh->p_hindex = hash;
+	fh->p_auth = auth;
+
+	if (fh->p_filp)
+		fh_get_cached_values(fh, &file, &exp);
+
 found:
-	if (rap != &rab->pb_head) {
-		*rap = ra->p_next;
-		ra->p_next   = rab->pb_head;
-		rab->pb_head = ra;
+	if (fhp != &fhb->pb_head) {
+		*fhp = fh->p_next;
+		fh->p_next   = fhb->pb_head;
+		fhb->pb_head = fh;
 	}
-	ra->p_count++;
+
+	atomic_inc(&fh->p_count);
 	nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
-	spin_unlock(&rab->pb_lock);
-	return ra;
+	spin_unlock(&fhb->pb_lock);
+
+	if (file) {
+		/*
+		 * Free the existing entry. The new entry will expire
+		 * prematurely, but it will be updated to the correct expiry
+		 * and be cached for the full time duration if it is used
+		 * again after expiry.
+		 */
+		fh_cache_put(file, exp);
+	}
+	return fh;
 }
 
 /*
@@ -892,7 +1091,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
               loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
 {
 	struct inode *inode;
-	struct raparms	*ra;
 	mm_segment_t	oldfs;
 	__be32		err;
 	int		host_err;
@@ -903,11 +1101,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 	if (svc_msnfs(fhp) && !lock_may_read(inode, offset, *count))
 		goto out;
 
-	/* Get readahead parameters */
-	ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);
-
-	if (ra && ra->p_set)
-		file->f_ra = ra->p_ra;
 
 	if (file->f_op->splice_read && rqstp->rq_splice_ok) {
 		struct splice_desc sd = {
@@ -926,16 +1119,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 		set_fs(oldfs);
 	}
 
-	/* Write back readahead params */
-	if (ra) {
-		struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
-		spin_lock(&rab->pb_lock);
-		ra->p_ra = file->f_ra;
-		ra->p_set = 1;
-		ra->p_count--;
-		spin_unlock(&rab->pb_lock);
-	}
-
 	if (host_err >= 0) {
 		nfsdstats.io_read += host_err;
 		*count = host_err;
@@ -1078,12 +1261,38 @@ nfsd_read(struct svc_rqst *rqstp, struct
 			goto out;
 		err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
 	} else {
-		err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
-		if (err)
-			goto out;
-		err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
-		nfsd_close(file);
+		struct fhcache	*fh;
+
+		/* Check if this fh is cached */
+		fh = nfsd_get_fhcache(fhp->fh_handle.fh_auth[3]);
+		if (fh && fh->p_filp) {
+			/* Got cached values */
+			file = fh->p_filp;
+			fhp->fh_dentry = file->f_path.dentry;
+			fhp->fh_export = fh->p_exp;
+			err = fh_verify(rqstp, fhp, S_IFREG, NFSD_MAY_READ);
+		} else {
+			/* Nothing in cache, or no free cache entry available */
+			err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,
+					&file);
+		}
+
+		if (!err)
+			err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen,
+					    count);
+
+		if (fh) {
+			if (!fh->p_filp && file) {
+				/* Write back cached values */
+				fh_cache_upd(fh, file, fhp->fh_export);
+			}
+
+			/* Drop our reference */
+			atomic_dec(&fh->p_count);
+		} else if (file)
+			nfsd_close(file);
 	}
+
 out:
 	return err;
 }
@@ -1791,6 +2000,38 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 		goto out_nfserr;
 
 	if (type != S_IFDIR) { /* It's UNLINK */
+		int i, found = 0;
+
+		for (i = 0 ; i < FHPARM_HASH_SIZE && !found; i++) {
+			struct fhcache_hbucket *fhb = &fhcache_hash[i];
+			struct fhcache *fh;
+
+			spin_lock(&fhb->pb_lock);
+			for (fh = fhb->pb_head; fh; fh = fh->p_next) {
+				if (fh->p_filp &&
+				    fh->p_filp->f_path.dentry == rdentry) {
+					/* Found the entry for removed file */
+					struct file *file;
+					struct svc_export *exp;
+
+					fh_get_cached_values(fh, &file, &exp);
+					spin_lock(&k_nfsd_lock);
+					list_del_init(&fh->p_list);
+					spin_unlock(&k_nfsd_lock);
+
+					spin_unlock(&fhb->pb_lock);
+
+					/* Drop reference to this entry */
+					fh_cache_put(file, exp);
+
+					spin_lock(&fhb->pb_lock);
+					found = 1;
+					break;
+				}
+			}
+			spin_unlock(&fhb->pb_lock);
+		}
+
 #ifdef MSNFS
 		if ((fhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
 			(atomic_read(&rdentry->d_count) > 1)) {
@@ -2061,23 +2302,36 @@ nfsd_permission(struct svc_rqst *rqstp, 
 void
 nfsd_racache_shutdown(void)
 {
-	struct raparms *raparm, *last_raparm;
 	unsigned int i;
 
-	dprintk("nfsd: freeing readahead buffers.\n");
+	dprintk("nfsd: freeing FH buffers.\n");
 
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		raparm = raparm_hash[i].pb_head;
-		while(raparm) {
-			last_raparm = raparm;
-			raparm = raparm->p_next;
-			kfree(last_raparm);
+	/* First stop the daemon, and we will clean up here ourselves */
+        kthread_stop(k_nfsd_task);
+        k_nfsd_task = NULL;
+
+	for (i = 0; i < FHPARM_HASH_SIZE; i++) {
+		struct fhcache *fhcache, *last_fhcache;
+
+		fhcache = fhcache_hash[i].pb_head;
+		while(fhcache) {
+			last_fhcache = fhcache;
+			if (fhcache->p_filp) {
+				struct file *file;
+				struct svc_export *exp;
+
+				fh_get_cached_values(fhcache, &file, &exp);
+				list_del(&fhcache->p_list);
+				fh_cache_put(file, exp);
+			}
+			fhcache = fhcache->p_next;
+			kfree(last_fhcache);
 		}
-		raparm_hash[i].pb_head = NULL;
+		fhcache_hash[i].pb_head = NULL;
 	}
 }
 /*
- * Initialize readahead param cache
+ * Initialize file cache
  */
 int
 nfsd_racache_init(int cache_size)
@@ -2085,36 +2339,49 @@ nfsd_racache_init(int cache_size)
 	int	i;
 	int	j = 0;
 	int	nperbucket;
-	struct raparms **raparm = NULL;
+	struct fhcache **fhcache = NULL;
 
 
-	if (raparm_hash[0].pb_head)
+	if (fhcache_hash[0].pb_head)
 		return 0;
-	nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
+	nperbucket = DIV_ROUND_UP(cache_size, FHPARM_HASH_SIZE);
 	if (nperbucket < 2)
 		nperbucket = 2;
-	cache_size = nperbucket * RAPARM_HASH_SIZE;
+	cache_size = nperbucket * FHPARM_HASH_SIZE;
 
-	dprintk("nfsd: allocating %d readahead buffers.\n", cache_size);
+	dprintk("nfsd: allocating %d file cache buffers.\n", cache_size);
 
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		spin_lock_init(&raparm_hash[i].pb_lock);
+	for (i = 0; i < FHPARM_HASH_SIZE; i++) {
+		spin_lock_init(&fhcache_hash[i].pb_lock);
 
-		raparm = &raparm_hash[i].pb_head;
+		fhcache = &fhcache_hash[i].pb_head;
 		for (j = 0; j < nperbucket; j++) {
-			*raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL);
-			if (!*raparm)
+			*fhcache = kzalloc(sizeof(struct fhcache), GFP_KERNEL);
+			if (!*fhcache) {
+				dprintk("nfsd: kmalloc failed, freeing file cache buffers\n");
 				goto out_nomem;
-			raparm = &(*raparm)->p_next;
+			}
+			INIT_LIST_HEAD(&(*fhcache)->p_list);
+			fhcache = &(*fhcache)->p_next;
 		}
-		*raparm = NULL;
+		*fhcache = NULL;
 	}
 
 	nfsdstats.ra_size = cache_size;
+
+	INIT_LIST_HEAD(&nfsd_daemon_list);
+	spin_lock_init(&k_nfsd_lock);
+	k_nfsd_task = kthread_run(k_nfsd_thread, NULL, "nfsd_cacher");
+
+	if (IS_ERR(k_nfsd_task)) {
+		printk(KERN_ERR "%s: unable to create kernel thread: %ld\n",
+		       __FUNCTION__, PTR_ERR(k_nfsd_task));
+		goto out_nomem;
+	}
+
 	return 0;
 
 out_nomem:
-	dprintk("nfsd: kmalloc failed, freeing readahead buffers\n");
 	nfsd_racache_shutdown();
 	return -ENOMEM;
 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance
  2008-12-30 10:42 [RFC PATCH 0/1] nfsd: Improve NFS server performance Krishna Kumar
       [not found] ` <20081230104245.9409.30030.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-02-04 23:19 ` J. Bruce Fields
  2009-02-05 15:08   ` Krishna Kumar2
  1 sibling, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2009-02-04 23:19 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: linux-nfs

On Tue, Dec 30, 2008 at 04:12:45PM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>

Thanks for the work, and apologies for the slow response.

> 
> Patch summary:
> --------------
> Change the readahead caching on the server to a file handle caching model.
> Since file handles are unique, this patch removes all dependencies on the
> kernel readahead parameters/implementation and instead caches files based
> on file handles. This change allows the server to not have to open/close
> a file multiple times when the client reads it, and results in faster lookup
> times.

I think of open and lookup as fairly fast, so I'm surprised this makes a
great difference; do you have profile results or something to confirm
that this is in fact what made the difference?

> Also, readahead is automatically taken care of since the file is not
> closed while it is getting read (quickly) by the client.
> 
> 
> Read algo change:
> ------------------
> The new nfsd_read() is changed to:
> 	if file {
> 		Old code
> 	} else {
> 		Check if this FH is cached
> 		if fh && fh has cached file pointer:
> 			Get file pointer
> 			Update fields in fhp from cache
> 			call fh_verify
> 		else:
> 			Nothing in the cache, call nfsd_open as usual
> 
> 		nfsd_vfs_read
> 
> 		if fh {
> 			If this is a new fh entry:
> 				Save cached values
> 			Drop our reference to fh
> 		} else
> 			Close file
> 	}

When do items get removed from this cache?

> 
> 
> Performance:
> -------------
> This patch was tested with clients running 1, 4, 8, 16 --- 256 test processes,
> each doing reads of different files. Each test includes different I/O sizes.
> Many individual tests (16% of test cases) got throughput improvement in the
> 9 to 15% range. The full results are provided at the end of this post.

Could you provide details sufficient to reproduce this test if
necessary?  (At least: what was the test code, how many clients were
used, what was the client and server hardware, and what filesystem was
the server exporting?)

--b.

> 
> Please review. Any comments or improvement ideas are greatly appreciated.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
> 
> 		(#Test Processes on Client == #NFSD's on Server)
> --------------------------------------------------------------
> #Test Processes		Org BW KB/s	New BW KB/s	%
> --------------------------------------------------------------

What's the second column?

> 4	256		48151.09	50328.70	4.52
> 4	4096		47700.05	49760.34	4.31
> 4	8192		47553.34	48509.00	2.00
> 4	16384		48764.87	51208.54	5.01
> 4	32768		49306.11	50141.59	1.69
> 4	65536		48681.46	49491.32	1.66
> 4	131072		48378.02	49971.95	3.29
> 
> 8	256		38906.95	42444.95	9.09
> 8	4096		38141.46	42154.24	10.52
> 8	8192		37058.55	41241.78	11.28
> 8	16384		37446.56	40573.70	8.35
> 8	32768		36655.91	42159.85	15.01
> 8	65536		38776.11	40619.20	4.75
> 8	131072		38187.85	41119.04	7.67
> 
> 16	256		36274.49	36143.00	-0.36
> 16	4096		34320.56	37664.35	9.74
> 16	8192		35489.65	34555.43	-2.63
> 16	16384		35647.32	36289.72	1.80
> 16	32768		37037.31	36874.33	-0.44
> 16	65536		36388.14	36991.56	1.65
> 16	131072		35729.34	37588.85	5.20
> 
> 32	256		30838.89	32811.47	6.39
> 32	4096		31291.93	33439.83	6.86
> 32	8192		29885.57	33337.10	11.54
> 32	16384		30020.23	31795.97	5.91
> 32	32768		32805.03	33860.68	3.21
> 32	65536		31275.12	32997.34	5.50
> 32	131072		33391.85	34209.86	2.44
> 
> 64	256		26729.46	28077.13	5.04
> 64	4096		25705.01	27339.37	6.35
> 64	8192		27757.06	27488.04	-0.96
> 64	16384		22927.44	23938.79	4.41
> 64	32768		26956.16	27848.52	3.31
> 64	65536		27419.59	29228.76	6.59
> 64	131072		27623.29	27651.99	.10
> 
> 128	256		22463.63	22437.45	-.11
> 128	4096		22039.69	22554.03	2.33
> 128	8192		22218.42	24010.64	8.06
> 128	16384		15295.59	16745.28	9.47
> 128	32768		23319.54	23450.46	0.56
> 128	65536		22942.03	24169.26	5.34
> 128	131072		23845.27	23894.14	0.20
> 
> 256	256		15659.17	16266.38	3.87
> 256	4096		15614.72	16362.25	4.78
> 256	8192		16950.24	17092.50	0.83
> 256	16384		9253.25		10274.28	11.03
> 256	32768		17872.89	17792.93	-.44
> 256	65536		18459.78	18641.68	0.98
> 256	131072		19408.01	20538.80	5.82
> --------------------------------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance
  2009-02-04 23:19 ` [RFC PATCH 0/1] nfsd: Improve NFS server performance J. Bruce Fields
@ 2009-02-05 15:08   ` Krishna Kumar2
  2009-02-05 20:24     ` J. Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: Krishna Kumar2 @ 2009-02-05 15:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Hi Bruce,

Thanks for your comments (also please refer to REV2 of patch as that is
much simpler).

> >
> > Patch summary:
> > --------------
> > Change the readahead caching on the server to a file handle caching
model.
> > Since file handles are unique, this patch removes all dependencies on
the
> > kernel readahead parameters/implementation and instead caches files
based
> > on file handles. This change allows the server to not have to
open/close
> > a file multiple times when the client reads it, and results in faster
lookup
> > times.
>
> I think of open and lookup as fairly fast, so I'm surprised this makes a
> great difference; do you have profile results or something to confirm
> that this is in fact what made the difference?

Beyond saving the open/lookup times, the cache is updated only once. Hence
no
lock plus update is required for subsequent reads - the code does a single
lock
on every read operation instead of two. The time to get the cache is
approximately the same for old vs new code; but in the new code we get
file/dentry and svc_exp.

I used to have counters in nfsd_open - something like dbg_num_opens,
dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies, dgb_cache_jiffies,
etc.
I can reintroduce those debugs and get a run and see how those numbers
looks
like, is that what you are looking for?

> > Also, readahead is automatically taken care of since the file is not
> > closed while it is getting read (quickly) by the client.
> >
> >
> > Read algo change:
> > ------------------
> > The new nfsd_read() is changed to:
> >    if file {
> >       Old code
> >    } else {
> >       Check if this FH is cached
> >       if fh && fh has cached file pointer:
> >          Get file pointer
> >          Update fields in fhp from cache
> >          call fh_verify
> >       else:
> >          Nothing in the cache, call nfsd_open as usual
> >
> >       nfsd_vfs_read
> >
> >       if fh {
> >          If this is a new fh entry:
> >             Save cached values
> >          Drop our reference to fh
> >       } else
> >          Close file
> >    }
>
> When do items get removed from this cache?

At the first open, the item is kept at the end of a global list (which is
manipulated by the new daemon). After some jiffies are over, the daemon
goes through the list till it comes to the first entry that has not
expired; and frees up all the earlier entries. If the file is being used,
it is not freed. If file is used after free, a new entry is added to the
end of the list. So very minimal list manipulation is required - no sorting
and moving entries in the list.

Please let me know if you would like me to write up a small text about how
this patch works.

> > Performance:
> > -------------
> > This patch was tested with clients running 1, 4, 8, 16 --- 256 test
processes,
> > each doing reads of different files. Each test includes different I/O
sizes.
> > Many individual tests (16% of test cases) got throughput improvement in
the
> > 9 to 15% range. The full results are provided at the end of this post.
>
> Could you provide details sufficient to reproduce this test if
> necessary?  (At least: what was the test code, how many clients were
> used, what was the client and server hardware, and what filesystem was
> the server exporting?)

Sure - I will send the test code in a day (don't have access to the system
right
now, sorry. But this is a script that runs a C program that forks and then
reads
a file till it is killed and it prints the amount of data read and the
amount of
time it ran).

The other details are:
      #Clients: 1
      Hardware Configuration (both systems):
            Two Dual-Core AMD Opteron (4 cpus) at 3GH.
            1GB memory
            10gbps private network
      Filesystem: ext3 (one filesystem)

Thanks,

- KK

> > Please review. Any comments or improvement ideas are greatly
appreciated.
> >
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> > ---
> >
> >       (#Test Processes on Client == #NFSD's on Server)
> > --------------------------------------------------------------
> > #Test Processes      Org BW KB/s   New BW KB/s   %
> > --------------------------------------------------------------
>
> What's the second column?
>
> > 4   256      48151.09   50328.70   4.52
> > 4   4096      47700.05   49760.34   4.31
> > 4   8192      47553.34   48509.00   2.00
> > 4   16384      48764.87   51208.54   5.01
> > 4   32768      49306.11   50141.59   1.69
> > 4   65536      48681.46   49491.32   1.66
> > 4   131072      48378.02   49971.95   3.29
> >
> > 8   256      38906.95   42444.95   9.09
> > 8   4096      38141.46   42154.24   10.52
> > 8   8192      37058.55   41241.78   11.28
> > 8   16384      37446.56   40573.70   8.35
> > 8   32768      36655.91   42159.85   15.01
> > 8   65536      38776.11   40619.20   4.75
> > 8   131072      38187.85   41119.04   7.67
> >
> > 16   256      36274.49   36143.00   -0.36
> > 16   4096      34320.56   37664.35   9.74
> > 16   8192      35489.65   34555.43   -2.63
> > 16   16384      35647.32   36289.72   1.80
> > 16   32768      37037.31   36874.33   -0.44
> > 16   65536      36388.14   36991.56   1.65
> > 16   131072      35729.34   37588.85   5.20
> >
> > 32   256      30838.89   32811.47   6.39
> > 32   4096      31291.93   33439.83   6.86
> > 32   8192      29885.57   33337.10   11.54
> > 32   16384      30020.23   31795.97   5.91
> > 32   32768      32805.03   33860.68   3.21
> > 32   65536      31275.12   32997.34   5.50
> > 32   131072      33391.85   34209.86   2.44
> >
> > 64   256      26729.46   28077.13   5.04
> > 64   4096      25705.01   27339.37   6.35
> > 64   8192      27757.06   27488.04   -0.96
> > 64   16384      22927.44   23938.79   4.41
> > 64   32768      26956.16   27848.52   3.31
> > 64   65536      27419.59   29228.76   6.59
> > 64   131072      27623.29   27651.99   .10
> >
> > 128   256      22463.63   22437.45   -.11
> > 128   4096      22039.69   22554.03   2.33
> > 128   8192      22218.42   24010.64   8.06
> > 128   16384      15295.59   16745.28   9.47
> > 128   32768      23319.54   23450.46   0.56
> > 128   65536      22942.03   24169.26   5.34
> > 128   131072      23845.27   23894.14   0.20
> >
> > 256   256      15659.17   16266.38   3.87
> > 256   4096      15614.72   16362.25   4.78
> > 256   8192      16950.24   17092.50   0.83
> > 256   16384      9253.25      10274.28   11.03
> > 256   32768      17872.89   17792.93   -.44
> > 256   65536      18459.78   18641.68   0.98
> > 256   131072      19408.01   20538.80   5.82
> > --------------------------------------------------------------
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance
  2009-02-05 15:08   ` Krishna Kumar2
@ 2009-02-05 20:24     ` J. Bruce Fields
  2009-02-07  9:13       ` Krishna Kumar2
  0 siblings, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2009-02-05 20:24 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: linux-nfs

On Thu, Feb 05, 2009 at 08:38:19PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
> 
> Thanks for your comments (also please refer to REV2 of patch as that is
> much simpler).

Yes, apologies, I only noticed I had a later vesion after responding to
the wrong one....

> > I think of open and lookup as fairly fast, so I'm surprised this
> > makes a great difference; do you have profile results or something
> > to confirm that this is in fact what made the difference?
> 
> Beyond saving the open/lookup times, the cache is updated only once.
> Hence no lock plus update is required for subsequent reads - the code
> does a single lock on every read operation instead of two. The time to
> get the cache is approximately the same for old vs new code; but in
> the new code we get file/dentry and svc_exp.
> 
> I used to have counters in nfsd_open - something like dbg_num_opens,
> dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
> dgb_cache_jiffies, etc.  I can reintroduce those debugs and get a run
> and see how those numbers looks like, is that what you are looking
> for?

I'm not sure what you mean by dbg_open_jiffies--surely a single open of
a file already in the dentry cache is too fast to be measurable in
jiffies?

> > When do items get removed from this cache?
> 
> At the first open, the item is kept at the end of a global list (which is
> manipulated by the new daemon). After some jiffies are over, the daemon
> goes through the list till it comes to the first entry that has not
> expired; and frees up all the earlier entries. If the file is being used,
> it is not freed. If file is used after free, a new entry is added to the
> end of the list. So very minimal list manipulation is required - no sorting
> and moving entries in the list.

OK, yeah, I just wondered whether you could end up with a reference to a
file hanging around indefinitely even after it had been deleted, for
example.

I've heard of someone updating read-only block snapshots by stopping
mountd, flushing the export cache, unmounting the old snapshot, then
mounting the new one and restarting mountd.  A bit of a hack, but I
guess it works, as long as no clients hold locks or NFSv4 opens on the
filesystem.

An open cache may break that by holding references to the filesystem
they want to unmount.  But perhaps we should give such users a proper
interface that tells nfsd to temporarily drop state it holds on a
filesystem, and tell them to use that instead.

> Please let me know if you would like me to write up a small text about how
> this patch works.

Any explanation always welcome.

> > Could you provide details sufficient to reproduce this test if
> > necessary?  (At least: what was the test code, how many clients were
> > used, what was the client and server hardware, and what filesystem was
> > the server exporting?)
> 
> Sure - I will send the test code in a day (don't have access to the system
> right
> now, sorry. But this is a script that runs a C program that forks and then
> reads
> a file till it is killed and it prints the amount of data read and the
> amount of
> time it ran).
> 
> The other details are:
>       #Clients: 1
>       Hardware Configuration (both systems):
>             Two Dual-Core AMD Opteron (4 cpus) at 3GH.
>             1GB memory
>             10gbps private network
>       Filesystem: ext3 (one filesystem)

OK, thanks!  And what sort of disk on the server?

--b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance
  2009-02-05 20:24     ` J. Bruce Fields
@ 2009-02-07  9:13       ` Krishna Kumar2
  2009-02-09 19:06         ` J. Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: Krishna Kumar2 @ 2009-02-07  9:13 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Hi Bruce,

> > I used to have counters in nfsd_open - something like dbg_num_opens,
> > dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
> > dgb_cache_jiffies, etc.  I can reintroduce those debugs and get a run
> > and see how those numbers looks like, is that what you are looking
> > for?
>
> I'm not sure what you mean by dbg_open_jiffies--surely a single open of
> a file already in the dentry cache is too fast to be measurable in
> jiffies?

When dbg_number_of_opens is very high, I see a big difference in the open
times
for original vs new (almost zero) code. I am running 8, 64, 256, etc,
processes and each of them reads files upto 500MB (a lot of open/read/close
per file per process), so the jiffies adds up (contention between parallel
opens, some processing in open, etc). To clarify this, I will reintroduce
the debugs and get some values (it was done a long time back and I don't
remember how much difference was there), and post it along with what the
debug code is doing.

> OK, yeah, I just wondered whether you could end up with a reference to a
> file hanging around indefinitely even after it had been deleted, for
> example.

If client deletes a file, the server immediately locates and removes the
cached
entry. If server deletes a file, my original intention was to use inotify
to
inform NFS server to delete the cache but that ran into some problems. So
my
solution was to fallback to the cache getting deleted by the daemon after
the
short timeout, till then the space for the inode is not freed. So in both
cases,
references to the file will not hang around indefinitely.

> I've heard of someone updating read-only block snapshots by stopping
> mountd, flushing the export cache, unmounting the old snapshot, then
> mounting the new one and restarting mountd.  A bit of a hack, but I
> guess it works, as long as no clients hold locks or NFSv4 opens on the
> filesystem.
>
> An open cache may break that by holding references to the filesystem
> they want to unmount.  But perhaps we should give such users a proper
> interface that tells nfsd to temporarily drop state it holds on a
> filesystem, and tell them to use that instead.

I must admit that I am lost in this scenario - I was assuming that the
filesystem can be unmounted only after nfs services are stopped, hence I
added
cache cleanup on nfsd_shutdown. Is there some hook to catch for the unmount
where I should clean the cache for that filesystem?

> > Please let me know if you would like me to write up a small text about
how
> > this patch works.
>
> Any explanation always welcome.

Sure. I will send this text soon, along with test program.

> > The other details are:
> >       #Clients: 1
> >       Hardware Configuration (both systems):
> >             Two Dual-Core AMD Opteron (4 cpus) at 3GH.
> >             1GB memory
> >             10gbps private network
> >       Filesystem: ext3 (one filesystem)
>
> OK, thanks!  And what sort of disk on the server?

133 GB ServeRAID (I think ST9146802SS  Seagate disk), containing 256 files,
each
of 500MB size.

Thanks,

- KK


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance
  2009-02-07  9:13       ` Krishna Kumar2
@ 2009-02-09 19:06         ` J. Bruce Fields
  2009-02-09 20:56           ` Chuck Lever
  0 siblings, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2009-02-09 19:06 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: linux-nfs

On Sat, Feb 07, 2009 at 02:43:55PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
> 
> > > I used to have counters in nfsd_open - something like dbg_num_opens,
> > > dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
> > > dgb_cache_jiffies, etc.  I can reintroduce those debugs and get a run
> > > and see how those numbers looks like, is that what you are looking
> > > for?
> >
> > I'm not sure what you mean by dbg_open_jiffies--surely a single open of
> > a file already in the dentry cache is too fast to be measurable in
> > jiffies?
> 
> When dbg_number_of_opens is very high, I see a big difference in the open
> times
> for original vs new (almost zero) code. I am running 8, 64, 256, etc,
> processes and each of them reads files upto 500MB (a lot of open/read/close
> per file per process), so the jiffies adds up (contention between parallel
> opens, some processing in open, etc). To clarify this, I will reintroduce
> the debugs and get some values (it was done a long time back and I don't
> remember how much difference was there), and post it along with what the
> debug code is doing.
> 
> > OK, yeah, I just wondered whether you could end up with a reference to a
> > file hanging around indefinitely even after it had been deleted, for
> > example.
> 
> If client deletes a file, the server immediately locates and removes the
> cached
> entry. If server deletes a file, my original intention was to use inotify
> to
> inform NFS server to delete the cache but that ran into some problems. So
> my
> solution was to fallback to the cache getting deleted by the daemon after
> the
> short timeout, till then the space for the inode is not freed. So in both
> cases,
> references to the file will not hang around indefinitely.
> 
> > I've heard of someone updating read-only block snapshots by stopping
> > mountd, flushing the export cache, unmounting the old snapshot, then
> > mounting the new one and restarting mountd.  A bit of a hack, but I
> > guess it works, as long as no clients hold locks or NFSv4 opens on the
> > filesystem.
> >
> > An open cache may break that by holding references to the filesystem
> > they want to unmount.  But perhaps we should give such users a proper
> > interface that tells nfsd to temporarily drop state it holds on a
> > filesystem, and tell them to use that instead.
> 
> I must admit that I am lost in this scenario - I was assuming that the
> filesystem can be unmounted only after nfs services are stopped, hence I
> added
> cache cleanup on nfsd_shutdown. Is there some hook to catch for the unmount
> where I should clean the cache for that filesystem?

No.  People have talked about doing that, but it hasn't happened.

But I think I'd prefer some separate operation (probably just triggered
by a write to a some new file in the nfsd filesystem) that told nfsd to
release all its references to a given filesystem.  An administrator
would have to know to do this before unmounting (or maybe mount could be
patched to do this).

Since we don't have a way to tell clients (at least v2/v3 clients) that
we've lost their state on just one filesystem, we'd have to save nfsd's
state internally but drop any hard references to filesystem objects,
then reacquire them afterward.

I'm not sure how best to do that.

That's not necessarily a prerequisite for this change; it depends on how
common that sort of use is.

--b.

> 
> > > Please let me know if you would like me to write up a small text about
> how
> > > this patch works.
> >
> > Any explanation always welcome.
> 
> Sure. I will send this text soon, along with test program.
> 
> > > The other details are:
> > >       #Clients: 1
> > >       Hardware Configuration (both systems):
> > >             Two Dual-Core AMD Opteron (4 cpus) at 3GH.
> > >             1GB memory
> > >             10gbps private network
> > >       Filesystem: ext3 (one filesystem)
> >
> > OK, thanks!  And what sort of disk on the server?
> 
> 133 GB ServeRAID (I think ST9146802SS  Seagate disk), containing 256 files,
> each
> of 500MB size.
> 
> Thanks,
> 
> - KK
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance
  2009-02-09 19:06         ` J. Bruce Fields
@ 2009-02-09 20:56           ` Chuck Lever
  2009-02-09 21:04             ` J. Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2009-02-09 20:56 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Krishna Kumar2, linux-nfs

On Feb 9, 2009, at 2:06 PM, J. Bruce Fields wrote:
> On Sat, Feb 07, 2009 at 02:43:55PM +0530, Krishna Kumar2 wrote:
>> Hi Bruce,
>>
>>>> I used to have counters in nfsd_open - something like  
>>>> dbg_num_opens,
>>>> dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
>>>> dgb_cache_jiffies, etc.  I can reintroduce those debugs and get a  
>>>> run
>>>> and see how those numbers looks like, is that what you are looking
>>>> for?
>>>
>>> I'm not sure what you mean by dbg_open_jiffies--surely a single  
>>> open of
>>> a file already in the dentry cache is too fast to be measurable in
>>> jiffies?
>>
>> When dbg_number_of_opens is very high, I see a big difference in  
>> the open
>> times
>> for original vs new (almost zero) code. I am running 8, 64, 256, etc,
>> processes and each of them reads files upto 500MB (a lot of open/ 
>> read/close
>> per file per process), so the jiffies adds up (contention between  
>> parallel
>> opens, some processing in open, etc). To clarify this, I will  
>> reintroduce
>> the debugs and get some values (it was done a long time back and I  
>> don't
>> remember how much difference was there), and post it along with  
>> what the
>> debug code is doing.
>>
>>> OK, yeah, I just wondered whether you could end up with a  
>>> reference to a
>>> file hanging around indefinitely even after it had been deleted, for
>>> example.
>>
>> If client deletes a file, the server immediately locates and  
>> removes the
>> cached
>> entry. If server deletes a file, my original intention was to use  
>> inotify
>> to
>> inform NFS server to delete the cache but that ran into some  
>> problems. So
>> my
>> solution was to fallback to the cache getting deleted by the daemon  
>> after
>> the
>> short timeout, till then the space for the inode is not freed. So  
>> in both
>> cases,
>> references to the file will not hang around indefinitely.
>>
>>> I've heard of someone updating read-only block snapshots by stopping
>>> mountd, flushing the export cache, unmounting the old snapshot, then
>>> mounting the new one and restarting mountd.  A bit of a hack, but I
>>> guess it works, as long as no clients hold locks or NFSv4 opens on  
>>> the
>>> filesystem.
>>>
>>> An open cache may break that by holding references to the filesystem
>>> they want to unmount.  But perhaps we should give such users a  
>>> proper
>>> interface that tells nfsd to temporarily drop state it holds on a
>>> filesystem, and tell them to use that instead.
>>
>> I must admit that I am lost in this scenario - I was assuming that  
>> the
>> filesystem can be unmounted only after nfs services are stopped,  
>> hence I
>> added
>> cache cleanup on nfsd_shutdown. Is there some hook to catch for the  
>> unmount
>> where I should clean the cache for that filesystem?
>
> No.  People have talked about doing that, but it hasn't happened.

It should be noted that mountd's UMNT and UMNT_ALL requests (used by  
NFSv2/v3) are advisory, and that our NFSv4 client doesn't contact the  
server at unmount time.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance
  2009-02-09 20:56           ` Chuck Lever
@ 2009-02-09 21:04             ` J. Bruce Fields
       [not found]               ` <OFFC9EFD50.9BF4778E-ON65257583.0054B0F0-65257583.0056621F@in.ibm.com>
  0 siblings, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2009-02-09 21:04 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Krishna Kumar2, linux-nfs

On Mon, Feb 09, 2009 at 03:56:16PM -0500, Chuck Lever wrote:
> On Feb 9, 2009, at 2:06 PM, J. Bruce Fields wrote:
>>
>> No.  People have talked about doing that, but it hasn't happened.
>
> It should be noted that mountd's UMNT and UMNT_ALL requests (used by  
> NFSv2/v3) are advisory, and that our NFSv4 client doesn't contact the  
> server at unmount time.

We're not talking about a client unmounting a server, but about a server
unmounting an exported filesystem.

--b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfsd: Improve NFS server performance
       [not found]               ` <OFFC9EFD50.9BF4778E-ON65257583.0054B0F0-65257583.0056621F@in.ibm.com>
@ 2009-03-24 18:00                 ` J. Bruce Fields
  2009-03-24 18:57                   ` Krishna Kumar2
  0 siblings, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2009-03-24 18:00 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: Jeff Layton, linux-nfs

On Tue, Mar 24, 2009 at 09:13:32PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
> 
> I am sorry about the delay due to some unavoidable circumstances. However,
> I have
> got most of the details that you and Jeff had asked for. I am including:
> 
> 1. Patch (on latest git tree):
>       (See attached file: patch0)            (See attached file: patch1)
> 
> 2. Results (for org, using daemon, using workqueue):
>       (See attached file: result.daemon)    (See attached file: result.wq)
>   (See attached file: result.summary)
> 
> 3. Profile data (for org, using daemon, using workqueue):
>       (See attached file: profiles.bz2)
> 
> 4. A small write up:
>       (See attached file: write_up)
> 
> 5. The test program:
>       (See attached file: read_files.c)
> 
> Sorry about the large attachment in step #3 above, I didn't know if I
> should put it some place. Also if required, I can submit patches breaking
> up this monolithic one.

Thanks, I'll take a look (maybe not immediately).  Probably Jeff will
too, but as long as a) there's a large monolithic patch, and b) all of
this is in attachments--I'm afraid nobody else may bother.

So it'd be more helpful to send this information in the body of email
(more than one email if needed)--that way there's a greater chance other
people will pay attention, and (with luck) have some useful idea to
contribute.

Thanks!--b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfsd: Improve NFS server performance
  2009-03-24 18:00                 ` J. Bruce Fields
@ 2009-03-24 18:57                   ` Krishna Kumar2
  0 siblings, 0 replies; 11+ messages in thread
From: Krishna Kumar2 @ 2009-03-24 18:57 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

OK, I will break the attachments into text files tomorrow :) I will send
everything but the
profile data once again.

thanks,
- KK

"J. Bruce Fields" <bfields@fieldses.org> wrote on 03/24/2009 11:30:36 PM:

> "J. Bruce Fields" <bfields@fieldses.org>
> 03/24/2009 11:30 PM
>
> To
>
> Krishna Kumar2/India/IBM@IBMIN
>
> cc
>
> Jeff Layton <jlayton@redhat.com>, linux-nfs@vger.kernel.org
>
> Subject
>
> Re: nfsd: Improve NFS server performance
>
> On Tue, Mar 24, 2009 at 09:13:32PM +0530, Krishna Kumar2 wrote:
> > Hi Bruce,
> >
> > I am sorry about the delay due to some unavoidable circumstances.
However,
> > I have
> > got most of the details that you and Jeff had asked for. I am
including:
> >
> > 1. Patch (on latest git tree):
> >       (See attached file: patch0)            (See attached file:
patch1)
> >
> > 2. Results (for org, using daemon, using workqueue):
> >       (See attached file: result.daemon)    (See attached file:
result.wq)
> >   (See attached file: result.summary)
> >
> > 3. Profile data (for org, using daemon, using workqueue):
> >       (See attached file: profiles.bz2)
> >
> > 4. A small write up:
> >       (See attached file: write_up)
> >
> > 5. The test program:
> >       (See attached file: read_files.c)
> >
> > Sorry about the large attachment in step #3 above, I didn't know if I
> > should put it some place. Also if required, I can submit patches
breaking
> > up this monolithic one.
>
> Thanks, I'll take a look (maybe not immediately).  Probably Jeff will
> too, but as long as a) there's a large monolithic patch, and b) all of
> this is in attachments--I'm afraid nobody else may bother.
>
> So it'd be more helpful to send this information in the body of email
> (more than one email if needed)--that way there's a greater chance other
> people will pay attention, and (with luck) have some useful idea to
> contribute.
>
> Thanks!--b.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-03-24 19:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-30 10:42 [RFC PATCH 0/1] nfsd: Improve NFS server performance Krishna Kumar
     [not found] ` <20081230104245.9409.30030.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-12-30 10:42   ` [RFC PATCH 1/1]: nfsd: By changing RA caching to file handle caching Krishna Kumar
2009-02-04 23:19 ` [RFC PATCH 0/1] nfsd: Improve NFS server performance J. Bruce Fields
2009-02-05 15:08   ` Krishna Kumar2
2009-02-05 20:24     ` J. Bruce Fields
2009-02-07  9:13       ` Krishna Kumar2
2009-02-09 19:06         ` J. Bruce Fields
2009-02-09 20:56           ` Chuck Lever
2009-02-09 21:04             ` J. Bruce Fields
     [not found]               ` <OFFC9EFD50.9BF4778E-ON65257583.0054B0F0-65257583.0056621F@in.ibm.com>
2009-03-24 18:00                 ` J. Bruce Fields
2009-03-24 18:57                   ` Krishna Kumar2

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.