* Re: [PATCH 0/11] nfsd: Summary of "Improve NFS server performance"
[not found] ` <20090325133607.16437.33288.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-04-17 18:46 ` J. Bruce Fields
2009-04-17 19:30 ` Krishna Kumar2
2009-04-21 22:57 ` J. Bruce Fields
1 sibling, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-17 18:46 UTC (permalink / raw)
To: Krishna Kumar; +Cc: jlayton, linux-nfs, Krishna Kumar
On Wed, Mar 25, 2009 at 07:06:07PM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
>
> Patch summary:
> --------------
> Change the caching from ino/dev to a file handle model. Advantages:
>
> 1. Since file handles are unique, this patch removes all dependencies on the
> kernel readahead parameters and it's implementation; and instead caches
> files based on file handles.
> 2. Allows the server to not have to open/close a file multiple times when the
> client reads it, and results in faster lookup times.
> 3. Readahead is automatically taken care of since the file is not closed while
> it is getting read (quickly) by the client.
> 4. Another optimization is to avoid getting the cache lock twice for each read
> operation (after the first time it is taken to update the cache).
> 5. (ra_size, ra_depth retain the "ra" prefix for now, since I have not checked
> whether changing that will break any user programs)
You're referring here to the "ra" line in /proc/net/rpc/nfsd?
--b.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/11] nfsd: Summary of "Improve NFS server performance"
2009-04-17 18:46 ` [PATCH 0/11] nfsd: Summary of "Improve NFS server performance" J. Bruce Fields
@ 2009-04-17 19:30 ` Krishna Kumar2
0 siblings, 0 replies; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-17 19:30 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: jlayton, Krishna Kumar, linux-nfs
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/18/2009 12:16:47 AM:
> Re: [PATCH 0/11] nfsd: Summary of "Improve NFS server performance"
>
> On Wed, Mar 25, 2009 at 07:06:07PM +0530, Krishna Kumar wrote:
> > From: Krishna Kumar <krkumar2@in.ibm.com>
> >
> > Patch summary:
> > --------------
> > Change the caching from ino/dev to a file handle model. Advantages:
> >
> > 1. Since file handles are unique, this patch removes all dependencies
on the
> > kernel readahead parameters and it's implementation; and instead
caches> > files based on file handles.
> > 2. Allows the server to not have to open/close a file multiple times
when the
> > client reads it, and results in faster lookup times.
> > 3. Readahead is automatically taken care of since the file is not
closed while
> > it is getting read (quickly) by the client.
> > 4. Another optimization is to avoid getting the cache lock twice for
each read
> > operation (after the first time it is taken to update the cache).
> > 5. (ra_size, ra_depth retain the "ra" prefix for now, since I have not
checked
> > whether changing that will break any user programs)
>
> You're referring here to the "ra" line in /proc/net/rpc/nfsd?
Yes - at that time I was thinking of any consumer of this file. From your
comment,
I see that it is not an issue now, thanks for pointing it out.
Thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/11] nfsd: ADD data structure infrastructure
[not found] ` <20090325133628.16437.11092.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-04-21 22:48 ` J. Bruce Fields
2009-04-22 5:36 ` Krishna Kumar2
2009-04-21 23:05 ` J. Bruce Fields
1 sibling, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-21 22:48 UTC (permalink / raw)
To: Krishna Kumar; +Cc: Krishna Kumar, jlayton, linux-nfs
On Wed, Mar 25, 2009 at 07:06:28PM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
>
> ADD infrastructure in terms of new structures.
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>
> fs/nfsd/vfs.c | 35 +++++++++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff -ruNp org/fs/nfsd/vfs.c new1/fs/nfsd/vfs.c
> --- org/fs/nfsd/vfs.c 2009-03-25 17:31:39.000000000 +0530
> +++ new1/fs/nfsd/vfs.c 2009-03-25 17:39:43.000000000 +0530
> @@ -88,6 +88,41 @@ struct raparm_hbucket {
> #define RAPARM_HASH_MASK (RAPARM_HASH_SIZE-1)
> static struct raparm_hbucket raparm_hash[RAPARM_HASH_SIZE];
>
> +/*
> + * This is a cache of file handles to quicken file lookup. This also
> + * helps prevent multiple open/close of a file when a client reads it.
> + *
> + * If you increase the number of cached files very much, you'll need to
> + * add a hash table here.
I'm a little confused here--what do you mean by "add a hash table here"?
--b.
> + */
> +struct fhcache {
> + struct fhcache *p_next;
> +
> + /* Hashed on this parameter */
> + __u32 p_auth;
> +
> + /* Cached information */
> + struct file *p_filp;
> + struct svc_export *p_exp;
> +
> + /* Refcount for overwrite */
> + atomic_t p_count;
> +
> + /* When this entry expires */
> + unsigned long p_expires;
> +
> + unsigned int p_hindex;
> +};
> +
> +struct fhcache_hbucket {
> + struct fhcache *pb_head;
> + spinlock_t pb_lock;
> +} ____cacheline_aligned_in_smp;
> +
> +#define FHPARM_HASH_BITS 8
> +#define FHPARM_HASH_SIZE (1<<FHPARM_HASH_BITS)
> +#define FHPARM_HASH_MASK (FHPARM_HASH_SIZE-1)
> +
> /*
> * Called from nfsd_lookup and encode_dirent. Check if we have crossed
> * a mount point.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/11] nfsd: Summary of "Improve NFS server performance"
[not found] ` <20090325133607.16437.33288.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-04-17 18:46 ` [PATCH 0/11] nfsd: Summary of "Improve NFS server performance" J. Bruce Fields
@ 2009-04-21 22:57 ` J. Bruce Fields
2009-04-22 5:35 ` Krishna Kumar2
1 sibling, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-21 22:57 UTC (permalink / raw)
To: Krishna Kumar; +Cc: jlayton, linux-nfs, Krishna Kumar
On Wed, Mar 25, 2009 at 07:06:07PM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
>
> Patch summary:
> --------------
> Change the caching from ino/dev to a file handle model. Advantages:
>
> 1. Since file handles are unique, this patch removes all dependencies on the
> kernel readahead parameters and it's implementation; and instead caches
> files based on file handles.
> 2. Allows the server to not have to open/close a file multiple times when the
> client reads it, and results in faster lookup times.
> 3. Readahead is automatically taken care of since the file is not closed while
> it is getting read (quickly) by the client.
> 4. Another optimization is to avoid getting the cache lock twice for each read
> operation (after the first time it is taken to update the cache).
> 5. (ra_size, ra_depth retain the "ra" prefix for now, since I have not checked
> whether changing that will break any user programs)
>
>
> Patches are described as:
> --------------------------
> 1. nfsd: ADD data structure infrastructure
> 2. nfsd: ADD new function infrastructure
> 3. nfsd: CHANGE old function calls to new calls
> 4. nfsd: ADD client "rm" support
> 5. nfsd: ADD nfsd_fhcache_shutdown and nfsd_fhcache_init
> 6. nfsd: CHANGE _ra* calls to _fh* calls in nfssvc
> 7. nfsd: REMOVE old infrastructure's functions
> 8. nfsd: REMOVE old infrastructure's data structures
>
>
> List of changes from Ver1:
> --------------------------
> 1. Implement entire logic of either updating-cache or do-nothing or close-file
> in fh_cache_upd. This simplifies the caller (nfsd_read).
> 2. nfsd_get_fhcache doesn't overwrite existing entries which would require the
> existing entries to be freed up - that is done by the daemon exclusively.
> This saves time at lookup by avoiding freeing entries (fh_cache_put).
> Another change is to optimize the logic of selecting a free slot.
> 3. Due to #2, fh_cache_upd doesn't have to test whether the entry is already
> on the daemon list (it never is, plus list_del_init is changed to list_del).
> 4. As a result of #2, the daemon becomes simpler - there is no race to handle
> where there is an entry on the list but it has no cached file/dentry, etc.
> 5. Made some comments clearer and easier to understand.
> 6. Jeff: Changed NFSD_CACHE_JIFFIES to use HZ.
> 7. Jeff: Changed nfsd_daemon_list to nfsd_daemon_free_list; and changed ra_init
> and ra_shutdown prefixes.
> 8. Jeff: Split patch into smaller patches. Tested each with successful builds.
> 9. Pending:
> - Bruce: But I think I'd prefer some separate operation (probably just
> triggered by a write to a some new file in the nfsd filesystem) that
> told nfsd to release all its references to a given filesystem. An
> administrator would have to know to do this before unmounting (or
> maybe mount could be patched to do this).
>
>
> Performance:
> -------------
>
> This patch was tested with clients running 2, 4, 8, 16 --- 256 test processes,
> each doing reads of different files. Each test includes different I/O sizes.
> 31/77 tests got more than 5% improvement; with 5 tests getting more than 10%
> gain (full results at the end of this post).
The numbers look promising, but very noisy. Maybe an average of a few
tests would give more stable numbers?
--b.
> Please review. Any comments or improvement ideas are appreciated.
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
> (#Test Processes on Client == #NFSD's on Server)
> -----------------------------------------------------------------------
> #Test Processes Bufsize Org BW KB/s New BW KB/s %
> -----------------------------------------------------------------------
> 2 256 68022.46 71094.64 4.51
> 2 4096 67833.74 70726.38 4.26
> 2 8192 64541.14 69635.93 7.89
> 2 16384 65708.86 68994.88 5.00
> 2 32768 64272.28 68525.36 6.61
> 2 65536 64684.13 69103.28 6.83
> 2 131072 64765.67 68855.57 6.31
>
> 4 256 60849.48 64702.04 6.33
> 4 4096 60660.67 64309.37 6.01
> 4 8192 60506.00 61142.84 1.05
> 4 16384 60796.86 64069.82 5.38
> 4 32768 60947.07 64648.57 6.07
> 4 65536 60774.45 63735.24 4.87
> 4 131072 61369.66 65261.85 6.34
>
> 8 256 49239.57 54467.33 10.61
> 8 4096 50650.45 55400.01 9.37
> 8 8192 50661.58 51732.16 2.11
> 8 16384 51114.76 56025.31 9.60
> 8 32768 52367.20 54348.05 3.78
> 8 65536 51000.23 54285.63 6.44
> 8 131072 52996.73 54021.11 1.93
>
> 16 256 44534.67 45478.60 2.11
> 16 4096 43897.69 46519.89 5.97
> 16 8192 43787.87 44083.61 .67
> 16 16384 43883.62 46726.03 6.47
> 16 32768 44284.96 44035.86 -.56
> 16 65536 43804.33 44865.20 2.42
> 16 131072 44525.30 43752.62 -1.73
>
> 32 256 40420.30 42575.30 5.33
> 32 4096 39913.14 42279.21 5.92
> 32 8192 40261.19 42399.93 5.31
> 32 16384 38094.95 42645.32 11.94
> 32 32768 40610.27 43015.37 5.92
> 32 65536 41438.05 41794.76 .86
> 32 131072 41869.06 43644.07 4.23
>
> 48 256 36986.45 40185.34 8.64
> 48 4096 36585.79 38227.38 4.48
> 48 8192 38406.78 38055.91 -.91
> 48 16384 34950.05 36688.86 4.97
> 48 32768 38908.71 37900.33 -2.59
> 48 65536 39364.64 40036.67 1.70
> 48 131072 40391.56 40887.11 1.22
>
> 64 256 32821.89 34568.06 5.32
> 64 4096 35468.42 35529.29 .17
> 64 8192 34135.44 36462.31 6.81
> 64 16384 31926.51 32694.91 2.40
> 64 32768 35527.69 35234.60 -.82
> 64 65536 36066.08 36359.77 .81
> 64 131072 35969.04 37462.86 4.15
>
> 96 256 30032.74 29973.45 -.19
> 96 4096 29687.06 30881.64 4.02
> 96 8192 31142.51 32504.66 4.37
> 96 16384 29546.77 30663.39 3.77
> 96 32768 32458.94 32797.36 1.04
> 96 65536 32826.99 33451.66 1.90
> 96 131072 33601.46 34177.39 1.71
>
> 128 256 28584.59 29092.11 1.77
> 128 4096 29311.11 30097.65 2.68
> 128 8192 31398.87 33154.63 5.59
> 128 16384 28283.58 29071.45 2.78
> 128 32768 32819.93 33654.11 2.54
> 128 65536 32617.13 33778.46 3.56
> 128 131072 32972.71 34160.82 3.60
>
> 192 256 25245.92 26331.48 4.29
> 192 4096 27368.48 29318.49 7.12
> 192 8192 30173.74 31477.35 4.32
> 192 16384 26388.54 29719.15 12.62
> 192 32768 31840.91 33606.17 5.54
> 192 65536 33374.85 33607.14 .69
> 192 131072 33523.48 32601.93 -2.74
>
> 256 256 22256.91 21139.79 -5.01
> 256 4096 25192.75 24281.51 -3.61
> 256 8192 26534.95 28100.59 5.90
> 256 16384 24162.85 25607.86 5.98
> 256 32768 29218.38 29417.28 .68
> 256 65536 29609.59 30049.79 1.48
> 256 131072 30014.29 30132.33 .39
> -----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/11] nfsd: ADD data structure infrastructure
[not found] ` <20090325133628.16437.11092.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-04-21 22:48 ` [PATCH 1/11] nfsd: ADD data structure infrastructure J. Bruce Fields
@ 2009-04-21 23:05 ` J. Bruce Fields
1 sibling, 0 replies; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-21 23:05 UTC (permalink / raw)
To: Krishna Kumar; +Cc: Krishna Kumar, jlayton, linux-nfs
On Wed, Mar 25, 2009 at 07:06:28PM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
>
> ADD infrastructure in terms of new structures.
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>
> fs/nfsd/vfs.c | 35 +++++++++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff -ruNp org/fs/nfsd/vfs.c new1/fs/nfsd/vfs.c
> --- org/fs/nfsd/vfs.c 2009-03-25 17:31:39.000000000 +0530
> +++ new1/fs/nfsd/vfs.c 2009-03-25 17:39:43.000000000 +0530
> @@ -88,6 +88,41 @@ struct raparm_hbucket {
> #define RAPARM_HASH_MASK (RAPARM_HASH_SIZE-1)
> static struct raparm_hbucket raparm_hash[RAPARM_HASH_SIZE];
>
> +/*
> + * This is a cache of file handles to quicken file lookup. This also
> + * helps prevent multiple open/close of a file when a client reads it.
> + *
> + * If you increase the number of cached files very much, you'll need to
> + * add a hash table here.
> + */
> +struct fhcache {
> + struct fhcache *p_next;
> +
> + /* Hashed on this parameter */
> + __u32 p_auth;
> +
> + /* Cached information */
> + struct file *p_filp;
> + struct svc_export *p_exp;
> +
> + /* Refcount for overwrite */
> + atomic_t p_count;
> +
> + /* When this entry expires */
> + unsigned long p_expires;
> +
> + unsigned int p_hindex;
> +};
> +
> +struct fhcache_hbucket {
> + struct fhcache *pb_head;
> + spinlock_t pb_lock;
> +} ____cacheline_aligned_in_smp;
Hadn't run across __cacheline_aligned_in_smp before--so that just
assures an fhcache_hbucket gets its own cacheline, so that two cpu's
taking the spinlock on different buckets won't be fighting over the
cacheline?
--b.
> +
> +#define FHPARM_HASH_BITS 8
> +#define FHPARM_HASH_SIZE (1<<FHPARM_HASH_BITS)
> +#define FHPARM_HASH_MASK (FHPARM_HASH_SIZE-1)
> +
> /*
> * Called from nfsd_lookup and encode_dirent. Check if we have crossed
> * a mount point.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/11] nfsd: ADD new function infrastructure
[not found] ` <20090325133647.16437.59567.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-04-22 2:54 ` J. Bruce Fields
2009-04-22 5:37 ` Krishna Kumar2
0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-22 2:54 UTC (permalink / raw)
To: Krishna Kumar; +Cc: jlayton, linux-nfs, Krishna Kumar
On Wed, Mar 25, 2009 at 07:06:47PM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
>
> ADD infrastructure in terms of new functions.
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>
> fs/nfsd/vfs.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 224 insertions(+)
>
> diff -ruNp new1/fs/nfsd/vfs.c new2/fs/nfsd/vfs.c
> --- new1/fs/nfsd/vfs.c 2009-03-25 17:39:43.000000000 +0530
> +++ new2/fs/nfsd/vfs.c 2009-03-25 17:42:10.000000000 +0530
> @@ -55,11 +55,15 @@
> #include <linux/security.h>
> #endif /* CONFIG_NFSD_V4 */
> #include <linux/jhash.h>
> +#include <linux/kthread.h>
>
> #include <asm/uaccess.h>
>
> #define NFSDDBG_FACILITY NFSDDBG_FILEOP
>
> +/* Number of jiffies to cache the file before releasing */
> +#define NFSD_CACHE_JIFFIES (HZ/10)
> +
>
> /*
> * This is a cache of readahead params that help us choose the proper
> @@ -111,6 +115,9 @@ struct fhcache {
> /* When this entry expires */
> unsigned long p_expires;
>
> + /* List of entries linked to 'nfsd_daemon_free_list' */
> + struct list_head p_list;
> +
> unsigned int p_hindex;
> };
>
> @@ -122,6 +129,7 @@ struct fhcache_hbucket {
> #define FHPARM_HASH_BITS 8
> #define FHPARM_HASH_SIZE (1<<FHPARM_HASH_BITS)
> #define FHPARM_HASH_MASK (FHPARM_HASH_SIZE-1)
> +static struct fhcache_hbucket fhcache_hash[FHPARM_HASH_SIZE];
>
> /*
> * Called from nfsd_lookup and encode_dirent. Check if we have crossed
> @@ -866,6 +874,222 @@ found:
> return ra;
> }
>
> +/* Synchronization for daemon with enqueuer's */
> +static DEFINE_SPINLOCK(k_nfsd_lock);
> +
> +/* List of FH cache entries that has to be cleaned up when they expire */
> +static struct list_head nfsd_daemon_free_list;
> +
> +/*
> + * Returns cached values of 'file' and svc_export; resets these entries
> + * to NULL.
> + */
> +static inline void fh_get_cached_entries(struct fhcache *fh,
> + struct file **filep,
> + struct svc_export **expp)
> +{
> + *filep = fh->p_filp;
> + *expp = fh->p_exp;
> +
> + fh->p_filp = NULL;
> + fh->p_exp = NULL;
> +}
> +
> +/*
> + * Hold a reference to dentry and svc_export (file already has an extra
> + * reference count as it is not closed normally.
> + */
> +static inline void fh_cache_get(struct file *file, struct svc_export *exp)
> +{
> + dget(file->f_dentry);
> + cache_get(&exp->h);
> +}
> +
> +/* Drop a reference to file, dentry and svc_export */
> +static inline void fh_cache_put(struct file *file, struct svc_export *exp)
> +{
> + cache_put(&exp->h, &svc_export_cache);
> + dput(file->f_dentry);
> + fput(file);
> +}
> +
> +/*
> + * Holds a reference to dentry and svc_export, and caches both. Add fh entry
> + * to list for daemon to cleanup later. Once we add the entry to the list,
> + * we'd rather it expire prematurely rather than updating it on every read.
> + */
> +static inline void fh_cache_upd(struct fhcache *fh, struct file *file,
> + struct svc_export *exp)
> +{
> + if (fh) {
> + if (!fh->p_filp && file) {
> + struct fhcache_hbucket *fhb;
> +
> + fh_cache_get(file, exp);
> +
> + fhb = &fhcache_hash[fh->p_hindex];
> +
> + spin_lock(&fhb->pb_lock);
> + fh->p_filp = file;
> + fh->p_exp = exp;
> + fh->p_expires = jiffies + NFSD_CACHE_JIFFIES;
> +
> + spin_lock(&k_nfsd_lock);
> + list_add_tail(&fh->p_list, &nfsd_daemon_free_list);
> + spin_unlock(&k_nfsd_lock);
> +
> + spin_unlock(&fhb->pb_lock);
> + }
> +
> + /* Drop our reference */
> + atomic_dec(&fh->p_count);
> + } else if (file)
> + nfsd_close(file);
> +}
> +
> +/* Daemon cache cleanup handler */
> +void daemon_free_entries(void)
> +{
> + unsigned long now = jiffies;
> +
> + spin_lock(&k_nfsd_lock);
> + while (!list_empty(&nfsd_daemon_free_list)) {
> + struct fhcache *fh = list_entry(nfsd_daemon_free_list.next,
> + struct fhcache, p_list);
> + struct fhcache_hbucket *fhb;
> + struct file *file;
> + struct svc_export *exp;
> +
> + if (time_after(fh->p_expires, now) || now != jiffies) {
> + /*
> + * This (and all subsequent entries) have not expired;
> + * or we have spent too long in this loop.
> + */
> + break;
> + }
> +
> + fhb = &fhcache_hash[fh->p_hindex];
> +
> + /*
> + * Make sure we do not deadlock with updaters - we can free
> + * entry next time in case of a race.
> + */
> + if (!spin_trylock(&fhb->pb_lock)) {
> + /*
> + * Entry is being used, no need to free this, try later
> + */
> + break;
> + }
> +
> + if (atomic_read(&fh->p_count)) {
> + spin_unlock(&fhb->pb_lock);
> + break;
> + }
> +
> + list_del(&fh->p_list);
> + fh_get_cached_entries(fh, &file, &exp);
> + spin_unlock(&fhb->pb_lock);
> + spin_unlock(&k_nfsd_lock);
> +
> + fh_cache_put(file, exp);
> + spin_lock(&k_nfsd_lock);
> + }
> + spin_unlock(&k_nfsd_lock);
> +}
> +
> +static int k_nfsd_thread(void *unused)
> +{
> + while (!kthread_should_stop()) {
> + schedule_timeout_interruptible(NFSD_CACHE_JIFFIES);
> +
> + if (kthread_should_stop())
> + break;
> +
> + daemon_free_entries();
> + }
> + __set_current_state(TASK_RUNNING);
Is this necessary? The comment before schedule_timeout() claims "The
current task state is guaranteed to be TASK_RUNNING when this routine
returns."
> +
> + return 0;
> +}
> +
> +/*
> + * Obtain the cached file, export and d_inode values for the FH
> + * specified by fh->auth[3]
> + */
> +static inline struct fhcache *
> +nfsd_get_fhcache(__u32 auth)
> +{
> + struct fhcache *fh, **fhp, **ffhp = NULL;
> + int depth = 0;
> + unsigned int hash;
> + struct fhcache_hbucket *fhb;
> +
> + if (!auth)
> + return NULL;
> +
> + hash = jhash_1word(auth, 0xfeedbeef) & FHPARM_HASH_MASK;
> + fhb = &fhcache_hash[hash];
> +
> + spin_lock(&fhb->pb_lock);
> + for (fhp = &fhb->pb_head; (fh = *fhp); fhp = &fh->p_next) {
> + if (fh->p_auth == auth) {
> + /* Same inode */
> + if (unlikely(!fh->p_filp)) {
> + /*
> + * Someone is racing in the same code, and
> + * this is the first reference to this file.
> + */
> + spin_unlock(&fhb->pb_lock);
The locking seems over-complicated. But I have to admit, I don't
understand it yet.
> + return NULL;
> + }
> +
> + /*
> + * Hold an extra reference to dentry/exp since these
> + * are released in fh_put(). 'file' already has an
> + * extra hold from the first lookup which was never
> + * dropped.
Why not take a reference on the file instead of the dentry?
> + */
> + fh_cache_get(fh->p_filp, fh->p_exp);
> + goto found;
> + }
> +
> + depth++;
> +
> + if (!ffhp && !fh->p_filp) {
> + /*
> + * This is an unused inode (or a different one), and no
> + * entry was found till now.
> + */
> + if (!atomic_read(&fh->p_count)) /* Entry is unused */
> + ffhp = fhp;
> + }
> + }
> +
> + if (!ffhp) {
> + spin_unlock(&fhb->pb_lock);
> + return NULL;
> + }
> +
> + depth = nfsdstats.ra_size*11/10;
Help, I'm confused--what's the meaning of this calculation?
--b.
> + fhp = ffhp;
> + fh = *ffhp;
> + fh->p_hindex = hash;
> + fh->p_auth = auth;
> +
> +found:
> + if (fhp != &fhb->pb_head) {
> + *fhp = fh->p_next;
> + fh->p_next = fhb->pb_head;
> + fhb->pb_head = fh;
> + }
> +
> + atomic_inc(&fh->p_count);
> + nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
> + spin_unlock(&fhb->pb_lock);
> +
> + return fh;
> +}
> +
> /*
> * Grab and keep cached pages associated with a file in the svc_rqst
> * so that they can be passed to the network sendmsg/sendpage routines
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/11] nfsd: Summary of "Improve NFS server performance"
2009-04-21 22:57 ` J. Bruce Fields
@ 2009-04-22 5:35 ` Krishna Kumar2
2009-04-22 19:41 ` J. Bruce Fields
0 siblings, 1 reply; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-22 5:35 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: jlayton, Krishna Kumar, linux-nfs
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/22/2009 04:27:28 AM:
> The numbers look promising, but very noisy. Maybe an average of a few
> tests would give more stable numbers?
Is it enough if I cut out some "popular" bufsizes and #processes out of
that
table and post it? If so, here is a smaller list:
----------------------------------------------------------------------
#Processes Bufsize OrgBW KB/s New BW KB/s %
----------------------------------------------------------------------
4 4096 60660.67 64309.37 6.01
4 16384 60796.86 64069.82 5.38
4 65536 60774.45 63735.24 4.87
4 131072 61369.66 65261.85 6.34
8 4096 50650.45 55400.01 9.37
8 16384 51114.76 56025.31 9.60
8 65536 51000.23 54285.63 6.44
8 131072 52996.73 54021.11 1.93
16 4096 43897.69 46519.89 5.97
16 16384 43883.62 46726.03 6.47
16 65536 43804.33 44865.20 2.42
16 131072 44525.30 43752.62 -1.73
32 4096 39913.14 42279.21 5.92
32 16384 38094.95 42645.32 11.94
32 65536 41438.05 41794.76 .86
32 131072 41869.06 43644.07 4.23
64 4096 35468.42 35529.29 .17
64 16384 31926.51 32694.91 2.40
64 65536 36066.08 36359.77 .81
64 131072 35969.04 37462.86 4.15
128 4096 29311.11 30097.65 2.68
128 16384 28283.58 29071.45 2.78
128 65536 32617.13 33778.46 3.56
128 131072 32972.71 34160.82 3.60
192 4096 27368.48 29318.49 7.12
192 16384 26388.54 29719.15 12.62
192 65536 33374.85 33607.14 .69
192 131072 33523.48 32601.93 -2.74
-----------------------------------------------------------------------
1170059.88 1223737.36 4.58%
-----------------------------------------------------------------------
Thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/11] nfsd: ADD data structure infrastructure
2009-04-21 22:48 ` [PATCH 1/11] nfsd: ADD data structure infrastructure J. Bruce Fields
@ 2009-04-22 5:36 ` Krishna Kumar2
2009-04-22 19:43 ` J. Bruce Fields
0 siblings, 1 reply; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-22 5:36 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: jlayton, Krishna Kumar, linux-nfs
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/22/2009 04:18:38 AM:
> > +/*
> > + * This is a cache of file handles to quicken file lookup. This also
> > + * helps prevent multiple open/close of a file when a client reads it.
> > + *
> > + * If you increase the number of cached files very much, you'll need
to
> > + * add a hash table here.
>
> I'm a little confused here--what do you mean by "add a hash table here"?
That is also in the original code. I feel the comment is wrong as we
already have a hash table in raparm_hbucket, but I didn't want to change
typos as part of this patch.
thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/11] nfsd: ADD new function infrastructure
2009-04-22 2:54 ` [PATCH 2/11] nfsd: ADD new function infrastructure J. Bruce Fields
@ 2009-04-22 5:37 ` Krishna Kumar2
0 siblings, 0 replies; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-22 5:37 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: jlayton, Krishna Kumar, linux-nfs
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/22/2009 08:24:27 AM:
> > +static int k_nfsd_thread(void *unused)
> > +{
> > + while (!kthread_should_stop()) {
> > + schedule_timeout_interruptible(NFSD_CACHE_JIFFIES);
> > +
> > + if (kthread_should_stop())
> > + break;
> > +
> > + daemon_free_entries();
> > + }
> > + __set_current_state(TASK_RUNNING);
>
> Is this necessary? The comment before schedule_timeout() claims "The
> current task state is guaranteed to be TASK_RUNNING when this routine
> returns."
Right, it is not required.
> > +static inline struct fhcache *
> > +nfsd_get_fhcache(__u32 auth)
> > +{
> > + struct fhcache *fh, **fhp, **ffhp = NULL;
> > + int depth = 0;
> > + unsigned int hash;
> > + struct fhcache_hbucket *fhb;
> > +
> > + if (!auth)
> > + return NULL;
> > +
> > + hash = jhash_1word(auth, 0xfeedbeef) & FHPARM_HASH_MASK;
> > + fhb = &fhcache_hash[hash];
> > +
> > + spin_lock(&fhb->pb_lock);
> > + for (fhp = &fhb->pb_head; (fh = *fhp); fhp = &fh->p_next) {
> > + if (fh->p_auth == auth) {
> > + /* Same inode */
> > + if (unlikely(!fh->p_filp)) {
> > + /*
> > + * Someone is racing in the same code, and
> > + * this is the first reference to this file.
> > + */
> > + spin_unlock(&fhb->pb_lock);
>
> The locking seems over-complicated. But I have to admit, I don't
> understand it yet.
I have one new lock - k_nfsd_lock, that is used to add/delete the entry to
nfsd_daemon_free_list. Lock is used by either the daemon, or when an
operation results in creating or deleting a cache entry (read and
unlink).
>
> > + return NULL;
> > + }
> > +
> > + /*
> > + * Hold an extra reference to dentry/exp since these
> > + * are released in fh_put(). 'file' already has an
> > + * extra hold from the first lookup which was never
> > + * dropped.
>
> Why not take a reference on the file instead of the dentry?
I am holding an extra reference to file in the first lookup, but unless
I hold dentry also, it gets freed up by fh_put.
> > + depth = nfsdstats.ra_size*11/10;
>
> Help, I'm confused--what's the meaning of this calculation?
That is original code, and I also find it confusing :) What I understand is
that when an entry is not found in the cache, ra_depth[10] is incremented,
and
when it is found, ra_depth[0] is incremented (unless the hash table entry
contains elements in the range of 20-30 or more, then ra_depth[2], [3],
etc,
get incremented. It seems to give an indication of the efficiency of the
hash
table.
thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/11] nfsd: Summary of "Improve NFS server performance"
2009-04-22 5:35 ` Krishna Kumar2
@ 2009-04-22 19:41 ` J. Bruce Fields
0 siblings, 0 replies; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-22 19:41 UTC (permalink / raw)
To: Krishna Kumar2; +Cc: jlayton, Krishna Kumar, linux-nfs
On Wed, Apr 22, 2009 at 11:05:14AM +0530, Krishna Kumar2 wrote:
> "J. Bruce Fields" <bfields@fieldses.org> wrote on 04/22/2009 04:27:28 AM:
>
> > The numbers look promising, but very noisy. Maybe an average of a few
> > tests would give more stable numbers?
>
> Is it enough if I cut out some "popular" bufsizes and #processes out of
> that
> table and post it? If so, here is a smaller list:
By "noisy" I just mean that they look like there's a lot of randomness.
So, yes, I'm fine with a subset, but I'm curious what the variance is on
repeated runs of each test.
--b.
>
> ----------------------------------------------------------------------
> #Processes Bufsize OrgBW KB/s New BW KB/s %
> ----------------------------------------------------------------------
> 4 4096 60660.67 64309.37 6.01
> 4 16384 60796.86 64069.82 5.38
> 4 65536 60774.45 63735.24 4.87
> 4 131072 61369.66 65261.85 6.34
>
> 8 4096 50650.45 55400.01 9.37
> 8 16384 51114.76 56025.31 9.60
> 8 65536 51000.23 54285.63 6.44
> 8 131072 52996.73 54021.11 1.93
>
> 16 4096 43897.69 46519.89 5.97
> 16 16384 43883.62 46726.03 6.47
> 16 65536 43804.33 44865.20 2.42
> 16 131072 44525.30 43752.62 -1.73
>
> 32 4096 39913.14 42279.21 5.92
> 32 16384 38094.95 42645.32 11.94
> 32 65536 41438.05 41794.76 .86
> 32 131072 41869.06 43644.07 4.23
>
> 64 4096 35468.42 35529.29 .17
> 64 16384 31926.51 32694.91 2.40
> 64 65536 36066.08 36359.77 .81
> 64 131072 35969.04 37462.86 4.15
>
> 128 4096 29311.11 30097.65 2.68
> 128 16384 28283.58 29071.45 2.78
> 128 65536 32617.13 33778.46 3.56
> 128 131072 32972.71 34160.82 3.60
>
> 192 4096 27368.48 29318.49 7.12
> 192 16384 26388.54 29719.15 12.62
> 192 65536 33374.85 33607.14 .69
> 192 131072 33523.48 32601.93 -2.74
> -----------------------------------------------------------------------
> 1170059.88 1223737.36 4.58%
> -----------------------------------------------------------------------
>
> Thanks,
>
> - KK
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/11] nfsd: ADD data structure infrastructure
2009-04-22 5:36 ` Krishna Kumar2
@ 2009-04-22 19:43 ` J. Bruce Fields
0 siblings, 0 replies; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-22 19:43 UTC (permalink / raw)
To: Krishna Kumar2; +Cc: jlayton, Krishna Kumar, linux-nfs
On Wed, Apr 22, 2009 at 11:06:33AM +0530, Krishna Kumar2 wrote:
> "J. Bruce Fields" <bfields@fieldses.org> wrote on 04/22/2009 04:18:38 AM:
>
> > > +/*
> > > + * This is a cache of file handles to quicken file lookup. This also
> > > + * helps prevent multiple open/close of a file when a client reads it.
> > > + *
> > > + * If you increase the number of cached files very much, you'll need
> to
> > > + * add a hash table here.
> >
> > I'm a little confused here--what do you mean by "add a hash table here"?
>
> That is also in the original code. I feel the comment is wrong as we
> already have a hash table in raparm_hbucket, but I didn't want to change
> typos as part of this patch.
This patch doesn't change anything--it only adds new lines. If you're
trying to make minimal changes, that's admirable, but break up the
patches in such a way that each patch shows that minimal change.
--b.
>
> thanks,
>
> - KK
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
[not found] ` <20090325133707.16437.66360.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-04-22 20:05 ` J. Bruce Fields
2009-04-23 15:55 ` Krishna Kumar2
0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-22 20:05 UTC (permalink / raw)
To: Krishna Kumar; +Cc: Krishna Kumar, jlayton, linux-nfs
On Wed, Mar 25, 2009 at 07:07:07PM +0530, Krishna Kumar wrote:
> @@ -1337,12 +1321,30 @@ nfsd_read(struct svc_rqst *rqstp, struct
> goto out;
> err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
> } else {
> - err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
> - if (err)
> - goto out;
> - err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
> - nfsd_close(file);
> + struct fhcache *fh;
> +
> + /* Check if this fh is cached */
> + fh = nfsd_get_fhcache(fhp->fh_handle.fh_auth[3]);
How do you know fh_auth[3] is sufficient to identify the file reliably?
This looks very fragile to me.
If the goal is to bypass rqst_exp_find(), exportfs_decode_fh() and
friends--that makes me nervous. We should figure out how to make those
lookups faster instead. Or at the very least, make sure we're keying on
the entire filehandle instead of just part of it.
--b.
> + if (fh && fh->p_filp) {
> + /* Got cached values */
> + file = fh->p_filp;
> + fhp->fh_dentry = file->f_dentry;
> + fhp->fh_export = fh->p_exp;
> + err = fh_verify(rqstp, fhp, S_IFREG, NFSD_MAY_READ);
> + } else {
> + /* Nothing in cache */
> + err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,
> + &file);
> + }
> +
> + if (!err)
> + err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen,
> + count);
> +
> + /* Update cached values if required, and clean up */
> + fh_cache_upd(fh, file, fhp->fh_export);
> }
> +
> out:
> return err;
> }
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
2009-04-22 20:05 ` [PATCH 3/11] nfsd: CHANGE old function calls to new calls J. Bruce Fields
@ 2009-04-23 15:55 ` Krishna Kumar2
2009-04-23 22:39 ` J. Bruce Fields
0 siblings, 1 reply; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-23 15:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: jlayton, Krishna Kumar, linux-nfs
Hi Bruce,
(combining all three of your mails into one response for easier tracking)
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/23/2009 01:35:53 AM:
> How do you know fh_auth[3] is sufficient to identify the file reliably?
> This looks very fragile to me.
>
> If the goal is to bypass rqst_exp_find(), exportfs_decode_fh() and
> friends--that makes me nervous. We should figure out how to make those
> lookups faster instead. Or at the very least, make sure we're keying on
> the entire filehandle instead of just part of it.
To understand how filehandles works, I was looking at nfsd_set_fh_dentry,
here fid is fh->auth[2] (since type is FSID_DEV). However, the unique FH
value is stored in fh->auth[3], while auth[0]/[1] contains the maj/min and
ino. fh->auth[3] is unique for different files and across file systems, but
it is not actually used to find the file (only the exp) - the stored
dev/ino
(auth[0] and auth[1]) are used to find the dentry by calling the underlying
filesystem (exportfs_decode_fh).
Keying on the entire filehandle seems reasonable, but I don't think it is
required as auth[3] seems to be allocated memory which is unique in a
system,
just that we don't use it for locating the file currently and I was
proposing
that we do. Please correct me if this is wrong, or whether a better way is
possible.
> This patch doesn't change anything--it only adds new lines. If you're
> trying to make minimal changes, that's admirable, but break up the
> patches in such a way that each patch shows that minimal change.
To make each revision compile cleanly, I had to add the infrastructure and
then delete the old one. So it looks like new code. The entire function and
data structures are re-written so it is difficult to break into individual
components where each will compile clean.
> By "noisy" I just mean that they look like there's a lot of randomness.
> So, yes, I'm fine with a subset, but I'm curious what the variance is on
> repeated runs of each test.
Sure, I will try to do this over the weekend and submit.
thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
2009-04-23 15:55 ` Krishna Kumar2
@ 2009-04-23 22:39 ` J. Bruce Fields
2009-04-24 16:17 ` Krishna Kumar2
0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-23 22:39 UTC (permalink / raw)
To: Krishna Kumar2; +Cc: jlayton, Krishna Kumar, linux-nfs
On Thu, Apr 23, 2009 at 09:25:34PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
>
> (combining all three of your mails into one response for easier tracking)
>
> "J. Bruce Fields" <bfields@fieldses.org> wrote on 04/23/2009 01:35:53 AM:
>
> > How do you know fh_auth[3] is sufficient to identify the file reliably?
> > This looks very fragile to me.
> >
> > If the goal is to bypass rqst_exp_find(), exportfs_decode_fh() and
> > friends--that makes me nervous. We should figure out how to make those
> > lookups faster instead. Or at the very least, make sure we're keying on
> > the entire filehandle instead of just part of it.
>
> To understand how filehandles works, I was looking at nfsd_set_fh_dentry,
> here fid is fh->auth[2] (since type is FSID_DEV). However, the unique FH
> value is stored in fh->auth[3], while auth[0]/[1] contains the maj/min and
> ino. fh->auth[3] is unique for different files and across file systems, but
> it is not actually used to find the file (only the exp) - the stored
> dev/ino
> (auth[0] and auth[1]) are used to find the dentry by calling the underlying
> filesystem (exportfs_decode_fh).
>
> Keying on the entire filehandle seems reasonable, but I don't think it is
> required as auth[3] seems to be allocated memory which is unique in a
> system,
You lost me. Where do you see auth[3] getting encoded, and what do you
mean by saying it's "allocated memory which is unique in a system"?
There are a lot of different filehandle encoding options. I lose track
of them myself....
> just that we don't use it for locating the file currently and I was
> proposing
> that we do. Please correct me if this is wrong, or whether a better way is
> possible.
>
> > This patch doesn't change anything--it only adds new lines. If you're
> > trying to make minimal changes, that's admirable, but break up the
> > patches in such a way that each patch shows that minimal change.
>
> To make each revision compile cleanly, I had to add the infrastructure and
> then delete the old one. So it looks like new code. The entire function and
> data structures are re-written so it is difficult to break into individual
> components where each will compile clean.
Breaking up a complex change into logical individual steps, each of
which compile and run cleanly, is sometimes tricky. So it may be that
there *is* some way to do it in this case, but that you just haven't
found it yet.
That said, it could be, as you say, that this requires just replacing
everything. In that case, please just start over from scratch and don't
feel bound to keep stuff from the old implementation that you don't
understand.
--b.
> > By "noisy" I just mean that they look like there's a lot of randomness.
> > So, yes, I'm fine with a subset, but I'm curious what the variance is on
> > repeated runs of each test.
>
> Sure, I will try to do this over the weekend and submit.
>
> thanks,
>
> - KK
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
2009-04-23 22:39 ` J. Bruce Fields
@ 2009-04-24 16:17 ` Krishna Kumar2
2009-04-24 16:23 ` J. Bruce Fields
0 siblings, 1 reply; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-24 16:17 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: jlayton, Krishna Kumar, linux-nfs
Hi Bruce,
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/24/2009 04:09:09 AM:
> > Keying on the entire filehandle seems reasonable, but I don't think it
is
> > required as auth[3] seems to be allocated memory which is unique in a
> > system,
>
> You lost me. Where do you see auth[3] getting encoded, and what do you
> mean by saying it's "allocated memory which is unique in a system"?
>
> There are a lot of different filehandle encoding options. I lose track
> of them myself....
Sorry, I am wrong, I was thinking of &auth[3] and wrote too fast. I am
testing
using your suggestion hashing on the full filehandle, something like:
{
__u32 a = auth[0], b = auth[1], c = auth[2], d = auth[3];
hash = jhash_3words(a, b, jhash_2words(c, d, 0), 0xfeedbeef) &
FHPARM_HASH_MASK;
...
/*
* Matching check uses something like:
* if (fh->p_auth1 == a && fh->p_auth2 == b && fh->p_auth3 == c &&
fh->p_auth4 == d)
*/
}
Is what you had in mind? I am testing some more with this, so far I get
different values for different files and filesystems.
I am not sure if there is an easier way to do a hash and get the unique
file
associated with the filehandle, this part of the code is very complicated.
> > just that we don't use it for locating the file currently and I was
> > proposing
> > that we do. Please correct me if this is wrong, or whether a better way
is
> > possible.
> >
> > > This patch doesn't change anything--it only adds new lines. If
you're
> > > trying to make minimal changes, that's admirable, but break up the
> > > patches in such a way that each patch shows that minimal change.
> >
> > To make each revision compile cleanly, I had to add the infrastructure
and
> > then delete the old one. So it looks like new code. The entire function
and
> > data structures are re-written so it is difficult to break into
individual
> > components where each will compile clean.
>
> Breaking up a complex change into logical individual steps, each of
> which compile and run cleanly, is sometimes tricky. So it may be that
> there *is* some way to do it in this case, but that you just haven't
> found it yet.
>
> That said, it could be, as you say, that this requires just replacing
> everything. In that case, please just start over from scratch and don't
> feel bound to keep stuff from the old implementation that you don't
> understand.
OK.
thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
2009-04-24 16:17 ` Krishna Kumar2
@ 2009-04-24 16:23 ` J. Bruce Fields
2009-04-24 16:58 ` Krishna Kumar2
0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-24 16:23 UTC (permalink / raw)
To: Krishna Kumar2; +Cc: jlayton, Krishna Kumar, linux-nfs, Greg Banks
On Fri, Apr 24, 2009 at 09:47:57PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
>
> "J. Bruce Fields" <bfields@fieldses.org> wrote on 04/24/2009 04:09:09 AM:
>
> > > Keying on the entire filehandle seems reasonable, but I don't think it
> is
> > > required as auth[3] seems to be allocated memory which is unique in a
> > > system,
> >
> > You lost me. Where do you see auth[3] getting encoded, and what do you
> > mean by saying it's "allocated memory which is unique in a system"?
> >
> > There are a lot of different filehandle encoding options. I lose track
> > of them myself....
>
> Sorry, I am wrong, I was thinking of &auth[3] and wrote too fast. I am
> testing
> using your suggestion hashing on the full filehandle, something like:
>
> {
> __u32 a = auth[0], b = auth[1], c = auth[2], d = auth[3];
> hash = jhash_3words(a, b, jhash_2words(c, d, 0), 0xfeedbeef) &
> FHPARM_HASH_MASK;
> ...
> /*
> * Matching check uses something like:
> * if (fh->p_auth1 == a && fh->p_auth2 == b && fh->p_auth3 == c &&
> fh->p_auth4 == d)
> */
> }
>
> Is what you had in mind? I am testing some more with this, so far I get
> different values for different files and filesystems.
>
> I am not sure if there is an easier way to do a hash and get the unique
> file
> associated with the filehandle, this part of the code is very complicated.
Why not just do a hash on the entire filehandle, however long it may be?
(Cc'ing Greg since he says he had some patches which did something
similar, and perhaps he could offer a suggestion.)
--b.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
2009-04-24 16:23 ` J. Bruce Fields
@ 2009-04-24 16:58 ` Krishna Kumar2
2009-04-24 19:25 ` J. Bruce Fields
0 siblings, 1 reply; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-24 16:58 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Greg Banks, jlayton, Krishna Kumar, linux-nfs
Hi Bruce,
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/24/2009 09:53:21 PM:
> > {
> > __u32 a = auth[0], b = auth[1], c = auth[2], d = auth[3];
> > hash = jhash_3words(a, b, jhash_2words(c, d, 0), 0xfeedbeef) &
> > FHPARM_HASH_MASK;
> > ...
> > /*
> > * Matching check uses something like:
> > * if (fh->p_auth1 == a && fh->p_auth2 == b && fh->p_auth3 == c
&&
> > fh->p_auth4 == d)
> > */
> > }
> >
> > Is what you had in mind? I am testing some more with this, so far I get
> > different values for different files and filesystems.
> >
> > I am not sure if there is an easier way to do a hash and get the unique
> > file
> > associated with the filehandle, this part of the code is very
complicated.
>
> Why not just do a hash on the entire filehandle, however long it may be?
I am not sure how many numbers to hash on, usually the first 4 numbers are
the
ino, inode generation, parent inode, parent inode generation, etc, and is a
unique match. Or filesystems can have their own encode handlers but store
similar stuff in these indices. I guess a memcmp could also be done if I
know
the length of the auth being used.
> (Cc'ing Greg since he says he had some patches which did something
> similar, and perhaps he could offer a suggestion.)
OK, will wait for response from Greg.
thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
2009-04-24 16:58 ` Krishna Kumar2
@ 2009-04-24 19:25 ` J. Bruce Fields
2009-04-26 11:16 ` Krishna Kumar2
0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-04-24 19:25 UTC (permalink / raw)
To: Krishna Kumar2; +Cc: Greg Banks, jlayton, Krishna Kumar, linux-nfs
On Fri, Apr 24, 2009 at 10:28:18PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
>
> "J. Bruce Fields" <bfields@fieldses.org> wrote on 04/24/2009 09:53:21 PM:
>
> > > {
> > > __u32 a = auth[0], b = auth[1], c = auth[2], d = auth[3];
> > > hash = jhash_3words(a, b, jhash_2words(c, d, 0), 0xfeedbeef) &
> > > FHPARM_HASH_MASK;
> > > ...
> > > /*
> > > * Matching check uses something like:
> > > * if (fh->p_auth1 == a && fh->p_auth2 == b && fh->p_auth3 == c
> &&
> > > fh->p_auth4 == d)
> > > */
> > > }
> > >
> > > Is what you had in mind? I am testing some more with this, so far I get
> > > different values for different files and filesystems.
> > >
> > > I am not sure if there is an easier way to do a hash and get the unique
> > > file
> > > associated with the filehandle, this part of the code is very
> complicated.
> >
> > Why not just do a hash on the entire filehandle, however long it may be?
>
> I am not sure how many numbers to hash on, usually the first 4 numbers are
> the
> ino, inode generation, parent inode, parent inode generation, etc, and is a
> unique match. Or filesystems can have their own encode handlers but store
> similar stuff in these indices. I guess a memcmp could also be done if I
> know
> the length of the auth being used.
Why not use the whole thing?:
fh1->fh_size == fh2->fh_size
&& memcmp(fh1->fh_base, fh2->fh_base, fh1->fh_size) == 0
--b.
>
> > (Cc'ing Greg since he says he had some patches which did something
> > similar, and perhaps he could offer a suggestion.)
>
> OK, will wait for response from Greg.
>
> thanks,
>
> - KK
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/11] nfsd: CHANGE old function calls to new calls
2009-04-24 19:25 ` J. Bruce Fields
@ 2009-04-26 11:16 ` Krishna Kumar2
0 siblings, 0 replies; 19+ messages in thread
From: Krishna Kumar2 @ 2009-04-26 11:16 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Greg Banks, jlayton, Krishna Kumar, linux-nfs
"J. Bruce Fields" <bfields@fieldses.org> wrote on 04/25/2009 12:55:06 AM:
> Why not use the whole thing?:
>
> fh1->fh_size == fh2->fh_size
> && memcmp(fh1->fh_base, fh2->fh_base, fh1->fh_size) == 0
OK - so I just add a knfsd_fh in the cache and compare fh's. That
sounds good, let me test and see what is the result.
thanks,
- KK
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-04-26 11:17 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20090325133607.16437.33288.sendpatchset@localhost.localdomain>
[not found] ` <20090325133607.16437.33288.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-04-17 18:46 ` [PATCH 0/11] nfsd: Summary of "Improve NFS server performance" J. Bruce Fields
2009-04-17 19:30 ` Krishna Kumar2
2009-04-21 22:57 ` J. Bruce Fields
2009-04-22 5:35 ` Krishna Kumar2
2009-04-22 19:41 ` J. Bruce Fields
[not found] ` <20090325133628.16437.11092.sendpatchset@localhost.localdomain>
[not found] ` <20090325133628.16437.11092.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-04-21 22:48 ` [PATCH 1/11] nfsd: ADD data structure infrastructure J. Bruce Fields
2009-04-22 5:36 ` Krishna Kumar2
2009-04-22 19:43 ` J. Bruce Fields
2009-04-21 23:05 ` J. Bruce Fields
[not found] ` <20090325133647.16437.59567.sendpatchset@localhost.localdomain>
[not found] ` <20090325133647.16437.59567.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-04-22 2:54 ` [PATCH 2/11] nfsd: ADD new function infrastructure J. Bruce Fields
2009-04-22 5:37 ` Krishna Kumar2
[not found] ` <20090325133707.16437.66360.sendpatchset@localhost.localdomain>
[not found] ` <20090325133707.16437.66360.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-04-22 20:05 ` [PATCH 3/11] nfsd: CHANGE old function calls to new calls J. Bruce Fields
2009-04-23 15:55 ` Krishna Kumar2
2009-04-23 22:39 ` J. Bruce Fields
2009-04-24 16:17 ` Krishna Kumar2
2009-04-24 16:23 ` J. Bruce Fields
2009-04-24 16:58 ` Krishna Kumar2
2009-04-24 19:25 ` J. Bruce Fields
2009-04-26 11:16 ` Krishna Kumar2
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.