* [RFC] parallel directory operations @ 2005-02-19 17:57 Alex Tomas 2005-02-19 18:04 ` [RFC] pdirops: vfs patch Alex Tomas 2005-02-19 18:05 ` [RFC] pdirops: tmpfs patch Alex Tomas 0 siblings, 2 replies; 11+ messages in thread From: Alex Tomas @ 2005-02-19 17:57 UTC (permalink / raw) To: Al Viro; +Cc: LKML, linux-fsdevel, alex Good day Al and all could you review couple patches that implement $subj for vfs and tmpfs. In short the idea is that we can protect operations taking semaphore related for set of names. definitely, protection at vfs layer isn't enough and filesystem will need to protect their own structures by itself, but in some cases vfs patch is enough. for example, tmpfs. during some loads one can see quite high load in /tmp. being mounted as tmpfs on big smp, we can get high contention on i_sem. probably someone could try more-less real load? I wrote simple program that spawn few processes, then chdir to the given directory, then loops creating and unlinking file. The test box is dual PIII-1GHz: run 1: 2 processes create/unlink file on regular tmpfs [root@bob root]# mount -t tmpfs none /test [root@bob root]# (cd /test; time /root/crunl ./f 1000000 2) #1998: 1000000 iterations, create/unlink ./f-0-1998 #1999: 1000000 iterations, create/unlink ./f-1-1999 #384: done #384: done wait for completion ... OK real 0m36.224s user 0m0.823s sys 0m47.994s run 2: 2 processes create/unlink file on tmpfs + pdirops [root@bob root]# mount -t tmpfs -o pdirops none /test [root@bob root]# (cd /test; time /root/crunl ./f 1000000 2) #1992: 1000000 iterations, create/unlink ./f-0-1992 #1993: 1000000 iterations, create/unlink ./f-1-1993 #384: done #384: done wait for completion ... OK real 0m15.108s user 0m0.592s sys 0m29.406s run 3: 1 process creates/unlinks file on regular tmpfs [root@bob root]# mount -t tmpfs none /test [root@bob root]# (cd /test; time /root/crunl ./f 1000000 1) #2004: 1000000 iterations, create/unlink ./f-0-2004 #384: done wait for completion ... OK real 0m11.950s user 0m0.262s sys 0m7.465s run 4: 1 process creates/unlinks file on tmpf + pdirops [root@bob root]# mount -t tmpfs -o pdirops none /test [root@bob root]# (cd /test; time /root/crunl ./f 1000000 1) #2009: 1000000 iterations, create/unlink ./f-0-2009 #384: done wait for completion ... OK real 0m8.047s user 0m0.243s sys 0m7.646s 2 processes creating/unlinking on regular tmpfs cause ~200K context switches: procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 2 0 0 0 1005760 6616 10928 0 0 0 0 1007 215095 1 64 35 2 0 0 0 1005760 6616 10928 0 0 0 0 1007 213580 1 67 32 2 0 0 0 1005760 6616 10928 0 0 0 0 1007 214445 1 63 36 2 0 0 0 1005760 6616 10928 0 0 0 0 1007 216250 1 63 36 2 processes creating/unlinking on tmpfs + pdirops cause ~44 context switches: procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 2 0 1 0 1001824 6632 10912 0 0 0 0 1008 41 2 98 0 2 0 2 0 1002144 6632 10912 0 0 0 0 1008 45 2 98 0 2 0 2 0 1001632 6632 10912 0 0 0 0 1007 47 2 98 0 the next benchmark is rename. two processes generate random name, create file, generate new name, rename created file in new name and unlink: run 5: regular tmpfs [root@bob root]# mount -t tmpfs none /test [root@bob root]# (cd /test; time /root/rndrename ./f 1000000 2) #2036: 1000000 iterations #2037: 1000000 iterations wait for completion ... OK real 1m22.381s user 0m10.254s sys 1m50.214s run 6: tmpfs + pdirops [root@bob root]# mount -t tmpfs -o pdirops none /test [root@bob root]# (cd /test; time /root/rndrename ./f 1000000 2) #2044: 1000000 iterations #2045: 1000000 iterations wait for completion ... OK real 0m39.403s user 0m9.411s sys 1m8.626s thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-19 17:57 [RFC] parallel directory operations Alex Tomas @ 2005-02-19 18:04 ` Alex Tomas 2005-02-20 23:35 ` Jan Blunck 2005-02-19 18:05 ` [RFC] pdirops: tmpfs patch Alex Tomas 1 sibling, 1 reply; 11+ messages in thread From: Alex Tomas @ 2005-02-19 18:04 UTC (permalink / raw) To: Al Viro; +Cc: LKML, linux-fsdevel, alex fs/inode.c | 1 fs/namei.c | 66 ++++++++++++++++++++++++++++++++++++++--------------- include/linux/fs.h | 11 ++++---- 3 files changed, 54 insertions(+), 24 deletions(-) Index: linux-2.6.10/fs/namei.c =================================================================== --- linux-2.6.10.orig/fs/namei.c 2005-01-28 19:32:13.000000000 +0300 +++ linux-2.6.10/fs/namei.c 2005-02-19 20:40:05.763437304 +0300 @@ -104,6 +104,28 @@ * any extra contention... */ +static inline struct semaphore * lock_sem(struct inode *dir, struct qstr *name) +{ + if (IS_PDIROPS(dir)) { + struct super_block *sb; + /* name->hash expected to be already calculated */ + sb = dir->i_sb; + BUG_ON(sb->s_pdirops_sems == NULL); + return sb->s_pdirops_sems + name->hash % sb->s_pdirops_size; + } + return &dir->i_sem; +} + +static inline void lock_dir(struct inode *dir, struct qstr *name) +{ + down(lock_sem(dir, name)); +} + +static inline void unlock_dir(struct inode *dir, struct qstr *name) +{ + up(lock_sem(dir, name)); +} + /* In order to reduce some races, while at the same time doing additional * checking and hopefully speeding things up, we copy filenames to the * kernel data space before using them.. @@ -380,7 +402,7 @@ struct dentry * result; struct inode *dir = parent->d_inode; - down(&dir->i_sem); + lock_dir(dir, name); /* * First re-do the cached lookup just in case it was created * while we waited for the directory semaphore.. @@ -406,7 +428,7 @@ else result = dentry; } - up(&dir->i_sem); + unlock_dir(dir, name); return result; } @@ -414,7 +436,7 @@ * Uhhuh! Nasty case: the cache was re-populated while * we waited on the semaphore. Need to revalidate. */ - up(&dir->i_sem); + unlock_dir(dir, name); if (result->d_op && result->d_op->d_revalidate) { if (!result->d_op->d_revalidate(result, nd) && !d_invalidate(result)) { dput(result); @@ -1182,12 +1204,26 @@ /* * p1 and p2 should be directories on the same fs. */ -struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) +struct dentry *lock_rename(struct dentry *p1, struct qstr *n1, + struct dentry *p2, struct qstr *n2) { struct dentry *p; if (p1 == p2) { - down(&p1->d_inode->i_sem); + if (IS_PDIROPS(p1->d_inode)) { + unsigned int h1, h2; + h1 = n1->hash % p1->d_inode->i_sb->s_pdirops_size; + h2 = n2->hash % p2->d_inode->i_sb->s_pdirops_size; + if (h1 < h2) { + lock_dir(p1->d_inode, n1); + lock_dir(p2->d_inode, n2); + } else if (h1 > h2) { + lock_dir(p2->d_inode, n2); + lock_dir(p1->d_inode, n1); + } else + lock_dir(p1->d_inode, n1); + } else + down(&p1->d_inode->i_sem); return NULL; } @@ -1195,31 +1231,35 @@ for (p = p1; p->d_parent != p; p = p->d_parent) { if (p->d_parent == p2) { - down(&p2->d_inode->i_sem); - down(&p1->d_inode->i_sem); + lock_dir(p2->d_inode, n2); + lock_dir(p1->d_inode, n1); return p; } } for (p = p2; p->d_parent != p; p = p->d_parent) { if (p->d_parent == p1) { - down(&p1->d_inode->i_sem); - down(&p2->d_inode->i_sem); + lock_dir(p1->d_inode, n1); + lock_dir(p2->d_inode, n2); return p; } } - down(&p1->d_inode->i_sem); - down(&p2->d_inode->i_sem); + lock_dir(p1->d_inode, n1); + lock_dir(p2->d_inode, n2); return NULL; } -void unlock_rename(struct dentry *p1, struct dentry *p2) +void unlock_rename(struct dentry *p1, struct qstr *n1, + struct dentry *p2, struct qstr *n2) { - up(&p1->d_inode->i_sem); + unlock_dir(p1->d_inode, n1); if (p1 != p2) { - up(&p2->d_inode->i_sem); + unlock_dir(p2->d_inode, n2); up(&p1->d_inode->i_sb->s_vfs_rename_sem); + } else if (IS_PDIROPS(p1->d_inode) && + n1->hash != n2->hash) { + unlock_dir(p2->d_inode, n2); } } @@ -1386,13 +1426,13 @@ dir = nd->dentry; nd->flags &= ~LOOKUP_PARENT; - down(&dir->d_inode->i_sem); + lock_dir(dir->d_inode, &nd->last); dentry = __lookup_hash(&nd->last, nd->dentry, nd); do_last: error = PTR_ERR(dentry); if (IS_ERR(dentry)) { - up(&dir->d_inode->i_sem); + unlock_dir(dir->d_inode, &nd->last); goto exit; } @@ -1401,7 +1441,7 @@ if (!IS_POSIXACL(dir->d_inode)) mode &= ~current->fs->umask; error = vfs_create(dir->d_inode, dentry, mode, nd); - up(&dir->d_inode->i_sem); + unlock_dir(dir->d_inode, &nd->last); dput(nd->dentry); nd->dentry = dentry; if (error) @@ -1415,7 +1455,7 @@ /* * It already exists. */ - up(&dir->d_inode->i_sem); + unlock_dir(dir->d_inode, &nd->last); error = -EEXIST; if (flag & O_EXCL) @@ -1499,7 +1539,7 @@ goto exit; } dir = nd->dentry; - down(&dir->d_inode->i_sem); + lock_dir(dir->d_inode, &nd->last); dentry = __lookup_hash(&nd->last, nd->dentry, nd); putname(nd->last.name); goto do_last; @@ -1517,7 +1557,7 @@ { struct dentry *dentry; - down(&nd->dentry->d_inode->i_sem); + lock_dir(nd->dentry->d_inode, &nd->last); dentry = ERR_PTR(-EEXIST); if (nd->last_type != LAST_NORM) goto fail; @@ -1602,7 +1642,7 @@ } dput(dentry); } - up(&nd.dentry->d_inode->i_sem); + unlock_dir(nd.dentry->d_inode, &nd.last); path_release(&nd); out: putname(tmp); @@ -1656,7 +1696,7 @@ error = vfs_mkdir(nd.dentry->d_inode, dentry, mode); dput(dentry); } - up(&nd.dentry->d_inode->i_sem); + unlock_dir(nd.dentry->d_inode, &nd.last); path_release(&nd); out: putname(tmp); @@ -1757,14 +1797,14 @@ error = -EBUSY; goto exit1; } - down(&nd.dentry->d_inode->i_sem); + lock_dir(nd.dentry->d_inode, &nd.last); dentry = lookup_hash(&nd.last, nd.dentry); error = PTR_ERR(dentry); if (!IS_ERR(dentry)) { error = vfs_rmdir(nd.dentry->d_inode, dentry); dput(dentry); } - up(&nd.dentry->d_inode->i_sem); + unlock_dir(nd.dentry->d_inode, &nd.last); exit1: path_release(&nd); exit: @@ -1826,7 +1866,7 @@ error = -EISDIR; if (nd.last_type != LAST_NORM) goto exit1; - down(&nd.dentry->d_inode->i_sem); + lock_dir(nd.dentry->d_inode, &nd.last); dentry = lookup_hash(&nd.last, nd.dentry); error = PTR_ERR(dentry); if (!IS_ERR(dentry)) { @@ -1840,7 +1880,7 @@ exit2: dput(dentry); } - up(&nd.dentry->d_inode->i_sem); + unlock_dir(nd.dentry->d_inode, &nd.last); if (inode) iput(inode); /* truncate the inode here */ exit1: @@ -1902,7 +1942,7 @@ error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO); dput(dentry); } - up(&nd.dentry->d_inode->i_sem); + unlock_dir(nd.dentry->d_inode, &nd.last); path_release(&nd); out: putname(to); @@ -1986,7 +2026,7 @@ error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry); dput(new_dentry); } - up(&nd.dentry->d_inode->i_sem); + unlock_dir(nd.dentry->d_inode, &nd.last); out_release: path_release(&nd); out: @@ -2174,7 +2214,7 @@ if (newnd.last_type != LAST_NORM) goto exit2; - trap = lock_rename(new_dir, old_dir); + trap = lock_rename(new_dir, &newnd.last, old_dir, &oldnd.last); old_dentry = lookup_hash(&oldnd.last, old_dir); error = PTR_ERR(old_dentry); @@ -2212,7 +2252,7 @@ exit4: dput(old_dentry); exit3: - unlock_rename(new_dir, old_dir); + unlock_rename(new_dir, &newnd.last, old_dir, &oldnd.last); exit2: path_release(&newnd); exit1: Index: linux-2.6.10/fs/super.c =================================================================== --- linux-2.6.10.orig/fs/super.c 2005-01-28 19:32:13.000000000 +0300 +++ linux-2.6.10/fs/super.c 2005-02-19 18:11:50.000000000 +0300 @@ -172,6 +172,10 @@ down_write(&s->s_umount); fs->kill_sb(s); put_filesystem(fs); + if (s->s_flags & S_PDIROPS) { + BUG_ON(s->s_pdirops_sems == NULL); + kfree(s->s_pdirops_sems); + } put_super(s); } } @@ -791,6 +795,22 @@ EXPORT_SYMBOL(get_sb_single); +static int init_pdirops_sems(struct super_block *sb) +{ + int i; + + /* XXX: should be memsize dependent? -bzzz */ + sb->s_pdirops_size = 1024; + + sb->s_pdirops_sems = kmalloc(sizeof(struct semaphore) * + sb->s_pdirops_size, GFP_KERNEL); + if (sb->s_pdirops_sems == NULL) + return -ENOMEM; + for (i = 0; i < sb->s_pdirops_size; i++) + sema_init(&sb->s_pdirops_sems[i], 1); + return 0; +} + struct vfsmount * do_kern_mount(const char *fstype, int flags, const char *name, void *data) { @@ -827,6 +847,11 @@ error = security_sb_kern_mount(sb, secdata); if (error) goto out_sb; + if (sb->s_flags & S_PDIROPS) { + error = init_pdirops_sems(sb); + if (error) + goto out_sb; + } mnt->mnt_sb = sb; mnt->mnt_root = dget(sb->s_root); mnt->mnt_mountpoint = sb->s_root; Index: linux-2.6.10/include/linux/fs.h =================================================================== --- linux-2.6.10.orig/include/linux/fs.h 2005-01-28 19:32:15.000000000 +0300 +++ linux-2.6.10/include/linux/fs.h 2005-02-19 18:11:50.000000000 +0300 @@ -150,6 +150,7 @@ #define S_DIRSYNC 64 /* Directory modifications are synchronous */ #define S_NOCMTIME 128 /* Do not update file c/mtime */ #define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */ +#define S_PDIROPS 512 /* Parallel directory operations */ /* * Note that nosuid etc flags are inode-specific: setting some file-system @@ -180,6 +181,7 @@ #define IS_NODIRATIME(inode) __IS_FLG(inode, MS_NODIRATIME) #define IS_POSIXACL(inode) __IS_FLG(inode, MS_POSIXACL) #define IS_ONE_SECOND(inode) __IS_FLG(inode, MS_ONE_SECOND) +#define IS_PDIROPS(inode) __IS_FLG(inode, S_PDIROPS) #define IS_DEADDIR(inode) ((inode)->i_flags & S_DEAD) #define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME) @@ -797,6 +799,8 @@ * even looking at it. You had been warned. */ struct semaphore s_vfs_rename_sem; /* Kludge */ + struct semaphore *s_pdirops_sems; + int s_pdirops_size; }; /* Index: linux-2.6.10/include/linux/namei.h =================================================================== --- linux-2.6.10.orig/include/linux/namei.h 2005-01-28 19:29:50.000000000 +0300 +++ linux-2.6.10/include/linux/namei.h 2005-02-19 19:59:19.291357592 +0300 @@ -69,8 +69,10 @@ extern int follow_down(struct vfsmount **, struct dentry **); extern int follow_up(struct vfsmount **, struct dentry **); -extern struct dentry *lock_rename(struct dentry *, struct dentry *); -extern void unlock_rename(struct dentry *, struct dentry *); +extern struct dentry *lock_rename(struct dentry *, struct qstr *, + struct dentry *, struct qstr *); +extern void unlock_rename(struct dentry *, struct qstr *, + struct dentry *, struct qstr *); static inline void nd_set_link(struct nameidata *nd, char *path) { ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-19 18:04 ` [RFC] pdirops: vfs patch Alex Tomas @ 2005-02-20 23:35 ` Jan Blunck 2005-02-20 23:43 ` Alex Tomas 0 siblings, 1 reply; 11+ messages in thread From: Jan Blunck @ 2005-02-20 23:35 UTC (permalink / raw) To: Alex Tomas; +Cc: Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel Alex Tomas wrote: > +static inline struct semaphore * lock_sem(struct inode *dir, struct qstr *name) > +{ > + if (IS_PDIROPS(dir)) { > + struct super_block *sb; > + /* name->hash expected to be already calculated */ > + sb = dir->i_sb; > + BUG_ON(sb->s_pdirops_sems == NULL); > + return sb->s_pdirops_sems + name->hash % sb->s_pdirops_size; > + } > + return &dir->i_sem; > +} > + > +static inline void lock_dir(struct inode *dir, struct qstr *name) > +{ > + down(lock_sem(dir, name)); > +} > + > @@ -1182,12 +1204,26 @@ > /* > * p1 and p2 should be directories on the same fs. > */ > -struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) > +struct dentry *lock_rename(struct dentry *p1, struct qstr *n1, > + struct dentry *p2, struct qstr *n2) > { > struct dentry *p; > > if (p1 == p2) { > - down(&p1->d_inode->i_sem); > + if (IS_PDIROPS(p1->d_inode)) { > + unsigned int h1, h2; > + h1 = n1->hash % p1->d_inode->i_sb->s_pdirops_size; > + h2 = n2->hash % p2->d_inode->i_sb->s_pdirops_size; > + if (h1 < h2) { > + lock_dir(p1->d_inode, n1); > + lock_dir(p2->d_inode, n2); > + } else if (h1 > h2) { > + lock_dir(p2->d_inode, n2); > + lock_dir(p1->d_inode, n1); > + } else > + lock_dir(p1->d_inode, n1); > + } else > + down(&p1->d_inode->i_sem); > return NULL; > } > > @@ -1195,31 +1231,35 @@ > > for (p = p1; p->d_parent != p; p = p->d_parent) { > if (p->d_parent == p2) { > - down(&p2->d_inode->i_sem); > - down(&p1->d_inode->i_sem); > + lock_dir(p2->d_inode, n2); > + lock_dir(p1->d_inode, n1); > return p; > } > } > > for (p = p2; p->d_parent != p; p = p->d_parent) { > if (p->d_parent == p1) { > - down(&p1->d_inode->i_sem); > - down(&p2->d_inode->i_sem); > + lock_dir(p1->d_inode, n1); > + lock_dir(p2->d_inode, n2); > return p; > } > } > > - down(&p1->d_inode->i_sem); > - down(&p2->d_inode->i_sem); > + lock_dir(p1->d_inode, n1); > + lock_dir(p2->d_inode, n2); > return NULL; > } With luck you have s_pdirops_size (or 1024) different renames altering concurrently one directory inode. Therefore you need a lock protecting your filesystem data. This is basically the job done by i_sem. So in my opinion you only move "The Problem" from the VFS to the lowlevel filesystems. But then there is no need for i_sem or your s_pdirops_sems anymore. Regards, Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-20 23:35 ` Jan Blunck @ 2005-02-20 23:43 ` Alex Tomas 0 siblings, 0 replies; 11+ messages in thread From: Alex Tomas @ 2005-02-20 23:43 UTC (permalink / raw) To: Jan Blunck Cc: Alex Tomas, Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel >>>>> Jan Blunck (JB) writes: JB> With luck you have s_pdirops_size (or 1024) different renames altering JB> concurrently one directory inode. Therefore you need a lock protecting JB> your filesystem data. This is basically the job done by i_sem. So in JB> my opinion you only move "The Problem" from the VFS to the lowlevel JB> filesystems. But then there is no need for i_sem or your JB> s_pdirops_sems anymore. 1) i_sem protects dcache too 2) tmpfs has no "own" data, so we can use it this way (see 2nd patch) 3) I have pdirops patch for ext3, but it needs some cleaning ... thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: tmpfs patch 2005-02-19 17:57 [RFC] parallel directory operations Alex Tomas 2005-02-19 18:04 ` [RFC] pdirops: vfs patch Alex Tomas @ 2005-02-19 18:05 ` Alex Tomas 1 sibling, 0 replies; 11+ messages in thread From: Alex Tomas @ 2005-02-19 18:05 UTC (permalink / raw) To: Al Viro; +Cc: LKML, linux-fsdevel, alex Index: linux-2.6.10/mm/shmem.c =================================================================== --- linux-2.6.10.orig/mm/shmem.c 2005-01-28 19:32:16.000000000 +0300 +++ linux-2.6.10/mm/shmem.c 2005-02-19 20:05:32.642599576 +0300 @@ -1849,7 +1849,7 @@ #endif }; -static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes) +static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes, struct super_block *sb) { char *this_char, *value, *rest; @@ -1858,6 +1858,9 @@ continue; if ((value = strchr(this_char,'=')) != NULL) { *value++ = 0; + } else if (!strcmp(this_char,"pdirops")) { + sb->s_flags |= S_PDIROPS; + continue; } else { printk(KERN_ERR "tmpfs: No value for mount option '%s'\n", @@ -1928,7 +1931,7 @@ max_blocks = sbinfo->max_blocks; max_inodes = sbinfo->max_inodes; } - if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes)) + if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes, sb)) return -EINVAL; /* Keep it simple: disallow limited <-> unlimited remount */ if ((max_blocks || max_inodes) == !sbinfo) @@ -1978,7 +1981,7 @@ inodes = blocks; if (shmem_parse_options(data, &mode, - &uid, &gid, &blocks, &inodes)) + &uid, &gid, &blocks, &inodes, sb)) return -EINVAL; } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch @ 2005-02-22 11:54 Jan Blunck 2005-02-22 12:04 ` Alex Tomas 0 siblings, 1 reply; 11+ messages in thread From: Jan Blunck @ 2005-02-22 11:54 UTC (permalink / raw) To: Alex Tomas; +Cc: Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel Quoting Alex Tomas <alex@clusterfs.com>: > > 1) i_sem protects dcache too Where? i_sem is the per-inode lock, and shouldn't be used else. > 2) tmpfs has no "own" data, so we can use it this way (see 2nd patch) > 3) I have pdirops patch for ext3, but it needs some cleaning ... I think you didn't get my point. 1) Your approach is duplicating the locking effort for regular filesystem (like ext2): a) locking with s_pdirops_sems b) locking the low-level filesystem data It's cool that it speeds up tmpfs, but I don't think that this legatimate the doubled locking for every other filesystem. I'm not sure that it also increases performance for regular filesystems, if you do the locking right. 2) In my opinion, a superblock-wide semaphore array which allows 1024 different (different names and different operations) accesses to ONE single inode (which is the data it should protect) is not a good idea. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-22 11:54 [RFC] pdirops: vfs patch Jan Blunck @ 2005-02-22 12:04 ` Alex Tomas 2005-02-22 13:00 ` Jan Blunck 0 siblings, 1 reply; 11+ messages in thread From: Alex Tomas @ 2005-02-22 12:04 UTC (permalink / raw) To: Jan Blunck Cc: Alex Tomas, Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel >>>>> Jan Blunck (JB) writes: >> 1) i_sem protects dcache too JB> Where? i_sem is the per-inode lock, and shouldn't be used else. read comments in fs/namei.c:read_lookup() >> 2) tmpfs has no "own" data, so we can use it this way (see 2nd patch) >> 3) I have pdirops patch for ext3, but it needs some cleaning ... JB> I think you didn't get my point. JB> 1) Your approach is duplicating the locking effort for regular filesystem JB> (like ext2): JB> a) locking with s_pdirops_sems JB> b) locking the low-level filesystem data JB> It's cool that it speeds up tmpfs, but I don't think that this legatimate the JB> doubled locking for every other filesystem. JB> I'm not sure that it also increases performance for regular filesystems, if you JB> do the locking right. we've already done this for ext3. it works. it speeds some loads up significantly. especially on big directories. and you can control this via mount option, so almost zero cost for fs that doesn't support pdirops. JB> 2) In my opinion, a superblock-wide semaphore array which allows 1024 JB> different (different names and different operations) accesses to ONE single JB> inode (which is the data it should protect) is not a good idea. yes, it has some weakness, i'm reworking vfs patch to avoid inter-inode collisions. thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-22 12:04 ` Alex Tomas @ 2005-02-22 13:00 ` Jan Blunck 2005-02-22 13:23 ` Alex Tomas 0 siblings, 1 reply; 11+ messages in thread From: Jan Blunck @ 2005-02-22 13:00 UTC (permalink / raw) To: Alex Tomas; +Cc: Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel Quoting Alex Tomas <alex@clusterfs.com>: > >>>>> Jan Blunck (JB) writes: > > >> 1) i_sem protects dcache too > > JB> Where? i_sem is the per-inode lock, and shouldn't be used else. > > read comments in fs/namei.c:read_lookup() > i_sem does NOT protect the dcache. Also not in real_lookup(). The lock must be acquired for ->lookup() and because we might sleep on i_sem, we have to get it early and check for repopulation of the dcache. > we've already done this for ext3. it works. > it speeds some loads up significantly. > especially on big directories. > and you can control this via mount option, > so almost zero cost for fs that doesn't support pdirops. Ok. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-22 13:00 ` Jan Blunck @ 2005-02-22 13:23 ` Alex Tomas 2005-02-22 13:41 ` Jan Blunck 0 siblings, 1 reply; 11+ messages in thread From: Alex Tomas @ 2005-02-22 13:23 UTC (permalink / raw) To: Jan Blunck Cc: Alex Tomas, Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel >>>>> Jan Blunck (JB) writes: JB> i_sem does NOT protect the dcache. Also not in real_lookup(). The lock must be JB> acquired for ->lookup() and because we might sleep on i_sem, we have to get it JB> early and check for repopulation of the dcache. dentry is part of dcache, right? i_sem protects dentry from being returned with incomplete data inside, right? thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-22 13:23 ` Alex Tomas @ 2005-02-22 13:41 ` Jan Blunck 2005-02-23 13:55 ` Alex Tomas 0 siblings, 1 reply; 11+ messages in thread From: Jan Blunck @ 2005-02-22 13:41 UTC (permalink / raw) To: Alex Tomas; +Cc: Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel Quoting Alex Tomas <alex@clusterfs.com>: > >>>>> Jan Blunck (JB) writes: > > JB> i_sem does NOT protect the dcache. Also not in real_lookup(). The lock > must be > JB> acquired for ->lookup() and because we might sleep on i_sem, we have to > get it > JB> early and check for repopulation of the dcache. > > dentry is part of dcache, right? i_sem protects dentry from being > returned with incomplete data inside, right? > Nope, d_alloc() is setting d_flags to DCACHE_UNHASHED. Therefore it is not found by __d_lookup() until it is rehashed which is implicit done by ->lookup(). Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch 2005-02-22 13:41 ` Jan Blunck @ 2005-02-23 13:55 ` Alex Tomas 0 siblings, 0 replies; 11+ messages in thread From: Alex Tomas @ 2005-02-23 13:55 UTC (permalink / raw) To: Jan Blunck Cc: Alex Tomas, Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel >>>>> Jan Blunck (JB) writes: JB> Nope, d_alloc() is setting d_flags to DCACHE_UNHASHED. Therefore it is not found JB> by __d_lookup() until it is rehashed which is implicit done by ->lookup(). that means we can have two processes allocated dentry for same name. they'll call ->lookup() each against own dentry, fill them and hash. so, we'll get two equal dentries. is that OK? thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-02-23 13:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-02-19 17:57 [RFC] parallel directory operations Alex Tomas 2005-02-19 18:04 ` [RFC] pdirops: vfs patch Alex Tomas 2005-02-20 23:35 ` Jan Blunck 2005-02-20 23:43 ` Alex Tomas 2005-02-19 18:05 ` [RFC] pdirops: tmpfs patch Alex Tomas 2005-02-22 11:54 [RFC] pdirops: vfs patch Jan Blunck 2005-02-22 12:04 ` Alex Tomas 2005-02-22 13:00 ` Jan Blunck 2005-02-22 13:23 ` Alex Tomas 2005-02-22 13:41 ` Jan Blunck 2005-02-23 13:55 ` Alex Tomas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).