linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] parallel directory operations
@ 2005-02-19 17:57 Alex Tomas
  2005-02-19 18:04 ` [RFC] pdirops: vfs patch Alex Tomas
  2005-02-19 18:05 ` [RFC] pdirops: tmpfs patch Alex Tomas
  0 siblings, 2 replies; 11+ messages in thread
From: Alex Tomas @ 2005-02-19 17:57 UTC (permalink / raw)
  To: Al Viro; +Cc: LKML, linux-fsdevel, alex


Good day Al and all

could you review couple patches that implement $subj
for vfs and tmpfs. In short the idea is that we can
protect operations taking semaphore related for set
of names. definitely, protection at vfs layer isn't
enough and filesystem will need to protect their own
structures by itself, but in some cases vfs patch is
enough. for example, tmpfs. during some loads one can
see quite high load in /tmp. being mounted as tmpfs
on big smp, we can get high contention on i_sem.

probably someone could try more-less real load?

I wrote simple program that spawn few processes, then
chdir to the given directory, then loops creating and
unlinking file. The test box is dual PIII-1GHz:


run 1: 2 processes create/unlink file on regular tmpfs

[root@bob root]# mount -t tmpfs none /test
[root@bob root]# (cd /test; time  /root/crunl ./f 1000000 2)
#1998: 1000000 iterations, create/unlink ./f-0-1998
#1999: 1000000 iterations, create/unlink ./f-1-1999
#384: done
#384: done
wait for completion ... OK
real    0m36.224s
user    0m0.823s
sys     0m47.994s


run 2: 2 processes create/unlink file on tmpfs + pdirops

[root@bob root]# mount -t tmpfs -o pdirops none /test
[root@bob root]# (cd /test; time  /root/crunl ./f 1000000 2)
#1992: 1000000 iterations, create/unlink ./f-0-1992
#1993: 1000000 iterations, create/unlink ./f-1-1993
#384: done
#384: done
wait for completion ... OK
real    0m15.108s
user    0m0.592s
sys     0m29.406s


run 3: 1 process creates/unlinks file on regular tmpfs

[root@bob root]# mount -t tmpfs none /test
[root@bob root]# (cd /test; time  /root/crunl ./f 1000000 1)
#2004: 1000000 iterations, create/unlink ./f-0-2004
#384: done
wait for completion ... OK
real    0m11.950s
user    0m0.262s
sys     0m7.465s

run 4: 1 process creates/unlinks file on tmpf + pdirops

[root@bob root]# mount -t tmpfs -o pdirops none /test
[root@bob root]# (cd /test; time  /root/crunl ./f 1000000 1)
#2009: 1000000 iterations, create/unlink ./f-0-2009
#384: done
wait for completion ... OK
real    0m8.047s
user    0m0.243s
sys     0m7.646s


2 processes creating/unlinking on regular tmpfs cause ~200K context switches:

   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 2  0  0      0 1005760   6616  10928    0    0     0     0 1007 215095  1 64 35
 2  0  0      0 1005760   6616  10928    0    0     0     0 1007 213580  1 67 32
 2  0  0      0 1005760   6616  10928    0    0     0     0 1007 214445  1 63 36
 2  0  0      0 1005760   6616  10928    0    0     0     0 1007 216250  1 63 36


2 processes creating/unlinking on tmpfs + pdirops cause ~44 context switches:

   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 2  0  1      0 1001824   6632  10912    0    0     0     0 1008    41  2 98  0
 2  0  2      0 1002144   6632  10912    0    0     0     0 1008    45  2 98  0
 2  0  2      0 1001632   6632  10912    0    0     0     0 1007    47  2 98  0


the next benchmark is rename. two processes generate random name, create file,
generate new name, rename created file in new name and unlink:

run 5: regular tmpfs

[root@bob root]# mount -t tmpfs none /test
[root@bob root]# (cd /test; time  /root/rndrename ./f 1000000 2)
#2036: 1000000 iterations
#2037: 1000000 iterations
wait for completion ... OK
real    1m22.381s
user    0m10.254s
sys     1m50.214s

run 6: tmpfs + pdirops

[root@bob root]# mount -t tmpfs -o pdirops none /test
[root@bob root]# (cd /test; time  /root/rndrename ./f 1000000 2)
#2044: 1000000 iterations
#2045: 1000000 iterations
wait for completion ... OK

real    0m39.403s
user    0m9.411s
sys     1m8.626s


thanks, Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [RFC] pdirops: vfs patch
@ 2005-02-22 11:54 Jan Blunck
  2005-02-22 12:04 ` Alex Tomas
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Blunck @ 2005-02-22 11:54 UTC (permalink / raw)
  To: Alex Tomas; +Cc: Alexander Viro, Linux-Kernel Mailing List, linux-fsdevel

Quoting Alex Tomas <alex@clusterfs.com>:

>
> 1) i_sem protects dcache too

Where? i_sem is the per-inode lock, and shouldn't be used else.

> 2) tmpfs has no "own" data, so we can use it this way (see 2nd patch)
> 3) I have pdirops patch for ext3, but it needs some cleaning ...

I think you didn't get my point.

1) Your approach is duplicating the locking effort for regular filesystem
(like ext2):
a) locking with s_pdirops_sems
b) locking the low-level filesystem data
It's cool that it speeds up tmpfs, but I don't think that this legatimate the
doubled locking for every other filesystem.
I'm not sure that it also increases performance for regular filesystems, if you
do the locking right.

2) In my opinion, a superblock-wide semaphore array which allows 1024
different (different names and different operations) accesses to ONE single
inode (which is the data it should protect) is not a good idea.

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-02-23 13:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-19 17:57 [RFC] parallel directory operations Alex Tomas
2005-02-19 18:04 ` [RFC] pdirops: vfs patch Alex Tomas
2005-02-20 23:35   ` Jan Blunck
2005-02-20 23:43     ` Alex Tomas
2005-02-19 18:05 ` [RFC] pdirops: tmpfs patch Alex Tomas
2005-02-22 11:54 [RFC] pdirops: vfs patch Jan Blunck
2005-02-22 12:04 ` Alex Tomas
2005-02-22 13:00   ` Jan Blunck
2005-02-22 13:23     ` Alex Tomas
2005-02-22 13:41       ` Jan Blunck
2005-02-23 13:55         ` Alex Tomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).