linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] remove BKL from ext2's readdir
@ 2003-03-15 10:13 Alex Tomas
  2003-03-15 10:36 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2003-03-15 10:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: ext2-devel, Andrew Morton


hi!

I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.
As far as I see in sources, there is no need in this BKL. So, here is small
patch and some benchmarks. of course, the result was expected pretty much ;)


             500         1000
readdir+BKL: 0m11.793s   0m23.403s
readdir-BKL: 0m6.060s    0m12.113s

description: two processes read own dir populated by files (500 and 1000 files).
this repeats for 100000 times. the iron is dual 1GHz P3.


diff -uNr linux/fs/ext2/dir.c edited/fs/ext2/dir.c
--- linux/fs/ext2/dir.c	Sat Mar 15 13:08:24 2003
+++ edited/fs/ext2/dir.c	Sat Mar 15 13:08:11 2003
@@ -259,8 +259,6 @@
 	int need_revalidate = (filp->f_version != inode->i_version);
 	int ret = 0;
 
-	lock_kernel();
-
 	if (pos > inode->i_size - EXT2_DIR_REC_LEN(1))
 		goto done;
 
@@ -313,7 +311,6 @@
 	filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
 	filp->f_version = inode->i_version;
 	UPDATE_ATIME(inode);
-	unlock_kernel();
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 10:13 [PATCH] remove BKL from ext2's readdir Alex Tomas
@ 2003-03-15 10:36 ` Andrew Morton
  2003-03-15 11:03   ` Andrew Morton
  2003-03-15 16:31   ` Robert Love
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2003-03-15 10:36 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel

Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> 
> hi!
> 
> I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.

Yes, I had this in -mm for ages, seem to have lost it.  Also removal of BKL
from lseek().

The theory is that the lock is there to avoid f_pos races against lseek.  We
ended up deciding that the way to address this is to ensure that all readdir
implementations do:

foo_readdir()
{
	loff_t pos = file->f_pos;

	....
	<code which doesn't touch file->f_pos, but which modifies pos>
	...

	file->f_pos = pos;
}

ext2 does this right and does not need the lock_kernel().  Once all
filesystems have been audited (and, if necessary, fixed) we can remove the
BKL from lseek also.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 10:36 ` Andrew Morton
@ 2003-03-15 11:03   ` Andrew Morton
  2003-03-15 16:22     ` Alex Tomas
  2003-03-15 16:31   ` Robert Love
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2003-03-15 11:03 UTC (permalink / raw)
  To: bzzz, linux-kernel, ext2-devel

Andrew Morton <akpm@digeo.com> wrote:
>
> Alex Tomas <bzzz@tmi.comex.ru> wrote:
> >
> > 
> > hi!
> > 
> > I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.
> 
> Yes, I had this in -mm for ages, seem to have lost it.  Also removal of BKL
> from lseek().
> 
> The theory is that the lock is there to avoid f_pos races against lseek.  We
> ended up deciding that the way to address this is to ensure that all readdir
> implementations do:
> 

hm, no.  lseek is using the directory's i_sem now, and readdir runs
under that too.  So we should be able to remove lock_kernel() from
the readdir implementation of all filesystems which are using
generic_file_llseek().

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 11:03   ` Andrew Morton
@ 2003-03-15 16:22     ` Alex Tomas
  2003-03-15 20:11       ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2003-03-15 16:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bzzz, linux-kernel, ext2-devel

>>>>> Andrew Morton (AM) writes:

 AM> hm, no.  lseek is using the directory's i_sem now, and readdir
 AM> runs under that too.  So we should be able to remove
 AM> lock_kernel() from the readdir implementation of all filesystems
 AM> which are using generic_file_llseek().

looks like only coda and ntfs use generic_file_llseek(). other use
default_llseek(). what's the reason do not use generic_file_llseek().
historical only? or not?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 10:36 ` Andrew Morton
  2003-03-15 11:03   ` Andrew Morton
@ 2003-03-15 16:31   ` Robert Love
  1 sibling, 0 replies; 11+ messages in thread
From: Robert Love @ 2003-03-15 16:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alex Tomas, linux-kernel, ext2-devel

On Sat, 2003-03-15 at 05:36, Andrew Morton wrote:

> > I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.
> 
> Yes, I had this in -mm for ages, seem to have lost it.  Also removal of BKL
> from lseek().

I moved lseek() from the BKL to the i_sem in early 2.5.

	Robert Love


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 20:11       ` Andrew Morton
@ 2003-03-15 20:09         ` Alex Tomas
  2003-03-15 20:24           ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2003-03-15 20:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alex Tomas, linux-kernel, ext2-devel

>>>>> Andrew Morton (AM) writes:


 AM> grep again.

 AM> I've made the change to ext2 and ext3.  Other filesystems will
 AM> require some thought to verify that the lock_kernel()s are not
 AM> protecting against some other random codepath.


hmm:

[root@proto edited]# head Makefile 
VERSION = 2
PATCHLEVEL = 5
SUBLEVEL = 64
EXTRAVERSION =


fs/ext2/dir.c:

struct file_operations ext2_dir_operations = {
        .read           = generic_read_dir,
        .readdir        = ext2_readdir,
        .ioctl          = ext2_ioctl,
        .fsync          = ext2_sync_file,
};



fs/read_write.c:

static inline loff_t llseek(struct file *file, loff_t offset, int origin)
{
        loff_t (*fn)(struct file *, loff_t, int);

        fn = default_llseek;
        if (file->f_op && file->f_op->llseek)
                fn = file->f_op->llseek;
        return fn(file, offset, origin);
}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 16:22     ` Alex Tomas
@ 2003-03-15 20:11       ` Andrew Morton
  2003-03-15 20:09         ` Alex Tomas
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2003-03-15 20:11 UTC (permalink / raw)
  To: Alex Tomas; +Cc: bzzz, linux-kernel, ext2-devel

Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> >>>>> Andrew Morton (AM) writes:
> 
>  AM> hm, no.  lseek is using the directory's i_sem now, and readdir
>  AM> runs under that too.  So we should be able to remove
>  AM> lock_kernel() from the readdir implementation of all filesystems
>  AM> which are using generic_file_llseek().
> 
> looks like only coda and ntfs use generic_file_llseek(). other use
> default_llseek(). what's the reason do not use generic_file_llseek().
> historical only? or not?

grep again.

I've made the change to ext2 and ext3.  Other filesystems will require
some thought to verify that the lock_kernel()s are not protecting
against some other random codepath.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 20:24           ` Andrew Morton
@ 2003-03-15 20:19             ` Alex Tomas
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Tomas @ 2003-03-15 20:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alex Tomas, linux-kernel, ext2-devel

>>>>> Andrew Morton (AM) writes:

 AM> ah, OK.  How about this?

looks fine


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 20:09         ` Alex Tomas
@ 2003-03-15 20:24           ` Andrew Morton
  2003-03-15 20:19             ` Alex Tomas
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2003-03-15 20:24 UTC (permalink / raw)
  To: Alex Tomas; +Cc: bzzz, linux-kernel, ext2-devel

Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> fs/ext2/dir.c:
> 
> struct file_operations ext2_dir_operations = {
>         .read           = generic_read_dir,
>         .readdir        = ext2_readdir,
>         .ioctl          = ext2_ioctl,
>         .fsync          = ext2_sync_file,
> };

ah, OK.  How about this?

diff -puN fs/ext2/dir.c~lseek-ext2_readdir fs/ext2/dir.c
--- 25/fs/ext2/dir.c~lseek-ext2_readdir	2003-03-15 03:20:22.000000000 -0800
+++ 25-akpm/fs/ext2/dir.c	2003-03-15 12:21:56.000000000 -0800
@@ -259,8 +259,6 @@ ext2_readdir (struct file * filp, void *
 	int need_revalidate = (filp->f_version != inode->i_version);
 	int ret = 0;
 
-	lock_kernel();
-
 	if (pos > inode->i_size - EXT2_DIR_REC_LEN(1))
 		goto done;
 
@@ -313,7 +311,6 @@ done:
 	filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
 	filp->f_version = inode->i_version;
 	UPDATE_ATIME(inode);
-	unlock_kernel();
 	return 0;
 }
 
@@ -660,6 +657,7 @@ not_empty:
 }
 
 struct file_operations ext2_dir_operations = {
+	.llseek		= generic_file_llseek,
 	.read		= generic_read_dir,
 	.readdir	= ext2_readdir,
 	.ioctl		= ext2_ioctl,
diff -puN fs/ext3/dir.c~lseek-ext2_readdir fs/ext3/dir.c
--- 25/fs/ext3/dir.c~lseek-ext2_readdir	2003-03-15 03:20:22.000000000 -0800
+++ 25-akpm/fs/ext3/dir.c	2003-03-15 12:22:14.000000000 -0800
@@ -37,10 +37,11 @@ static int ext3_release_dir (struct inod
 				struct file * filp);
 
 struct file_operations ext3_dir_operations = {
+	.llseek		= generic_file_llseek,
 	.read		= generic_read_dir,
 	.readdir	= ext3_readdir,		/* we take BKL. needed?*/
 	.ioctl		= ext3_ioctl,		/* BKL held */
-	.fsync		= ext3_sync_file,		/* BKL held */
+	.fsync		= ext3_sync_file,	/* BKL held */
 #ifdef CONFIG_EXT3_INDEX
 	.release	= ext3_release_dir,
 #endif
@@ -98,8 +99,7 @@ static int ext3_readdir(struct file * fi
 	struct super_block * sb;
 	int err;
 	struct inode *inode = filp->f_dentry->d_inode;
-
-	lock_kernel();
+	int ret = 0;
 
 	sb = inode->i_sb;
 
@@ -110,8 +110,8 @@ static int ext3_readdir(struct file * fi
 	     ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
 		err = ext3_dx_readdir(filp, dirent, filldir);
 		if (err != ERR_BAD_DX_DIR) {
-			unlock_kernel();
-			return err;
+			ret = err;
+			goto out;
 		}
 		/*
 		 * We don't set the inode dirty flag since it's not
@@ -191,8 +191,8 @@ revalidate:
 				filp->f_pos = (filp->f_pos |
 						(sb->s_blocksize - 1)) + 1;
 				brelse (bh);
-				unlock_kernel();
-				return stored;
+				ret = stored;
+				goto out;
 			}
 			offset += le16_to_cpu(de->rec_len);
 			if (le32_to_cpu(de->inode)) {
@@ -222,8 +222,8 @@ revalidate:
 		brelse (bh);
 	}
 	UPDATE_ATIME(inode);
-	unlock_kernel();
-	return 0;
+out:
+	return ret;
 }
 
 #ifdef CONFIG_EXT3_INDEX
diff -puN fs/ext2/file.c~lseek-ext2_readdir fs/ext2/file.c

_



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
  2003-03-15 21:55   ` Andi Kleen
@ 2003-03-15 22:03     ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2003-03-15 22:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: bzzz, linux-kernel

Andi Kleen <ak@muc.de> wrote:
>
> Andrew Morton <akpm@digeo.com> writes:
> 
> > foo_readdir()
> > {
> > 	loff_t pos = file->f_pos;
> >
> > 	....
> > 	<code which doesn't touch file->f_pos, but which modifies pos>
> > 	...
> >
> > 	file->f_pos = pos;
> > }
> 
> At least for alpha this will require an rmb_depends() between the read
> and the write. Probably on x86 an rmb() wouldn't hurt neither.
> 
> Otherwise there is no guarantee other CPUs see that intended memory 
> modification order
> 

Won't the atomic operations in down() and up() do that?

It's all a bit moot really.  If two user threads or processes are fighting
over the value of f_pos at the same time in this manner then they're buggy
anyway.  All we need to do here is to ensure that the kernel doesn't do
anything grossly wrong or insecure.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove BKL from ext2's readdir
       [not found] ` <20030315023614.3e28e67b.akpm@digeo.com.suse.lists.linux.kernel>
@ 2003-03-15 21:55   ` Andi Kleen
  2003-03-15 22:03     ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2003-03-15 21:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bzzz, linux-kernel

Andrew Morton <akpm@digeo.com> writes:

> foo_readdir()
> {
> 	loff_t pos = file->f_pos;
>
> 	....
> 	<code which doesn't touch file->f_pos, but which modifies pos>
> 	...
>
> 	file->f_pos = pos;
> }

At least for alpha this will require an rmb_depends() between the read
and the write. Probably on x86 an rmb() wouldn't hurt neither.

Otherwise there is no guarantee other CPUs see that intended memory 
modification order

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-03-15 21:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-15 10:13 [PATCH] remove BKL from ext2's readdir Alex Tomas
2003-03-15 10:36 ` Andrew Morton
2003-03-15 11:03   ` Andrew Morton
2003-03-15 16:22     ` Alex Tomas
2003-03-15 20:11       ` Andrew Morton
2003-03-15 20:09         ` Alex Tomas
2003-03-15 20:24           ` Andrew Morton
2003-03-15 20:19             ` Alex Tomas
2003-03-15 16:31   ` Robert Love
     [not found] <m3vfyluedb.fsf@lexa.home.net.suse.lists.linux.kernel>
     [not found] ` <20030315023614.3e28e67b.akpm@digeo.com.suse.lists.linux.kernel>
2003-03-15 21:55   ` Andi Kleen
2003-03-15 22:03     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).