linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: intermediate summary of ext3-2.4-0.9.4 thread
       [not found] ` <Pine.GSO.4.21.0108022312211.1494-100000@weyl.math.psu.edu>
@ 2001-08-03 13:09   ` Daniel Phillips
  2001-08-03 14:43     ` Horst von Brand
                       ` (2 more replies)
  2001-08-03 18:36   ` intermediate summary of ext3-2.4-0.9.4 thread Matthias Andree
  1 sibling, 3 replies; 55+ messages in thread
From: Daniel Phillips @ 2001-08-03 13:09 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Horst von Brand, linux-kernel

On Friday 03 August 2001 05:13, Alexander Viro wrote:
> On Fri, 3 Aug 2001, Daniel Phillips wrote:
> > There is only one chain of directories from the fd's dentry up to
> > the root, that's the one to sync.
>
> You forgot ".. at any given moment". IOW, operation you propose is
> inherently racy. You want to do that - you do that in userland.

Are you saying that there may not be a ".." some of the time?  Or just 
that it may spontaneously be relinked?  If it does spontaneously change 
it doesn't matter, you have still made sure there is access by at least 
one path.

The trouble with doing this in userland is, the locked chain of dcache 
entries isn't there.

--
Daniel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 13:09   ` intermediate summary of ext3-2.4-0.9.4 thread Daniel Phillips
@ 2001-08-03 14:43     ` Horst von Brand
  2001-08-03 17:49     ` Mike Castle
  2001-08-03 18:08     ` Alexander Viro
  2 siblings, 0 replies; 55+ messages in thread
From: Horst von Brand @ 2001-08-03 14:43 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

Daniel Phillips <phillips@bonn-fries.net> said:
> On Friday 03 August 2001 05:13, Alexander Viro wrote:

[...]

> > You forgot ".. at any given moment". IOW, operation you propose is
> > inherently racy. You want to do that - you do that in userland.

> Are you saying that there may not be a ".." some of the time?  Or just 
> that it may spontaneously be relinked?  If it does spontaneously change 
> it doesn't matter, you have still made sure there is access by at least 
> one path.

Think "mv thisdir somewhereelse"
-- 
Dr. Horst H. von Brand                Usuario #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 13:09   ` intermediate summary of ext3-2.4-0.9.4 thread Daniel Phillips
  2001-08-03 14:43     ` Horst von Brand
@ 2001-08-03 17:49     ` Mike Castle
  2001-08-04  3:23       ` Daniel Phillips
  2001-08-03 18:08     ` Alexander Viro
  2 siblings, 1 reply; 55+ messages in thread
From: Mike Castle @ 2001-08-03 17:49 UTC (permalink / raw)
  To: linux-kernel

On Fri, Aug 03, 2001 at 03:09:06PM +0200, Daniel Phillips wrote:
> On Friday 03 August 2001 05:13, Alexander Viro wrote:
> > On Fri, 3 Aug 2001, Daniel Phillips wrote:
> > > There is only one chain of directories from the fd's dentry up to
> > > the root, that's the one to sync.
> >
> > You forgot ".. at any given moment". IOW, operation you propose is
> > inherently racy. You want to do that - you do that in userland.
> 
> Are you saying that there may not be a ".." some of the time?  Or just 
> that it may spontaneously be relinked?  If it does spontaneously change 
> it doesn't matter, you have still made sure there is access by at least 
> one path.


I read the ".." as a typo for "..."  As in Al was suggesting the sentence
should read "There is only one chain of directories at any given moment
from the fd's dentry up to the root...."

At least, that was my reading on it.

mrc
-- 
     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 13:09   ` intermediate summary of ext3-2.4-0.9.4 thread Daniel Phillips
  2001-08-03 14:43     ` Horst von Brand
  2001-08-03 17:49     ` Mike Castle
@ 2001-08-03 18:08     ` Alexander Viro
  2001-08-03 18:26       ` Daniel Phillips
                         ` (2 more replies)
  2 siblings, 3 replies; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 18:08 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Horst von Brand, linux-kernel



On Fri, 3 Aug 2001, Daniel Phillips wrote:

> Are you saying that there may not be a ".." some of the time?  Or just 
> that it may spontaneously be relinked?  If it does spontaneously change 
> it doesn't matter, you have still made sure there is access by at least 
> one path.
> 
> The trouble with doing this in userland is, the locked chain of dcache 
> entries isn't there.

There is no _locked_ chain. And if you want to grab the locks on all
ancestors - think again. It means sorting the inodes by address _and_
relocking if any of them had been moved while you were locking the
previous ones. I absolutely refuse to add such crap to the tree and I
seriously suspect that Linus and Alan will do the same.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 18:08     ` Alexander Viro
@ 2001-08-03 18:26       ` Daniel Phillips
  2001-08-03 18:53         ` Alexander Viro
  2001-08-03 18:34       ` Linus Torvalds
  2001-08-03 22:29       ` Anton Altaparmakov
  2 siblings, 1 reply; 55+ messages in thread
From: Daniel Phillips @ 2001-08-03 18:26 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Horst von Brand, linux-kernel

On Friday 03 August 2001 20:08, Alexander Viro wrote:
> On Fri, 3 Aug 2001, Daniel Phillips wrote:
> > Are you saying that there may not be a ".." some of the time?  Or just
> > that it may spontaneously be relinked?  If it does spontaneously change
> > it doesn't matter, you have still made sure there is access by at least
> > one path.
> >
> > The trouble with doing this in userland is, the locked chain of dcache
> > entries isn't there.
>
> There is no _locked_ chain.

Locked as in can't be destroyed (refcount) not i_sem or such, sorry for the
loose usage.

> And if you want to grab the locks on all
> ancestors - think again. It means sorting the inodes by address _and_
> relocking if any of them had been moved while you were locking the
> previous ones. I absolutely refuse to add such crap to the tree and I
> seriously suspect that Linus and Alan will do the same.

--
Daniel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 18:08     ` Alexander Viro
  2001-08-03 18:26       ` Daniel Phillips
@ 2001-08-03 18:34       ` Linus Torvalds
  2001-08-03 22:01         ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Chris Wedgwood
  2001-08-03 22:29       ` Anton Altaparmakov
  2 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2001-08-03 18:34 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.GSO.4.21.0108031400590.3272-100000@weyl.math.psu.edu>,
Alexander Viro  <viro@math.psu.edu> wrote:
>
>
>On Fri, 3 Aug 2001, Daniel Phillips wrote:
>
>> Are you saying that there may not be a ".." some of the time?  Or just 
>> that it may spontaneously be relinked?  If it does spontaneously change 
>> it doesn't matter, you have still made sure there is access by at least 
>> one path.
>> 
>> The trouble with doing this in userland is, the locked chain of dcache 
>> entries isn't there.
>
>There is no _locked_ chain. And if you want to grab the locks on all
>ancestors - think again. It means sorting the inodes by address _and_
>relocking if any of them had been moved while you were locking the
>previous ones. I absolutely refuse to add such crap to the tree and I
>seriously suspect that Linus and Alan will do the same.

Note that while there is no "absoilutely correct" thing here, I think
the right thing to do (as in "do what the user _expects_ you to do")
would be fairly simple to implement with a simple

	fsync(int fd)
	{
		dentry = fdget(fd);
		do_fsync(dentry);
		for (;;) {
			tmp = dentry;
			dentry = dentry->d_parent;
			if (dentry == tmp)
				break;
			do_fdatasync(dentry);
		}
	}

Add dcount increments as needed (and they _are_ needed, to make sure
that we don't hold on to a dentry while the child has been moved
somewhere else and the dentry now has a zero count). And we only need to
sync up to the closest mount-point, not the global root.

Does this guarantee that we fsync the whole path in the presense of
concurrent renames? No. That's a user problem, why should we care? He
should fsync his other renames too, he didn't ask us to fsync them.

And we don't care about any other paths that the file may have. We're
syncing the path that the user opened, and no others. Again, if the user
opened another path, he should have synced _that_ one.

		Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
       [not found] ` <Pine.GSO.4.21.0108022312211.1494-100000@weyl.math.psu.edu>
  2001-08-03 13:09   ` intermediate summary of ext3-2.4-0.9.4 thread Daniel Phillips
@ 2001-08-03 18:36   ` Matthias Andree
  2001-08-03 19:16     ` Alexander Viro
  1 sibling, 1 reply; 55+ messages in thread
From: Matthias Andree @ 2001-08-03 18:36 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Daniel Phillips, Horst von Brand, linux-kernel

On Thu, 02 Aug 2001, Alexander Viro wrote:

> On Fri, 3 Aug 2001, Daniel Phillips wrote:
> 
> > There is only one chain of directories from the fd's dentry up to the 
> > root, that's the one to sync.
> 
> You forgot ".. at any given moment". IOW, operation you propose is inherently
> racy. You want to do that - you do that in userland.

Applications usually protect their playgrounds - separate uid for
instance. If only the application has access to that area, and itself
does not trigger races, "at any given moment" is not a restriction.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 18:26       ` Daniel Phillips
@ 2001-08-03 18:53         ` Alexander Viro
  2001-08-03 20:50           ` Daniel Phillips
  2001-08-04  3:43           ` Matthias Andree
  0 siblings, 2 replies; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 18:53 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Horst von Brand, linux-kernel



On Fri, 3 Aug 2001, Daniel Phillips wrote:

> > There is no _locked_ chain.
> 
> Locked as in can't be destroyed (refcount) not i_sem or such, sorry for the
> loose usage.

Sigh... You need i_sem for fsync(). Moreover, there is no warranty that
set of objects you sync has _anything_ to path by the time when you start
syncing the second one. Application has information about the use of
parts of tree it's interested in. Kernel doesn't. Notice that all this
wankage was full of "oh, but MTA doesn't care for symlinks", "oh, but MTA
doesn't deal with parents renamed", ad nausea. You know what it means?
Right, that kernel shouldn't try to second-guess the userland. Application
knows what fs objects it wants synced. Kernel provides a primitive for
syncing an object and leaves the choice of objects to sync to application.

Folks, putting policy into the kernel is Wrong(tm). And that's precisely
what you are advocating here.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 18:36   ` intermediate summary of ext3-2.4-0.9.4 thread Matthias Andree
@ 2001-08-03 19:16     ` Alexander Viro
  2001-08-04  3:40       ` fdatasync(2) is also there (was: intermediate summary of ext3-2.4-0.9.4 thread) Matthias Andree
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 19:16 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Daniel Phillips, Horst von Brand, linux-kernel



On Fri, 3 Aug 2001, Matthias Andree wrote:

> Applications usually protect their playgrounds - separate uid for
> instance. If only the application has access to that area, and itself
> does not trigger races, "at any given moment" is not a restriction.

Bingo. The whole thing relies on second-guessing the application.
BTW, I can think of very legitimate cases when we want to create
a bunch of files, fsync them as we go and then fsync the directory
where they had been created. Application knows what and when should
be synced _and_ it has a way to ask kernel to sync an object.

BTW, symlinks may be a problem - they can't be opened and symlink
creation is asynchronous. And there are applications that _do_
care about them - for all they care, lost symlink is as bad as
the need to scan lost+found


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 18:53         ` Alexander Viro
@ 2001-08-03 20:50           ` Daniel Phillips
  2001-08-04  3:43           ` Matthias Andree
  1 sibling, 0 replies; 55+ messages in thread
From: Daniel Phillips @ 2001-08-03 20:50 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Horst von Brand, linux-kernel

On Friday 03 August 2001 20:53, Alexander Viro wrote:
> On Fri, 3 Aug 2001, Daniel Phillips wrote:
> > > There is no _locked_ chain.
> >
> > Locked as in can't be destroyed (refcount) not i_sem or such, sorry for
> > the loose usage.
>
> Sigh... You need i_sem for fsync().

We can drop it before syncing the parent, the point is, the dcache entry
stays.

> Moreover, there is no warranty that
> set of objects you sync has _anything_ to path by the time when you start
> syncing the second one.

OK, you win, I'll provide this example:

	   A				   B

  echo hello >/a/b/c/d
  fsync(d)->
    fsync_one(d)
				rename /a/b/c/d as /x/y/z/d
      fsync_one(c)
        fsync_one(b)
          fsync_one(a)
            fsync_one(/)

where fsync_one looks roughly like our current sys_fsync.

So we fsynced and we still might have lost the path to d.

The moral of the story: if the filesystem isn't designed to provide a
guaranteed commit on rename then we shouldn't try to fix the VFS so it
sorta does.

> Application has information about the use of
> parts of tree it's interested in. Kernel doesn't. Notice that all this
> wankage was full of "oh, but MTA doesn't care for symlinks", "oh, but MTA
> doesn't deal with parents renamed", ad nausea. You know what it means?
> Right, that kernel shouldn't try to second-guess the userland. Application
> knows what fs objects it wants synced. Kernel provides a primitive for
> syncing an object and leaves the choice of objects to sync to application.
>
> Folks, putting policy into the kernel is Wrong(tm). And that's precisely
> what you are advocating here.

I needed an example where even a new, improved sys_fsync would fail to
do the right thing for Ext2 in the absence of specific coding in the
MTA.  So the MTA is either designed to handle dumb filesystems or it
isn't.  It's pretty much a moot point since we have four filesystems
now in a nearly usable state that can provide the needed commit
guarantees, which should, as has been pointed out in this thread,
provide MTA's with both faster and more reliable operation.

--
Daniel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 18:34       ` Linus Torvalds
@ 2001-08-03 22:01         ` Chris Wedgwood
  2001-08-03 22:33           ` Alexander Viro
                             ` (4 more replies)
  0 siblings, 5 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 22:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Alan Cox, Chris Mason

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

On Fri, Aug 03, 2001 at 06:34:14PM +0000, Linus Torvalds wrote:

    	fsync(int fd)
    	{
    		dentry = fdget(fd);
    		do_fsync(dentry);
    		for (;;) {
    			tmp = dentry;
    			dentry = dentry->d_parent;
    			if (dentry == tmp)
    				break;
    			do_fdatasync(dentry);
    		}
    	}

I really like this idea. Can people please try out the attached patch?

Please note, it contains a couple of things that need not be there in
the final version.

Note, there is also a reiserfs fix in here because we can call
f_op->fsync on a directory and without this fix it will BUG!  Chris,
perhaps you can suggest a better fix?


Linus, one more thing --- the first argument to ->fsync is struct file*
and nothing uses it, I'd like to blow it away or would you prefer we
wait to 2.5.x as its essentially and API change and will break XFS,
JFS, etc.



   --cw

[-- Attachment #2: 2.4.8-pre3_fsync-entire-path.patch --]
[-- Type: text/plain, Size: 2056 bytes --]

diff -Nur --exclude=*.o --exclude=*~ linux-2.4.8-pre3/fs/buffer.c linux-2.4.8-pre3+expt/fs/buffer.c
--- linux-2.4.8-pre3/fs/buffer.c	Wed Jul 25 19:03:05 2001
+++ linux-2.4.8-pre3+expt/fs/buffer.c	Sat Aug  4 10:53:07 2001
@@ -379,33 +379,45 @@
 
 asmlinkage long sys_fsync(unsigned int fd)
 {
-	struct file * file;
-	struct dentry * dentry;
-	struct inode * inode;
+	struct file *file;
+	struct dentry *dentry;
+	struct inode *inode;
 	int err;
 
 	err = -EBADF;
 	file = fget(fd);
 	if (!file)
 		goto out;
-
-	dentry = file->f_dentry;
-	inode = dentry->d_inode;
-
 	err = -EINVAL;
 	if (!file->f_op || !file->f_op->fsync)
 		goto out_putf;
 
-	/* We need to protect against concurrent writers.. */
-	down(&inode->i_sem);
-	filemap_fdatasync(inode->i_mapping);
-	err = file->f_op->fsync(file, dentry, 0);
-	filemap_fdatawait(inode->i_mapping);
-	up(&inode->i_sem);
+	/* walk the path (dentry chain) fsync'ing everything along
+	   it. This algorithm to do "do what the user _expects_ you to
+	   do" was suggested by Linus. --cw */
+	dentry = file->f_dentry;
+	do {
+		inode = dentry->d_inode;
+		/* We need to protect against concurrent writers.. */
+		down(&inode->i_sem);
+		filemap_fdatasync(inode->i_mapping);
+		err = -EINVAL;
+		if (!file->f_op || !file->f_op->fsync)
+			goto out_putf;
+		err = file->f_op->fsync(file, dentry, 0);
+		filemap_fdatawait(inode->i_mapping);
+		up(&inode->i_sem);
+
+		/* stop at root of fs */
+		if(dentry == dentry->d_parent)
+			break;
+
+		dentry = dentry->d_parent;
+	} while(dentry && !err);
 
-out_putf:
+ out_putf:
 	fput(file);
-out:
+ out:
 	return err;
 }
 
diff -Nur --exclude=*.o --exclude=*~ linux-2.4.8-pre3/fs/reiserfs/file.c linux-2.4.8-pre3+expt/fs/reiserfs/file.c
--- linux-2.4.8-pre3/fs/reiserfs/file.c	Fri Jul 20 20:37:09 2001
+++ linux-2.4.8-pre3+expt/fs/reiserfs/file.c	Sat Aug  4 09:06:14 2001
@@ -90,6 +90,9 @@
 
   lock_kernel() ;
 
+  if (S_ISDIR(p_s_inode->i_mode))
+          return reiserfs_dir_fsync(p_s_filp, p_s_dentry, datasync);
+
   if (!S_ISREG(p_s_inode->i_mode))
       BUG ();
 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 18:08     ` Alexander Viro
  2001-08-03 18:26       ` Daniel Phillips
  2001-08-03 18:34       ` Linus Torvalds
@ 2001-08-03 22:29       ` Anton Altaparmakov
  2001-08-03 23:06         ` Chris Wedgwood
  2001-08-06 15:23         ` looking for resources for designing loopback filesystem dave-mlist
  2 siblings, 2 replies; 55+ messages in thread
From: Anton Altaparmakov @ 2001-08-03 22:29 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

Your patch down()s a semaphore without calling up() in the error code path...

Best regards,

Anton

At 23:01 03/08/2001, Chris Wedgwood wrote:
>On Fri, Aug 03, 2001 at 06:34:14PM +0000, Linus Torvalds wrote:
>
>         fsync(int fd)
>         {
>                 dentry = fdget(fd);
>                 do_fsync(dentry);
>                 for (;;) {
>                         tmp = dentry;
>                         dentry = dentry->d_parent;
>                         if (dentry == tmp)
>                                 break;
>                         do_fdatasync(dentry);
>                 }
>         }
>
>I really like this idea. Can people please try out the attached patch?
>
>Please note, it contains a couple of things that need not be there in
>the final version.
>
>Note, there is also a reiserfs fix in here because we can call
>f_op->fsync on a directory and without this fix it will BUG!  Chris,
>perhaps you can suggest a better fix?
>
>
>Linus, one more thing --- the first argument to ->fsync is struct file*
>and nothing uses it, I'd like to blow it away or would you prefer we
>wait to 2.5.x as its essentially and API change and will break XFS,
>JFS, etc.
>
>
>
>    --cw

-- 
   "Nothing succeeds like success." - Alexandre Dumas
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 22:01         ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Chris Wedgwood
@ 2001-08-03 22:33           ` Alexander Viro
  2001-08-03 23:16             ` Chris Wedgwood
  2001-08-03 22:45           ` Alexander Viro
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 22:33 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sat, 4 Aug 2001, Chris Wedgwood wrote:

> I really like this idea. Can people please try out the attached patch?
> 
> Please note, it contains a couple of things that need not be there in
> the final version.

Like an oopsable race absolutely trivial to exploit for any user with write
access to fs?

	a) dentry can freed under you. Just rename the parent while you
are syncing it. Then you'll block on attempt to take ->i_sem on
grandparent and merrily go to hell when parent will be moved away and
rename(2) will do dput() on grandparent.

	b) access to ->d_parent requires at least one of the following:
dcache_lock, BKL, i_sem or ->i_zombie on inode of parent.

	c) as it is, you will get a hell of IO load on a dumb fs.
dumb == anything that uses file_fsync() as ->fsync() of directories.
You'll do full sync of fs on every bloody step.

	d) sequence of inodes you sync has only one property guaranteed:
at some moment nth inode is a parent of (n-1)th. That's it. E.g. it's easy
to get a situation when _none_ of the inodes you sync had ever been a
grandparent of the original inode.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 22:01         ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Chris Wedgwood
  2001-08-03 22:33           ` Alexander Viro
@ 2001-08-03 22:45           ` Alexander Viro
  2001-08-03 23:09             ` Chris Wedgwood
  2001-08-04  1:08           ` Andrew Morton
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 22:45 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sat, 4 Aug 2001, Chris Wedgwood wrote:

> Linus, one more thing --- the first argument to ->fsync is struct file*
> and nothing uses it, I'd like to blow it away or would you prefer we

nfs_fsync().


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 22:29       ` Anton Altaparmakov
@ 2001-08-03 23:06         ` Chris Wedgwood
  2001-08-06 15:23         ` looking for resources for designing loopback filesystem dave-mlist
  1 sibling, 0 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 23:06 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, Aug 03, 2001 at 11:29:29PM +0100, Anton Altaparmakov wrote:

    Your patch down()s a semaphore without calling up() in the error
    code path...

Errors? Nah, you won't get errors :)


Yes, it does... I did a quick copy of the code from above and didn't
check it. I'll fix that shortly.



  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 22:45           ` Alexander Viro
@ 2001-08-03 23:09             ` Chris Wedgwood
  2001-08-03 23:15               ` Alexander Viro
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 23:09 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, Aug 03, 2001 at 06:45:47PM -0400, Alexander Viro wrote:

    nfs_fsync().

Ah, well... then I'm not sure how the loop should look.  I use
f->fsync(file, dentry, ...) where file references the original file
not any of the parent path components.

Obviously, for ext2 and reiserfs (which is what I have here) this
won't matter --- will it for NFS?  If so, so I need to open/etc. for
each parent component to get a valid struct file*?

Please bear with me, I'm still learning :)



  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:09             ` Chris Wedgwood
@ 2001-08-03 23:15               ` Alexander Viro
  2001-08-03 23:20                 ` Chris Wedgwood
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 23:15 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sat, 4 Aug 2001, Chris Wedgwood wrote:

> On Fri, Aug 03, 2001 at 06:45:47PM -0400, Alexander Viro wrote:
> 
>     nfs_fsync().
> 
> Ah, well... then I'm not sure how the loop should look.  I use
> f->fsync(file, dentry, ...) where file references the original file
> not any of the parent path components.
> 
> Obviously, for ext2 and reiserfs (which is what I have here) this
> won't matter --- will it for NFS?  If so, so I need to open/etc. for
> each parent component to get a valid struct file*?

<shrug> credentials cache. 2.5 fodder. Notice that with NFS you don't
have fsync for directories. At all. So that's not a problem in that
particular case - you can pass NULL on all subsequent iterations.
On the first step you need struct file * - NFS needs credentials
to pass to the server.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 22:33           ` Alexander Viro
@ 2001-08-03 23:16             ` Chris Wedgwood
  2001-08-03 23:23               ` Alexander Viro
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 23:16 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, Aug 03, 2001 at 06:33:37PM -0400, Alexander Viro wrote:

    Like an oopsable race absolutely trivial to exploit for any user
    with write access to fs?

:(

    	a) dentry can freed under you. Just rename the parent while
    you are syncing it. Then you'll block on attempt to take ->i_sem
    on grandparent and merrily go to hell when parent will be moved
    away and rename(2) will do dput() on grandparent.

Can I lock a dentry to prevent this?

    	b) access to ->d_parent requires at least one of the
    following: dcache_lock, BKL, i_sem or ->i_zombie on inode of
    parent.

I assume BLK lock is undesirable here?  dcache_lock seems quite
reasonable and very cheap.  I can't see how else to do it as getting
inode of parent required d_parent.

    	c) as it is, you will get a hell of IO load on a dumb fs.
    dumb == anything that uses file_fsync() as ->fsync() of
    directories.  You'll do full sync of fs on every bloody step.

Yes, but it reality it's not that bad. As is, reiserfs and ext3 (used
to, might not now) sync pending transaction on fsync anyhow.  Run lilo
recently on reiserfs of ext3?  Ever seemed slow?  strace -f and you'll
see it calls fsync after writing to the map files each time (no idea
why) and it it still acceptable.

    	d) sequence of inodes you sync has only one property
    guaranteed: at some moment nth inode is a parent of
    (n-1)th. That's it. E.g. it's easy to get a situation when _none_
    of the inodes you sync had ever been a grandparent of the original
    inode.

I'm not sure I follow you hear... is this because of bind semantics or
somethng else?  I can see it relevant in the case of a fs mount in
/var/somewhere/blah when you fsync /var/somewhere/blah/dir/file, you
don't need to go past blah --- but how would I detect this 'edge'?




Thanks!

  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:15               ` Alexander Viro
@ 2001-08-03 23:20                 ` Chris Wedgwood
  2001-08-03 23:25                   ` Alexander Viro
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 23:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, Aug 03, 2001 at 07:15:59PM -0400, Alexander Viro wrote:

    <shrug> credentials cache. 2.5 fodder. Notice that with NFS you
    don't have fsync for directories. At all. So that's not a problem
    in that particular case - you can pass NULL on all subsequent
    iterations.  On the first step you need struct file * - NFS needs
    credentials to pass to the server.

How about passing NULL then for now? --- and explicitly requiring
->fsync accept NULL for the first argument and ignore it?  It would be
nice to know that this is the case then though, because we could stop
immediately.



  --cw


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:16             ` Chris Wedgwood
@ 2001-08-03 23:23               ` Alexander Viro
  2001-08-04  3:53                 ` Matthias Andree
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 23:23 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sat, 4 Aug 2001, Chris Wedgwood wrote:

> Can I lock a dentry to prevent this?

dget_locked() under dcache_lock + dput() when you are done.

>     	b) access to ->d_parent requires at least one of the
>     following: dcache_lock, BKL, i_sem or ->i_zombie on inode of
>     parent.
> 
> I assume BLK lock is undesirable here?  dcache_lock seems quite
> reasonable and very cheap.  I can't see how else to do it as getting
> inode of parent required d_parent.
> 
>     	c) as it is, you will get a hell of IO load on a dumb fs.
>     dumb == anything that uses file_fsync() as ->fsync() of
>     directories.  You'll do full sync of fs on every bloody step.
> 
> Yes, but it reality it's not that bad. As is, reiserfs and ext3 (used
> to, might not now) sync pending transaction on fsync anyhow.  Run lilo
> recently on reiserfs of ext3?  Ever seemed slow?  strace -f and you'll
> see it calls fsync after writing to the map files each time (no idea
> why) and it it still acceptable.

Sure. And now each fsync() turns into a bunch of such calls.
 
>     	d) sequence of inodes you sync has only one property
>     guaranteed: at some moment nth inode is a parent of
>     (n-1)th. That's it. E.g. it's easy to get a situation when _none_
>     of the inodes you sync had ever been a grandparent of the original
>     inode.
> 
> I'm not sure I follow you hear... is this because of bind semantics or
> somethng else?  I can see it relevant in the case of a fs mount in
> /var/somewhere/blah when you fsync /var/somewhere/blah/dir/file, you
> don't need to go past blah --- but how would I detect this 'edge'?

It has nothing to bindings/mount/etc. fsync /a/b/c. While c is written
out, mv a/b/c /a/d/c. While d is written out, mv a/d/c a/b/c && mv a/d e/d
Through all these renames /a remained the grandparent of c. You won't sync it -
you sync c, then d, then e, then root.

Constructing less convoluted examples with similar results is left as an
exercise to reader...


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:20                 ` Chris Wedgwood
@ 2001-08-03 23:25                   ` Alexander Viro
  2001-08-03 23:35                     ` Chris Wedgwood
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 23:25 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sat, 4 Aug 2001, Chris Wedgwood wrote:

> On Fri, Aug 03, 2001 at 07:15:59PM -0400, Alexander Viro wrote:
> 
>     <shrug> credentials cache. 2.5 fodder. Notice that with NFS you
>     don't have fsync for directories. At all. So that's not a problem
>     in that particular case - you can pass NULL on all subsequent
>     iterations.  On the first step you need struct file * - NFS needs
>     credentials to pass to the server.
> 
> How about passing NULL then for now? --- and explicitly requiring
> ->fsync accept NULL for the first argument and ignore it?  It would be
> nice to know that this is the case then though, because we could stop
> immediately.

You need credentials to sync a regular file on any network filesystem.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:25                   ` Alexander Viro
@ 2001-08-03 23:35                     ` Chris Wedgwood
  2001-08-03 23:41                       ` Alexander Viro
  2001-08-03 23:42                       ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Alan Cox
  0 siblings, 2 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 23:35 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, Aug 03, 2001 at 07:25:19PM -0400, Alexander Viro wrote:

    You need credentials to sync a regular file on any network
    filesystem.

For 2.5.x I assume your planning or a credentials cache?  Something
like dentry->d_creds or something?  If that's the case we still don't
need the struct file* to be passed --- but I suspect that's not the
case and I really don't understand.



  --cw




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:35                     ` Chris Wedgwood
@ 2001-08-03 23:41                       ` Alexander Viro
  2001-08-03 23:46                         ` Chris Wedgwood
  2001-08-03 23:42                       ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Alan Cox
  1 sibling, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 23:41 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sat, 4 Aug 2001, Chris Wedgwood wrote:

> On Fri, Aug 03, 2001 at 07:25:19PM -0400, Alexander Viro wrote:
> 
>     You need credentials to sync a regular file on any network
>     filesystem.
> 
> For 2.5.x I assume your planning or a credentials cache?  Something
> like dentry->d_creds or something?  If that's the case we still don't
> need the struct file* to be passed --- but I suspect that's not the
> case and I really don't understand.

file->f_cred. Different people opening the same file can have different
credentials (e.g. credentials can be revoked)


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:35                     ` Chris Wedgwood
  2001-08-03 23:41                       ` Alexander Viro
@ 2001-08-03 23:42                       ` Alan Cox
  2001-08-03 23:44                         ` Chris Wedgwood
  1 sibling, 1 reply; 55+ messages in thread
From: Alan Cox @ 2001-08-03 23:42 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Alexander Viro, Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

> For 2.5.x I assume your planning or a credentials cache?  Something
> like dentry->d_creds or something?  If that's the case we still don't
> need the struct file* to be passed --- but I suspect that's not the
> case and I really don't understand.

It can't come off the dentry as multiple people can have the same file open
with different rights.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:42                       ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Alan Cox
@ 2001-08-03 23:44                         ` Chris Wedgwood
  0 siblings, 0 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 23:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alexander Viro, Linus Torvalds, linux-kernel, Chris Mason

On Sat, Aug 04, 2001 at 12:42:38AM +0100, Alan Cox wrote:

    It can't come off the dentry as multiple people can have the same
    file open with different rights.

Yes, of course.

If that's the case, I'm not sure how you pick reasonable creds. for
fsyncing path components for a network filesystem, unless you (as the
calling user) open each component and sync that bit by bit --- which
could almost be done in libc (kernel is easier though).


  --cw


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:41                       ` Alexander Viro
@ 2001-08-03 23:46                         ` Chris Wedgwood
  2001-08-03 23:53                           ` Alexander Viro
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-03 23:46 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, Aug 03, 2001 at 07:41:40PM -0400, Alexander Viro wrote:

    file->f_cred. Different people opening the same file can have
    different credentials (e.g. credentials can be revoked)

Sure, but if I

        f = open("/home/viro/file", ... );
        fsync(f);

I only have creds for 'file' --- I have no such thing for 'viro' or
'home', so I can't do anything sensible here.



  --cw





^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:46                         ` Chris Wedgwood
@ 2001-08-03 23:53                           ` Alexander Viro
  2001-08-04  7:45                             ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchange patch) Hans Reiser
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-03 23:53 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sat, 4 Aug 2001, Chris Wedgwood wrote:

> On Fri, Aug 03, 2001 at 07:41:40PM -0400, Alexander Viro wrote:
> 
>     file->f_cred. Different people opening the same file can have
>     different credentials (e.g. credentials can be revoked)
> 
> Sure, but if I
> 
>         f = open("/home/viro/file", ... );
>         fsync(f);
> 
> I only have creds for 'file' --- I have no such thing for 'viro' or
> 'home', so I can't do anything sensible here.

Credentials are about "I'm foo", not "I'm allowed to do this with that".
Server decides what you can do.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic  change patch)
  2001-08-03 22:01         ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Chris Wedgwood
  2001-08-03 22:33           ` Alexander Viro
  2001-08-03 22:45           ` Alexander Viro
@ 2001-08-04  1:08           ` Andrew Morton
  2001-08-04  1:19             ` Chris Wedgwood
  2001-08-04 17:35           ` Jan Harkes
  2001-08-06 15:02           ` Chris Mason
  4 siblings, 1 reply; 55+ messages in thread
From: Andrew Morton @ 2001-08-04  1:08 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

Chris Wedgwood wrote:
> 
> On Fri, Aug 03, 2001 at 06:34:14PM +0000, Linus Torvalds wrote:
> 
>         fsync(int fd)
>         {
>                 dentry = fdget(fd);
>                 do_fsync(dentry);
>                 for (;;) {
>                         tmp = dentry;
>                         dentry = dentry->d_parent;
>                         if (dentry == tmp)
>                                 break;
>                         do_fdatasync(dentry);
>                 }
>         }
> 
> I really like this idea. Can people please try out the attached patch?

Ow.  You just crippled ext3.

I don't think an ext2 problem (which I don't think is a problem
at all) should be "fixed" at the VFS layer when other filesystems
are perfectly happy without it, no?

This whole thread, talking about "linux this" and "linux that"
is off-base.  It's ext2 we're talking about.  This MTA requirement
is a highly unusual and specialised thing - I don't see why the
general-purpose ext2 should bear the burden of supporting it when
other filesystems such as reiserfs (I think?) and ext3 support it
naturally and better than ext2 ever will.

-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04  1:08           ` Andrew Morton
@ 2001-08-04  1:19             ` Chris Wedgwood
  2001-08-04  1:45               ` Andrew Morton
  2001-08-04 21:07               ` Andrew Morton
  0 siblings, 2 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-04  1:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, Aug 03, 2001 at 06:08:49PM -0700, Andrew Morton wrote:

    Ow.  You just crippled ext3.

How so? The Flush all transactions on fsync behaviour that resierfs
did/does have at present too?  (There are 'fixes' to reiserfs for
this).

    I don't think an ext2 problem (which I don't think is a problem at
    all) should be "fixed" at the VFS layer when other filesystems are
    perfectly happy without it, no?

If you want to be sure that when you fsync a file, that, silly bugger
rename games further up the path aside, the entire path is also on
disk, the VFS is the only place to do it with the current fs API.

really, there is _some_ merit in the argument that

        open
        fsync
        close
<crash>

shouldn't loose the file...

    This whole thread, talking about "linux this" and "linux that" is
    off-base.  It's ext2 we're talking about.  This MTA requirement is
    a highly unusual and specialised thing - I don't see why the
    general-purpose ext2 should bear the burden of supporting it when
    other filesystems such as reiserfs (I think?) and ext3 support it
    naturally and better than ext2 ever will.

Well, since it will only sync dirty blocks, it will hardly hurt ext2
that much at all --- and it will only force the dirty blocks in path
components to be written when you fsync the file, thats probably only
a single block anyhow.

FWIW, I don't think it's unreasonable we do this, nor does it fix all
the potential MTA problems :)



Anyhow, that patch was bogus.  As soon as I get some more clues, I'll
send another one (trying to get more clueful now, not having much
success!).




  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic  change patch)
  2001-08-04  1:19             ` Chris Wedgwood
@ 2001-08-04  1:45               ` Andrew Morton
  2001-08-04  4:04                 ` Matthias Andree
  2001-08-04 21:07               ` Andrew Morton
  1 sibling, 1 reply; 55+ messages in thread
From: Andrew Morton @ 2001-08-04  1:45 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

Chris Wedgwood wrote:
> 
> On Fri, Aug 03, 2001 at 06:08:49PM -0700, Andrew Morton wrote:
> 
>     Ow.  You just crippled ext3.
> 
> How so? The Flush all transactions on fsync behaviour that resierfs
> did/does have at present too?  (There are 'fixes' to reiserfs for
> this).

That, plus the fact that ext3 gets its synchronous-op scalability
from batching the transactions from multiple threads together.
The holding of parent->parent->i_sem across synchronous writes could
defeat that.

ext3 does physical, block-level journalling.  (its journalling layer is
actually designed to be fs-independent so one could journal other filesystems
with it).  So there is no tracking between a particular fs event and a
particular transaction.  A transaction is just a blob of blocks which
can encompass thousands of fs events.

>     I don't think an ext2 problem (which I don't think is a problem at
>     all) should be "fixed" at the VFS layer when other filesystems are
>     perfectly happy without it, no?
> 
> If you want to be sure that when you fsync a file, that, silly bugger
> rename games further up the path aside, the entire path is also on
> disk, the VFS is the only place to do it with the current fs API.
> 
> really, there is _some_ merit in the argument that
> 
>         open
>         fsync
>         close
> <crash>
> 
> shouldn't loose the file...

Agreed - I think it's the expected and sensible behaviour.  But I've seen
no complaints about it except for use in a few specialised applications.
Where "a few" == "one", actually.

>     This whole thread, talking about "linux this" and "linux that" is
>     off-base.  It's ext2 we're talking about.  This MTA requirement is
>     a highly unusual and specialised thing - I don't see why the
>     general-purpose ext2 should bear the burden of supporting it when
>     other filesystems such as reiserfs (I think?) and ext3 support it
>     naturally and better than ext2 ever will.
> 
> Well, since it will only sync dirty blocks, it will hardly hurt ext2
> that much at all --- and it will only force the dirty blocks in path
> components to be written when you fsync the file, thats probably only
> a single block anyhow.

mmm... Holding i_sem across multiple revs of the disk will hurt.  It
doesn't *need* to be held while we're waiting on IO, but fixing that
would be a big change, and there has been little motivation to change
things because it is for specialised apps.

-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 17:49     ` Mike Castle
@ 2001-08-04  3:23       ` Daniel Phillips
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Phillips @ 2001-08-04  3:23 UTC (permalink / raw)
  To: Mike Castle, linux-kernel

On Friday 03 August 2001 19:49, Mike Castle wrote:
> On Fri, Aug 03, 2001 at 03:09:06PM +0200, Daniel Phillips wrote:
> > On Friday 03 August 2001 05:13, Alexander Viro wrote:
> > > On Fri, 3 Aug 2001, Daniel Phillips wrote:
> > > > There is only one chain of directories from the fd's dentry up to
> > > > the root, that's the one to sync.
> > >
> > > You forgot ".. at any given moment". IOW, operation you propose is
> > > inherently racy. You want to do that - you do that in userland.
> >
> > Are you saying that there may not be a ".." some of the time?  Or just
> > that it may spontaneously be relinked?  If it does spontaneously change
> > it doesn't matter, you have still made sure there is access by at least
> > one path.
>
> I read the ".." as a typo for "..."  As in Al was suggesting the sentence
> should read "There is only one chain of directories at any given moment
> from the fd's dentry up to the root...."

Heh, after some practice you get good at decoding Alspeak, it's not harder
than MIX.

--
Daniel


^ permalink raw reply	[flat|nested] 55+ messages in thread

* fdatasync(2) is also there (was: intermediate summary of ext3-2.4-0.9.4 thread)
  2001-08-03 19:16     ` Alexander Viro
@ 2001-08-04  3:40       ` Matthias Andree
  2001-08-05  0:28         ` Mike Castle
  0 siblings, 1 reply; 55+ messages in thread
From: Matthias Andree @ 2001-08-04  3:40 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

On Fri, 03 Aug 2001, Alexander Viro wrote:

> Bingo. The whole thing relies on second-guessing the application.
> BTW, I can think of very legitimate cases when we want to create
> a bunch of files, fsync them as we go and then fsync the directory
> where they had been created. Application knows what and when should
> be synced _and_ it has a way to ask kernel to sync an object.

How portable is fsync()ing the directory?

How USEFUL is it to the application if all other boxen fsync() the
directory entries along with the file?

You want to sync the files, but the directory only after you created all
of the files? Use fdatasync(2) for the files - it doesn't flush meta
data, then sync the directory. It's POSIX, it's SUS v2.

Not everyone has it, though, so you may need to fall back to fsync() on
those systems.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: intermediate summary of ext3-2.4-0.9.4 thread
  2001-08-03 18:53         ` Alexander Viro
  2001-08-03 20:50           ` Daniel Phillips
@ 2001-08-04  3:43           ` Matthias Andree
  1 sibling, 0 replies; 55+ messages in thread
From: Matthias Andree @ 2001-08-04  3:43 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Daniel Phillips, Horst von Brand, linux-kernel

On Fri, 03 Aug 2001, Alexander Viro wrote:

> Sigh... You need i_sem for fsync(). Moreover, there is no warranty that
> set of objects you sync has _anything_ to path by the time when you start
> syncing the second one. Application has information about the use of
> parts of tree it's interested in. Kernel doesn't. Notice that all this
> wankage was full of "oh, but MTA doesn't care for symlinks", "oh, but MTA
> doesn't deal with parents renamed", ad nausea. You know what it means?

I know a few MTAs, but none of them use symlinks, they always use hard
links (if at all).

MTAs don't rename parent directories.

> Folks, putting policy into the kernel is Wrong(tm). And that's precisely
> what you are advocating here.

Is putting options with a user-space interface (mount option, file
system option such as chattr) wrong as well? Is making fsync() more
compatible wrong?

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 23:23               ` Alexander Viro
@ 2001-08-04  3:53                 ` Matthias Andree
  2001-08-04  5:48                   ` Alexander Viro
  0 siblings, 1 reply; 55+ messages in thread
From: Matthias Andree @ 2001-08-04  3:53 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linux-Kernel mailing list

On Fri, 03 Aug 2001, Alexander Viro wrote:

> It has nothing to bindings/mount/etc. fsync /a/b/c. While c is written
> out, mv a/b/c /a/d/c. While d is written out, mv a/d/c a/b/c && mv a/d e/d
> Through all these renames /a remained the grandparent of c. You won't sync it -
> you sync c, then d, then e, then root.

Which looks like the right thing, what used to be a/b/c is now e/d/c --
and you synced c, d, and e.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04  1:45               ` Andrew Morton
@ 2001-08-04  4:04                 ` Matthias Andree
  2001-08-04 18:30                   ` Chris Wedgwood
  2001-08-04 21:07                   ` Andrew Morton
  0 siblings, 2 replies; 55+ messages in thread
From: Matthias Andree @ 2001-08-04  4:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Fri, 03 Aug 2001, Andrew Morton wrote:

> > really, there is _some_ merit in the argument that
> > 
> >         open
> >         fsync
> >         close
> > <crash>
> > 
> > shouldn't loose the file...

...

> mmm... Holding i_sem across multiple revs of the disk will hurt.  It
> doesn't *need* to be held while we're waiting on IO, but fixing that
> would be a big change, and there has been little motivation to change
> things because it is for specialised apps.

I have no clues of the inode, dentry whatever kernel structure names of
Linux. However, aren't we already at the point that ext3 fsync() flushes
the corresponding dirents? How difficult would it be to have the inode
track changes to its dirents such as rename or link? Without breaking it
if someone renames a parent or grandparent? I mean, FFS + softupdates
can do it. The MTA doesn't really care for the implementation, it cares
that it never loses its files over a crash -- where finding it renamed
to /mount/point/lost+found is considered a loss. For people that want
the data synced and want to sync the meta data later, there is
fdatasync() as they go and either fsync()ing the files or fsync()ing
their directory as they finish.

unlink and particularly symlink will need separate consideration, but
that's left for later.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04  3:53                 ` Matthias Andree
@ 2001-08-04  5:48                   ` Alexander Viro
  0 siblings, 0 replies; 55+ messages in thread
From: Alexander Viro @ 2001-08-04  5:48 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Linux-Kernel mailing list



On Sat, 4 Aug 2001, Matthias Andree wrote:

> On Fri, 03 Aug 2001, Alexander Viro wrote:
> 
> > It has nothing to bindings/mount/etc. fsync /a/b/c. While c is written
> > out, mv a/b/c /a/d/c. While d is written out, mv a/d/c a/b/c && mv a/d e/d
> > Through all these renames /a remained the grandparent of c. You won't sync it -
> > you sync c, then d, then e, then root.
> 
> Which looks like the right thing, what used to be a/b/c is now e/d/c --
> and you synced c, d, and e.

Like hell it is.

/        /a        /a/b        /a/b/c        /a/d        /e
                               ^^^^^^
/        /a        /a/b        /a/d/c        /a/d        /e
                               ^^^^^^
/        /a        /a/b        /a/b/c        /a/d        /e
                                             ^^^^
/        /a        /a/b        /a/b/c        /e/d        /e
                                             ^^^^
/        /a        /a/b        /a/b/c        /e/d        /e
                                                         ^^
/        /a        /a/b        /a/b/c        /e/d        /e
^

Result of these renames is the same as mv /a/d /e/d. See above for names
and currently synced inode during that sequence...


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchange  patch)
  2001-08-03 23:53                           ` Alexander Viro
@ 2001-08-04  7:45                             ` Hans Reiser
  2001-08-04 18:31                               ` Chris Wedgwood
  0 siblings, 1 reply; 55+ messages in thread
From: Hans Reiser @ 2001-08-04  7:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

Alexander Viro wrote:
> 
> On Sat, 4 Aug 2001, Chris Wedgwood wrote:
> 
> > On Fri, Aug 03, 2001 at 07:41:40PM -0400, Alexander Viro wrote:
> >
> >     file->f_cred. Different people opening the same file can have
> >     different credentials (e.g. credentials can be revoked)
> >
> > Sure, but if I
> >
> >         f = open("/home/viro/file", ... );
> >         fsync(f);
> >
> > I only have creds for 'file' --- I have no such thing for 'viro' or
> > 'home', so I can't do anything sensible here.
> 
> Credentials are about "I'm foo", not "I'm allowed to do this with that".
> Server decides what you can do.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Can you define f_cred for us?

Hans

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 22:01         ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Chris Wedgwood
                             ` (2 preceding siblings ...)
  2001-08-04  1:08           ` Andrew Morton
@ 2001-08-04 17:35           ` Jan Harkes
  2001-08-04 18:18             ` Chris Wedgwood
  2001-08-06 15:02           ` Chris Mason
  4 siblings, 1 reply; 55+ messages in thread
From: Jan Harkes @ 2001-08-04 17:35 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Sat, Aug 04, 2001 at 10:01:43AM +1200, Chris Wedgwood wrote:
> On Fri, Aug 03, 2001 at 06:34:14PM +0000, Linus Torvalds wrote:
> 
>     	fsync(int fd)
>     	{
>     		dentry = fdget(fd);
>     		do_fsync(dentry);
>     		for (;;) {
>     			tmp = dentry;
>     			dentry = dentry->d_parent;
>     			if (dentry == tmp)
>     				break;
>     			do_fdatasync(dentry);
>     		}
>     	}
> 
> I really like this idea. Can people please try out the attached patch?

Deadly for Coda, every fsync triggers an upcall to userspace, which
costs at least 2 context switches and a whole bunch of network traffic
as updates are sent back to the server, i.e. the fact that some parent
has a new child or timestamp is not related to or interesting for this
delivery.

The existing fsync(dir) is far better and the reasons are simple and
explained multiple times throughout this far too long thread. So let me
reiterate.

- fsync(dir) is an _explicit_ indication by an application that actually
  cares what precisely needs to be written to disk.
- fsync(dir) doesn't negatively affect applications that don't care.
- The proposed patch doesn't solve the 'oops I relinked the file, or I
  renamed a parent' problems where the path leading to a file is lost
  anyways. And the relinking is exactly what is done by both qmail and
  cyrus imap, i.e. this patch wouldn't even solve the problem that is
  being discussed.
- fsync(dir) doesn't write out anymore that has to be written, i.e. a
  open/unlink in a temporary directory doesn't need to hit the disk if
  there is an fsync in the middle (see qmail/cyrus syscall paths in a
  previous email)
- We don't need to modify the code to make fsync(dir) work! If it turns
  out to be too slow, let's fix fsync(dir) to be more efficient.

Jan


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04 17:35           ` Jan Harkes
@ 2001-08-04 18:18             ` Chris Wedgwood
  0 siblings, 0 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-04 18:18 UTC (permalink / raw)
  To: Jan Harkes, linux-kernel

(cc' list trimmed)

On Sat, Aug 04, 2001 at 01:35:00PM -0400, Jan Harkes wrote:

    Deadly for Coda, every fsync triggers an upcall to userspace,
    which costs at least 2 context switches and a whole bunch of
    network traffic as updates are sent back to the server, i.e. the
    fact that some parent has a new child or timestamp is not related
    to or interesting for this delivery.

would limiting it to just the parent dentry be acceptable?

    - fsync(dir) is an _explicit_ indication by an application that
      actually cares what precisely needs to be written to disk.

i though nothing used it (maybe qmail?)

    - fsync(dir) doesn't negatively affect applications that don't care.

fsync(dir) or fsync(file)?

    - The proposed patch doesn't solve the 'oops I relinked the file,
      or I renamed a parent' problems where the path leading to a file
      is lost anyways.

      And the relinking is exactly what is done by both qmail and
      cyrus imap, i.e. this patch wouldn't even solve the problem that
      is being discussed.

cryrus/qmail do link("tmp_place", "store/where_i_want_it") ?

do they also do fsync(open("store")) ?




  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04  4:04                 ` Matthias Andree
@ 2001-08-04 18:30                   ` Chris Wedgwood
  2001-08-05 12:15                     ` Matthias Andree
  2001-08-04 21:07                   ` Andrew Morton
  1 sibling, 1 reply; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-04 18:30 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

(cc' list trimmed)

On Sat, Aug 04, 2001 at 06:04:23AM +0200, Matthias Andree wrote:

    However, aren't we already at the point that ext3 fsync() flushes
    the corresponding dirents?

Ideally an acceptable solution will also obvioate the needs for +S
under ext2 and also work for reiserfs and more.

    How difficult would it be to have the inode track changes to its
    dirents such as rename or link?

For simple cases, not terrible (I don't think),  but you would have to
limit how much you tracked in extreme cases (like 2 directory inode at
most or something).

    I mean, FFS + softupdates can do it.

So run FFS :)

    The MTA doesn't really care for the implementation, it cares that
    it never loses its files over a crash

By MTA you mean Postfix.  Wietse recently stated he might start
looking at other alternatives which don't relay on undocument FFS
semantics or synchronous metadata updates.

    where finding it renamed
    to /mount/point/lost+found is considered a loss.

Actually, there is enough information on the file to recover things
for postfix.  For sendmail (which has two files, your pretty much
stuffed though).

    unlink and particularly symlink will need separate consideration,
    but that's left for later.

Unlink should be ok, I don't see why symlink should even be
considered (it would make things way too complex for little or no
gain).



Anyhow, if you read recent comments it looks like thre are filesystems
which will be badly affected by placing this logic into the VFS
(eg. Coda).  It may well be this should become a fs-specific thing
(which sucks a little, because it makes the suggestion of tracking
link/unlink directories ugly) and for some filesystems such as
resierfs and ext3, they could be modified to merge the 'related
directories' writes with the metadata write of the file itself.




  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchange patch)
  2001-08-04  7:45                             ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchange patch) Hans Reiser
@ 2001-08-04 18:31                               ` Chris Wedgwood
  2001-08-05  1:47                                 ` Alexander Viro
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-04 18:31 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Alexander Viro, Linus Torvalds, linux-kernel, Alan Cox, Chris Mason

On Sat, Aug 04, 2001 at 11:45:47AM +0400, Hans Reiser wrote:

    Can you define f_cred for us?

Surely is becomes a fs-specific opaque type, void* or whatever?  For
many filesystems it somply won't be relevant, and for some filesystems
such as NFS and Coda, presumably it will need deep fs-specific
details?



  --cw


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic  change patch)
  2001-08-04  4:04                 ` Matthias Andree
  2001-08-04 18:30                   ` Chris Wedgwood
@ 2001-08-04 21:07                   ` Andrew Morton
  1 sibling, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2001-08-04 21:07 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

Matthias Andree wrote:
> 
> aren't we already at the point that ext3 fsync() flushes
> the corresponding dirents?

_any_ synchronous operation on ext3 has flushed _everything_
by the time it returns to the caller.  Every last little bit
is on disk.

This applies to fsync() against any file/dir, write() on an
O_SYNC file, any metadata operation or write() against a `chattr +S'
object, any metadata operation or write() against a `mount -o sync'
filesystem and msync().

The only exception is pageout of mmap'ed files - you'll need to
run msync() to guarantee that these are crashproofed.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic  change patch)
  2001-08-04  1:19             ` Chris Wedgwood
  2001-08-04  1:45               ` Andrew Morton
@ 2001-08-04 21:07               ` Andrew Morton
  2001-08-05  7:25                 ` Chris Wedgwood
  2001-08-09 13:25                 ` Matthias Andree
  1 sibling, 2 replies; 55+ messages in thread
From: Andrew Morton @ 2001-08-04 21:07 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel

Chris Wedgwood wrote:
> 
> On Fri, Aug 03, 2001 at 06:08:49PM -0700, Andrew Morton wrote:
> 
>     Ow.  You just crippled ext3.
> 
> How so? The Flush all transactions on fsync behaviour that resierfs
> did/does have at present too?  (There are 'fixes' to reiserfs for
> this).

Sorry, Chris - I was being even more than usually stupid.  All
you need do is overload the return value from file_operations.fsync().
It currently returns zero on success or negative error.  You can
just define a return value of "1" to mean that the fs has taken
care of syncing the directory info and no further action is needed
at the fs layer.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: fdatasync(2) is also there (was: intermediate summary of ext3-2.4-0.9.4 thread)
  2001-08-04  3:40       ` fdatasync(2) is also there (was: intermediate summary of ext3-2.4-0.9.4 thread) Matthias Andree
@ 2001-08-05  0:28         ` Mike Castle
  0 siblings, 0 replies; 55+ messages in thread
From: Mike Castle @ 2001-08-05  0:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Viro

On Sat, Aug 04, 2001 at 05:40:43AM +0200, Matthias Andree wrote:
> How portable is fsync()ing the directory?

It is as portable as assuming that the directory entries are synced during
fsync().

> 
> How USEFUL is it to the application if all other boxen fsync() the
> directory entries along with the file?

Except, you can NOT guarantee that this is the case.  The spec does NOT
require it.

> You want to sync the files, but the directory only after you created all
> of the files? Use fdatasync(2) for the files - it doesn't flush meta
> data, then sync the directory. It's POSIX, it's SUS v2.

Is fsync(dir) what you mean here?

mrc
-- 
     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchange patch)
  2001-08-04 18:31                               ` Chris Wedgwood
@ 2001-08-05  1:47                                 ` Alexander Viro
  2001-08-08 17:22                                   ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchangepatch) Hans Reiser
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Viro @ 2001-08-05  1:47 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Hans Reiser, Linus Torvalds, linux-kernel, Alan Cox, Chris Mason



On Sun, 5 Aug 2001, Chris Wedgwood wrote:

> On Sat, Aug 04, 2001 at 11:45:47AM +0400, Hans Reiser wrote:
> 
>     Can you define f_cred for us?
> 
> Surely is becomes a fs-specific opaque type, void* or whatever?  For
> many filesystems it somply won't be relevant, and for some filesystems
> such as NFS and Coda, presumably it will need deep fs-specific
> details?

Not really. It _is_ relevant for, say it, ext2. Why? Because we need that
to handle, e.g. 5% reserve. Look:

	Disk is almost full. We have only 3% of blocks free and in that
situation only root should be able to allocate more.
	User foo creates an empty file. No blocks allocated, so we are OK.
He does truncate() to, say it, 10Mb. Still no block allocations.
	He mmaps that file. And does memset() on mmaped area. Now we have
a bunch of dirty pages. Well, eventually kswapd will write them out, right?
	What UID does kswapd run under? Exactly. Welcome to the fun - we've
managed to trick the system into allocating 10Mb of disk space for us, since
in the allocation time UID and GID were 0.
	Now that we have these blocks allocated, usual write() will succeeed
just fine - no block allocation, no checks.

See what I mean? Blind use of current->fsuid et.al. in filesystem code is a
Bad Thing(tm). We really want to know who is responsible for the operation
and "current task" is not always the right answer.

What we need is a cache of objects that would contain (at least) UID,
GID and auxillary groups. That, and ability to add extra authentication
tokens. Process should have a pointer to its current credentials (quite
possibly shared by many processes). Ditto for an opened file. When we
open a file we should simply inherit credentials of opening process.
When we do seteuid(2), etc., we should create a separate copy of current
credentials, modify that copy, drop the reference to old creds and set
the pointer to credentials to the modified copy. Fs operations should
(eventually) be changed to get credentials explicitly - as an argument.
That, BTW, would allow knfsd to avoid the silly games with setting
->fsuid for the duration of fs operation - we could simply pass the
credentials of the party responsible for NFS request to filesystem
methods.

It's nothing new - both the problem and solution are well-known since
at least early 90s. All this stuff is pretty straightforward, but it
can't be done in 2.4 - it involves changing the prototypes of fs
methods. Changes to the bodies of methods (i.e. to fs code) are
minimal - basically, it's a search-and-replace job that can be done
at once on the whole tree. However, such things do not belong to
-STABLE.

PS: probably "identity" would be a better term than "credentials". It's
a "look, I'm acting on behalf of Joe R. User and I carry a bunch of IDs to
prove that" thing.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04 21:07               ` Andrew Morton
@ 2001-08-05  7:25                 ` Chris Wedgwood
  2001-08-09 13:25                 ` Matthias Andree
  1 sibling, 0 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-05  7:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sat, Aug 04, 2001 at 02:07:59PM -0700, Andrew Morton wrote:

    Sorry, Chris - I was being even more than usually stupid.  All you
    need do is overload the return value from file_operations.fsync().
    It currently returns zero on success or negative error.  You can
    just define a return value of "1" to mean that the fs has taken
    care of syncing the directory info and no further action is needed
    at the fs layer.

Seems quite reasonable.  I'm quite keen on ehat approach except it
seems for some filesystems, walking the path is expensive.  How about
a flag in the superblock on whether we do this or not?

That said, it's been pointed out it doesn't actually help in several
cases (link/unlink) for example, so maybe I should forget the whole
thing?




  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04 18:30                   ` Chris Wedgwood
@ 2001-08-05 12:15                     ` Matthias Andree
  2001-08-05 12:32                       ` Chris Wedgwood
  0 siblings, 1 reply; 55+ messages in thread
From: Matthias Andree @ 2001-08-05 12:15 UTC (permalink / raw)
  To: linux-kernel

On Sun, 05 Aug 2001, Chris Wedgwood wrote:

> Anyhow, if you read recent comments it looks like thre are filesystems
> which will be badly affected by placing this logic into the VFS
> (eg. Coda).  It may well be this should become a fs-specific thing
> (which sucks a little, because it makes the suggestion of tracking
> link/unlink directories ugly) and for some filesystems such as

Why does it? Each file-system is self-contained with respect to hard
links. You cannot have link cross file system boundaries.

Common code can be placed into a library. (Probably 2.5 stuff though.)

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-05 12:15                     ` Matthias Andree
@ 2001-08-05 12:32                       ` Chris Wedgwood
  2001-08-05 13:02                         ` Matti Aarnio
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-05 12:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Matthias Andree


On Sun, Aug 05, 2001 at 02:15:46PM +0200, Matthias Andree wrote:

    Why does it? Each file-system is self-contained with respect to hard
    links. You cannot have link cross file system boundaries.

    Common code can be placed into a library. (Probably 2.5 stuff though.)

As pointed out by Jan Harkes, logic that works for ext2 (eg. walking
the dentry chain and sync'ing all the components) sucks for things
like Coda, where the performance impact may be noticable (actually,
I'm not conviced it will be, but what do I know).

Not only that, it doesn't help qmail, cyrus imapd or Postfix
completely.




  --cw

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-05 12:32                       ` Chris Wedgwood
@ 2001-08-05 13:02                         ` Matti Aarnio
  0 siblings, 0 replies; 55+ messages in thread
From: Matti Aarnio @ 2001-08-05 13:02 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel

On Mon, Aug 06, 2001 at 12:32:42AM +1200, Chris Wedgwood wrote:
> On Sun, Aug 05, 2001 at 02:15:46PM +0200, Matthias Andree wrote:
> 
>     Why does it? Each file-system is self-contained with respect to hard
>     links. You cannot have link cross file system boundaries.
> 
>     Common code can be placed into a library. (Probably 2.5 stuff though.)
> 
> As pointed out by Jan Harkes, logic that works for ext2 (eg. walking
> the dentry chain and sync'ing all the components) sucks for things
> like Coda, where the performance impact may be noticable (actually,
> I'm not conviced it will be, but what do I know).
> 
> Not only that, it doesn't help qmail, cyrus imapd or Postfix
> completely.

	And for that matter, it (full access-tree fsync()ing) isn't
	necessary for systems which don't go around creating directories,
	instead place things into existing ones, and move things around
	in between the directories.


>   --cw

/Matti Aarnio

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-03 22:01         ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Chris Wedgwood
                             ` (3 preceding siblings ...)
  2001-08-04 17:35           ` Jan Harkes
@ 2001-08-06 15:02           ` Chris Mason
  2001-08-06 20:48             ` Chris Wedgwood
  4 siblings, 1 reply; 55+ messages in thread
From: Chris Mason @ 2001-08-06 15:02 UTC (permalink / raw)
  To: Chris Wedgwood, Linus Torvalds; +Cc: linux-kernel, Alan Cox



On Saturday, August 04, 2001 10:01:43 AM +1200 Chris Wedgwood <cw@f00f.org>
wrote:
 
> Note, there is also a reiserfs fix in here because we can call
> f_op->fsync on a directory and without this fix it will BUG!  Chris,
> perhaps you can suggest a better fix?
> 

Aside from the rest of the discussion on this, did you actually trigger
this BUG for fsync(dir)?  f_op->fsync should already be set to
reiserfs_dir_fsync.

-chris


^ permalink raw reply	[flat|nested] 55+ messages in thread

* looking for resources for designing loopback filesystem
  2001-08-03 22:29       ` Anton Altaparmakov
  2001-08-03 23:06         ` Chris Wedgwood
@ 2001-08-06 15:23         ` dave-mlist
  2001-08-06 16:11           ` Ville Herva
  1 sibling, 1 reply; 55+ messages in thread
From: dave-mlist @ 2001-08-06 15:23 UTC (permalink / raw)
  To: linux-kernel

I'd like to create a filesystem which mirrors another local
filesystem, but with different permissions, and with some content
invisible to some users.  Is there a kernel module that does this
already?  Is the loopback module the one?  If so, where can I learn
about applying the module to this purpose?

Dave

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: looking for resources for designing loopback filesystem
  2001-08-06 15:23         ` looking for resources for designing loopback filesystem dave-mlist
@ 2001-08-06 16:11           ` Ville Herva
  0 siblings, 0 replies; 55+ messages in thread
From: Ville Herva @ 2001-08-06 16:11 UTC (permalink / raw)
  To: dave-mlist; +Cc: linux-kernel

On Mon, Aug 06, 2001 at 08:23:47AM -0700, you [dave-mlist@bfnet.com] claimed:
> I'd like to create a filesystem which mirrors another local
> filesystem, but with different permissions, and with some content
> invisible to some users.  Is there a kernel module that does this
> already?  Is the loopback module the one?  If so, where can I learn
> about applying the module to this purpose?

Perhaps you could hack an nfs-server (knfsd or nfsd) to do the permission
tweaks and hiding and then just mount the stuff over to another location
with ordinary nfs client stuff.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-06 15:02           ` Chris Mason
@ 2001-08-06 20:48             ` Chris Wedgwood
  0 siblings, 0 replies; 55+ messages in thread
From: Chris Wedgwood @ 2001-08-06 20:48 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, linux-kernel, Alan Cox

On Mon, Aug 06, 2001 at 11:02:27AM -0400, Chris Mason wrote:

    Aside from the rest of the discussion on this, did you actually trigger
    this BUG for fsync(dir)?  f_op->fsync should already be set to
    reiserfs_dir_fsync.

Yes, but when walking the dentry chain I got to resierfs_file_sync or
something rather than reiserfs_dir_fsync --- two possible fixes,
either have reiserfs internally deal with this (my patch just calls
the dir function from the file function) or try and make the code I
wrote aware of the difference (which technicaly is probably more
correct, I just couldn't figure out how to do it).

I actually made a note to myself to look at merging reiserfs_dir_fsync
and resierfs_file_sync at some point.




  --cw


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync  semanticchangepatch)
  2001-08-05  1:47                                 ` Alexander Viro
@ 2001-08-08 17:22                                   ` Hans Reiser
  0 siblings, 0 replies; 55+ messages in thread
From: Hans Reiser @ 2001-08-08 17:22 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Chris Wedgwood, linux-kernel, Alan Cox, Chris Mason

Does fseuid capture the meaning you intend?  And if it does, what exactly is
missing from the tradional paradigm of effective vs. real uid that cannot be
applied to kswapd?

Can you give an example of when does the filesytem need to know that the
effective user id is A but the real user id is kswapd?

I understand that kswapd is not managing its uids, but I don't yet understand
why the fs needs two fields rather than one to track the effective uid
completely not caring that the real uid is kswapd.  

I suspect I just need more details on the issue to understand you....

Hans

Alexander Viro wrote:
> 
> On Sun, 5 Aug 2001, Chris Wedgwood wrote:
> 
> > On Sat, Aug 04, 2001 at 11:45:47AM +0400, Hans Reiser wrote:
> >
> >     Can you define f_cred for us?
> >
> > Surely is becomes a fs-specific opaque type, void* or whatever?  For
> > many filesystems it somply won't be relevant, and for some filesystems
> > such as NFS and Coda, presumably it will need deep fs-specific
> > details?
> 
> Not really. It _is_ relevant for, say it, ext2. Why? Because we need that
> to handle, e.g. 5% reserve. Look:
> 
>         Disk is almost full. We have only 3% of blocks free and in that
> situation only root should be able to allocate more.
>         User foo creates an empty file. No blocks allocated, so we are OK.
> He does truncate() to, say it, 10Mb. Still no block allocations.
>         He mmaps that file. And does memset() on mmaped area. Now we have
> a bunch of dirty pages. Well, eventually kswapd will write them out, right?
>         What UID does kswapd run under? Exactly. Welcome to the fun - we've
> managed to trick the system into allocating 10Mb of disk space for us, since
> in the allocation time UID and GID were 0.
>         Now that we have these blocks allocated, usual write() will succeeed
> just fine - no block allocation, no checks.
> 
> See what I mean? Blind use of current->fsuid et.al. in filesystem code is a
> Bad Thing(tm). We really want to know who is responsible for the operation
> and "current task" is not always the right answer.
> 
> What we need is a cache of objects that would contain (at least) UID,
> GID and auxillary groups. That, and ability to add extra authentication
> tokens. Process should have a pointer to its current credentials (quite
> possibly shared by many processes). Ditto for an opened file. When we
> open a file we should simply inherit credentials of opening process.
> When we do seteuid(2), etc., we should create a separate copy of current
> credentials, modify that copy, drop the reference to old creds and set
> the pointer to credentials to the modified copy. Fs operations should
> (eventually) be changed to get credentials explicitly - as an argument.
> That, BTW, would allow knfsd to avoid the silly games with setting
> ->fsuid for the duration of fs operation - we could simply pass the
> credentials of the party responsible for NFS request to filesystem
> methods.
> 
> It's nothing new - both the problem and solution are well-known since
> at least early 90s. All this stuff is pretty straightforward, but it
> can't be done in 2.4 - it involves changing the prototypes of fs
> methods. Changes to the bodies of methods (i.e. to fs code) are
> minimal - basically, it's a search-and-replace job that can be done
> at once on the whole tree. However, such things do not belong to
> -STABLE.
> 
> PS: probably "identity" would be a better term than "credentials". It's
> a "look, I'm acting on behalf of Joe R. User and I carry a bunch of IDs to
> prove that" thing.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch)
  2001-08-04 21:07               ` Andrew Morton
  2001-08-05  7:25                 ` Chris Wedgwood
@ 2001-08-09 13:25                 ` Matthias Andree
  1 sibling, 0 replies; 55+ messages in thread
From: Matthias Andree @ 2001-08-09 13:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chris Wedgwood, linux-kernel

On Sat, 04 Aug 2001, Andrew Morton wrote:

> Sorry, Chris - I was being even more than usually stupid.  All
> you need do is overload the return value from file_operations.fsync().
> It currently returns zero on success or negative error.  You can
> just define a return value of "1" to mean that the fs has taken
> care of syncing the directory info and no further action is needed
> at the fs layer.

But ensure that the user space never sees the "1" return value from
fsync() itself.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2001-08-09 13:25 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0108030507330F.00440@starship>
     [not found] ` <Pine.GSO.4.21.0108022312211.1494-100000@weyl.math.psu.edu>
2001-08-03 13:09   ` intermediate summary of ext3-2.4-0.9.4 thread Daniel Phillips
2001-08-03 14:43     ` Horst von Brand
2001-08-03 17:49     ` Mike Castle
2001-08-04  3:23       ` Daniel Phillips
2001-08-03 18:08     ` Alexander Viro
2001-08-03 18:26       ` Daniel Phillips
2001-08-03 18:53         ` Alexander Viro
2001-08-03 20:50           ` Daniel Phillips
2001-08-04  3:43           ` Matthias Andree
2001-08-03 18:34       ` Linus Torvalds
2001-08-03 22:01         ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Chris Wedgwood
2001-08-03 22:33           ` Alexander Viro
2001-08-03 23:16             ` Chris Wedgwood
2001-08-03 23:23               ` Alexander Viro
2001-08-04  3:53                 ` Matthias Andree
2001-08-04  5:48                   ` Alexander Viro
2001-08-03 22:45           ` Alexander Viro
2001-08-03 23:09             ` Chris Wedgwood
2001-08-03 23:15               ` Alexander Viro
2001-08-03 23:20                 ` Chris Wedgwood
2001-08-03 23:25                   ` Alexander Viro
2001-08-03 23:35                     ` Chris Wedgwood
2001-08-03 23:41                       ` Alexander Viro
2001-08-03 23:46                         ` Chris Wedgwood
2001-08-03 23:53                           ` Alexander Viro
2001-08-04  7:45                             ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchange patch) Hans Reiser
2001-08-04 18:31                               ` Chris Wedgwood
2001-08-05  1:47                                 ` Alexander Viro
2001-08-08 17:22                                   ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semanticchangepatch) Hans Reiser
2001-08-03 23:42                       ` [PATCH] 2.4.8-pre3 fsync entire path (+reiserfs fsync semantic change patch) Alan Cox
2001-08-03 23:44                         ` Chris Wedgwood
2001-08-04  1:08           ` Andrew Morton
2001-08-04  1:19             ` Chris Wedgwood
2001-08-04  1:45               ` Andrew Morton
2001-08-04  4:04                 ` Matthias Andree
2001-08-04 18:30                   ` Chris Wedgwood
2001-08-05 12:15                     ` Matthias Andree
2001-08-05 12:32                       ` Chris Wedgwood
2001-08-05 13:02                         ` Matti Aarnio
2001-08-04 21:07                   ` Andrew Morton
2001-08-04 21:07               ` Andrew Morton
2001-08-05  7:25                 ` Chris Wedgwood
2001-08-09 13:25                 ` Matthias Andree
2001-08-04 17:35           ` Jan Harkes
2001-08-04 18:18             ` Chris Wedgwood
2001-08-06 15:02           ` Chris Mason
2001-08-06 20:48             ` Chris Wedgwood
2001-08-03 22:29       ` Anton Altaparmakov
2001-08-03 23:06         ` Chris Wedgwood
2001-08-06 15:23         ` looking for resources for designing loopback filesystem dave-mlist
2001-08-06 16:11           ` Ville Herva
2001-08-03 18:36   ` intermediate summary of ext3-2.4-0.9.4 thread Matthias Andree
2001-08-03 19:16     ` Alexander Viro
2001-08-04  3:40       ` fdatasync(2) is also there (was: intermediate summary of ext3-2.4-0.9.4 thread) Matthias Andree
2001-08-05  0:28         ` Mike Castle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).